Downloads 1k

..............................

Views 3k

..............................

Cited by

..............................

Received date December 22, 2009

Accepted date May 20, 2010 1. Intr

Arabic Speaker Independent Continuous Automatic Speech Recognition Based on a

Author Mohammad Abushariah1, 2, Raja Ainon1, Roziati Zainuddin1, Moustafa Elshafei3, and Othman Khalifa4,

Abstract This paper describes and proposes an efficient and effective framework for the design and development of a speaker-independent continuous automatic Arabic spe ech recognition system based on a phonetically rich and balanced speech corpus. The speech corpus contains a total o f 415 sentences recorded by 40 (20 male and 20 fema le) Arabic native speakers from 11 different Arab countries represent ing the three major regions (Levant, Gulf, and Africa) in the Arab world. The proposed Arabic speech recognition system is ba sed on the Carnegie Mellon University (CMU) Sphinx tools, and the Cambridge HTK tools were also used at some testing stages. The speech engine uses 3-emitting state Hidden Markov Models (HMM) for tri-phone based acoustic models. Based on experimental analysis of about 7 hours of training speech data, the acoustic model is best using continuous observation ’s probability model of 16 Gaussian mixture distributions and the state distributions were tied to 500 senones. The languag e model contains both bi-grams and tri-grams. For s imilar speakers with different sentences, the system obtained a word rec ognition accuracy of 92.67% and 93.88% and a Word E rror Rate (WER) of 11.27% and 10.07% with and without diacritical mark s, respectively. For different speakers with similar sentences, the system obtained a word recognition accuracy of 95.92% and 96.29%, and a WER of 5.78%, and 5.45% with and with out diacritical marks, respectively. Whereas different speakers and different sentences, the system obtained a word recognition accuracy of 89.08% and 90.23%, and a WER of 15.59% and 14.44% w ith and without diacritical marks, respectively.

References

[1] Alansary S., Nagi M., and Adly N., Building an International Corpus of Arabic Progress of Compilation Stage, in Proceedings of 8 th International Conference on Language Engineering , Egypt, pp. 3375344, 2007.

[2] Alghamdi M., Alhamid A., and Aldasuqi M., Database of Arabic Sounds: Sentences, Technical Report , Saudi Arabia, 2003.

[3] Alghamdi M., Basalamah M., Seeni M., and Husain A., Database of Arabic Sounds: Words, in Proceedings of the 15 th National Computer Conference , Saudi Arabia, pp. 7975815, 1997.

[4] Alghamdi M., Elshafei M., and Al5Muhtaseb H., Arabic Broadcast News Transcription System, International Computer Journal of Speech Technology , vol. 10, no. 4, pp. 1835195, 2009.

[5] Alotaibi Y., Comparative Study of ANN and HMM to Arabic Digits Recognition Systems, Journal of King Abdulaziz University: Engineering Sciences , vol. 19, no. 1, pp. 43559, 2008.

[6] Alotaibi Y., Alghamdi M., and Alotaiby F., Using a Telephony Saudi Accented Arabic Corpus in Automatic Recognition of Spoken Arabic Digits, in Proceedings of 4 th International Symposium on Image/Video Communications over Fixed and Mobile Networks , Spain, pp. 43560, 2008.

[7] Alsulaiti L. and Atwell E., The Design of a Corpus of Contemporary Arabic, International Computer Journal of Corpus Linguistics , John Benjamins Publishing Company, vol. 11, no. 2, pp. 1355171, 2006.

[8] Azmi M. and Tolba H., Syllable5Based Automatic Arabic Speech Recognition in Different Conditions of Noise, IEEE Proceedings of the 9 th International Conference on Signal Processing , China, pp. 6015604, 2008.

[9] Black A. and Tokuda K., The Blizzard Challenge Evaluating Corpus5Based Speech Synthesis on Common Datasets, in Proceeding of Interspeech , Portugal, pp. 77580, 2005.

[10] Chou F. and Tseng C., The Design of Prosodically Oriented Mandarin Speech Database, in Proceedings of International Congress of Phonetics Sciences , San Francisco, pp. 237552377, 1999.

[11] Chourasia V., Samudravijaya K., and Chandwani M., Phonetically Rich Hindi Sentence Corpus for Creation of Speech Database, in Proceedings of International Symposium on Speech Technology and Processing Systems and Oriental , Indonesia, pp. 1325137, 2005.

[12] D Arcy S. and Russell M., Experiments with the ABI (Accents of the British Isles) Speech Corpus, in Proceedings of Interspeech 08 , Australia, pp. 2935296, 2008.

[13] El Choubassi M., El Khoury H., Alagha J., Skaf J., and Al5Alaoui M., Arabic Speech Recognition Using Recurrent Neural Networks, in Proceedings of 3 rd IEEE International Symposium on Signal Processing and Information Technology , Germany, pp. 5435547, 2003.

[14] Garofolo J., Lamel L., Fisher W., Fiscus J., Pallett D., Dahlgren N., and Zue V., TIMIT Acoustic5Phonetic Continuous Speech Corpus, Technical Document , Trustees of the University of Pennsylvania, Philadelphia, 1993.

[15] Gordon R., Ethnologue: Languages of the World , Texas: Dallas, SIL International, 2005.

[16] Hong H., Kim S., and Chung M., Effects of Allophones on the Performance of Korean Speech Recognition, in Proceedings of Interspeech , Australia, pp. 241052413, 2008.

[17] Hyassat H. and Abu Zitar R., Arabic Speech Recognition Using SPHINX Engine, International Computer Journal of Speech Technology , vol. 9, no. 354, pp. 1335150, 2008.

[18] Kirchhoff K., Bilmes J., Das S., Duta N., Egan M., Ji G., He F., Henderson J., Liu D., Noamany M., Schone P., Schwartz R., and Vergyri D., Novel Approaches to Arabic Speech Recognition: Report from the 2002 Johns5 Hopkins Summer Workshop, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal processing , Hong Kong, vol. 1, pp. 3445347, 2003.

[19] Lee T., Lo W., Ching P., and Meng H., Spoken Language Resources for Cantonese Speech Processing, Computer Journal of Speech Communication , vol. 36, no. 354, 3275342, 2002.

[20] Lestari D., Iwano K., and Furui S., A Large Vocabulary Continuous Speech Recognition System for Indonesian Language, in Proceedings of 15 th Indonesian Scientific Conference , Japan, pp. 17522, 2006.

[21] Mourtaga E., Sharieh A., and Abdallah M., Speaker Independent Quranic Recognizer Based on Maximum Likelihood Linear Regression, in Proceedings of World Academy of Science , Engineering and Technology , Brazil, pp. 61567, 2007.

[22] Nofal M., Abdel5Raheem E., El Henawy H., and Abdel Kader N., Acoustic Training System for 92 The International Arab Journal of Information Technology, Vol. 9, No. 1, January 2012 Speaker Independent Continuous Arabic Speech Recognition System, in Proceedings of the 4th IEEE International Symposium on Signal Processing and Information Technology , Italy, pp. 2005203, 2004.

[23] Parkinson D. and Farwaneh S., Perspectives on Arabic Linguistics XV , John Benjamins Publishing Company, Philadelphia, 2003.

[24] Pineda L., G mez M., Vaufreydaz D., and Serignat J., Experiments on the Construction of a Phonetically Balanced Corpus from the Web, in Proceedings of 5 th International Conference on Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science , Korea, pp. 4165419, 2004.

[25] Raza A., Hussain S., Sarfraz H., Ullah I., and Sarfraz Z., Design and Development of Phonetically Rich Urdu Speech Corpus, in Proceedings of IEEE Oriental COCOSDA International Conference on Speech Database and Assessments , Urumqi, pp. 38543, 2009.

[26] Sagisaka Y., Takeda K., Abel M., Katagiri S., Umeda T., and Kuwabara H., A Large5Scale Japanese Speech Database, in Proceedings of International Conference on Spoken Language Processing , Japan, pp. 108951092, 1990.

[27] Salor ., Pellom B., Ciloglu T., and Demirekler M., Turkish Speech Corpora and Recognition Tools Developed by Porting SONIC: Towards Multilingual Speech Recognition, Computer Journal of Speech and Language , vol. 21, no. 4, pp. 5805593, 2007.

[28] Satori H., Harti M., and Chenfour N., Arabic Speech Recognition System Based on CMUSphinx, in Proceedings of IEEE International Symposium on Computational Intelligence and Intelligent Informatics , Morocco, pp. 31535, 2007.

[29] Satori H., Hiyassat H., Harti M., and Chenfour N., Investigation Arabic Speech Recognition Using CMU Sphinx System, International Arab Journal of Information Technology , vol. 6, no. 2, pp. 1865190, 2009.

[30] Soltau H., Saon G., Kingsbury B., Kuo J., Mangu L., Povey D., and Zweig G., The Ibm 2006 Gale Arabic Asr System, in Proceedings of IEEE International Conference on Acoustics, Speech, and Single , USA, vol. 4, pp. 3495352, 2007. Mohammad Abushariah received two bachelor degrees in management information systems and information technology from the International Islamic University Malaysia in 2005 and 2006, respectively. He obtained his Master degree in software engineering from University of Malaya in 2007. Currently, he is working towards his PhD in computer science and information technology in University of Malaya, specialized in arabic automatic continuous speech recognition. He has over 10 publications in IEEE international conferences, and technical reports. His research interests include: Arabic speech processing, text and speech corpora, and language resources production. He is a member of IEEE and IACSIT. Raja Ainon is an associate professor in the Department of Software Engineering at University of Malaya. Her current research areas include HMM5based speech synthesis and recognition for malay and arabic languages, and fuzzy5 genetic algorithms. She is the author of mor e than 30 scholarly articles in automatic timetabling, text compression, expert systems, computational linguistics, fuzzy5genetic algorithms, emotional te xt5 to5speech synthesis, and speech recognition. Currently, she is heading the computational research group at University of Malaya. Roziati Zainuddin is working at the Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, University of Malaya. Her areas of interest are intelligent multimedia, image and speech processing, computational fluid dynamics, bio5medical informatics, computer vision and visualisation, and e5 l earning. Her research work has been published in several international journal and conference publications. Several awards have been won for her research outcome at software exhibitions. Her professional duties include reviewing articles, edi ting journals, supervising research students, and appoin ted as an external examiner. Moustafa Elshafei received his PhD (with Dean List) from McGill University, Canada, in electrical engineering in 1982. Since then, he has accumulated a unique blend of nine years of industrial experience and over 17 years of academic experience. He is co5inventor/sole inventor of several US patents and international patents. He has over 1 20 publications in international journals, conferences, and technical reports. He was the PI/CoI of many funded projects and was also involved in many internally funded or industry funded projects. His research interests include: Arabic speech processing, digita l signal processing, and intelligent instrumentation. He is a member of IEEE, ISA, and SPE. Arabic Speaker-Independent Continuous Automatic Speech Recognition Based on 93 Othman Khalifa received his Bachelor s degree in electronic engineering from the Garyounis University, Libya in 1986. He obtained his Master degree in electronics science engineering and his PhD from Newcastle University, UK in 1996 and 2000, respectively. He worked in industry for eight years. Currently, he is a professor and head of the Department of Electrical and Computer Engineering, International Islamic University Malaysia. His area of research interest is communication systems, digital image/ video processing, coding and compression, wavelets, fractal and pattern recognition. He published more than 150 papers in international journals and conferences. He awarded more than 30 medals in different exhibition , and secured more than 9 r esearch Grants. He is SIEEE member.

Abstract: This paper describes and proposes an efficient and effective framework for the design and development of a speaker-independent continuous automatic Arabic spe ech recognition system based on a phonetically rich and balanced speech corpus. The speech corpus contains a total o f 415 sentences recorded by 40 (20 male and 20 fema le) Arabic native speakers from 11 different Arab countries represent ing the three major regions (Levant, Gulf, and Africa) in the Arab world. The proposed Arabic speech recognition system is ba sed on the Carnegie Mellon University (CMU) Sphinx tools, and the Cambridge HTK tools were also used at some testing stages. The speech engine uses 3-emitting state Hidden Markov Models (HMM) for tri-phone based acoustic models. Based on experimental analysis of about 7 hours of training speech data, the acoustic model is best using continuous observation ’s probability model of 16 Gaussian mixture distributions and the state distributions were tied to 500 senones. The languag e model contains both bi-grams and tri-grams. For s imilar speakers with different sentences, the system obtained a word rec ognition accuracy of 92.67% and 93.88% and a Word E rror Rate (WER) of 11.27% and 10.07% with and without diacritical mark s, respectively. For different speakers with similar sentences, the system obtained a word recognition accuracy of 95.92% and 96.29%, and a WER of 5.78%, and 5.45% with and with out diacritical marks, respectively. Whereas different speakers and different sentences, the system obtained a word recognition accuracy of 89.08% and 90.23%, and a WER of 15.59% and 14.44% w ith and without diacritical marks, respectively.
URL: https://iajit.org/paper/3391

,abstract={This paper describes and proposes an efficient and effective framework for the design and development of a speaker-independent continuous automatic Arabic spe ech recognition system based on a phonetically rich and balanced speech corpus. The speech corpus contains a total o f 415 sentences recorded by 40 (20 male and 20 fema le) Arabic native speakers from 11 different Arab countries represent ing the three major regions (Levant, Gulf, and Africa) in the Arab world. The proposed Arabic speech recognition system is ba sed on the Carnegie Mellon University (CMU) Sphinx tools, and the Cambridge HTK tools were also used at some testing stages. The speech engine uses 3-emitting state Hidden Markov Models (HMM) for tri-phone based acoustic models. Based on experimental analysis of about 7 hours of training speech data, the acoustic model is best using continuous observation ’s probability model of 16 Gaussian mixture distributions and the state distributions were tied to 500 senones. The languag e model contains both bi-grams and tri-grams. For s imilar speakers with different sentences, the system obtained a word rec ognition accuracy of 92.67% and 93.88% and a Word E rror Rate (WER) of 11.27% and 10.07% with and without diacritical mark s, respectively. For different speakers with similar sentences, the system obtained a word recognition accuracy of 95.92% and 96.29%, and a WER of 5.78%, and 5.45% with and with out diacritical marks, respectively. Whereas different speakers and different sentences, the system obtained a word recognition accuracy of 89.08% and 90.23%, and a WER of 15.59% and 14.44% w ith and without diacritical marks, respectively. },
keywords={Arabic automatic speech recognition, arabic speech corpus, phonetically rich and balanced, acoustic model, statistical language model},
ISSN={2413-9351},
month={Jan}}

AB - This paper describes and proposes an efficient and effective framework for the design and development of a speaker-independent continuous automatic Arabic spe ech recognition system based on a phonetically rich and balanced speech corpus. The speech corpus contains a total o f 415 sentences recorded by 40 (20 male and 20 fema le) Arabic native speakers from 11 different Arab countries represent ing the three major regions (Levant, Gulf, and Africa) in the Arab world. The proposed Arabic speech recognition system is ba sed on the Carnegie Mellon University (CMU) Sphinx tools, and the Cambridge HTK tools were also used at some testing stages. The speech engine uses 3-emitting state Hidden Markov Models (HMM) for tri-phone based acoustic models. Based on experimental analysis of about 7 hours of training speech data, the acoustic model is best using continuous observation ’s probability model of 16 Gaussian mixture distributions and the state distributions were tied to 500 senones. The languag e model contains both bi-grams and tri-grams. For s imilar speakers with different sentences, the system obtained a word rec ognition accuracy of 92.67% and 93.88% and a Word E rror Rate (WER) of 11.27% and 10.07% with and without diacritical mark s, respectively. For different speakers with similar sentences, the system obtained a word recognition accuracy of 95.92% and 96.29%, and a WER of 5.78%, and 5.45% with and with out diacritical marks, respectively. Whereas different speakers and different sentences, the system obtained a word recognition accuracy of 89.08% and 90.23%, and a WER of 15.59% and 14.44% w ith and without diacritical marks, respectively.