The International Arab Journal of Information Technology (IAJIT)


Arabic Speaker Independent Continuous Automatic Speech Recognition Based on a

This  paper  describes  and  proposes  an  efficient  and  effective  framework  for  the  design  and  development  of  a  speaker-independent  continuous  automatic  Arabic  spe ech  recognition  system  based  on  a  phonetically  rich  and  balanced  speech  corpus.  The  speech  corpus  contains  a  total  o f  415  sentences  recorded  by  40  (20  male  and  20  fema le)  Arabic  native  speakers  from  11  different  Arab  countries  represent ing  the  three  major  regions  (Levant,  Gulf,  and  Africa)  in  the  Arab  world.  The  proposed  Arabic  speech  recognition  system  is  ba sed  on  the  Carnegie  Mellon  University  (CMU)  Sphinx  tools,  and  the  Cambridge HTK tools were also used at some testing  stages. The speech engine uses 3-emitting state Hidden Markov  Models  (HMM)  for  tri-phone  based  acoustic  models.  Based  on   experimental  analysis  of  about  7  hours  of  training  speech  data,  the  acoustic  model  is  best  using  continuous  observation ’s  probability  model  of  16  Gaussian  mixture  distributions  and  the  state  distributions  were  tied  to  500  senones.  The  languag e  model  contains  both  bi-grams  and  tri-grams.  For  s imilar  speakers  with  different sentences, the system obtained a word rec ognition accuracy of 92.67% and 93.88% and a Word E rror Rate (WER) of  11.27% and 10.07% with and without diacritical mark s, respectively. For different speakers with similar sentences, the system  obtained a  word recognition  accuracy of  95.92%  and  96.29%,  and a  WER of  5.78%,  and 5.45%  with and  with out  diacritical  marks,  respectively.  Whereas  different  speakers  and   different  sentences,  the  system  obtained  a  word  recognition  accuracy  of  89.08% and 90.23%, and a WER of 15.59% and 14.44% w ith and without diacritical marks, respectively.   

[1] Alansary S., Nagi M., and Adly N., Building an International Corpus of Arabic Progress of Compilation Stage, in Proceedings of 8 th International Conference on Language Engineering , Egypt, pp. 3375344, 2007.

[2] Alghamdi M., Alhamid A., and Aldasuqi M., Database of Arabic Sounds: Sentences, Technical Report , Saudi Arabia, 2003.

[3] Alghamdi M., Basalamah M., Seeni M., and Husain A., Database of Arabic Sounds: Words, in Proceedings of the 15 th National Computer Conference , Saudi Arabia, pp. 7975815, 1997.

[4] Alghamdi M., Elshafei M., and Al5Muhtaseb H., Arabic Broadcast News Transcription System, International Computer Journal of Speech Technology , vol. 10, no. 4, pp. 1835195, 2009.

[5] Alotaibi Y., Comparative Study of ANN and HMM to Arabic Digits Recognition Systems, Journal of King Abdulaziz University: Engineering Sciences , vol. 19, no. 1, pp. 43559, 2008.

[6] Alotaibi Y., Alghamdi M., and Alotaiby F., Using a Telephony Saudi Accented Arabic Corpus in Automatic Recognition of Spoken Arabic Digits, in Proceedings of 4 th International Symposium on Image/Video Communications over Fixed and Mobile Networks , Spain, pp. 43560, 2008.

[7] Alsulaiti L. and Atwell E., The Design of a Corpus of Contemporary Arabic, International Computer Journal of Corpus Linguistics , John Benjamins Publishing Company, vol. 11, no. 2, pp. 1355171, 2006.

[8] Azmi M. and Tolba H., Syllable5Based Automatic Arabic Speech Recognition in Different Conditions of Noise, IEEE Proceedings of the 9 th International Conference on Signal Processing , China, pp. 6015604, 2008.

[9] Black A. and Tokuda K., The Blizzard Challenge Evaluating Corpus5Based Speech Synthesis on Common Datasets, in Proceeding of Interspeech , Portugal, pp. 77580, 2005.

[10] Chou F. and Tseng C., The Design of Prosodically Oriented Mandarin Speech Database, in Proceedings of International Congress of Phonetics Sciences , San Francisco, pp. 237552377, 1999.

[11] Chourasia V., Samudravijaya K., and Chandwani M., Phonetically Rich Hindi Sentence Corpus for Creation of Speech Database, in Proceedings of International Symposium on Speech Technology and Processing Systems and Oriental , Indonesia, pp. 1325137, 2005.

[12] D Arcy S. and Russell M., Experiments with the ABI (Accents of the British Isles) Speech Corpus, in Proceedings of Interspeech 08 , Australia, pp. 2935296, 2008.

[13] El Choubassi M., El Khoury H., Alagha J., Skaf J., and Al5Alaoui M., Arabic Speech Recognition Using Recurrent Neural Networks, in Proceedings of 3 rd IEEE International Symposium on Signal Processing and Information Technology , Germany, pp. 5435547, 2003.

[14] Garofolo J., Lamel L., Fisher W., Fiscus J., Pallett D., Dahlgren N., and Zue V., TIMIT Acoustic5Phonetic Continuous Speech Corpus, Technical Document , Trustees of the University of Pennsylvania, Philadelphia, 1993.

[15] Gordon R., Ethnologue: Languages of the World , Texas: Dallas, SIL International, 2005.

[16] Hong H., Kim S., and Chung M., Effects of Allophones on the Performance of Korean Speech Recognition, in Proceedings of Interspeech , Australia, pp. 241052413, 2008.

[17] Hyassat H. and Abu Zitar R., Arabic Speech Recognition Using SPHINX Engine, International Computer Journal of Speech Technology , vol. 9, no. 354, pp. 1335150, 2008.

[18] Kirchhoff K., Bilmes J., Das S., Duta N., Egan M., Ji G., He F., Henderson J., Liu D., Noamany M., Schone P., Schwartz R., and Vergyri D., Novel Approaches to Arabic Speech Recognition: Report from the 2002 Johns5 Hopkins Summer Workshop, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal processing , Hong Kong, vol. 1, pp. 3445347, 2003.

[19] Lee T., Lo W., Ching P., and Meng H., Spoken Language Resources for Cantonese Speech Processing, Computer Journal of Speech Communication , vol. 36, no. 354, 3275342, 2002.

[20] Lestari D., Iwano K., and Furui S., A Large Vocabulary Continuous Speech Recognition System for Indonesian Language, in Proceedings of 15 th Indonesian Scientific Conference , Japan, pp. 17522, 2006.

[21] Mourtaga E., Sharieh A., and Abdallah M., Speaker Independent Quranic Recognizer Based on Maximum Likelihood Linear Regression, in Proceedings of World Academy of Science , Engineering and Technology , Brazil, pp. 61567, 2007.

[22] Nofal M., Abdel5Raheem E., El Henawy H., and Abdel Kader N., Acoustic Training System for 92 The International Arab Journal of Information Technology, Vol. 9, No. 1, January 2012 Speaker Independent Continuous Arabic Speech Recognition System, in Proceedings of the 4th IEEE International Symposium on Signal Processing and Information Technology , Italy, pp. 2005203, 2004.

[23] Parkinson D. and Farwaneh S., Perspectives on Arabic Linguistics XV , John Benjamins Publishing Company, Philadelphia, 2003.

[24] Pineda L., G mez M., Vaufreydaz D., and Serignat J., Experiments on the Construction of a Phonetically Balanced Corpus from the Web, in Proceedings of 5 th International Conference on Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science , Korea, pp. 4165419, 2004.

[25] Raza A., Hussain S., Sarfraz H., Ullah I., and Sarfraz Z., Design and Development of Phonetically Rich Urdu Speech Corpus, in Proceedings of IEEE Oriental COCOSDA International Conference on Speech Database and Assessments , Urumqi, pp. 38543, 2009.

[26] Sagisaka Y., Takeda K., Abel M., Katagiri S., Umeda T., and Kuwabara H., A Large5Scale Japanese Speech Database, in Proceedings of International Conference on Spoken Language Processing , Japan, pp. 108951092, 1990.

[27] Salor ., Pellom B., Ciloglu T., and Demirekler M., Turkish Speech Corpora and Recognition Tools Developed by Porting SONIC: Towards Multilingual Speech Recognition, Computer Journal of Speech and Language , vol. 21, no. 4, pp. 5805593, 2007.

[28] Satori H., Harti M., and Chenfour N., Arabic Speech Recognition System Based on CMUSphinx, in Proceedings of IEEE International Symposium on Computational Intelligence and Intelligent Informatics , Morocco, pp. 31535, 2007.

[29] Satori H., Hiyassat H., Harti M., and Chenfour N., Investigation Arabic Speech Recognition Using CMU Sphinx System, International Arab Journal of Information Technology , vol. 6, no. 2, pp. 1865190, 2009.

[30] Soltau H., Saon G., Kingsbury B., Kuo J., Mangu L., Povey D., and Zweig G., The Ibm 2006 Gale Arabic Asr System, in Proceedings of IEEE International Conference on Acoustics, Speech, and Single , USA, vol. 4, pp. 3495352, 2007. Mohammad Abushariah received two bachelor degrees in management information systems and information technology from the International Islamic University Malaysia in 2005 and 2006, respectively. He obtained his Master degree in software engineering from University of Malaya in 2007. Currently, he is working towards his PhD in computer science and information technology in University of Malaya, specialized in arabic automatic continuous speech recognition. He has over 10 publications in IEEE international conferences, and technical reports. His research interests include: Arabic speech processing, text and speech corpora, and language resources production. He is a member of IEEE and IACSIT. Raja Ainon is an associate professor in the Department of Software Engineering at University of Malaya. Her current research areas include HMM5based speech synthesis and recognition for malay and arabic languages, and fuzzy5 genetic algorithms. She is the author of mor e than 30 scholarly articles in automatic timetabling, text compression, expert systems, computational linguistics, fuzzy5genetic algorithms, emotional te xt5 to5speech synthesis, and speech recognition. Currently, she is heading the computational research group at University of Malaya. Roziati Zainuddin is working at the Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, University of Malaya. Her areas of interest are intelligent multimedia, image and speech processing, computational fluid dynamics, bio5medical informatics, computer vision and visualisation, and e5 l earning. Her research work has been published in several international journal and conference publications. Several awards have been won for her research outcome at software exhibitions. Her professional duties include reviewing articles, edi ting journals, supervising research students, and appoin ted as an external examiner. Moustafa Elshafei received his PhD (with Dean List) from McGill University, Canada, in electrical engineering in 1982. Since then, he has accumulated a unique blend of nine years of industrial experience and over 17 years of academic experience. He is co5inventor/sole inventor of several US patents and international patents. He has over 1 20 publications in international journals, conferences, and technical reports. He was the PI/CoI of many funded projects and was also involved in many internally funded or industry funded projects. His research interests include: Arabic speech processing, digita l signal processing, and intelligent instrumentation. He is a member of IEEE, ISA, and SPE. Arabic Speaker-Independent Continuous Automatic Speech Recognition Based on 93 Othman Khalifa received his Bachelor s degree in electronic engineering from the Garyounis University, Libya in 1986. He obtained his Master degree in electronics science engineering and his PhD from Newcastle University, UK in 1996 and 2000, respectively. He worked in industry for eight years. Currently, he is a professor and head of the Department of Electrical and Computer Engineering, International Islamic University Malaysia. His area of research interest is communication systems, digital image/ video processing, coding and compression, wavelets, fractal and pattern recognition. He published more than 150 papers in international journals and conferences. He awarded more than 30 medals in different exhibition , and secured more than 9 r esearch Grants. He is SIEEE member.