The International Arab Journal of Information Technology (IAJIT)


Middle Eastern and North African English Speech Corpus (MENAESC): Automatic Identification of

This study aims to explore the English accents in the Arab world. Although there are limited resources for a speech corpus that attempts to automatically identify the degree of accent patterns of an Arabic speaker of English, there is no speech corpus specialized for Arabic speakers of English in the Middle East and North Africa (MENA). To that end, different samples were collected in order to create the linguistic resource that we called Middle Eastern and North African English Speech Corpus (MENAESC). In addition to the “accent approach” applied in the field of automatic language/dialect recognition; we applied also the “macro-accent approach”-by employing Mel-Frequency Cepstral Coefficients (MFCC), Energy and Shifted Delta Cepstra (SDC) features and Gaussian Mixture Model-Universal Background Model (GMM-UBM) classifier- on four accents (Egyptian, Qatari, Syrian, and Tunisian accents) among the eleven accents that were selected based on their high population density in the location where the experiments were carried out. By using the Equal Error Rate percentage (EER%) for the assessment of our system effectiveness in the identification of MENA English accents using the two approaches mentioned above through the employ of the MENAESC, results showed we reached 1.5 to 2%, for “accent approach” and 2 to 3.5% for “macro-accents approach” for identification of MENA English. It also exhibited that the Qatari accent, of the 4 accents included, scored the lowest EER% for all tests performed. Taken together, the system effectiveness is not only affected by the approaches used, but also by the database size MENAESC and its characteristics. Moreover, it is impacted by the proficiency of the Arabic speakers of English and the influence of their mother tongue.

[1] Abed A. and Guerti M., “HMM/GMM Classification for Articulation Disorder Correction among Algerian Children,” The International Arab Journal of Information Technology, vol. 13, no. 4, pp. 449-455, 2016.

[2] Alghamdi M., Alhargan F., Alkanhal M., Alkhairy A., Eldesouki M., and Alenazi A., “Saudi Accented Arabic Voice Bank,” Journal of King Saud University-Computer and Information Sciences, vol. 20, pp. 45-62, 2008.

[3] Arslan L. and Hansen J., “Language Accent Classification in American English,” Speech Communication, vol. 18, no. 4, pp. 353-367, 1996.

[4] Bahari M., Saeidi R., Van hamme H., and Van Leeuwen D., “Accent Recognition Using I- Vector, Gaussian Mean Super Vector and Gaussian Posterior Probability Super Vector for Spontaneous Telephone Speech,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 7344-7348, 2013.

[5] Blackburn C., Vonwiller J., and King R., “Automatic Accent Classification Using Artificial Neural Networks,” in Proceedings of 3rd European Conference on Speech Communication and Technology, Berlin pp. 1241-1244, 1993.

[6] Chellali S., Al-Maadeed S., Kenai O., Ahfir M., and Hidouci W., “Construction of Audio Corpus of Nonnative English Dialects-Arabs Speakers-,” in Proceedings of the 4th International Conference on Artificial Intelligence and Pattern Recognition, Poland, pp. 98-102, 2017.

[7] Choueiter G., Zweig G., and Nguyen P., “An Empirical Study of Automatic Accent Classification,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, pp. 4265- 4268, 2008.

[8] Cieri C., Miller D., and Walker K., “The Fisher corpus: A Resource for the Next Generations of Speech-to-Text,” in Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, pp. 69-71, 2004.

[9] D’Arcy S., Russell M., Browning S., and Tomlinson M., “The Accents of the British Isles (ABI) Corpus,” in Proceeding of Modélisations Pour Identification des Langues, Paris, pp. 115- 119, 2005.

[10] De Marco A. and Cox S., “Iterative Classification of Regional British Accents in I- Vector Space,” in Proceedings of Symposium on Machine Learning in Speech and Language Processing MLSLP, USA, pp. 1-4, 2012.

[11] De Wet F., Louwa P., and Niesler T., “Human and automatic Accent Identification of Nguni and Sotho Black South African English,” South 74 The International Arab Journal of Information Technology, Vol. 18, No. 1, January 2021 African Journal of Science, vol. 103, no. 3, pp. 159-164, 2007.

[12] Garofolo J., Lamel L., Fisher W., Fiscus J., and Pallett D., “David S DARPA TIMIT Acoustic- Phonetic Continuous Speech Corpus CD-ROM, NIST Speech disc 1-1.1,” NASA STI/Recon Technical Report, 1993.

[13] Ge Z., “Improved Accent Classification Combining Phonetic Vowels with Acoustic Features,” in Proceedings of 8th International Congress on Image and Signal Processin, Shenyang, pp. 1204-1209, 2015.

[14] Habash N., “Introduction to Arabic Natural Language Processing,” Synthesis Lectures on Human Language Technologies, vol. 3, no. 1, pp.1-87, 2010.

[15] Hanani A., Russell M., and Carey M., “Human and Computer Recognition of Regional Accents and Ethnic Groups from British English Speech,” Computer Speech and Language, vol. 27, no. 1, pp. 59-74, 2013.

[16] Hansen J. and Arslan L., “Foreign Accent Classification Using Source Generator Based Prosodic Features,” in Proceedings of International Conference Acoustic, Speech Signal Process, Detroit, pp. 836-839, 1995.

[17] Hautamaki V., Siniscalchi S., Behravan H., Salerno V., and Kukanov I., “Boosting Universal Speech Attributes Classification with Deep Neural Network for Foreign Accent Characterization,” in Proceedings of 16th Annual Conference of the International Speech Communication Association, Dresden, pp. 408- 412, 2015.

[18] Heuvel H., Choukri K., Gollan C., Moreno A., and Mostefa D., “TC-STAR: New Language Resources for ASR and SLT Purposes,” in Proceedings of the 5th International Conference on Language Resources and Evaluation, Genoa, pp. 2570-2573, 2006.

[19] Humphries J., Woodland P., and Pearce D., “Using Accent-Specific Pronunciation Modelling for Robust Speech Recognition,” in Proceedings of the 4th International Conference on Spoken Language, Philadelphia, pp. 2324-2327, 1996.

[20] Jiao Y., Tu M., Berisha V., and Liss J., “Accent Identification by Combining Deep Neural Networks and Recurrent Neural Networks Trained on Long and Short-Term Features,” in Proceedings of Interspeech Native Language Sub-Challenge, San Francisco, pp. 2388-2392, 2016.

[21] Jurafsky D., Wooters C., Tajchman G., Segal J., Stolcke A., Fosler E., and Morgan N., “The Berkeley Restaurant Project,” in Proceedings of the International Conference on Spoken Language Processing, Yokohama, pp. 2139- 2142, 1994.

[22] Kamper H. and Niesler T., “Multi-Accent Speech Recognition of Afrikaans, Black and White Varieties of South African English,” in Proceedings of 12th Annual Conference of the International Speech Communication Association, Florence, pp. 3189-3192, 2011.

[23] Kamper H., Mukanya F., and Niesler T., “Acoustic Modelling of English Accented and Afrikaans Accented South African English,” in Proceedings of PRASA, Stellenbosch, pp. 117- 122, 2010.

[24] Kumpf K. and King R., “Automatic Accent Classification of Foreign Accented Australian English Speech,” in Proceeding of 4th International Conference on Spoken Language Processing, Philadelphia, pp. 1740-1743, 1996.

[25] Lander T., “CSLU: Foreign Accented English Release 1.2,” Linguistic Data Consortium, Philadelphia: Linguistic Data Consortium, 2007.

[26] Ma Z. and Fokoué E., “A Comparison of Classifiers in Performing Speaker Accent Recognition Using Mfccs,” Open Journal of Statistics, vol. 4, no. 4, pp. 258-266, 2014.

[27] Martin R., “Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics,” IEEE Transactions on Speech and Audio Processing, vol. 9, no. 5, pp. 504-512, 2001.

[28] Minematsu N., Tomiyama Y., Yoshimoto K., Shimizu K., Nakagawa S., Dantsuji M., and Makino S., “Development of English Speech Database Read by Japanese to Support CALL Research,” Intelligent Control and Automation, pp. 577-560, 2004.

[29] Nguyen P., Tran D., Huang X., and Sharma D., “Australian Accent-Based Speaker Classification,” in Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Phuket, pp. 416- 419, 2010.

[30] Patel I., Kulkarni R., and Yarravarapu S., “Automatic Non-Native Dialect and Accent Voice Detection of South Indian English,” Advances in Image and Video Processing, vol. 5, no. 1, pp. 39-48, 2017.

[31] Pedersen C. and Diederich J., “Accent Classification Using Support Vector Machines,” in Proceedings of the 6th IEEE/ACIS International Conference on Computer and Information Science, Melbourne, pp. 444-449, 2007.

[32] Raab M., Gruhn R., and Noeth E., “Non-Native Speech Databases,” in Proceedings of Automatic Speech Recognition and Understandin, Kyoto, pp. 413-418, 2007.

[33] Reynolds D., Quatieri T., and Dunn R., “Speaker Verification Using Adapted Gaussian Mixture Models,” Digital Signal Processing, vol. 10, no. Middle Eastern and North African English Speech Corpus (MENAESC): Automatic ... 75 1-3, pp. 19-41, 2000.

[34] Sadjadi S., Slaney M., and Heck L., “MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker-Recognition Research,” Microsoft Research Technical Report, 2013.

[35] Sakhnov K., Verteletskay E., and Simak B., “Approach for Energy-Based Voice Detector with Adaptive Scaling Factor,” IAENG International Journal of Computer Science, vol. 36, no. 4, 2009.

[36] Sakhnov K., Verteletskaya E., and Šimák B., “Dynamical Energy-Based Speech/Silence Detector for Speech Enhancement Applications,” in Proceedings of the World Congress on Engineering, London, 2009.

[37] Segura J., Ehrette T., Potamianos A., Fohr D., Illina I., Breton P., Clot V., Gemello R., Matassoni M., and Maragos P., “The HIWIRE Database, A Noisy and Non-Native English Speech Corpus for Cockpit Communication,”, 2007.

[38] Singer E., Torres-Carrasquillo P., Gleason T., Campbell W., and Reynolds D., “Acoustic, Phonetic, Discriminative Approaches to Automatic Language Identification,” in Proceedings of 8th European Conference on Speech Communication and Technology, Geneva, pp. 1345-1348, 2003.

[39] Tang H. and Ghorbani A., “Accent Classification Using Support Vector Machine and Hidden Markov Model,” in Proceedings of Conference of the Canadian Society for Computational Studies of Intelligence, Halifax, pp. 629-631, 2003.

[40] Teixeira C., Trancoso I., and Serralheiro A., “Recognition of Non-Native Accents,” in Proceedings of 5th European Conference on Speech Communication and Technology, Rhodes, pp. 2375-2378, 1997.

[41] Teixeira C., Trancoso I., and Serralheiro A., “Accent Identification,” in Proceedings of 4th International Conference on Spoken Language Processing, Philadelphia, pp.1784-1787, 1996.

[42] Visceglia T., Tseng C., Kondo M., Meng H., and Sagisaka Y., “Phonetic Aspects of Content Design in AESOP (Asian English Speech Corpus Project),” in Proceedings of Oriental COCOSDA International Conference on Speech Database and Assessments, Urumqi, pp. 60-65, 2009.

[43] Wang H., Leung C., Lee T., Ma B., and Li H., “Shifted-Delta MLP Features for Spoken Language Recognition,” IEEE Signal Processing Letters, vol. 20, no. 1, pp. 15-18, 2013.

[44] Weinberger S., Speech Accent Archive,, Last Visited, 2019.

[45] Yusnita M., Paulraj M., Yaacob S., Abu Bakar S., and Shahriman A., “Malaysian English Accents Identification Using LPC and Formant Analysis,” in Proceedings of IEEE International Conference on Control System, Computing and Engineering, Penang, pp. 472-476, 2011.

[46] Yusnita M., Paulraj M, Sazali Y., Yusuf R., and Shahriman A., “Analysis of Accent-Sensitive Words in Multi-Resolution Mel-Frequency Cepstral Coefficients for Classification of Accents in Malaysian English,” International Journal of Automotive and Mechanical Engineering, vol. 7, pp. 1053-1073, 2013. 76 The International Arab Journal of Information Technology, Vol. 18, No. 1, January 2021 Sara Chellali is a Ph.D. student at “Ecole nationale Supérieure d'Informatique (ESI, ex INI) ”, Algiers, Algeria. She received the Magister degree in Computer Sciences and the Master degree in Didactics of French as a foreign language from University of Amar Telidji, Laghouat, Algeria, and Engineer degree in Computer Systems from ESI. She is currently working as teacher- researcher in the “École Normale Supérieure de Laghouat ENSL”, Laghouat, Algeria. Her research is in language processing with particular emphasis on identification of accent/dialect, speech processing, deep learning, machine learnnig, pattern recognition and didactic of sciences (Mathematics). Somaya Al-Maadeed is a professor at Computer Science and Engineering Department at Qatar University. She received the Ph.D. degree in computer science from Nottingham, U.K., in 2004. She supervised students through research projects related to pattern recognition and Arabic recognition. She is currently the Head of the Computer Science Department, Qatar University. She is also the Coordinator of the Computer Vision Research Group, Qatar University. She enjoys excellent collaboration with national and international institutions, and industry. She is a principal investigator of several funded research projects generating approximately five million dollars in the last years. She published extensively in computer vision and pattern recognition and delivered workshops on teaching programming for undergraduate students. She attended workshops related to higher education strategy, assessment methods, and interactive teaching. In 2015, she was elected as the IEEE Chair of the Qatar Section. She and her team were the recipient of the best performance at ICDAR 2011 and ICDAR 2015 signature verification. Ouassila Kenai is Ph.D. student in speech communication in USTHB, Algiers, Algeria. She has got Magister degree in automatic speech processing from the Scientific and Technical Research Center for the Development of the Arabic Language CRSTDLA, Algeria and Engineer degree in communication (Electronics) from USTHB, Algeria. She is currently teacher at the institute of trades performing arts and audiovisual ISMAS, Algiers, Algeria. She also works as a teacher and consultant in the audiovisual field in several state and private establishments. Her research interests include speaker recognition -where she presented a new architecture based VAD for speaker diarization/detection systems (it was the subject of a published article)-, artificial intelligent, bioinformatics, speech and language processing, and forensic recognition (She published several conference papers on it). Maamar Ahfir received his “Ingeniorat” in Electronics and “Magister” in Optoelectronics, both from the University of Blida (Algeria), respectively in 1990 and 1997. He holds the E-science Doctorate degree in Electronics since 2008 from the “Ecole Nationale Polytechnique (ENP)” of Algiers (Algeria). He was Lecturer in the University of Laghouat (Algeria) from 1997 to 2019 and Head of the Informatics Department of the Technical College of Jizane (Saudi Arabia) from 2001 to 2002. He is currently Associate Professor at the University of Médéa (Algeria) since 2019 and Visiting Researcher to Applied DSP and VLSI Systems Laboratory of the University of Westminster, London, UK, since 2004.His areas of interest include room acoustics, speech and human heart sounds (Phonocardiogram) processing. Walid Hidouci is a professor in computer science at “Ecole nationale Superieure d'Informatique: ESI” in Algiers. He leads the "Advanced Database Systems" team in the LCSI research laboratory. His main topics of interests are: database systems, data structures, artificial intelligence, operating systems and parallel programming.