Speech-Based Techniques for Emotion Detection in Natural Arabic Audio Files
Emotion detection is one of the greatest challenges of Natural Language Processing (NLP). Often referred to as emotion recognition, it is the process of identifying a person’s various feelings or emotions such as: happiness, sadness, or anger. Emotions are a strong feeling regarding a human's situation or relation with others. They are the mental states that affect human behavior and interactions. In this paper, we propose an approach for emotion detection in audio files, focusing on a natural Arabic audio dataset and applying several Machine Learning (ML) classifiers: Sequential Minimal Optimization (SMO), Random Forest (RF), K-Nearest Neighbours (KNN), and Simple Logistic (SL). The classification experiments were conducted using sixteen acoustic feature sets. Many acoustic features were explored including Mel Frequency Cepstral Coefficient (MFCC), Mel spectrogram, spectral contrast, Zero Crossing Rate (ZCR), and Intensity. The experimental results show that SMO and SL classifiers achieved the highest overall accuracy 83.82% when using combinations of all acoustic features (MFCC, Mel spectrogram, Spectral contrast, ZCR and intensity). Additionally, The RF and KNN classifiers yielded Competitive results, with accuracies of 81.71% and 77.34%, respectively. These results suggest that combining multiple acoustic features significantly enhances the performance of emotion detection models, especially for complex emotions in natural Arabic audio datasets.
[1] Abdel-Hamid L., “Egyptian Arabic Speech Emotion Recognition Using Prosodic, Spectral and Wavelet Features,” Speech Communication, vol. 122, pp. 19-30, 2020. https://doi.org/10.1016/j.specom.2020.04.005
[2] Ahel Al Himmeh, Ahel Al Himmeh Channel, https://www.youtube.com/@HemmehJU/videos, Last Visited, 2024.
[3] Akhbar Al Nar, Akhbar Al Nar News Channel, https://www.youtube.com/@3lnar.newstv514/vid eos, Last Visited, 2024.
[4] AL Arabiya Arabic, Al Moqabala Al Arabiya, https://www.youtube.com/watch?v=N2zRSs4f8c A&list=PLOFBlNCrlrW5FRUDcFQ_K7CzRgW Ju3QQo&index=81, Last Visited, 2024.
[5] Al Araby, Al Araby Channel, https://www.youtube.com/c/AlArabyAr, Last Visited, 2024.
[6] Al Hadath, Al Hadath Channel, https://www.youtube.com/c/AlHadath, Last Visited, 2024.
[7] Al Jazeera Arabic, Mozaein, https://www.youtube.com/watch?v=8ap6vdC0hrc &list=PLJyrzEL- wvYKmmASkEQKjs7_3RGab2Bp3, Last Visited, 2024.
[8] Al Jazeera Arabic, Podcast Al Jazeera, https://www.youtube.com/watch?v=Dz8VL_8hV Xk&list=PLJyrzEL-wvYLgvwn54g28ML- Y4fxEeO2O&index=10, Last Visited, 2024.
[9] Al Kofiya Channel, Al Kofiya Channel, https://www.youtube.com/c/alkofiyatv, Last Visited, 2024.
[10] Al Qalah News, Al Qalah News Channel, https://www.youtube.com/@Alqalah_news/video s, Last Visited, 2024.
[11] Al Quds Today, Al Quds Today Channel, https://www.youtube.com/@- alqudstoday2300/videos, Last Visited, 2024.
[12] Al Quds Today, Al Quds Today Channel, https://www.youtube.com/channel/UCpZa_lVdcx 0uRcATZi4CCGQ/videos, Last Visited, 2024.
[13] Al Wakeel News, Al Wakeel News Channel, https://www.youtube.com/@ALWAKEEL_NEW S, Last Visited, 2024.
[14] Al Watan Syria, Al Watan Syria Newspaper Channel, https://www.youtube.com/@Alwatan_Sy, Last Visited, 2024.
[15] Alamri H. and Alshanbari H., “Emotion Recognition in Arabic Speech from Saudi Dialect Corpus Using Machine Learning and Deep Learning Algorithms,” International Journal of Computer Science and Network Security, vol. 23, no. 8, pp. 1-10, 2023. https://doi.org/10.21203/rs.3.rs-3019159/v1
[16] Alghad TV, Alghad TV Channel, https://www.youtube.com/c/alghadtv, Last Visited, 2024.
[17] Al-Jazeera Arabic, The Opposite Direction, https://www.youtube.com/watch?v=W6KTfe2W 4n8&list=PLJyrzEL- wvYJS4SlCjPBUvdcQVBpaTy6B, Last Visited, 2024.
[18] Al-Jazeera Mubasher, Al Jazeera Mubasher Channel, https://www.youtube.com/c/ajmubasher, Last Visited, 2024.
[19] Aljuhani R., Alshutayri A., and Alahdal S., “Arabic Speech Emotion Recognition from Saudi Dialect Corpus,” IEEE Access, vol. 9, pp. 127081- 127085, 2021. https://doi.org/10.1109/ACCESS.2021.3110992
[20] Al-Salam TV, Salam TV Channel, https://www.youtube.com/@salamtv1, Last Speech-Based Techniques for Emotion Detection in Natural Arabic Audio Files 155 Visited, 2024.
[21] Asas Platform, Asas Platform for Tawjihi in Jordan, https://www.youtube.com/@ripple2024_live_/vid eos, Last Visited, 2024.
[22] Azmin S. and Dhar K., “Emotion Detection from Bangla Text Corpus Using Naïve Bayes Classifier,” in Proceedings of the 4th International Conference on Electrical Information and Communication Technology, Khulna, pp. 1-5, 2019. https://doi.org/10.1109/EICT48899.2019.9068797
[23] Baali M. and Ghenim N., “Emotion Analysis of Arabic Tweets Using Deep Learning Approach,” Journal of Big Data, vol. 6, no. 89, pp. 1-12, 2019. https://doi.org/10.1186/s40537-019-0252-x
[24] Dijlah TV, Dijlah TV Channel, https://www.youtube.com/@DijlahTv, Last Visited, 2024.
[25] Dooz Nablus, Dooz Nablus Channel, https://www.youtube.com/@DoozNablus/videos, Last Visited, 2024.
[26] Dorry M., Emotion Identification from Spontaneous Communication, Master Thesis, Addis Ababa University, College of Natural Sciences, 2016. http://thesisbank.jhia.ac.ke/id/eprint/6109
[27] Fajer TV, Fajer TV Channel, https://www.youtube.com/@fajertv/videos, Last Visited, 2024.
[28] Gunes H., Schuller B., Pantic M., and Cowie R., “Emotion Representation, Analysis and Synthesis in Continuous Space: A Survey,” in Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, Santa Barbara, pp. 827-834, 2011. https://doi.org/10.1109/FG.2011.5771357
[29] Han J. and Kamber M., Data Mining: Concepts and Techniques, San Francisco: Morgan Kaufmann, 2006.
[30] Horkous H. and Guerti M., “Recognition of Emotions in the Algerian Dialect Speech,” International Journal of Computing and Digital Systems, vol. 10, no. 1, pp. 245-254, 2021. http://dx.doi.org/10.12785/ijcds/100125
[31] JavaTPoint, K-Nearest Neighbour Algorithm, https://www.javatpoint.com/k-nearest-neighbor- algorithm-for-machine-learning, Last Visited, 2024.
[32] JavaTPoint, Random Forest Algorithm, https://www.javatpoint.com/machine-learning- random-forest-algorithm, Last Visited, 2024.
[33] Jemdia Agency, Jemdia Agency Channel, https://www.youtube.com/channel/UChAtATk7P T0tW3YQkCnXJKQ, Last Visited, 2024.
[34] Jiang D., Lu L., Zhang H., Tao J., Cai L., “Music Type Classification by Spectral Contrast Feature,” in Proceedings of the IEEE International Conference on Multimedia and Expo, Lausanne, pp. 113-116, 2002. https://doi.org/10.1109/ICME.2002.1035731
[35] Khabar News Agency, Khabar News Agency Channel, https://www.youtube.com/@khbrpress1/videos, Last Visited, 2024.
[36] Khalil A., Al-Khatib W., El-Alfy M., and Cheded L., “Anger Detection in Arabic Speech Dialogs,” in Proceedings of the International Conference on Computing Sciences and Engineering, Kuwait, pp 1-6, 2018. http://doi.org/10.1109/ICCSE1.2018.8374203
[37] Klaylat S., Osman Z., Hamandi L., and Zantout R., “Emotion Recognition in Arabic Speech,” in Proceedings of the Sensors Networks Smart and Emerging Technologies, Beirut, pp. 1-4, 2017. https://doi.org/10.1109/SENSET.2017.8125028
[38] Klaylat S., Osman Z., Hamandi L., and Zantout R., “Emotion Recognition in Arabic Speech,” Analog Integrated Circuits and Signal Processing, vol. 96, pp. 337-351, 2018. https://doi.org/10.1007/s10470-018-1142-4
[39] Landwehr N., Hall M., and Frank E., “Logistic Model Tree,” Machine Learning, vol. 59, no. 1, pp. 161-205, 2005. https://doi.org/10.1007/s10994-005-0466-3
[40] Lee C., “Toward Detecting Emotions in Spoken Dialogs,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 2, pp. 293-303, 2005. https://doi.org/10.1109/TSA.2004.838534
[41] Ma’an Network, Ma’an Network Channel, https://www.youtube.com/c/MaanNetwork, Last Visited, 2024.
[42] McFee B., Raffel C., Liang D., Ellis D., McVicar M., Battenberg E., and Nieto O., “Librosa: Audio and Music Signal Analysis in Python,” in Proceedings of the 14th Python in Science Conference, Texas, pp. 18-24, 2015. https://doi.org/10.25080/Majora-7b98e3ed-003
[43] Meddeb M., Karray H., and Alimi A., “Building and Analyzing Emotion Corpus of the Arabic Speech,” in Proceedings of the 1st International Workshop on Arabic Script Analysis and Recognition, Nancy, pp. 134-139, 2017. https://doi.org/10.1109/ASAR.2017.8067775
[44] Meftah A. and Zakariah M., “Arabic Speech Emotion Recognition Using KNN and KSUEmotions Corpus,” International Journal of Simulation: Systems, Science and Technology. vol. 21, no. 2, pp. 1-6, 2020. https://doi.org/10.5013/IJSSST.a.21.02.21
[45] Milton A., Roy S., and Selvi S., “SVM Scheme for Speech Emotion Recognition Using MFCC Feature,” International Journal of Computer Applications, vol. 69, no. 9, pp 34-39, 2013. https://doi.org/10.5120/11872-7667
[46] Mohammad O. and Elhadef M., “Arabic Speech 156 The International Arab Journal of Information Technology, Vol. 22, No. 1, January 2025 Emotion Recognition Method Based on LPC and PPSD,” in Proceedings of the 2nd International Conference on Computing, Automation and Knowledge Management, Dubai, pp. 31-36, 2021. http://doi.org/10.1109/ICCAKM50778.2021.935 7769
[47] Msdr News, Msdr News Channel, https://www.youtube.com/@MsdrNews, Last Visited, 2024.
[48] Mutasem Al Shesh, Mutasem Al Shesh Channel, https://www.youtube.com/@MutasemAlshesh93 7089, Last Visited, 2024.
[49] Nandwani P. and Verma R., “A Review on Sentiment Analysis and Emotion Detection from Text,’ Social Network Analysis and Mining, vol. 11, no. 81, pp. 1-19, 2021. https://doi.org/10.1007/s13278-021-00776-6
[50] Osama Al Kahlout, Video by Osama Al Kahlout, https://www.youtube.com/watch?v=aOMh4cAd M7Y&t=217s, Last Visited, 2024.
[51] Palestine TV, Palestine TV Channel, https://www.youtube.com/@palestinetvchannel/v ideos, Last Visited, 2024.
[52] Palo H. and Mohanty M., “Classification of Emotions of Angry and Disgust,” Smart CR Review, vol. 5, no. 3, pp. 151-158, 2015. https://doi.org/10.6029/smartce.2015.03.003
[53] Panacea Hu, Panacea Hu Channel, https://www.youtube.com/@panaceahu1198/vide os, Last Visited, 2024.
[54] Quds News Network, Quds News Network Channel, Last Visited, 2024. https://www.youtube.com/c/QudsNPS
[55] Roya TV, Roya TV Channel, https://www.youtube.com/@royatv, Last Visited, 2024.
[56] Rum Online News, Rum Online News Channel, https://www.youtube.com/@rumonlinenews, Last Visited, 2024.
[57] Salian B., Narvade O., Tambewagh R., and Bharne S., “Speech Emotion Recognition Using Time Distributed CNN and LSTM,” in Proceedings of the International Conference on Automation, Computing and Communication, Nerul, pp. 1-6, 2021. https://doi.org/10.1051/itmconf/20214003006
[58] Saraya News Agency, Saraya News Agency Channel, https://www.youtube.com/@sarayanewstv/videos , Last Visited, 2024.
[59] Snd News Agency, Snd News Agency Channel, https://www.youtube.com/channel/UCS1fWGLm wc0Fo4KSpKXo4FA, Last Visited, 2024.
[60] Swain M., Routray A., and Kabisatpathy P., “Databases, Features and Classifiers for Speech Emotion Recognition: A Review,” International Journal of Speech Technology, vol. 21, pp. 93- 120, 2018. https://doi.org/10.1007/s10772-018- 9491-z
[61] Tajalsir M., Andez S., and Mohammed, F., “ASERS-CNN: Arabic Speech Emotion Recognition System Based on CNN Model,” Signal and Image Processing: An International Journal, vol. 13, no. 1, pp. 45-53, 2022.
[62] Venkataramanan K. and Rajamohan H., “Emotion Recognition from Speech,” arXiv Preprint, vol. arxiv:1912.10458, 2019. https://doi.org/10.48550/arXiv.1912.10458
[63] Wadhwa M., Pandey P., and Gupta A., Speech Emotion Recognition (SER) through Machine Learning, Analytics Insight, 2021. https://www.analyticsinsight.net/search?q=Speec h%20Emotion%20Recognition%20(SER)%20thr ough%20Machine%20Learning, Last Visited, 2024.
[64] Wafa Agency, Wafa Agency Channel, https://www.youtube.com/c/WafaAgency/videos , Last Visited, 2024.
[65] Wattan News Agency, Wattan News Agency Channel, https://www.youtube.com/c/WattanNews/videos, Last Visited, 2024.
[66] Wikipedia, Sequential Minimal Optimization’, https://en.wikipedia.org/wiki/Sequential_minimal _optimization, Last Visited, 2024.
[67] Wikipedia, Student's t-test, https://en.wikipedia.org/wiki/Student%27s_t-test, Last Visited, 2024.
[68] World Population Review, Arabic Population, https://worldpopulationreview.com/country- rankings/arab-countries, Last Visited, 2024.
[69] Zinab R. and Majid M., “Emotion Recognition Based on EEG Signals in Response to Bilingual Music Tracks,” The International Arab Journal of Information Technology, vol. 18, no. 3, pp. 286- 296, 2021. https://doi.org/10.34028/iajit/18/3/4