Optimizing Machine Learning-based Sentiment Analysis Accuracy in Bilingual Sentences via Preprocessing Techniques
With the recent advances in Natural Language Processing (NLP) technologies, the ability to process, analyze, and understand sentiments expressed in user-generated reviews regarding the products and services they use is becoming more achievable. Despite the latest improvements in this field, little attention has been given to multilingual sentiment analysis. In this article, a framework is presented for sentiment analysis in Arabic and English using two datasets (ASTD, AJGT) along with their translations. Preprocessing techniques, including n-gram tokenization, Arabic-specific stop words removal, punctuation removal, removing repeating characters, parts of speech tagging, stemming, and lemmatization, are applied. Four machine learning classifiers, namely Logistic Regression (LR), Random Forest (RF), Naive Bayes (NB), and Support Vector Machine (SVM), are employed. We highlight existing specialized research in sentiment analysis for Arabic and English, as well as the employed techniques in each. Furthermore, the impact of preprocessing on accuracy results for both Arabic and English languages is investigated through separate experiments for each step. Experimental results on the ASTD dataset demonstrate close performance across classifiers, with the SVM classifier achieving the highest accuracy of 70%. However, the accuracy varied when using the AJGT dataset, with the NB classifier yielding the best accuracy at approximately 87%. The experiments on the translated datasets from Arabic to English did not exhibit significant differences, although some features performed slightly better using the Arabic datasets.
[1] Abo M., Shah N., Balakrishnan V., and Abdelaziz A., “Sentiment Analysis Algorithms: Evaluation Performance of the Arabic and English Language,” IEEE Expert, pp. 1-5, 2018. doi:10.1109/ICCCEEE.2018.8515844.
[2] Al Shamsi A., Bayari R., and. Salloum S., “Sentiment Analysis in English Texts,” Advances in Science Technology and Engineering Systems Journal, vol. 5, pp. 1683-1689, 2021. Doi:10.25046/aj0506200.
[3] Alayba A., Palade V., England M., and Iqbal R., “Improving Sentiment Analysis in Arabic Using Optimizing Machine Learning-based Sentiment Analysis Accuracy in ... 269 Word Representation,” in Proceedings of the IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition, London, pp. 13- 18, 2018. doi: 10.1109/ASAR.2018.8480191
[4] Alayba A., Palade V., England M., and Iqbal R., “A Combined CNN and LSTM Model for Arabic Sentiment Analysis,” in Proceedings of Machine Learning and Knowledge Extraction: 2nd IFIP TC5, TC8/WG8.4, 8.9, TC12/WG12.9 International Cross-Domain Conference, CD- MAKE, Hamburg, pp. 179-191, 2018. https://doi.org/10.1007/978-3-319-99740-7_12
[5] Al-Azani S. and El-Alfy E., “Hybrid Deep Learning for Sentiment Polarity Determination of Arabic Microblogs,” International Conference on Neural Information Processing, Guangzhou, pp. 491-500, 2017. https://doi.org/10.1007/978-3- 319-70096-0_51
[6] Ali N., Hamid M., and Youssif A., “Sentiment Analysis for Movies Reviews Dataset Using Deep Learning Models,” International Journal of Data Mining and Knowledge Management Process, vol. 9, no. 2/3, pp. 19-27, 2019. https://ssrn.com/abstract=3403985
[7] Almaghrabi M. and Chetty G., “Improving Sentiment Analysis in Arabic and English Languages by Using Multi-Layer Perceptron Model (MLP),” in Proceedings of IEEE 7th International Conference on Data Science and Advanced Analytics, Sydney, pp. 745-746, 2020. doi: 10.1109/DSAA49011.2020.00095
[8] Alomari K., ElSherif H., and Shaalan K., “Arabic Tweets Sentimental Analysis Using Machine Learning,” in Proceedings of International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Arras, pp. 602-610, 2017. DOI: 10.1007/978-3-319- 60042-0_66
[9] Alrefai M., Faris H., and Aljarah I., “Sentiment Analysis for Arabic Language: A Brief Survey of Approaches and Techniques,” International Journal of Advanced Science and Technology, vol. 119, pp. 13-24, 2018. DOI:10.14257/ijast.2018.119.02
[10] Baly R., Badaro G., El-Khoury G., Moukalled R., and Aoun R., “A Characterization Study of Arabic Twitter Data with A Benchmarking for State-Of- The-Art Opinion Mining Models,” in Proceedings of the 3rd Arabic Natural Language Processing Workshop, EACL, Valencia, pp. 110-118, 2017. DOI:10.18653/v1/W17-1314
[11] Barhoumi A., Aloulou C., Camelin N., Estève Y., and Belguith L., “Arabic Sentiment Analysis: An Empirical Study of Machine Translation's Impact,” in Proceedings of Language Processing and Knowledge Management International Conference, Sfax, pp. 1-11, 2018. https://hal.science/hal-02042313
[12] Başarslan M. and Kayaalp F., “Sentiment Analysis with Machine Learning Methods on Social Media,” Advances in Distributed Computing and Artificial Intelligence Journal, vol. 9, pp. 5-15, 2021. DOI:10.14201/ADCAIJ202093515
[13] Boudad N., Faizi R., Rachid O., and Chiheb R., “Sentiment Analysis in Arabic: A review of the Literature,” Ain Shams Engineering Journal, vol. 9, no. 4, pp. 2479-2490, 2017. https://doi.org/10.1016/j.asej.2017.04.007
[14] Dahou A., Xiong S., Zhou J., Haddoud M., Duan P., “Word Embeddings and Convolutional Neural Network for Arabic Sentiment Classification,” in Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, pp. 2418-2427, 2016. https://aclanthology.org/C16-1228.pdf
[15] El-Awady R., Barakat S., and Elrashidy N., “Sentiment Analysis for Arabic and English Datasets,” International Journal of Intelligent Computing and Information Science, vol. 15, no. 1, 2015. DOI:10.21608/ijicis.2015.10911
[16] Elfaik H. and Nfaoui E., “Deep bidirectional LSTM Network Learning-Based Sentiment Analysis for Arabic Text,” Journal of Intelligent Systems, vol. 30, no. 1, pp. 395-412, 2020. DOI:10.1515/jisys-2020-0021
[17] El-Masri M., Berardinelli N., and Ahmed H., “Successes and challenges of Arabic Sentiment Analysis Research: A Literature Review,” Social Network Analysis and Mining, vol. 7, no. 54, 2017. https://doi.org/10.1007/s13278-017-0474-x
[18] Fouad M., Mahany A., Aljohani N., Abbasi R., and Hassan S., “ArWordVec: Efficient Word Embedding Models for Arabic Tweets,” Soft Computing, vol. 24, 2020. https://doi.org/10.1007/s00500-019-04153-6
[19] Hawalah A., “A Framework for Arabic Sentiment Analysis Using Machine Learning Classifiers,” Journal of Theoretical and Applied Information Technology, 2019. https://hal.science/hal- 02300717/file/Framework-arabic.pdf
[20] Heikal M., Torki M., and El-Makky N., “Sentiment Analysis of Arabic Tweets Using Deep Learning,” Procedia Computer Science, vol. 142, pp. 114-122, 2018. https://doi.org/10.1016/j.procs.2018.10.466
[21] Maree M., Eleyat M., Rabayah S., and Belkhatir M., “A Hybrid Composite Features Based Sentence Level Sentiment Analyzer,” IAES International Journal of Artificial Intelligence, vol. 12, no. 1, pp. 284-294, 2023. http://doi.org/10.11591/ijai.v12.i1.pp284-294
[22] Mohammad S., Salameh M., and Kiritchenko S., “Sentiment Lexicons for Arabic Social Media,” in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC, pp. 33-37, 2016. https://aclanthology.org/L16- 270 The International Arab Journal of Information Technology, Vol. 21, No. 2, March 2024 1006.pdf
[23] Nabil M., Aly M., and Atiya A., “ASTD: Arabic Sentiment Tweets Dataset,” in Proceedings of the Empirical Methods in Natural Language Processing Conference, Lisbon, pp. 2515-2519, 2015. DOI:10.18653/v1/D15-1299
[24] Oussous A., Benjelloun F., Lahcen A., and Belfkih S., “ASA: A Framework for Arabic Sentiment Analysis,” Journal of Information Science, vol. 46, no. 4, pp. 544-559, 2020. DOI: 10.1177/0165551519849516
[25] Soliman A., Eissa K., and El-Beltagy S., “AraVec: a Set of Arabic Word Embedding Models for Use in Arabic NLP,” Procedia Computer Science, vol. 117, pp. 256-265, 2017. https://doi.org/10.1016/j.procs.2017.10.117 Mohammed Maree received the Ph.D. degree in Information Technology from Monash University. He has published articles in various high-impact journals and conferences, such as ICTAI, Knowledge-Based Systems, IEEE Access, Behaviour and Information Technology, Journal on Computing and Cultural Heritage, Information Development and the Journal of Information Science. He is also a Committee Member/Reviewer of several conferences and journals, such as the World Wide Web, Computational Intelligence, and Expert Systems journals. He has supervised a number of Master’s and PhD students in the fields of knowledge engineering, data analysis, information retrieval, natural language processing, and hybrid intelligent systems. He began his career as the Manager of Research and Development at gSoft Technology Solution Inc. Then, he worked as the Director of Research and QA with Dimensions Consulting Company. Subsequently, he joined the Faculty of Engineering and Information Technology (EIT), Arab American University, Palestine (AAUP), as a full-time Lecturer. From September 2014 to August 2016, he was the Head of the Multimedia Technology Department, and from September 2016 to August 2018, he was the Head of the Information Technology Department. Subsequently, he was appointed as the Assistant to the Vice President for Academic Affairs at the Arab American University from August 2021 to July 2023. In addition to his work at AAUP, he worked as a Consultant for SocialDice and Dimensions Consulting companies. Dr. Mohammed is currently an Associate Professor of Information Technology and the Dean of Faculty of Information Technology at the Arab American University. Mujahed Eleyat is an assistant professor of computer science at the Arab American University (AAUP) in Palestine. He obtained a Ph.D scholarship in Norway and received his Phd from Norwegian University of Science and Technology in 2014. During his Ph.D study, he worked as an employee in a Norwegian company called miraim as for three years and did research in the field of high performance computation and gas flow networks. Before that, he obtained a scholarship from USA, called the presidential scholarship, to study at the university in Arkansas where he received his Master in Computer Science. In addition to teaching at AAUP for more than 10 years, Dr. Eleyat had also been the head of the Department of Computer Systems Engineering for 6 years and the assistant of the academic vice president for one year. In addition, He has been the Dean of the Faculty of Engineering and Information Technology since august 2019. Moreover, Dr. Eleyat is a member of high performance and embedded architecture and compilation (Hipeac) and his areas of expertise include High Performance Computing, Embedded Systems, and Natural Language Processing. Enas Mesqali received a B.S. degree in an Information and communications technology from Al- Quds Open University, Jenin, Palestine, in 2013 . She is currently pursuing a master degree in computer science at the Arab American University, Jenin, Palestine (AAUP).