The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Optimizing Machine Learning-based Sentiment Analysis Accuracy in Bilingual Sentences via Preprocessing Techniques

With the recent advances in Natural Language Processing (NLP) technologies, the ability to process, analyze, and understand sentiments expressed in user-generated reviews regarding the products and services they use is becoming more achievable. Despite the latest improvements in this field, little attention has been given to multilingual sentiment analysis. In this article, a framework is presented for sentiment analysis in Arabic and English using two datasets (ASTD, AJGT) along with their translations. Preprocessing techniques, including n-gram tokenization, Arabic-specific stop words removal, punctuation removal, removing repeating characters, parts of speech tagging, stemming, and lemmatization, are applied. Four machine learning classifiers, namely Logistic Regression (LR), Random Forest (RF), Naive Bayes (NB), and Support Vector Machine (SVM), are employed. We highlight existing specialized research in sentiment analysis for Arabic and English, as well as the employed techniques in each. Furthermore, the impact of preprocessing on accuracy results for both Arabic and English languages is investigated through separate experiments for each step. Experimental results on the ASTD dataset demonstrate close performance across classifiers, with the SVM classifier achieving the highest accuracy of 70%. However, the accuracy varied when using the AJGT dataset, with the NB classifier yielding the best accuracy at approximately 87%. The experiments on the translated datasets from Arabic to English did not exhibit significant differences, although some features performed slightly better using the Arabic datasets.

[1] Abo M., Shah N., Balakrishnan V., and Abdelaziz A., “Sentiment Analysis Algorithms: Evaluation Performance of the Arabic and English Language,” IEEE Expert, pp. 1-5, 2018. doi:10.1109/ICCCEEE.2018.8515844.

[2] Al Shamsi A., Bayari R., and. Salloum S., “Sentiment Analysis in English Texts,” Advances in Science Technology and Engineering Systems Journal, vol. 5, pp. 1683-1689, 2021. Doi:10.25046/aj0506200.

[3] Alayba A., Palade V., England M., and Iqbal R., “Improving Sentiment Analysis in Arabic Using Word Representation,” in Proceedings of the IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition, London, pp. 13- 18, 2018. doi: 10.1109/ASAR.2018.8480191

[4] Alayba A., Palade V., England M., and Iqbal R., “A Combined CNN and LSTM Model for Arabic Sentiment Analysis,” in Proceedings of Machine Learning and Knowledge Extraction: 2nd IFIP TC5, TC8/WG8.4, 8.9, TC12/WG12.9 International Cross-Domain Conference, CD- MAKE, Hamburg, pp. 179-191, 2018. https://doi.org/10.1007/978-3-319-99740-7_12

[5] Al-Azani S. and El-Alfy E., “Hybrid Deep Learning for Sentiment Polarity Determination of Arabic Microblogs,” International Conference on Neural Information Processing, Guangzhou, pp. 491-500, 2017. https://doi.org/10.1007/978-3- 319-70096-0_51

[6] Ali N., Hamid M., and Youssif A., “Sentiment Analysis for Movies Reviews Dataset Using Deep Learning Models,” International Journal of Data Mining and Knowledge Management Process, vol. 9, no. 2/3, pp. 19-27, 2019. https://ssrn.com/abstract=3403985

[7] Almaghrabi M. and Chetty G., “Improving Sentiment Analysis in Arabic and English Languages by Using Multi-Layer Perceptron Model (MLP),” in Proceedings of IEEE 7th International Conference on Data Science and Advanced Analytics, Sydney, pp. 745-746, 2020. doi: 10.1109/DSAA49011.2020.00095

[8] Alomari K., ElSherif H., and Shaalan K., “Arabic Tweets Sentimental Analysis Using Machine Learning,” in Proceedings of International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Arras, pp. 602-610, 2017. DOI: 10.1007/978-3-319- 60042-0_66

[9] Alrefai M., Faris H., and Aljarah I., “Sentiment Analysis for Arabic Language: A Brief Survey of Approaches and Techniques,” International Journal of Advanced Science and Technology, vol. 119, pp. 13-24, 2018. DOI:10.14257/ijast.2018.119.02

[10] Baly R., Badaro G., El-Khoury G., Moukalled R., and Aoun R., “A Characterization Study of Arabic Twitter Data with A Benchmarking for State-Of- The-Art Opinion Mining Models,” in Proceedings of the 3rd Arabic Natural Language Processing Workshop, EACL, Valencia, pp. 110-118, 2017. DOI:10.18653/v1/W17-1314

[11] Barhoumi A., Aloulou C., Camelin N., Estève Y., and Belguith L., “Arabic Sentiment Analysis: An Empirical Study of Machine Translation's Impact,” in Proceedings of Language Processing and Knowledge Management International Conference, Sfax, pp. 1-11, 2018. https://hal.science/hal-02042313

[12] Başarslan M. and Kayaalp F., “Sentiment Analysis with Machine Learning Methods on Social Media,” Advances in Distributed Computing and Artificial Intelligence Journal, vol. 9, pp. 5-15, 2021. DOI:10.14201/ADCAIJ202093515

[13] Boudad N., Faizi R., Rachid O., and Chiheb R., “Sentiment Analysis in Arabic: A review of the Literature,” Ain Shams Engineering Journal, vol. 9, no. 4, pp. 2479-2490, 2017. https://doi.org/10.1016/j.asej.2017.04.007

[14] Dahou A., Xiong S., Zhou J., Haddoud M., Duan P., “Word Embeddings and Convolutional Neural Network for Arabic Sentiment Classification,” in Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, pp. 2418-2427, 2016. https://aclanthology.org/C16-1228.pdf

[15] El-Awady R., Barakat S., and Elrashidy N., “Sentiment Analysis for Arabic and English Datasets,” International Journal of Intelligent Computing and Information Science, vol. 15, no. 1, 2015. DOI:10.21608/ijicis.2015.10911

[16] Elfaik H. and Nfaoui E., “Deep bidirectional LSTM Network Learning-Based Sentiment Analysis for Arabic Text,” Journal of Intelligent Systems, vol. 30, no. 1, pp. 395-412, 2020. DOI:10.1515/jisys-2020-0021

[17] El-Masri M., Berardinelli N., and Ahmed H., “Successes and challenges of Arabic Sentiment Analysis Research: A Literature Review,” Social Network Analysis and Mining, vol. 7, no. 54, 2017. https://doi.org/10.1007/s13278-017-0474-x

[18] Fouad M., Mahany A., Aljohani N., Abbasi R., and. Hassan S., “ArWordVec: Efficient Word Embedding Models for Arabic Tweets,” Soft Computing, vol. 24, 2020. https://doi.org/10.1007/s00500-019-04153-6

[19] Hawalah A., “A Framework for Arabic Sentiment Analysis Using Machine Learning Classifiers,” 270 The International Arab Journal of Information Technology, Vol. 21, No. 2, March 2024 Journal of Theoretical and Applied Information Technology, 2019. https://hal.science/hal- 02300717/file/Framework-arabic.pdf

[20] Heikal M., Torki M., and El-Makky N., “Sentiment Analysis of Arabic Tweets Using Deep Learning,” Procedia Computer Science, vol. 142, pp. 114-122, 2018. https://doi.org/10.1016/j.procs.2018.10.466

[21] Maree M., Eleyat M., Rabayah S., and Belkhatir M., “A Hybrid Composite Features Based Sentence Level Sentiment Analyzer,” IAES International Journal of Artificial Intelligence, vol. 12, no. 1, pp. 284-294, 2023. http://doi.org/10.11591/ijai.v12.i1.pp284-294

[22] Mohammad S., Salameh M., and Kiritchenko S., “Sentiment Lexicons for Arabic Social Media,” in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC, pp. 33-37, 2016. https://aclanthology.org/L16- 1006.pdf

[23] Nabil M., Aly M., and Atiya A., “ASTD: Arabic Sentiment Tweets Dataset,” in Proceedings of the Empirical Methods in Natural Language Processing Conference, Lisbon, pp. 2515-2519, 2015. DOI:10.18653/v1/D15-1299

[24] Oussous A., Benjelloun F., Lahcen A., and Belfkih S., “ASA: A Framework for Arabic Sentiment Analysis,” Journal of Information Science, vol. 46, no. 4, pp. 544-559, 2020. DOI: 10.1177/0165551519849516

[25] Soliman A., Eissa K., and El-Beltagy S., “AraVec: a Set of Arabic Word Embedding Models for Use in Arabic NLP,” Procedia Computer Science, vol. 117, pp. 256-265, 2017. https://doi.org/10.1016/j.procs.2017.10.117