The International Arab Journal of Information Technology (IAJIT)


Classifying Sentiment of Dialectal Arabic Reviews: A Semi-Supervised Approach

Omar Al-Harbi,
Arab Internet users tend to use dialectical words to express how they feel about products, services, and places. Although, dialects in Arabic derived from the formal Arabic language, it differs in several aspects. In general, Arabic sentiment analysis recently attracted lots of researchers’ attention. A considerable amount of research has been conducted in Modern Standard Arabic (MSA), but little work has focused on dialectal Arabic. The presence of the dialect in the Arabic texts made Arabic sentiment analysis is a challenging issue, due to it usually does not follow specific rules in writing or speaking system. In this paper, we implement a semi-supervised approach for sentiment polarity classification of dialectal reviews with the presence of Modern Standard Arabic (MSA). We combined dialectal sentiment lexicon with four classifying learning algorithm to perform the polarity classification, namely Support Vector Machines (SVM), Naïve Bayes (NB), Random Forest, and K-Nearest Neighbor (K-NN). To select the features with which the classifiers can perform the best, we used three feature evaluation methods, namely, Correlation-based Feature Selection, Principal Components Analysis, and SVM Feature Evaluation. In the experiment, we applied the approach to a data set which was manually collected. The experimental results show that the approach yielded the highest classification accuracy using SVM algorithm with 92.3 %.

[1] Abdul-Mageed M., Kübler S., and Diab M., “SAMAR: A System for Subjectivity and Sentiment Analysis of Arabic Social Media,” Computer Speech and Language, vol. 28, no. 1, pp. 20-37, 2014.

[2] Abdulla N., Ahmed N., Shehab M., and Al- Ayyob M., “Arabic Sentiment Analysis: Lexicon-Based and Corpus-Based,” in Proceedings of IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies, Amman, 2013.

[3] Aha D., Kibler D., and Albert M., “Instance- Based Learning Algorithms,” Machine Learning, vol. 6, no. 1, pp. 37-66, 1991.

[4] Al-Subaihin A. and Al-Khalifa H., “A System for Sentiment Analysis of Colloquial Arabic Using Human Computation,” The Scientific World Journal, vol. 2014, 2014.

[5] Alhumoud S., Altuwaijri M., Albuhairi T., and Alohaideb W., “Survey on Arabic Sentiment Analysis in Twitter,” International Science Index, vol. 9, no. 1, pp. 364-368, 2015.

[6] Azmi A. and Alzanin S., “Aara’-A System for Mining The Polarity Of Saudi Public Opinion Through E-Newspaper Comments,” Journal of Information Science, vol. 40, no. 3, pp. 398-410, 2014.

[7] Baly R., El-Khoury G., Moukalled R., Aoun R., Hajj H., Shaban K., and El-Hajj W., “Comparative Evaluation of Sentiment Analysis Methods Across Arabic Dialects,” Procedia Computer Science, vol. 117, pp. 266-273, 2017.

[8] Breiman L., “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.

[9] Cambria E., Schuller B., Xai Y., and Havasi C., “New Avenues in Opinion Mining and Sentiment Analysis,” IEEE Intelligent Systems, vol. 28, no. 2, pp. 15-21, 2013.

[10] Chang C. and Lin C., “LIBSVM: A Library for Support Vector Machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, 2011.

[11] Cleveland R., “A Classification for the Arabic Dialects of Jordan,” Bulletin of the American Schools of Oriental Research, vol. 171, pp. 56- 63, 1963.

[12] Cortes C. and Vapnik V., “Support-Vector Networks,” Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.

[13] Duwairi R., Marji R., Sha'ban N., and Rushaidat S., “Sentiment Analysis in Arabic Tweets,” in Proceedings of 5th International Conference on Information and Communication Systems, Irbid, 2014.

[14] Duwairi R., “Sentiment Analysis for Dialectical Arabic,” in Proceedings of 6th International Conference on Information and Communication Systems, Amman, pp. 166-170, 2015.

[15] El-Beltagy S. and Ali A., “Open Issues in the Sentiment Analysis of Arabic Social Media: A Case Study,” in Proceedings of 9th International Conference on Innovations in Information Technology, Abu Dhabi, 2013.

[16] ElSahar H. and El-Beltagy S., “A Fully Automated Approach for Arabic Slang Lexicon Extraction from Microblogs,” in Proceedings of Computational Linguistics and Intelligent Text Processing, Berlin, pp. 79-91, 2014.

[17] Farghaly A. and Shaalan K., “Arabic Natural Language Processing: Challenges and Solutions,” ACM Transactions on Asian Language Information Processing, vol. 8, no. 4, 2009.

[18] Frank E., Hall M., Holmes G., Kirkby R., Pfahringer B., Witten L., and Trigg L., “Weka-A Machine Learning Workbench for Data Mining,” in Proceedings of Data Mining and Knowledge Discovery Handbook, Boston, pp. 1269-1277, 2009.

[19] Guyon I., Weston J., Barnhill S., and Vapnik V., “Gene Selection for Cancer Classification Using Support Vector Machines,” Machine Learning, vol. 46, no. 1-3, pp. 389-422, 2002.

[20] Guyon I. and Elisseeff A., “An Introduction to Variable and Feature Selection,” Journal of Machine Learning Research, pp. 1157-1182, 2003.

[21] Hall M., Correlation-Based Feature Subset Selection For Machine Learning, Theses, University of Waikato, 1999.

[22] Hetzron R., The Semitic Languages, Routledge, 2013.

[23] Ibrahim H., Abdou S., and Gheith M., “Sentiment Analysis for Modern Standard Arabic And Colloquial,” International Journal on Natural Language Computing, vol. 4, no. 2, pp. 95-109, 2015.

[24] John G. and Langley P., “Estimating Continuous Distributions in Bayesian Classifiers,” in Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, Montréal, pp. 338-345, 1995.

[25] Jolliffe I., Principal Component Analysis, Springer Science and Business Media, 2002.

[26] Korayem M., Crandall D., and Abdul-Mageed M., “Subjectivity and Sentiment Analysis of Arabic: A Survey,” in Proceedings of Advanced Machine Learning Technologies and Applications, Berlin, pp. 128-139, 2012.

[27] Liu B., Sentiment Analysis and Subjectivity, Handbook of Natural Language Processing, 2010.

[28] Liu B., Sentiment Analysis and Opinion Mining, Morgan and Claypool Publishers, 2012. 1002 The International Arab Journal of Information Technology, Vol. 16, No. 6, November 2019

[29] Maynard D., Bontcheva K., and Rout D., “Challenges in Developing Opinion Mining Tools For Social Media,” in Proceedings of the@ NLP canu tag# usergeneratedcontent, pp. 15-22, 2012.

[30] McLoughlin L., Colloquial Arabic (Levantine), Routledge, 2009.

[31] Miller C., Arabic in the City: Issues in Dialect Contact and Language Variation, Routledge, 2007.

[32] Mourad A. and Darwish K., “Subjectivity and Sentiment Analysis of Modern Standard Arabic and Arabic Microblogs,” in Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Atlanta, pp. 55-64, 2013.

[33] Omar N. and Albared M., “Ensemble of Classification Algorithms for Subjectivity and Sentiment Analysis of Arabic Customers' Reviews,” International Journal of Advancements in Computing Technology, vol. 5, no. 14, pp. 77-85, 2013.

[34] Salim J., “Homonymy in Jordanian Colloquial Arabic: A Semantic Investigation,” English Language and Literature Studies, vol. 3, no. 3, pp. 69-76, 2013.

[35] Shaalan K., Bakr H., and Ziedan I., “Transferring Egyptian Colloquial Dialect into Modern Standard Arabic,” in Proceedings of International Conference on Recent Advances in Natural Language Processing, Borovets, 2007.

[36] Stokes J. and Gorman A., Encyclopedia of the Peoples of Africa and the Middle East, The Safavid and Qajar dynasties, 2010.

[37] Taboada M., Brooke J., Tofiloski M., Voll K., and Stede M., “Lexicon-Based Methods for Sentiment Analysis,” Computational Linguistics, vol. 37, no. 2, pp. 267-307, 2011.

[38] Wu X., Kumar V., Quinlan J., Ghosh J., Yang Q., Motoda H., McLachlan G., Ng A., Liu B., Yu P., Zhou P., Steinbach M., Hand D., and Steinberg D., “Top 10 Algorithms In Data Mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1-37, 2008. Omar Al-harbi is an assistant professor at the Department of Computer & Information at Jazan Community College, Jazan University, Saudi Arabia Kingdom. He Obtained his PhD in Computer Science with specialization in Artificial Intelligence from Islamic Science University of Malaysia (USIM) in 2013. He previously obtained his Master degree in Information Technology from Northern University of Malaysia (University Utara Malaysia UUM) in 2009. Dr. Omar Alharbi also obtained his Bachelor degree in Computer Science from Jerash University, Jordan in 2007. He has over 8 years of teaching experience. His research interests include natural language processing (NLP), word sense disambiguation (WSD), sentiment analysis, and question answering systems (QA).