The International Arab Journal of Information Technology (IAJIT)


Detecting Spam Reviews in Arabic by Deep Learning

Online reviews are frequently used by consumers to make decisions about online purchases, hotel bookings, car rentals, and other choices because online shopping has grown in popularity over the past few years. Reviews are now crucial to both the customer and the business. As writing fake reviews comes with financial gain, opinion spam activities have increased. Some unethical companies may hire workers to write reviews to influence consumers’ purchasing decisions; therefore, detecting spam reviews is a very important task. We compiled a large dataset of Arabic reviews consisting of spam and non-spam that are categorized by crowd-sourcing approach. Then, we applied deep learning algorithms to detect spam reviews. To the best of our knowledge, there are no prior studies utilized deep learning to classify reviews that are written in Arabic. Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) models were used, and an accuracy of 97% was achieved by both algorithms. To further improve the results, unbalanced issues were solved by oversampling and undersampling techniques. The results of them are improvements in the precision, recall, and F1-score for spam reviews. For example, in CNN F1-score for spam class increased from 79% to 90% with undersampling and became 82% with oversampling.

[1] Abdul-Mageed M., Elmadany A., and Nagoudi E., “ARBERT and MARBERT: Deep Bidirectional Transformers for Arabic,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual, pp. 7088-7105, 2021.

[2] Abid M., Benlaria H., and Gheraia Z., “The Impact of the Emerging Coronavirus (COVID-19) on E-Commerce in the Kingdom of Saudi Arabia,” WSEAS Transaction on Bussiness and Economics, vol. 19, pp. 825-836, 2022. DOI: 10.37394/23207.2022.19.72

[3] Abu-Hammad A. and El-Halees A., An Approach for Detecting Spam in Arabic Opinion, Master Thesis, Islamic University, 2013. &type=pdf&doi=b350cd14f88e5848392b6417c7 31832679472eb8

[4] Antoun W., Baly F., and Hajj H., “AraBERT: Transformer-based Model for Arabic Language Understanding,” arXiv Preprint, vol. arXiv:2003.00104v4, pp. 1-7, 2020.

[5] Archchitha K. and Charles E., “Opinion Spam Detection in Online Reviews Using Neural Networks,” in Proceedings of the 19th International Conference on Advances in ICT for Emerging Regions, Colombo, pp. 1-6, 2019. doi: 10.1109/ICTer48817.2019.9023695

[6] Bhatt D., Patel C., Talsania H., Patel J., and Vaghela R., “CNN Variants for Computer Vision: History, Architecture, Application, Challenges and Future Scope,” Electronics, vol. 10, no. 20, pp. 1-28, 2021.

[7] Bisong E., Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners, Apress, 2019. 4842-4470-8

[8] Bourahouat G., Abourezq M., and Daoudi N., “Word Embedding as a Semantic Feature Extraction Technique in Arabic Natural Language Processing: An Overview,” The International Arab Journal on Information Technology, vol. 21, no. 2, pp. 313-325, 2024. doi: 10.34028/21/2/13

[9] Devlin J., Chang M., Lee K., and Toutanova K., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, pp. 4171-4186, 2019. 1423

[10] Elfaik H. and Nfaoui E., “Deep Bidirectional LSTM Network Learning-Based Sentiment Analysis for Arabic Text,” Journal of Intelligent Systems, vol. 30, no. 1, pp. 395-412, 2021. doi: 10.1515/jisys-2020-0021

[11] Elnagar A., Khalifa Y., and Einea A., Intelligent Natural Language Processing: Trends and Applications, Springer, 2018. deep-neural-networks-for-extracting-sentiment- targets-in-a/15234276

[12] Fang H., Lu C., Hong F., Jiang W., and Wang T., “Convolutional Neural Network for Sentence Classification,” in Proceedings of the 15th IEEE International Conference on Electronic Measurement and Instruments, Nanjing, pp. 253- 258, 2021. doi: 10.1109/ICEMI52946.2021.9679581

[13] Guo T., Dong J., Li H., and Gao Y., “Simple Convolutional Neural Network on Image Classification,” IEEE 2nd International Conference on Big Data Analysis, Beijing, pp. 721-724, 2017. doi: 10.1109/ICBDA.2017.8078730

[14] Hajek P. and Munk M., “Fake Consumer Review Detection Using Deep Neural Networks Integrating Word Embeddings and Emotion Mining,” Neural Computing and Applications, vol. 2, pp. 17259-17274, 2020.

[15] Hameed Z. and Garcia-Zapirain B., “Sentiment Classification Using a Single-Layered BiLSTM Model,” IEEE Access, vol. 8, pp. 73992-74001, 2020. doi: 10.1109/ACCESS.2020.2988550 504 The International Arab Journal of Information Technology, Vol. 21, No. 3, May 2024

[16] Hammad A. and El-Halees A., “An Approach for Detecting Spam in Arabic Opinion Reviews,” The International Arab Journal of Information Technology, vol. 12, no. 1, pp. 10-16, 2015.,no.1/7006.pdf

[17] Heikal M., Torki M., and El-Makky N., “Sentiment Analysis of Arabic Tweets Using Deep Learning,” Procedia Computer Science, vol. 142, pp. 114-122, 2018. doi: 10.1016/j.procs.2018.10.466

[18] Hochreiter S. and Schmidhuber J., “Long Short- Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997. doi: 10.1162/neco.1997.9.8.1735

[19] Howard J. and Ruder S., “Universal Language Model Fine-tuning for Text Classification,” arXiv Preprint, vol. arXiv:1801.06146, pp. 1-12, 2018.

[20] Jain G., Sharma M., and Agarwal B., “Spam Detection in Social Media Using Convolutional and Long Short Term Memory Neural Network,” Annals of Mathemetics and Artificial Intelligence, vol. 85, no. 1, pp. 21-44, 2019. doi: 10.1007/s10472-018-9612-z

[21] Jindal N. and Liu B., “Opinion Spam and Analysis,” in Proceedings of the International Conference on Web Search Data Mining, Palo Alto, pp. 219-230, 2008.

[22] Johnson R. and Zhang T., “Semi-supervised Convolutional Neural Networks for Text Categorization Via Region Embedding,” Advances in Neural Information Processing Systems, vol. 28, pp. 919-927, 2015. 0

[23] Kalchbrenner N., Grefenstette E., and Blunsom P., “A Convolutional Neural Network for Modelling Sentences,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, pp. 655-665, 2014. doi: 10.3115/v1/p14-1062

[24] Krawczyk B., “Learning from Imbalanced Data: Open Challenges and Future Directions,” Progress in Artificial Intelligence, vol. 5, no. 4, pp. 221-232, 2016. doi: 10.1007/s13748-016-0094-0

[25] LeCun Y., Bengio Y., and Hinton G., “Deep Learning,” Nature, vol. 521, pp. 436-444, 2015. doi: 10.1038/nature14539

[26] Liu X., Wu J., and Zhou Z., “Exploratory Undersampling For Class-Imbalance Learning,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 39, no. 2, pp. 539-550, 2009. doi: 10.1109/TSMCB.2008.2007853

[27] Mani S., Kumari S., Jain A., and Kumar P., “Spam Review Detection Using Ensemble Machine Learning,” in Proceedings of the International Conference on Machine Learning and Data Mining in Pattern Recognition, New York, pp. 198-209, 2018. 319-96133-0_15

[28] Mataoui M., Zelmati O., Boughaci D., Chaouche M., and Lagoug F., “A Proposed Spam Detection Approach For Arabic Social Networks Content,” in Proceedings of the International Conference on Mathematics and Information Technology, Adrar, pp. 222-226 2017. doi: 10.1109/MATHIT.2017.8259721

[29] Mesleh A., “Support Vector Machines based Arabic Language Text Classification System: Feature Selection Comparative Study,” in Proceedings of the Advances in Computer and Information Sciences and Engineering Conference, Massachusetts, pp. 11-16, 2008.

[30] Mikolov T., Sutskever I., Chen K., Corrado G., and Dean J., Advances in Neural Information Processing Systems, Springer, 2013. 013/hash/9aa42b31882ec039965f3c4923ce901b- Abstract.html

[31] Narayan R., Rout J., and Jena S., Progress in Intelligent Computing Techniques: Theory, Practice, and Applications, Springer, 2018. DOI: 10.1007/978-981-10-3373-5

[32] Ombabi A., Ouarda W., and Alimi A., “Deep Learning CNN–LSTM Framework for Arabic Sentiment Analysis Using Textual Information Shared in Social Networks,” Social Network Analysis and Mining, vol. 10, no. 1, pp. 53, 2020. doi: 10.1007/s13278-020-00668-1

[33] Ott M., Choi Y., Cardie C., and Hancock J., “Finding Deceptive Opinion Spam by Any Stretch of the Imagination,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, pp. 309-319, 2011.

[34] Pennington J., Socher R., and Manning C., “GloVe : Global Vectors for Word Representation,” in Proceedings of the Empirical Methods in Natural Language Processing Conference, Doha, pp. 1532-1543, 2014. DOI: 10.3115/v1/D14-1162

[35] Phung S., Bouzerdoum A., and Nguyen G., Pattern Recognition, InTech, 2009.

[36] Ren Y. and Ji D., “Learning to Detect Deceptive Opinion Spam: A Survey,” IEEE Access, vol. 7, pp. 42934-42945, 2019. doi: 10.1109/ACCESS.2019.2908495

[37] Saeed R., Rady S., and Gharib T., “An Ensemble Approach for Spam Detection in Arabic Opinion Texts,” Journal of King Saud University- Computer and Information Sciences, vol. 34, no. 1, pp. 1407-1416, 2022. doi: 10.1016/j.jksuci.2019.10.002 Detecting Spam Reviews in Arabic by Deep Learning 505

[38] Samha A., Li Y., and Zhang J., “Apect-Based Opinion Extraction from Customer Reviews,” Computation and Language, arXiv Preprint, vol. arXiv:1404.1982, pp. 149-160, 2014.

[39] Saumya S. and Singh J., “Detection of Spam Reviews: A Sentiment Analysis Approach,” CSI Transaction on ICT, vol. 6, no. 2, pp. 137-148, 2018. doi: 10.1007/s40012-018-0193-0

[40] Shahariar G., Biswas S., Omar F., Shah F., and Hassan S., “Spam Review Detection Using Deep Learning,” in Proceedings of the 10th Annual Information Technology, Electronics and Mobile Communication Conference, Vancouver, pp. 0027-0033, 2019. doi: 10.1109/IEMCON.2019.8936148

[41] Sokolova M. and Lapalme G., “A Systematic Analysis of Performance Measures for Classification Tasks,” Information Processing and Management, vol. 45, no. 4, pp. 427-437, 2009. doi: 10.1016/j.ipm.2009.03.002

[42] Syah M., Effect of Oversampling and Undersampling on Classifying Imbalanced Text Datasets, Master Thesis, The University of Texas at Austin, 2004.

[43] Tammina S. and Annareddy S., “Sentiment Analysis on Customer Reviews Using Convolutional Neural Network,” in Proceedings of the International Conference on Computer Communication and Informatics, Coimbatore, pp. 1-6, 2020. doi: 10.1109/ICCCI48352.2020.9104086

[44] Wang C., Day M., Chen C., and Liou J., “Detecting Spamming Reviews Using Long Short-Term Memory Recurrent Neural Network Framework,” in Proceedings of the 2nd International Conference on E-Commerce, E- Business and E-Government, New York, pp. 16- 20, 2018.doi: 10.1145/3234781.3234794

[45] Zahran M., Magooda A., Mahgoub A., Raafat H., and Rashwan M., “Word Representations in Vector Space and their Applications for Arabic,” in Proceedings of the 16th International Conference on Computational Linguistics and Intelligent Text Processing, Cairo, pp. 430-443, 2015. 0_32

[46] Ziani A., Azizi N., Schwab D., Zenakhra D., and Aldwairi M., “Deceptive Opinions Detection Using New Proposed Arabic Semantic Features,” Procedia Computer Science, vol. 189, pp. 29-36, 2021. doi: 10.1016/j.procs.2021.05.067 E