Neural Networks and Sentiment Features for Extremist Content Detection in Arabic Social Media

Author Hanen Himdi, Fatimah Alhayan, Khaled Shaalan,

Keywords #BERT #sentiment analysis #deep learning #transformer-based models

Abstract

The proliferation of extremist content on social media poses critical threats to societal stability, necessitating advanced detection mechanisms. Despite substantial research on extremist content detection in various languages, Arabic remains significantly underexplored. Recognizing the pivotal role of social media, this study introduces a novel approach to detecting extremist posts in Arabic by leveraging neural networks. The proposed models utilize Arabic Bidirectional Encoder Representations from Transformers (AraBERT), Multi-Layer Perceptron (MLP), and Sentiment Features (SFs). Among the tested models, the optimal configuration-fine-tuning AraBERT with integrated MLP and SF-achieved an impressive 98% accuracy in detecting extremist Arabic tweets. Additionally, the model demonstrated robust performance when evaluated on real-world extremist posts from VKontakte, achieving 81% accuracy. These findings underscore the effectiveness of combining AraBERT, MLP, and SF in improving extremist content detection and highlight the potential of neural network-based solutions in combating harmful online content.

References

[1] Abdul-Mageed M., Diab M., and Kubler S., “SAMAR: Subjectivity and Sentiment Analysis for Arabic Social Media,” Computer Speech and Language, vol. 28, no. 1, pp. 20-37, 2014. https://doi.org/10.1016/j.csl.2013.03.001 [2] Ahmad S., Asghar M., Alotaibi F., and Awan I., “Detection and Classification of Social Media- based Extremist Affiliations Using Sentiment Analysis Techniques,” Human-Centric Computing and Information Sciences, vol. 9, pp. 1-23, 2019. https://doi.org/10.1186/s13673-019- 0185-6 [3] Ahmed A., Hasan M., Jaber M., Al-Ghuribi S., Abd D., Khan W., Sadiq A., and Hussain A., “Extremism Arabic Text Detection Using Rough Set Theory: Designing a Novel Approach,” IEEE Access, vol. 11, pp. 68428-68438, 2023. DOI:10.1109/ACCESS.2023.3278272 [4] Alatawi H., Alhothali A., and Moria K., “Detection of Hate Speech using BERT and Hate Speech Word Embedding with Deep Model,” Applied Artificial Intelligence, vol. 37, no. 1, pp. 384-405, 2023. https://doi.org/10.1080/08839514.2023.2166719 [5] Aldera S., Emam A., Al-Qurishi M., Alrubaian M., and Alothaim A., “Exploratory Data Analysis and Classification of a New Arabic Online Extremism Dataset,” IEEE Access, vol. 9, pp. 161613- 161626, 2021. DOI:10.1109/ACCESS.2021.3132651 [6] Aldera S., Emam A., Al-Qurishi M., Alrubaian M., and Alothaim A., Annotated Arabic Extremism Tweets, IEEE Dataport, 532 The International Arab Journal of Information Technology, Vol. 22, No. 3, May 2025 https://dx.doi.org/10.21227/g9c0-1t21, Last Visited, 2024. [7] Aldumaykhi A., Otai S., and Alsudais A., “Comparing Open Arabic Named Entity Recognition Tools,” in Proceedings of the 24th International Conference on Information Reuse and Integration for Data Science, Bellevue, pp. 46-51, 2023. https://ieeexplore.ieee.org/document/10229342 [8] Alfaidi A., Alwadei H., Alshutayri A., and Alahdal S., “Exploring the Performance of Farasa and CAMeL Taggers for Arabic Dialect Tweets,” The International Arab Journal of Information Technology, vol. 20, no. 3, pp. 349-356, 2023. https://doi.org/10.34028/iajit/20/3/7 [9] Al-Khalifa H., Magdy W., Darwish K., Elsayed T., and Mubarak H., “Overview of OSACT4 Arabic Offensive Language Detection Shared Task,” in Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, Marseille, pp. 48-52, 2020. https://aclanthology.org/2020.osact-1.0/ [10] Alluhaibi R., Alfraidi T., Abdeen M., and Yatimi A., “A Comparative Study of Arabic Part of Speech Taggers Using Literary Text Samples from Saudi Novels,” Information, vol. 12, no. 12, pp. 1- 13, 2021. https://doi.org/10.3390/info12120523 [11] Antoun W., Baly F., and Hajj H., “AraBERT: Transformer-based Model for Arabic Language Understanding,” in Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, Marseille, pp. 9-15, 2020. https://aclanthology.org/2020.osact-1.2.pdf [12] Berhoum A., Meftah M., Laouid A., and Hammoudeh M., “An Intelligent Approach Based on Cleaning up of Inutile Contents for Extremism Detection and Classification in Social Networks,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 22, no. 5, pp. 1-20, 2023. https://doi.org/10.1145/3575802 [13] Bisong E., Building Machine Learning and Deep Learning Models on Google Cloud Platform, A Comprehensive Guide for Beginners, Apress, 2019. https://link.springer.com/chapter/10.1007/978-1- 4842-4470-8_31 [14] Canete J., Chaperon G., Fuentes R., Ho J., Kang H., and Perez J., “Spanish Pre-Trained BERT Model and Evaluation Data,” arXiv Preprint, vol. arXiv:2308.02976v1, pp. 1-9, 2020. https://doi.org/10.48550/arXiv.2308.02976 [15] Chan T., Schweter S., and Moller T., “German’s Next Language Model,” arXiv Preprint, vol. arXiv:2010.10906, pp. 1-9, 2020. https://arxiv.org/pdf/2010.10906 [16] Chouikhi H., Chniter H., and Jarray F., “Arabic Sentiment Analysis Using BERT Model,” in Proceedings of the 13th International Conference on Advances in Computational Collective Intelligence, Kallithea, pp. 621-632, 2020. https://doi.org/10.1007/978-3-030-88113-9_50 [17] Cohen K., Johansson F., Kaati L., and Mork J., “Detecting Linguistic Markers for Radical Violence in Social Media,” Terrorism and Political Violence, vol. 26, no. 1, pp. 246-256, 2014. https://doi.org/10.1080/09546553.2014.849948 [18] Da Silva I., Spatti D., Flauzino R., Liboni L., Dos Reis Alves S., Artificial Neural Networks: A Practical Course, Springer, 2017. https://link.springer.com/chapter/10.1007/978-3- 319-43162-8_5 [19] Devlin J., Chang M., Lee K., and Toutanova K., “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the NAACL-HLT, Minneapolis, pp. 4171-4186, 2019. https://aclanthology.org/N19-1423.pdf [20] Dragos V. and Constable Y., “Comparison of Classification Techniques for Extremism Detection in French Social Media,” in Proceedings of the 26th International Conference on Information Fusion, Charleston, pp. 1-7, 2023. https://hal.science/hal-04313505 [21] Fraiwan M., “Identification of Markers and Artificial Intelligence-based Classification of Radical Twitter Data,” Applied Computing and Informatics, pp. 1-13, 2022. https://doi.org/10.1108/ACI-12-2021-0326 [22] Gaikwad M., Ahirrao S., Phansalkar S., and Kotecha K., “Online Extremism Detection: A Systematic Literature Review with Emphasis on Datasets, Classification Techniques, Validation Methods, and Tools,” IEEE Access, vol. 9, pp. 48364-48404, 2021. DOI: 10.1109/ACCESS.2021.3068313 [23] Gelber K., “Terrorist-Extremist Speech and Hate Speech: Understanding the Similarities and Differences,” Ethical Theory and Moral Practice, vol. 22, no. 3, pp. 607-622, 2019. https://doi.org/10.1007/s10677-019-10013-x [24] Himdi H. and Assiri F., “Tasaheel: An Arabic Automative Textual Analysis Tool-All in One,” IEEE Access, vol. 11, pp. 139979-139992, 2023. DOI:10.1109/ACCESS.2023.3340520 [25] Jamil M., Pais S., Cordeiro J., and Dias G., “Detection of Extreme Sentiments on Social Networks with BERT,” Social Network Analysis and Mining, vol. 12, no. 1, pp. 1-16, 2022. https://doi.org/10.1007/s13278-022-00882-z [26] Kadhim A., “An Evaluation of Preprocessing Techniques for Text Classification,” International Journal of Computer Science and Information Security, vol. 16, no. 6, pp. 22-32, 2018. Neural Networks and Sentiment Features for Extremist Content Detection in Arabic ... 533 https://www.academia.edu/36998792/An_Evaluat ion_of_Preprocessing_Techniques_for_Text_Cla ssification [27] Lipset M., “Social Stratification and ‘Right-Wing Extremism,” The British Journal of Sociology, vol. 10, no. 4, pp. 346-382, 1959. https://doi.org/10.2307/587800 [28] Liu B., Sentiment Analysis: Mining Opinions, Sentiments, and Emotions, Cambridge University Press, 2015. https://books.google.jo/books?id=PdX7DwAAQ BAJ&printsec=frontcover&redir_esc=y#v=onepa ge&q&f=false [29] Martin L., Muller B., Suarez P., Dupont Y., Romary L., De la Clergerie E., Seddah D., and Sagot B., “CamemBERT: A Tasty French Language Model,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, pp. 7203- 7219, 2020. https://aclanthology.org/2020.acl- main.645.pdf [30] Mohd M., Javeed S., Nowsheena, Wani M., and Khanday H., “Sentiment Analysis Using Lexico- Semantic Features,” Journal of Information Science, vol. 50, no. 6, pp. 1449-1470, 2020. https://doi.org/10.1177/01655515221124016 [31] Mussiraliyeva S., Bolatbek M., Omarov B., and Bagitova K., “Detection of Extremist Ideation on Social Media Using Machine Learning Techniques,” in Proceedings of the 12th International Conference on Computational Collective Intelligence, Da Nang, pp. 743-752, 2020. https://link.springer.com/chapter/10.1007/978-3- 030-63007-2_58 [32] Mussiraliyeva S., Omarov B., Yoo P., and Bolatbek M., “Applying Machine Learning Techniques for Religious Extremism Detection on Online User Contents,” Computers, Materials and Continua, vol. 70, no. 1, pp. 915-934, 2022. https://doi.org/10.32604/cmc.2022.019189 [33] Obeid O., Zalmout N., Khalifa S., Taji D., Oudah M., Alhafni B., Inoue G., Eryani F., Erdmann A., and Habash N., “CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing,” in Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, pp. 7022-7032, 2020. https://aclanthology.org/2020.lrec-1.868.pdf [34] Rajendran A., Sahithi V., Gupta C., Yadav M., Ahirrao S., Kotecha K., Gaikwad M., Abraham A., Ahmed N., and Alhammad S., “Detecting Extremism on Twitter During U.S. Capitol Riot Using Deep Learning Techniques,” IEEE Access, vol. 10, pp. 133052-133077, 2022. DOI:10.1109/ACCESS.2022.3227962 [35] Sudheesh R., Mujahid M., Rustam F., Mallampati B., Chunduri V., De la Torre Diez and I., Ashraf I., “Bidirectional Encoder Representations from Transformers and Deep Learning Model for Analyzing Smartphone-Related Tweets,” PeerJ Computer Science, vol. 9, pp. e1432, 2023. https://doi.org/10.7717/peerj-cs.1432 [36] Sun C., Qiu X., Xu Y., and Huang X., “How to Fine-Tune BERT for Text Classification?,” in Proceedings of the 18th China National Conference on Chinese Computational Linguistics, Kunming, pp. 194-206, 2019. https://doi.org/10.1007/978-3-030-32381-3_16 [37] Taboada M., “Sentiment Analysis: An Overview from Linguistics,” Annual Review of Linguistics, vol. 2, pp. 325-347, 2016. https://doi.org/10.1146/annurev-linguistics- 011415-040518 [38] Tangirala S., “Evaluating the Impact of GINI Index and Information Gain on Classification Using Decision Tree Classifier Algorithm,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 2, pp. 612- 619, 2020. DOI:10.14569/IJACSA.2020.0110277 [39] Tartir S. and Abdul-Nabi I., “Semantic Sentiment Analysis in Arabic Social Media,” Journal of King Saud University-Computer and Information Sciences, vol. 29, no. 2, pp. 229-233, 2017. https://doi.org/10.1016/j.jksuci.2016.11.011 [40] Taud H. and Mas J., Geomatic Approaches for Modeling Land Change Scenarios, Springer, 2018. https://doi.org/10.1007/978-3-319-60801- 3_27 [41] Torregrosa J., Bello-Orgaz G., Martinez-Camara E., Del Ser J., and Camacho D., “A Survey on Extremism Analysis Using Natural Language Processing: Definitions, Literature Review, Trends and Challenges,” Journal of Ambient Intelligence and Humanized Computing, vol. 14, no. 8, pp. 9869-9905, 2023. https://doi.org/10.1007/s12652-021-03658-z [42] Torregrosa J., Thorburn J., Lara-Cabrera R., Camacho D., and Trujillo H., “Linguistic Analysis of Pro-ISIS Users on Twitter,” Behavioral Sciences of Terrorism and Political Aggression, vol. 12, no. 3, pp. 171-185, 2020. https://doi.org/10.1080/19434472.2019.1651751 [43] Ul Rehman Z., Abbas S., Khan M., Mustafa G., Fayyaz H., Hanif M., and Saeed M., “Understanding the Language of ISIS: An Empirical Approach to Detect Radical Content on Twitter Using Machine Learning,” Computers, Materials and Continua, vol. 66, no. 2, pp. 1075- 1090, 2021. https://doi.org/10.32604/cmc.2020.012770 [44] Watanabe H., Bouazizi M., and Ohtsuki T., “Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection,” IEEE Access, vol. 6, pp. 13825-13835, 2018. 534 The International Arab Journal of Information Technology, Vol. 22, No. 3, May 2025 DOI: 10.1109/ACCESS.2018.2806394 Hanen Himdi is an Assistant Professor of Computer Science and Artificial Intelligence in the College of Computer Science and Engineering, University of Jeddah, KSA. She is a computer scientist with a Ph.D. degree in Computer Science from the University of Strathclyde, Scotland, UK. Her research interests are machine learning, natural language processing, and textual analysis. Her current research interests lie in the area of deep learning and the creation of AI models that make use of cutting-edge learning techniques. Fatimah Alhayan is an Assistant Professor of Computer Science in the College of Computer and Information Science at Princess Noura University, Saudi Arabia. Holding a Ph.D. in Computer Science from the University of Strathclyde, Scotland, UK. Her research interests include Information Credibility, Data Mining, Computational Social Science, Machine Learning, and Natural Language Processing (NLP) in both English and Arabic languages. Photo: Khaled Shaalan is a Prof. Khaled Shaalan currently occupies the Co- Chair of the Faculty of Engineering and IT position at The British University in Dubai, UAE. He is currently holding the rank of a Full Professor of Computer Science and AI. He has gained significant academic experience and insights into understanding complex ICT issues in many industrial and governmental domains through a career and affiliation spanning for more than 30 years. Areas of interest are Artificial Intelligence (AI), Natural Language Understanding, Knowledge Management, Health Informatics, Education Technology, E- businesses, cybersecurity, and Smart Government Services. He is ranked among the worldwide 2% top scientists till now according to a study led by Dr Ioannidis and his research team at Stanford University. He is also ranked as one of the Top Computer Scientists in the UAE according to the Research.comindex.