The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


A New Data Reduction Technique for Efficient Arabic Data Sentiment Analysis

Sentiment Analysis (SA) has become popular for determining opinions and feelings from textual data. The huge amount of text fed to sentiment analysis models can be considered an obstacle that slows the models’ execution. Besides, it requires a large memory to run these models. Thus, data reduction and feature extraction processes can enhance these models’ performance in terms of time complexity and memory usage. However, the reduction process should not affect the classification models’ performance with the sentiment analysis process to split textual data according to its polarity. In this work, we present an analytical study of the role of data reduction techniques in improving analysis time and accuracy conducted on Arabic datasets. A structured performance assessment of features is produced. The Bidirectional Encoder Representations from Transformers (BERT) models are used as a data reduction tool, and then the performance results of these models are compared to the performance of the Frequency-Inverse Document Frequency (TF-IDF) model. The results show that the quality of the features extracted via BERT models is more valuable for sentiment analysis tasks and can enhance the required time by eight different classifiers. For example, the performance of the Random Forest classifier was improved by 3% when BERT models were used for feature extraction rather than the TF-IDF method, and the time taken by the Random Forest Classifier (RFC) was reduced to one-tenth compared to its performance when the TF-IDF was used as a feature extraction tool.

[1] Abo M., Raj R., and Qazi A., “A Review on Arabic Sentiment Analysis: State-of-the-Art, Taxonomy and Open Research Challenges,” IEEE Access, vol. 7, pp. 162008-162024, 2019. DOI:10.1109/ACCESS.2019.2951530

[2] Ahuja R., Chug A., Kohli S., Gupta S., and Ahuja P., “The Impact of Features Extraction on the Sentiment Analysis,” Procedia Computer Science, vol. 152, pp. 341-348, 2019. https://doi.org/10.1186/s40537-022-00680-6

[3] Ajith A., Gupta B., Verma S., Maurya A., Husain M., and et al., “Improvement of Translation Accuracy for the Word Sense Disambiguation System Using Novel Classifier Approach,” The International Arab Journal of Information Technology, vol. 21, no. 6, pp. 1124-1142, 2024. https://doi.org/10.34028/iajit/21/6/14

[4] Alayba M. and Palade V., “Leveraging Arabic Sentiment Classification Using an Enhanced CNN-LSTM Approach and Effective Arabic Text Preparation,” Journal of King Saud University- Computer and Information Sciences, vol. 34, no. 10, pp. 9710-9722, 2021. https://doi.org/10.1016/j.jksuci.2021.12.004

[5] Alhumoud S. and Al Wazrah A., “Arabic Sentiment Analysis Using Recurrent Neural Networks: A Review,” Artificial Intelligence Review, vol. 55, no. 1, pp. 707-748, 2022. https://doi.org/10.1007/s10462-021-09989-9

[6] Alkaoud M., Alsaqoub M., Aljodhi I., Alqadibi A., and Altammami O., “ACLM: Developing a Compact Arabic Language Model,” The International Arab Journal of Information Technology, vol. 22, no. 3, pp. 535-546, 2025. https://doi.org/10.34028/iajit/22/3/9

[7] Alqarqaz M., Younes M., and Qaddoura R., “An Object Classification Approach for Autonomous Vehicles Using Machine Learning Techniques,” World Electric Vehicle Journal, vol. 14, no. 2, pp. 1-17, 2023. https://doi.org/10.3390/wevj14020041

[8] Antoun W., Baly F., and Hajj H., “AraBERT: Transformer-Based Model for Arabic Language Understanding,” in Proceedings of the Workshop Language Resources and Evaluation Conference, Marseille, pp. 9-15, 2020. https://aclanthology.org/2020.osact-1.2/

[9] Araujo M., Pereira A., and Benevenuto F., “A Comparative Study of Machine Translation for Multilingual Sentence-Level Sentiment Analysis,” Information Sciences, vol. 512, pp. 1078-1102, 2020. https://doi.org/10.1016/j.ins.2019.10.031

[10] Avinash M. and Sivasankar E., “A Study of Feature Extraction Techniques for Sentiment Analysis,” in Proceedings of the Emerging Technologies in Data Mining and Information Security, Kolkata, pp. 475-486, 2018. https://doi.org/10.1007/978-981-13-1501-5_41

[11] Awajan A., “Semantic Similarity Based Approach for Reducing Arabic Texts Dimensionality,” International Journal of Speech Technology, vol. 19, no. 2, pp. 191-201, 2016. https://doi.org/10.1007/s10772-015-9284-6

[12] Ayesha S., Hanif M., and Talib R., “Overview and Comparative Study of Dimensionality Reduction Techniques for High Dimensional Data,” Information Fusion, vol. 59, pp. 44-58, 2020. https://doi.org/10.1016/j.inffus.2020.01.005

[13] Chakraborty K., Bhatia S., Bhattacharyya S., Platos J., and et al., “Sentiment Analysis of Covid- 19 Tweets by Deep Learning Classifiers-A Study to Show How Popularity is Affecting Accuracy in Social Media,” Applied Soft Computing, vol. 97, pp. 106754, 2020. https://doi.org/10.1016/j.asoc.2020.106754

[14] Chen X., Xue Y., Zhao H., Lu X., Hu X., and Ma Z., “A Novel Feature Extraction Methodology for Sentiment Analysis of Product Reviews,” Neural 938 The International Arab Journal of Information Technology, Vol. 22, No. 5, September 2025 Computing and Applications, vol. 31, pp. 6625- 6642, 2019. https://doi.org/10.1007/s00521-018- 3477-2

[15] Elhassan N., Varone G., Ahmed R., Gogate M., and et al., “Arabic Sentiment Analysis based on Word Embeddings and Deep Learning,” Computers, vol. 12, no. 6, pp. 126, 2023. https://doi.org/10.3390/computers12060126

[16] Gaashan K. and Younes M., “Deep Learning- based Arabic Optical Character Recognition: A New Comprehensive Dataset at Character and Word Levels,” in Proceedings of the 15th International Conference on Information and Communication Systems, Irbid, pp. 1-6, 2024. DOI:10.1109/ICICS63486.2024.10638273

[17] Ghallab A., Mohsen A., and Ali Y., “Arabic Sentiment Analysis: A Systematic Literature Review,” Applied Computational Intelligence and Soft Computing, vol. 2020, pp. 1-21, 2020. https://doi.org/10.1155/2020/7403128

[18] Inoue G., Alhafni B., Baimukan N., Bouamor H., and Habash N., “The Interplay of Variant, Size, and Task Type in Arabic Pre-Trained Language Models,” in Proceedings of the 6th Arabic Natural Language Processing Workshop, Kyiv, pp. 1-13, 2021. https://aclanthology.org/2021.wanlp-1.10/

[19] Iqbal F., Hashmi J., Fung B., Batool R., Khattak A., Aleem S., and Hung P., “A Hybrid Framework for Sentiment Analysis Using Genetic Algorithm Based Feature Reduction,” IEEE Access, vol. 7, pp. 14637-14652, 2019. DOI:10.1109/ACCESS.2019.2892852

[20] Kaloub A. and Elgabar E., “Speech-Based Techniques for Emotion Detection in Natural Arabic Audio Files,” The International Arab Journal of Information Technology, vol. 22, no. 1, pp. 139-157, 2025. https://doi.org/10.34028/iajit/22/1/11

[21] Kaur G. and Sharma A., “A Deep Learning-Based Model Using Hybrid Feature Extraction Approach for Consumer Sentiment Analysis,” Journal of Big Data, vol. 10, no. 5, pp. 1-25, 2023. https://doi.org/10.1186/s40537-022-00680-6

[22] Kumar H., Harish B., and Darshan H., “Sentiment Analysis on IMDB Movie Reviews Using Hybrid Feature Extraction Method,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 5, pp. 108-114, 2019. DOI:10.9781/ijimai.2018.12.005

[23] Manzoor A., Rehman Z., Shaheen M., and Khan M., “Data Mining of IoT based Sentiments to Classify Political Opinions,” Journal of Experimental and Theoretical Artificial Intelligence, vol. 36, no. 4, pp. 1-16, 2022. https://doi.org/10.1080/0952813X.2022.2093406

[24] Mhatre M., Phondekar D., Kadam P., Chawathe A., and Ghag K., “Dimensionality Reduction for Sentiment Analysis Using Pre-Processing Techniques,” in Proceedings of the International Conference on Computing Methodologies and Communication, Erode, pp. 16-21, 2017. DOI:10.1109/ICCMC.2017.8282676

[25] Nandwani P. and Verma R., “A Review on Sentiment Analysis and Emotion Detection from Text,” Social Network Analysis and Mining, vol. 11, pp. 81, 2021. https://doi.org/10.1007/s13278- 021-00776-6

[26] Oussous A., Benjelloun F., Lahcen A., and Belfkih S., “ASA: A Framework for Arabic Sentiment Analysis,” Journal of Information Science, vol. 46, no. 4, pp. 544-559, 2020. DOI:10.1177/0165551519849516

[27] Pandya S. and Mehta P., “A Review on Sentiment Analysis Methodologies, Practices and Applications,” International Journal of Scientific and Technology Research, vol. 9, no. 2, pp. 601- 609, 2020.

[28] Saeed R., Rady S., and Gharib T., “Optimizing Sentiment Classification for Arabic Opinion Texts,” Cognitive Computation, vol. 13, no. 1, pp. 164-178, 2021. https://doi.org/10.1007/s12559- 020-09771-z

[29] Safaya A., Abdullatif M., and Yuret D., “KUISAIL at Semeval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media,” in Proceedings of the 14th Workshop on Semantic Evaluation, Barcelona, pp. 2054-2059, 2020. DOI:10.18653/v1/2020.semeval-1.271

[30] Singh K., Devi S., Devi H., and Mahanta A., “A Novel Approach for Dimension Reduction Using Word Embedding: An Enhanced Text Classification Approach,” International Journal of Information Management Data Insights, vol. 2, no. 1, pp. 1-10, 2022. https://doi.org/10.1016/j.jjimei.2022.100061

[31] Styawati S., Nurkholis A., Aldino A., Samsugi S., Suryati E., and Cahyono R., “Sentiment Analysis on Online Transportation Reviews Using Word2vec Text Embedding Model Feature Extraction and Support Vector Machine (SVM) Algorithm,” in Proceedings of the International Seminar on Machine Learning, Optimization, and Data Science, Jakarta, pp. 163-167, 2022. DOI:10.1109/ISMODE53584.2022.9742906

[32] Suleiman D., Odeh A., and Al-Sayyed R., “Arabic Sentiment Analysis Using Naive Bayes and CNN- LSTM,” Informatica, vol. 46, no. 6, pp. 79-86, 2022. DOI:10.31449/inf.v46i6.4199

[33] Touahri I., “The Construction of an Accurate Arabic Sentiment Analysis System Based on Resources Alteration and Approaches Comparison,” Applied Computing and Informatics, vol. 21, no. 1, pp. 1-15, 2022. DOI:10.1108/ACI-12-2021-0338

[34] Wankhade M., Rao A., and Kulkarni C., “A Survey on Sentiment Analysis Methods, A New Data Reduction Technique for Efficient Arabic Data Sentiment Analysis 939 Applications, and Challenges,” Artificial Intelligence Review, vol. 55, pp. 5731-5780, 2022. https://doi.org/10.1007/s10462-022-10144-1

[35] Younes M. and Boukerche A., “A Performance Evaluation of a Context-Aware Path Recommendation Protocol for Vehicular Ad-Hoc Networks,” in Proceedings of the IEEE Global Communications Conference, Atlanta, pp. 516- 521, 2013. DOI:10.1109/GLOCOM.2013.6831123

[36] Younes M. and Boukerche A., “SCOOL: A Secure Traffic Congestion Control Protocol for VANETs,” in Proceedings of the IEEE Wireless Communications and Networking Conference, New Orleans, pp. 1960-1965, 2015. DOI:10.1109/WCNC.2015.7127768

[37] Younes M., Alonso G., and Boukerche A., “A Distributed Infrastructure-Based Congestion Avoidance Protocol for Vehicular Ad Hoc Networks,” in Proceedings of the IEEE Global Communications Conference, Anaheim, pp. 73- 78, 2012. DOI:10.1109/GLOCOM.2012.6503093