The International Arab Journal of Information Technology (IAJIT)


Arabic Quran Verses Authentication Using Deep Learning and Word Embeddings

Nowadays, with the developments witnessed by the Internet, algorithms have come to control all aspects of digital content. Due to its Arabic roots, it is ironic to find that Arabic Quranic content is still thirsty to benefit from computer linguistics, especially with the advent of artificial intelligence algorithms. The massive spread of Islamic-typed websites and applications has led to a widespread of digital Quranic content. Unfortunately, such content lacks censorship and can rarely match resourcefulness. It is quite difficult, especially for a non-native speaker of the Arabic language, to distinguish and authenticate the provided Quranic verses from the non-Quranic Arabic texts. Text processing techniques classified outside the field of Natural Language Processing (NLP) give less qualified results, especially with Arabic texts. To address this problem, we propose to explore Word Embeddings (WE) with Deep Learning (DL) techniques to identify Quranic verses in Arabic textual content. The proposed work is evaluated using twelve different word embeddings models with two popular classifiers for binary classification, namely: Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). The experimental results showed the superiority of the proposed approach over traditional methods in distinguishing between the Quranic verses and the Arabic text with an accuracy of 98.33%.

[1] Abdellatif M. and Elgammal A., “Offensive Language Detection in Arabic Using Ulmfit,” in Proceedings of the 4th Workshop on Open- Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, Marseille, pp. 82-85, 2020.

[2] Abozinadah E., Mbaziira A., and Jr J., “Detection of Abusive Accounts with Arabic Tweets,” International Journal of Knowledge Engineering, vol. 1, no. 2, pp. 113-119, 2015.

[3] Agarap A., “Deep Learning Using Rectified Linear Units (Relu),” arXiv preprint arXiv:1803.08375, 2018.

[4] Alfaifi A., Atwell E., and Hedaya I., “Arabic Learner Corpus (ALC) V2: A New Written and Spoken Corpus of Arabic Learners,” in Proceedings of Learner Corpus Studies in Asia and the World), Kobe, pp. 77-89, 2014.

[5] Almazrooie M., Samsudin A., Gutub A., Salleh M., Omar M., and Hassan S., “Integrity Verification for Digital Holy Quran Verses Using Cryptographic Hash Function and Compression,” Journal of King Saud University- Computer and Information Sciences, vol. 32, no. 1, pp. 24-34, 2020.

[6] Arkok B. and Zeki A., “Classification of Quranic Topics Using Ensemble Learning,” in Proceedings of the 8th International Arabic Quran Verses Authentication Using Deep Learning and Word Embeddings 687 Conference on Computer and Communication Engineering, Kuala Lumpur, pp. 244-248, 2021.

[7] Bengio Y., Ducharme R., Vincent P., and Jauvin C., “A Neural Probabilistic Language Model,” Journal of Machine Learning Research, vol. 3, pp. 1137-1155, 2003.

[8] Elnagar A., Al-Debsi R., and Einea O., “Arabic Text Classification Using Deep Learning Models,” Information Processing and Management, vol. 57, no. 1, pp. 102121, 2020.

[9] Gilkar G., Hakak S., Kamsin A., Rahman M., and Rahman M., “An Exact Matching Approach to Enhance Retrieval Process for Quranic Texts,” in Proceedings of the 4th ACM International Conference of Computing for Engineering and Sciences, Kuala Lumpur, pp. 1-4, 2018.

[10] Goldberg Y. and Levy O., “Word2vec Explained: Deriving Mikolov Et Al.'S Negative-Sampling Word-Embedding Method,” arXiv preprint arXiv:1402.3722, 2014.

[11] Hakak S., Kamsin A., Idris M., Gani A., Amin G., and Zerdoumi S., “Diacritical Digital Quran Authentication Model,” Pertanika Journal of Science and Technology, vol. 25, pp. 133-142, 2017.

[12] Hakak S., Kamsin A., Palaiahnakote S., Tayan O., Idris M., and Abukhir K., “Residual-Based Approach for Authenticating Pattern of Multi- Style Diacritical Arabic Texts,” Plos One, vol. 13, no. 6, 2018.

[13] Hakak S., Kamsin A., Shivakumara P., Gilkar G., Khan W., and Imran M., “Exact String Matching Algorithms: Survey, Issues, and Future Research Directions,” IEEE Access, vol. 7, pp. 69614- 69637, 2019.

[14] Hakak S., Kamsin A., Shivakumara P., Idris M., and Gilkar G., “A New Split Based Searching for Exact Pattern Matching for Natural Texts,” PloS One, vol. 13, no. 7, 2018.

[15] Hakak S., Kamsin A., Shivakumara P., Tayan O., Idris M., and Gilkar G., “An Efficient Text Representation for Searching and Retrieving Classical Diacritical Arabic Text,” Procedia Computer Science, vol. 142, pp. 150-157, 2018.

[16] Hakak S., Kamsin A., Shivakumara, P., and Idris M., “Partition-Based Pattern Matching Approach for Efficient Retrieval of Arabic Text,” Malaysian Journal of Computer Science, vol. 31, no. 3, pp. 200-209, 2018.

[17] Hakak S., Kamsin A., Tayan O., Idris M., and Gilkar G., “Approaches for Preserving Content Integrity of Sensitive Online Arabic Content: A Survey and Research Challenges,” Information Processing and Management, vol. 56, no. 2, pp. 367-380, 2019.

[18] Hakak S., Kamsin A., Tayan O., Idris M., Gani A., and Zerdoumi S., “Preserving Content Integrity of Digital Holy Quran: Survey and Open Challenges,” IEEE Access, vol. 5, pp. 7305-7325, 2017.

[19] Hakak S., Kamsin A., Veri J., Ritonga R., and Herawan T., “A Framework for Authentication of Digital Quran,” in Proceedings of Information Systems Design and Intelligent Applications, India, pp. 752-764, 2018.

[20] Hakak S., Kamsin A., Khan W., Zakari A., Imran M., Bin-Ahmad K., and Gilkar G., “Digital Hadith Authentication: Recent Advances, Open Challenges, and Future Directions,” Transactions on Emerging Telecommunications Technologies, pp. e3977, 2020.

[21] Hochreiter S. and Schmidhuber J., “Long Short- Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.

[22] Hussein A., Al-Kafri M., Abonamah A., and Tariq M., “Mood Detection Based on Arabic Text Documents using Machine Learning Methods,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 9, no. 4, 2020.

[23] Kamaruddin N., Kamsin A., and Hakak S., “Associated Diacritical Watermarking Approach to Protect Sensitive Arabic Digital Texts,” in AIP Conference Proceedings, vol. 1891, no. 1, pp. 020074, 2017.

[24] LeCun Y., Bottou L., Bengio Y., and Haffner P., “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.

[25] Mikolov T., Chen K., Corrado G., and Dean J., “Efficient Estimation of Word Representations in Vector Space,” arXiv preprint arXiv:1301.3781, 2013.

[26] Mikolov T., Sutskever I., Chen K., Corrado G., and Dean J., “Distributed Representations of Words and Phrases and Their Compositionality,” Advances in Neural Information Processing Systems, vol. 26, pp. 3111-3119, 2013.

[27] Pudaruth S., Soyjaudah S., and Gunputh R., “Classification of Legislations using Deep Learning,” The International Arab Journal of Information Technology, vol. 18, no. 5, pp. 651- 662, 2021.

[28] Ruder S., “An Overview of Gradient Descent Optimization Algorithms,” arXiv preprint arXiv:1609.04747, 2016.

[29] Sabbah T. and Selamat A., “Support Vector Machine-Based Approach for Quranic Words Detection in Online Textual Content,” in Proceedings of the 8th IEEE Malaysian Software Engineering Conference, Langkawi, pp. 325-330, 2014.

[30] Sabbah T. and Selamat A., “A Framework for Quranic Verses Authenticity Detection in Online Forum,” in Proceedings of Taibah University 688 The International Arab Journal of Information Technology, Vol. 19, No. 4, July 2022 International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, Madinah, pp. 6-11, 2013.

[31] Schmidhuber J., “Deep Learning in Neural Networks: An Overview,” Neural Networks, vol. 61, pp. 85-117, 2015.

[32] Soliman A., Eissa K., and El-Beltagy S., “Aravec: A Set of Arabic Word Embedding Models for use in Arabic Nlp,” Procedia Computer Science, vol. 117, pp. 256-265, 2017.

[33] Touati-Hamad Z., Laouar M., and Bendib I., “Authentication of Quran Verses Sequences Using Deep Learning,” in Proceedings of the International Conference on Recent Advances in Mathematics and Informatics, Tebessa, pp. 1-4, 2021.

[34] Touati-Hamad Z., Laouar M. R., and Bendib I., “Quran Content Representation in NLP,” in Proceedings of the 10th International Conference on Information Systems and Technologies, Lecce, pp. 1-6, 2020.

[35] Zarrabi-Zadeh H., Tanzil-Quran Navigator,, Last Visited, 2020.

[36] Zerdoumi S., Sabri A., Kamsin A., Hashem I., Gani A., Hakak S., and Chang V., “Image Pattern Recognition in Big Data: Taxonomy and Open Challenges: Survey,” Multimedia Tools and Applications, vol. 77, no. 8, pp. 10091-10121, 2018. Zineb Touati-Hamad received her Master’s degree from Tebessa University in 2019. Presently, she is a Ph.D. Student at LAMIS Laboratory. Her research interests include Machine Learning, Natural Language Processing and Information systems. Mohamed Ridda Laouar received his Ph.D. degree from Valenciennes University in 2005. Presently, he is a professor at Tebessa University and LAMIS Laboratory. His current research interests are Decision Making, Artificial Intelligence and Information systems. Issam Bendib received his Ph.D. degree from Annaba University in 2018. Presently, he is an associate professor at Tebessa University. His current research interests: Information Retrieval and Machine Learning. Saqib Hakak received his Ph.D. degree from Kuala Lumpur University in 2018. Presently, he is an associate professor at the University of New Brunswick, Canada. His current research interests: Cybersecurity, Internet of Things, Natural, Language Processing, Cloud and Edge Computing