The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Embedding Search for Quranic Texts based on Large Language Models

Semantic search is the process of retrieving relevant information from a large corpus of texts based on the meaning and context of the query. This paper is introduced in order to explore the use of large language models for semantic search of Quranic texts. The Quran, which is the central religious text of Islam, contains rich and complex linguistic and semantic features that pose challenges for traditional keyword-based search methods. This study investigates a semantic search approach utilizing. Large Language Models (LLM) embedding and assess the performance of LLM embedding in comparison to a baseline embedding-based search method using a set of queries that represent different semantic search levels. In addition, this study will also discuss the limitations and implications of using large language models for semantic search of Quranic texts and suggest directions for future research. A significant finding in this study is the consistent effectiveness of the LLM embedding across varying semantic complexities. This suggests that embedding using LLMs can capture deep semantic connections effectively. On the other hand, as a second finding, the state-of-the-art transformer, AraT5, outperforms LLM embeddings in low-level semantic searches, indicating potential for further LLM fine-tuning on Arabic text corpora.

[1] Abuzayed A. and Al-Khalifa H., “BERT for Arabic Topic Modeling: An Experimental Study on BERTopic Technique,” Procedia Computer Science, vol. 189, pp. 191-194. 2021. https://doi.org/10.1016/j.procs.2021.05.096

[2] Afzal H. and Mukhtar T., “Semantically Enhanced Concept Search of the Holy Quran: Qur’anic English WordNet,” Arabian Journal for Science and Engineering, vol. 44, pp. 3953-3966, 2019. https://doi.org/10.1007/s13369-018-03709-2

[3] Alhawarat M., Hegazi M., and Hilal A., “Processing the Text of the Holy Quran: A Text Mining Study,” International Journal of Advanced Computer Science and Applications, vol. 6, no. 2, pp. 262-267, 2015. DOI:10.13140/RG.2.1.3025.3608

[4] Alqahtani M. and Atwell E., “Arabic Quranic Search Tool Based on Ontology,” Natural Language Processing and Information Systems, pp. 478-485, 2016. https://doi.org/10.1007/978-3-319-41754-7_52

[5] Alqahtani M. and Atwell E., “Evaluation Criteria for Computational Quran Search,” International Journal on Islamic Applications in Computer Science and Technology, vol. 5, no. 1, pp. 12-22, 2017. https://doi.org/10.1007/978-3-319-41754- 7_52

[6] Alshammeri M., Atwell E., and Alsalka M., “Detecting Semantic-based Similarity between Verses of the Quran with Doc2vec,” Procedia Computer Science, vol. 189, pp. 351-358, 2021. https://doi.org/10.1016/j.procs.2021.05.104

[7] Beirade F., Azzoune H., and Zegour D., “Semantic Query for Quranic Ontology,” Journal of King Saud University-Computer and Information Sciences, vol. 33, no. 6, pp. 753-760, 2021. DOI: 10.1016/j.jksuci.2019.04.005

[8] Ishkewy H. and Harb H., “ISWSE: Islamic Semantic Web Search Engine,” International Journal of Computer Applications, vol. 112, no. 5, pp. 37-34, 2015. https://citeseerx.ist.psu.edu/document?repid=rep1 &type=pdf&doi=2824d65e2716fe759a4d4c9a47 ebf11ca09b5c2c

[9] Karim O., QuranAnalysis: A Semantic Search and Intelligence System for the Quran, Master Theses, University of LEEDS, 2015. DOI:10.13140/RG.2.1.3165.7681

[10] Khan H., Saqlain S., Shoaib M., and Sher M., “Ontology-Based Semantic Search in Holy Quran,” International Journal of Future Computer and Communication, vol. 2, no. 6, pp. 570-575, 2013. DOI:10.7763/IJFCC.2013.V2.229

[11] Mohamed E. and Shokry E., “QSST: A Quranic Semantic Search Tool Based on Word Embedding,” Journal of King Saud University- Computer and Information Sciences, vol. 34, no. 3, pp. 934-945, 2022. https://doi.org/10.1016/j.jksuci.2020.01.004

[12] Muennighoff N., “SGPT: GPT Sentence Embeddings for Semantic Search,” arXiv Preprint, arXiv:2202.08904v5, 2022. https://doi.org/10.48550/arXiv.2202.08904 Embedding Search for Quranic Texts based on Large Language Models 255

[13] Nuhu Y., Yunus M., and Wahid N., “Query Expansion Based on Explicit-Relevant Feedback and Synonyms for English Quran Translation Information Retrieval,” International Journal of Advanced Computer Science and Applications, vol. 10, no. 5, pp. 227-234, 2019. DOI:10.14569/IJACSA.2019.0100530

[14] Nuhu Y., Yunus M., and Wahid N., “Query Expansion for Quran French Text Retrieval Using Semantic Search,” Journal of Soft Computing and Data Mining, vol. 1, no. 2, pp. 26-30, 2020. https://doi.org/10.30880/jscdm.2020.01.02.003

[15] Nuhu Y., Yunus M., Wahid N., Nawi N., and Samsudin A., “Query Expansion Method for Quran Search Using Semantic Search and Lucene Ranking,” Journal of Engineering Science and Technology, vol. 15, no. 1, pp. 675-692, 2020.

[16] Petroni F., Rocktäschel T., Lewis P., Bakhtin A., and Wu Y., “Language Models as Knowledge Bases,” arXiv Preprint, arXiv:1909.01066v2, 2019. https://doi.org/10.48550/arXiv.1909.01066

[17] Saeed S., Haider S., and Rajput Q., “On Finding Similar Verses from the Holy Quran Using Word Embeddings,” in Proceedings of the International Conference on Emerging Trends in Smart Technologies, Karachi, pp. 1-6, 2020. doi:10.1109/ICETST49965.2020.9080691.

[18] Safee M., Afifi M., Pitchay S., and Ridzuan F., “A Systematic Review Analysis for Quran Verses Retrieval,” Journal of Engineering and Applied Sciences, vol. 11, no. 3, pp. 629-634, 2016. DOI:10.3923/jeasci.2016.629.634

[19] Shoaib M., Yasin M., Hikmat U., Saeed M., and Khiyal M., “Relational WordNet model for semantic search in Holy Quran,” in Proceedings of the International Conference on Emerging Technologies, Islamabad, pp. 29-34, 2009. Doi:10.1109/ICET.2009.5353208.

[20] Sultana Z., Rahman M., Uddin M., and Arfat M., “Developing a Semantic Search Method for Retrieving Food Related Verses and Concepts from Holy Quran Using Ontology,” in Proceedings of the 5th International Conference on Electrical Engineering and Information Communication Technology, Dhaka, pp. 1-6, 2021. doi:10.1109/ICEEICT53905.2021.9667817

[21] Sun W., Yan L., Ma X., Wang S., and Ren P., Chen Z., Yin D., and Ren Z., “Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent,” arXiv Preprint, arXiv:2304.09542v1, 2023. https://doi.org/10.48550/arXiv.2304.09542

[22] Ta’a A., Abed A., Ali B., and Ahmad M., “Ontology-Based Approach for Knowledge Retrieval in Al-Quran Holy Book,” International Journal of Computational Engineering Science, vol. 6, pp. 8-15, 2016. http://dx.doi.org/10.4314/jfas.v9i5s.57

[23] Tarawneh M. and Al-Shawakfa A., “A Hybrid Approach for Indexing and Searching the Holy Quran,” Jordanian Journal of Computers and Information Technology, vol. 1, no. 1, pp. 42-51, 2015. Doi: 10.5455/jjcit.71-1445981961

[24] Utomo F., Suryana N., and Azmi M., “Question Answering Systems on Holy Quran: A Review of Existing Frameworks, Approaches, Algorithms and Research Issues,” Journal of Physics: Conference Series, vol. 1501, no. 1, pp. 012022. 2020. DOI:10.1088/1742-6596/1501/1/012022

[25] Zouaoui S. and Rezeg K., “A Novel Quranic Search Engine Using an Ontology-Based Semantic Indexing,” Arabian Journal for Science and Engineering, vol. 46, pp. 3653-3674, 2021. https://doi.org/10.1007/s13369-020-05082-5