The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Generating Sense Inventories for Ambiguous Arabic Words

The process of selecting the appropriate meaning of an ambigous word according to its context is known as word sense disambiguation. In this research, we generate a number of Arabic sense inventories based on an unsupervised approach and different pre-trained embeddings, such as Aravec, Fasttext, and Arabic-News embeddings. The resulted inventories from the pre-trained embeddings are evaluated to investigate their efficiency in Arabic word sense disambiguation and sentence similarity. The sense inventories are generated using an unsupervised approach that is based on a graph-based word sense inductionalgorithm. Results show that the Aravec- Twitter inventory achieves the best accuracy of 0.47 for 50 neighbors and a close accuracy to the Fasttext inventory for 200 neighbors while it provides similar accuracy to the Arabic-News inventory for 100neighbors. The experiment of replacing ambiguous words with their sense vectors is tested for sentence similarity using all sense inventories and the results show that using Aravec-Twitter sense inventoryprovides a better correlation value.


[1] Alian M. and Awajan A., “Semantic Similarity Approaches- Review,” in Proceedings of The International Arab Conference on Information Technology, Werdanye, pp. 1-6, 2018.

[2] Alian M. and Awajan A., “Sense Inventories for Arabic Texts,” in Proceedings of The International Arab Conference on Information Technology, Giza, pp. 1-4, 2020.

[3] Alian M., Awajan A., Al-Hasan A., and Akuzhia R., “Towards building Arabic paraphrasing benchmark,” in Proceedings of The 2nd International Conference on Data Science, E- learning and Information Systems, Dubai, pp. 1- 5, 2019.

[4] Alian, M. and Awajan A., “Semantic Similarity for English and Arabic Texts: A Review,” Journal of Information and Knowledge Management, vol. 19, no. 4, 2020.

[5] Alian M., Awajan A., and Al-Kouz A., “Word Sense Disambiguation for Arabic Text Using Wikipedia and Vector Space Model,” International Journal of Speech Technology, vol. 19, no. 4, pp. 857-867, 2016.

[6] Alian M., Awajan A., and Al-Kouz A., “Arabic Word Sense Disambiguation-Survey,” in Proceedings of The International Conference on New Trends in Computing Sciences, Amman, pp. 236-240, 2017.

[7] Alkhatlana A., Kalita J., and Alhaddad A., “Word Sense Disambiguation for Arabic Exploiting Arabic WordNet and Word Embedding,” in Proceedings of The 4th International Conference on Arabic Computational Linguistics, Dubai, pp. 50-60, 2018.

[8] AlKouli M., Transformation Rules for Arabic Language ( qwAEd tHwylyAh llgAh AlErbyAh), Dar Al-Falah, 1999.

[9] Altowayan A. and Tao L., “Word Embeddings for Arabic Sentiment Analysis,” in Proceedings of the International Conference on Big Data (Big Data), Washington, pp. 3820-3825, 2016.

[10] Chang H., Agrawal A., Ganesh A., Desai A., Mathur V., Hough A., and McCallum1 A., “Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings,” arXiv preprint arXiv:1804.03257, 2018.

[11] Chomsky N., Syntactic Structure, The Hague Mouton Publishers, 1957.

[12] Grave E., Bojanowski P., Gupta P., Joulin A., and Mikolov T., “Learning Word Vectors for 157 Languages,” in Proceedings of The International Conference on Language Resources and Evaluation, 2018.

[13] Hadni M., El Alaoui S., and Lachkar A., “Word Sense Disambiguation for Arabic Text Categorization,” The International Arab Journal of Information Technology, vol. 13, no. 1A, no. 1A, pp. 215-222, 2016.

[14] Ide N. and Véronis J., “Word Sense Disambiguation: The State of the Art,” Computational Linguistics, vol. 24, no. 1, pp. 1- 40, 1998. Generating Sense Inventories for Ambiguous Arabic Words 451

[15] Jurgens D., “An Analysis of Ambiguity in Word Sense Annotations,” in Proceedings of the 9th International Conference on Language Resources and Evaluation, Reykjavik, pp. 3006-3012, 2014.

[16] Laatar R., Aloulou C., and Belguith L., “Word Sense Disambiguation of Arabic Language With Word Embeddings As Part of The Creation of A Historical Dictionary,” in Proceedings of the International Workshop on Language Processing and Knowledge Management, Sfax, 2017.

[17] Laatar R., Aloulou C., and Bilguith L., “Word Sense Disambiguation of Arabic Language With Word Embeddings As Part of The Creation of A Historical Dictionary,” in Proceedings of The 8th International Conference on Computer Science and Information Technology, Amman, 2018.

[18] Logacheva V., Teslenko D., Shelmanov A., Remus S., Ustalov D., Kutuzov A., Artemova E., Biemann C., Ponzetto S., and Panchenko A., “Word Sense Disambiguation for 158 Languages using Word Embeddings Only,” arXiv preprint arXiv:2003.0665, 2020.

[19] Mikolov T., Sutskever I., Chen K., Corrado G., and Dean J., “Distributed Representations of Words And Phrases and Their Compositionality,” Neural Information Processing Systems, pp. 3111- 3119, 2013.

[20] Mohammad A., Eissa K., and El-Beltagy S., “Aravec: A Set of Arabic Word Embedding Models for Use in Arabic Nlp,” Procedia Computer Science, vol. 117, pp. 256-265, 2017.

[21] Navigli R., “Word Sense Disambiguation: A Survey,” ACM Computing Surveys, vol. 41, no. 2, pp. 1-69, 2009.

[22] Pelevina M., Arefyev N., Biemann C., and Panchenko A., “Making Sense of Word Embeddings,” in Proceedings of The 1st Workshop on Representation Learning for NLP, Berlin, pp. 174-183, 2016.

[23] Srivastava S. and Govilkar S., “A Survey on Paraphrase Detection Techniques for Indian Regional Languages,” The International Journal of Computer Applications, vol. 163, no. 9, pp. 0975-8887, 2017.

[24] Wu Z. and Palmer M., “Verb Semantics and Lexical Selection,” in Proceedings of The 32nd Annual Meeting of the Associations for Computational Linguistics, Stroudsburg, pp. 133- 138, 1994. Marwah Alian is a PhD candidate in Princess Sumaya University for Technology since 2015. She received her B.Sc. degree in Computer Science from Hashemite University in 1995 while her MS.c degree was received in Computer Science in 2007 from Jordan University. Her research interest is in the fields of e-learning systems, data mining and Natural language processing. Arafat Awajan is a Full Professor and the president of Mutah University. He was teaching at Princess Sumaya University for Technology (PSUT). He received his PhD degree in Computer Science from the University of Franche - Comte, France in 1987. He has held various administrative and academic positions at the Royal Scientific Society and Princess Sumaya University for Technology. Head of the Department of Computer Science (2000 -2003) Head of the Department of Computer Graphics and Animation (2005 -2006) Dean of the King Hussein School for Information Technology (2004 - 2007) Director of the Information Technology Center, RSS (2008 -2010) Dean of Student Affairs (2011 - 2014) Dean of the King Hussein School for Computing Sciences (2014 -2017) He is currently the vice president of the university (PSUT). His research interests include: Natural Language Processing, Arabic Text Mining and Digital Image Processing.