The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Syntactic Annotation in the I3rab Dependency Treebank

Arabic dependency parsers have a poor performance compared to parsers of other languages. Recently the impact of annotation at lexical level of dependency treebank on the overall performance of the dependency parses has been extensively investigated. This paper focuses on the impact of coarse-grained and fine-grained dependency relations on the performance of Arabic dependency parsers. Moreover, this paper introduces the annotation rules for I3rab dependency treebank. Experimentally, the obtained results showed that having an appropriate set of dependency relations improves the performance of an Arabic dependency parser up to 27.55%.


[1] Alosh M., Using Arabic: A Guide to Contemporary Usage, Cambridge University Press, 2005.

[2] Ambati V., “Dependency Structure Trees in Syntax Based Machine Translation,” MT Seminar Course Report, vol. 137, 2008.

[3] Atalay N., Oflazer K., and Say B., “The Annotation Process in The Turkish Treebank,” in Proceedings of 4th International Workshop on Linguistically Interpreted Corpora, Chicago, pp. 33-38, 2003.

[4] Böhmová A., Hajič J., Hajičová E., and Hladká B., “The Prague Dependency Treebank,” Treebanks, vol. 20, pp. 103-127, 2003.

[5] Comas P., Turmo J., and Màrquez L., “Sibyl A Factoid Question-Answering System for Spoken Documents,” ACM Transactions on Information Systems, vol. 30, no. 3, pp. 1-40, 2012.

[6] Comas P., Turmo J., and Márquez L., “Using Dependency Parsing and Machine Learning for Factoid Question Answering on Spoken Documents,” in Proceedings of 11th Annual Conference of the International Speech Communication Association, Chiba, pp. 1265- 1268, 2010.

[7] Dukes K. and Buckwalter T., “A Dependency Treebank of The Quran Using Traditional Arabic Grammar,” in Proceedings of 7th International Conference on Informatics and Systems, Cairo, pp. 1-7, 2010.

[8] El-Najjar H. and Baraka R., “Improving Dependency Parsing of Verbal Arabic Sentences Using Semantic Features,” in Proceedings of International Conference on Promising Electronic Technologies, Deir El-Balah, pp. 86- 91, 2018.

[9] Galley M. and Manning C., “Quadratic-Time Dependency Parsing for Machine Translation,” in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, pp. 773-781, 2013.

[10] Gillenwater J., He X., Gao J., and Deng L “End- To-End Learning of Parsing Models for Information Retrieval,” in Proceedings of International Conference on Acoustics, Speech and Signal Processing, British Columbia, pp. 3312-3316, 2013.

[11] Habash N. and Roth R., “Catib: The Columbia Arabic Treebank,” in Proceedings of the ACL- IJCNLP 2009 Conference Short Papers, Suntec, pp. 221-224, 2009.

[12] Halabi D., Awajan A., and Fayyoumi E., “Improving Arabic Dependency Parsers by Using Dependency Relations,” in Proceedings of 21st International Arab Conference on Information Technology, 6th of October, pp. 1-7, 2020.

[13] Halabi D., Fayyoumi E., and Awajan A., “I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical Theory,” arXiv preprint arXiv:2007.05772, 2020.

[14] Kakkonen T., “Dependency Treebanks: Methods, Annotation Schemes and Tools,” arXiv preprint cs/0610124, 2006.

[15] Katz-Brown J., Petrov S., McDonald R., Och, J., Talbot D., Ichikawa H., Seno M., Kazawa H., “Training A Parser for Machine Translation Reordering,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, pp. 183-192, 2011.

[16] Khan G. and Owens J., “Early Arabic Grammatical Theory: Heterogeneity and Standardization, Studies in the History of the Language Sciences,” Journal of Linguistics, vol. 53, no. 9, pp. 546-547, 1992.

[17] Buckwalter T., “Buckwalter Arabic morphological analyzer version,” https://catalog.ldc.upenn.edu/LDC2004L02, 2004.

[18] Hajic J., Smrz O., Zemánek P., Šnaidauf J., and Beška E., “Prague Arabic Dependency Treebank,” in Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, Paris, pp. 110-117, 2004.

[19] Hajič J., Smrž O., Zemánek P., Pajas P., Šnaidauf J., Beška E., and Hassanová K., “Prague Arabic dependency treebank 1.0,” https://catalog.ldc.upenn.edu/docs/LDC2004T23 Last Visited, 2004.

[20] Li H. and Xu F., “Question Answering with Dbpedia Based on The Dependency Parser and Entity-Centric Index,” in Proceedings of International Conference on Computational Intelligence and Applications, pp. 41-45, 2016.

[21] Marton Y., Habash N., and Rambow O., “Dependency Parsing of Modern Standard Arabic with Lexical and Inflectional Features,” Computational Linguistics, vol. 39, no. 1, pp. 161-94, 2013.

[22] Marton Y., Habash N., and Rambow O., “Improving Arabic Dependency Parsing with Lexical and Inflectional Morphological Features,” in Proceedings of the NAACL HLT 1st Workshop on Statistical Parsing of Morphologically-Rich Languages, Los Angeles, pp. 13-21, 2010.

[23] Nivre J., Hall, J., Kübler S., Nilsson J., Riedel S., Yuret D., and McDonald R., “The Conll 2007 Shared Task on Dependency Parsing,” in Syntactic Annotation in the I3rab Dependency Treebank 391 Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Stroudsburg, pp. 915-932, 2007.

[24] Owens J., The Foundations of Grammar, Jhon Benjamins, 1988.

[25] Sarker J., Billah M., and Al Mamun M., “Textual Question Answering for Semantic Parsing in Natural Language Processing,” in Proceedings of 1st International Conference on Advances in Science, Engineering and Robotics Technology, Bangladesh, pp. 1-5, 2019.

[26] Smrz O., Bielicky V., and Hajic J., “Prague Arabic Dependency Treebank: A Word on the Million Words,” Last Visited, 2008.

[27] Smrz O. and Pajas P., “Morphotrees of Arabic DQG WKHLUDQQRWDWLRQ LQ WKH7U(G HQYLURQPHQW´in Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, Paris, pp. 38-41, 2004.

[28] Smrz O., Šnaidauf J., and Zemánek P., “Prague Dependency Treebank for Arabic: Multi-Level Annotation of Arabic Corpus,” in Proceedings of the International Symposium on Processing of Arabic, Berlinm pp. 147-155, 2002. Dana Halabi is a PhD candidate in Computer Science (CS) at Princess Sumaya University for Technology (PSUT), Jordan. Her research interests include: Arabic NLP, Big data, Machine Learning, Deep learning. Arafat Awajan is a full professor of computer science at Mutah University and Princess Sumaya University for Technology. He received his PhD degree in computer science from the University of Franche-Comte, France in 1987. He held different academic positions at the Royal Scientific Society, Princess Sumaya University for Technology and Mutah University. He was appointed as the chair of the Computer Science Department (2000-2003) and the chair of the Computer Graphics and Animation Department (2005- 2006) at PSUT. He had been the dean of the King Hussein School for Information Technology from 2004 to 2007, the Dean of Student Affairs from 2011- 2014, the director of the Information Technology Center in the Royal Scientific Society from 2008- 2010, the dean of the King Hussein School for computing Sciences from 2014 to 2017, and the vice president of PSUT from 2017 to 2020. He is currently the president of Mutah university (Jordan).His research interests include natural language processing, text compression, and image processing. Ebaa Fayyoumi was born in Kuwait in 1978. She received the B.Sc. degree from Hashemite University, Zarqa, Jordan, in 2000, the M.Sc. degree from University of Jordan, Amman, Jordan, 2002, and the Ph.D. degree from Carleton University, Ottawa, ON, Canada, in 2008. She has been with the Faculty of Prince Hussein Bin Abdalla II for Information Technology, Hashemite University, since 2008. Prior to joining Hashemite University, she was a Lecturer at Carleton University. Ebaa joined Princess Sumaya University for Technology in 2016- 2018. She is a member in the Natural Language processing (NLP) group in Amman/Jordan. Her current research interests include statistical syntactical pattern recognition, micro-aggregation techniques, secure statistical databases, machine learning, applied algorithm, mobile application, e-learning and Natural Language Processing. She got many awards during her academic life; one of them is Carleton University Medal on Outstanding Graduate Work in 2008. 392 The International Arab Journal of Information Technology, Vol. 18, No. 3A, Special Issue 2021 Appendix – A: Annotation Examples Figure 1. “نادهتجم نابلاط ملاسو دمحم”, “Muhammed and Salem, the Taliban, are diligent”. Figure 2. “ةقرشم سمشلا نإ”, “The sun is shining”. Figure 3. “ةبتكملا يف باتكلا وه أرقي دمحم ناك”, “Muhammad was reading the book in the library”. Figure 4. “ديعس دئاقلا رصتنا”, “Commander Saeed won” Figure 5. “حم دوقي لااعرسم ةرايسلا دم”, “Mohamed does not drive fast”. Figure 6. “ادغ ةلحر يف بهذنس”, “We'll go on a trip tomorrow”. Figure 7. “نيترم هسفن لجرلا تدهاش”, “I watched the same guy twice” Figure 8. “اعيرس اناصح تيرتشا”, “I bought a horse fast”. Figure 9. “رانيد نيرشعب باتكلا تيرتشا”, “I bought the book for twenty dinars”.