The International Arab Journal of Information Technology (IAJIT)


Building a Syntactic-Semantic Interface for aSemi- Automatically Generated TAG for Arabic

Syntactic and semantic resources play an important role for various Natural Language Processing (NLP) tasks by providing information about the correct structural representations of the sentences and their meaning. To date, there is not a wide-coverage electronic grammar for the Arabic language. In this context, we present a new approach for building a Tree Adjoining Grammar (TAG) to represent the syntax and the semantic of modern standard Arabic. This grammar is produced semi-automatically with the eXtensible MetaGrammar (XMG) description language. First the syntax of Arabic is described using the defined Arab-XMG meta-grammar. Then semantic information is added by introducing semantic frame-based dimension into the meta-grammar. This is achieved by exploiting lexical resources such as ArabicVerbNet. Finally, the link between semantic and syntax is established using a syntax-semantic interface that allows the construction of sentence meaning through semantic role labeling. Experiments were performed to check grammar coverage as well as the syntactic-semantic analysis. The results showed that the generated grammar can cover the basic syntactic structures of Arabic sentences and the different phrasal structures with a precision rate of about 92%. Moreover, it confirms the effectiveness of the proposed approach as we were able to parse semantically a set of sentences and build their semantic representations with a precision rate of about 72%.

[1] Abdelkader A., Haddar K., Ben Hamadou A., tude et analyse de la phrase nominale arabe en HPSG, Verbum ex machina (TALN vol. 1), Presses universitaires de Louvain, 2006.

[2] Aloulou C., Analyse syntaxique de l Arabe: Le syst me MASPAR, R CITAL 2003, Batz-sur- Mer, 2003.

[3] Bahou Y., HadrichBelguith L., Aloulou C., and Ben Hamadou A., Adaptation et impl mentation des grammaires HPSG pour l analyse de textes arabes non voyell s, 15 me Congr s Francophone AFRIF-AFIA de Reconnaissance des Formes et Intelligence Artificielle, Tours, pp. 25-27, 2008.

[4] Barhoumi A., Analyse syntaxique de la langue arabe Analyse syntaxique bas e sur une m thode d'apprentissage automatique, Editions universitaires europ ennes, 2015.

[5] Ben Fraj F., Ben OthmaneZribi C., and Ben Ahmed M., Grammaire TAG pour l Analyse Syntaxique de Textes en Arabe comme un Probl me de Classification, in Proceedings of the 9th International Business Information Management Conference, Marrakech, pp. 1-8, 2008.

[6] Ben Fraj F., Construction d une grammaire d arbres adjoints pour la langue arabe, in Proceedings of the Actes de la 18e Conf rence sur le Traitement Automatique des Langues Naturelles, Montpellier, 2011.

[7] Ben Khelil C., Duchier D., Parmentier Y., Zribi C., and Ben Fraj F., ArabTAG: from a Handcrafted to a Semi-automatically Generated TAG, in Proceedings of the 12th International Workshop on Tree-Adjoining Grammars and Related Formalisms, D sseldorf, 2016.

[8] Chaumartin F. and Kahane S., Une Approche Paresseuse de L analyse S mantique ou Comment Construire une Interface Syntaxe- S mantique Partir D exemples, in Proceedings of Actes de TALN, Montr al, pp. 146-171, 2010.

[9] Crabb B., Duchier D., Gardent C., Le Roux J., and Parmentier Y., XMG: eXtensible Metagrammar, Computational Linguistics, vol. 39, no. 3, pp. 591-629, 2013.

[10] Danlos L., D-STAG: Un Formalisme Pour Le Discours Bas sur les TAG Synchrones, Revue TAL, vol. 50, no.1, pp. 111-143, 2009.

[11] Fillmore C., Johnson C., and Petruck M.R., Background to FrameNet, International Journal of Lexicography, vol. 16, no. 3, pp. 235- 250, 2003.

[12] Frank A., Van Genabith J., Butt M., and King T., GlueTag Linear Logic based Semantics for LTAG and what it teaches us about LFG and 321 232 26 89 Syntactic analysis (92,5%) Semantic analysis (72,27%) 0200400 SuccessFailure 548 The International Arab Journal of Information Technology, Vol. 15, No. 3A, Special Issue 2018 LTAG, in Proceedings of LFG01, Hong Kong, 2001.

[13] Gardent C. and Kallmeyer L., Semantic Sonstruction in FTAG, in Proceedings of the European Chapter of the Association for Computational Linguistics, Budapest, pp. 3-8, 2003.

[14] Gardent C., Integrating a Unification-Based Semantics in a Large Scale Lexicalised Tree Adjoining Grammar for French, in Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, pp. 249- 256, 2008.

[15] Habbash N. and Rambow O., Extracting a Tree Adjoining Grammar from the Penn Arabic Treebank, Journ esD tudessur la Parole, pp. 446-454, 2004.

[16] Haddad B. and Yaseen M., A Compositional Approach Towards Semantic Representation and Construction of ARABIC, in Proceedings of Logical Aspects of Computational Linguistics, Bordeaux, pp. 147-161, 2005.

[17] Joshi A. and Vijay-Shanker K., Compositional Semantics with Lexicalized Tree Adjoining Grammar (LTAG): How Much Under specification is Necessary?, Computing Meaning, Springer, 1999.

[18] Joshi A., Levy L., and Takahashi M., Tree Adjunct Grammars, Journal of Computer and System Sciences, vol. 10, no. 1, pp. 136-163, 1975.

[19] Kallmeyer L. and Joshi A., Factoring Predicate Argument and Scope Semantics: Underspecified Semantics with LTAG, Research on Language and Computation, vol. 1, no. 1-2, pp. 3-58, 2003.

[20] Kallmeyer L. and Osswald R., Syntax-Driven Semantic Frame Composition in Lexicalized Tree Adjoining Grammars, Journal of Language Modelling, vol. 1, no. 2, pp. 1-63, 2013.

[21] Kallmeyer L. and Romero M., Scope and Situation Binding in LTAG using Semantic Unification, Research on Language and Computation, vol. 6, no. 1, pp. 3-52, 2008.

[22] Kasper S., A comparison of Thematic Role Theories, M.S. Thesis, Marburg University, 2008.

[23] Kipper K., Korhonen A., Ryant N., and Palmer M., A Large-Scale Classification of English Verbs Lang, Language Resources and Evaluation, vol. 42, no. 1, pp. 21-40, 2008.

[24] Levin B. and Hovav M., Argument Realization, Cambridge University Press, 2005.

[25] Levin B., English Verb Classes and Alternations a Preliminary Investigation, University of Chicago Press, 1993.

[26] Maamouri M., Bies A., Buckwalter T., and Mekki W., The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus, in Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, Cairo, 2004.

[27] Mammeri M. and Bouhassain N., Impl mentation d un fragment de grammaire HPSG de l arabesur la plate-forme LKB, in Proceedings of the 3rd International Conference on Arabic Language Processing, Rabat, 2009.

[28] Mousser J., A Large Coverage Verb Taxonomy for Arabic, in Proceedings of the7th Conference on International Language Resources and Evaluation, Valetta, pp. 2675-2681, 2010.

[29] Nerbonne J., A Feature-Based Syntax /Semantics Interface, Annals of Mathematics and Artificial Intelligence, vol. 8, no. 1-2, pp. 107-132, 1993.

[30] Nesson R. and Shiebers S., Simpler TAG Semantics through Synchronization, in Proceedings of the 11th Conference on Formal Grammer, Malaga, 2006.

[31] Parmentier Y., Kallmeyer L., Lichte T., Maier W., and Dellert J., TuLiPA: A Syntax- Semantics Parsing Environment for Mildly Context-Sensitive Formalisms, in Proceedings of the 9th International Workshop on Tree- Adjoining Grammar and Related Formalisms, T bingen, pp. 121-128, 2008.

[32] Parmentier Y., Semtag: Une Plate-Forme Pour Le Calcul S mantique a Partir De Grammaires d Arbres Adjoints, Ph.D Thesis, universit Henri Poincar -Nancy 1, 2007.

[33] Pollard C. and Sag I., Head-Drive Phrase Structure Grammar, University of Chicago Press, 1994.

[34] Schieber S. and Schabes Y., Synchronous Tree- Adjoining Grammars, Technical Report, 1990.

[35] Trione J., B chet F., Favre B., and Nasr A., Rapid FrameNet Annotation of Spoken Conversation Transcripts, in Proceedings of the Joint ACL-ISO Workshop on Interoperable Semantic Annotation, London, 2015.

[36] Van Valin R., Exploring the Syntax-Semantics Interface, Cambridge University Press, 2005.

[37] Venturi G., Semantic Annotation of Italian Legal Texts: a Framenet-Based Approach, Advances in Frame Semantics, vol. 3, no. 1, pp. 46-79, 2011.

[38] Weir J., Characterizing Mildly Context- Sensitive Grammar Formalisms, Ph.D thesis, Universit de Pennsylvanie, 1988.

[39] XTAG system A Lexicalized Tree Adjoining Grammar for English, Technical Report, 2001. Building a Syntactic-Semantic Interface for aSemi-Automatically Generated TAG for Arabic 549 Cherifa Ben Khelil received her Master s in Software Engineering from Higher Institute of Computer Science Ariana, Tunisia, and she is pursuing her Doctoral degree under joint supervision between the National School of Computer Sciences (ENSI), University of La Manouba in Tunisia and the University of Orleans in France. Her research interests are related to Natural Language Processing in particular grammar generation to represent the syntax and the semantic of Arabic.language. Chiraz Ben Othmane Zribi is a professor at the National School of Computer Science, University of La Manouba, Tunisia and a researcher at the RIADI-GDL laboratory. She received her PhD in computer science in 1998 from PARIS XI University, France. Her principal research interests are in the area of Arabic language processing. Her recent work has focused on natural language parsing, detection and correction of errors, generation of dictionaries and knowledge retrieval. Denys Duchier has been Professor of Computer Science at Universit d 'Orl ans, France, since 2006. He received his PhD from Yale University, United States, in 1991. After postdoctoral fellowships at University of Ottawa and University of Vancouver, Canada, he moved in 1996 to Saarland University, Germany, where he worked on the design and implementation of the Oz programming language. His research interests focus on the application of constraints in computational linguistics, and on the design and implementation of programming languages. Yannick Parmentier is an Associate Professor at Universit de Lorraine, France. He got his PhD in Computer Science from Henri Poincar University in Nancy, France, in 2007. During his PhD, he took part in the design and implementation of the XMG description language and its application to the formal description of French. In 2007-2008, he was a postdoctoral fellow at University of T bingen, Germany, where he worked on symbolic parsing. From 2009 to 2017, he was an Associate Professor at University of Orl ans working on constraint-based approaches in computational linguistics.