Exploring the Potential of Schemes in Building NLP

Arabic is known for its sparseness, which explains the difficulty of its automatic processing. The Arabic language is based on schemes; lemmas are produced using derivat ion based on roots and schemes. This latter character presents two major advantages: First, this “hidden side” of the Arabic language composed of schemes suffers much less from sparseness since it represents a finite set, second, schemes k eep a large number of features of the language in a much reduced vocabulary size. Schemes present a very great perspective and have great potential in building accurate natural language processing tools for Arabic. In this work we tried to explore this p otential by building some NLP tools while relying e ntirely on schemes. The work is related to text classification and a Probab ilistic Context Free Grammar (PCFG) parsing.

[23] Zrigui M., Ayadi R., Mars M., and Maraoui M., Arabic Text Classification Framework based on Latent Dirichlet Allocation, Journal of Computing and Information Technology , vol. 20, no. 2, pp. 125 140, 2012. Mohamed Achraf Ben Mohamed is a PhD student in the Faculty of Economic Sciences and Management of Sfax, Tunisia. He is member of LaTICE Laboratory, Monastir unity (Tunisia). His areas of interest include natural language processing, computer assisted language learning and machine learning. Souheyl Mallat received his BCs degree in computer science from the Higher Institute of Applied Science and Technology of Sousse, Tunisia and his MSc degree from the Faculty of Sciences of Monastir, Tunisia. He is member of LaTICE Laboratory, Monastir unity (Tunisia). His areas of interest inc lude natural language processing, data mining and information retrieval. Mohamed Amine Nahdi received his BA degree in computer science at the Faculty of Sciences of Monastir, Tunisia and MA at the Grenoble Institute of Technology, France. He is a member of LATICE laboratory in Tunisia and LIDILEM laboratory in Grenoble France. Mounir Zrigui is an associate professor at the University of Monastir, Tunisia. He received his PhD degree from the Paul Sabatier University, Toulouse, France in 1987 and his HDR in computer science from the Stendhal University, Grenoble, France in 2008. He has more than 25 years of experience including teaching and research in all aspects of automatic processing of natural language (written and oral).