The International Arab Journal of Information Technology (IAJIT)


An Enhanced Corpus for Arabic Newspapers

In this paper, we propose our enhanced approach to create a dedicated corpus for Algerian Arabic newspapers comments. The developed approach has to enhance an existing approach by the enrichment of the available corpus and the inclusion of the annotation step by following the Model Annotate Train Test Evaluate Revise (MATTER) approach. A corpus is created by collecting comments from web sites of three well know Algerian newspapers. Three classifiers, support vector machines, naïve Bayes, and k-nearest neighbors, were used for classification of comments into positive and negative classes. To identify the influence of the stemming in the obtained results, the classification was tested with and without stemming. Obtained results show that stemming does not enhance considerably the classification due to the nature of Algerian comments tied to Algerian Arabic Dialect. The promising results constitute a motivation for us to improve our approach especially in dealing with non Arabic sentences, especially Dialectal and French ones.

Hichem Rahab is currently working as an Assistant Professor in department of Mathematics and computer science in the University of Khenchela, Algeria. He obtained his Master degree in Computer science from Batna University, Algeria, 2012. His resaerch interest includes machine learning, Arabic opinion mining and sentiment analysis. Abdelhafid Zitouni received his PhD in computer science in 2008 from the University of Constantine, Algeria. He is currently working as Professor in University of Constantine 2 Abdelhamid Mehri. His research interests include Cloud Computing, Security, and Arabic text mining field. Pr. Abdelhafid Zitouni has published many articles in International Journals and Conferences. He peer- reviewed conference and journal papers in the above research topics. Mahieddine Djoudi received a PhD in Computer Science from the University of Nancy, France, in 1991. His PhD thesis research was in Acoustic Phonetic Decoding for Standard Arabic Speech Recognition. He is currently working at Computer Science Department, Faculty of Fundamental and Applied Sciences at the University of Poitiers, France and member of TechNE Technology Enhanced Learning Research Laboratory. His main scientific interests are: e-Learning, Mobile Learning, Cloud Computing, Information Literacy and Learning Analytics. He has published over 100 scientific papers. He is also a member of program committees, editor or reviewer for international journals or conferences proceedings.