The International Arab Journal of Information Technology (IAJIT)


A New Approach for Arabic Named Entity Recognition

A Named Entity Recognition (NER) plays a noteworthy role in Natural Language Processing (NLP) research, since it makes available the detection of proper nouns in unstructured texts. NER makes easier searching, retrieving, and extracting information seeing as the significant information in texts is usually sited around proper names. This paper suggests an efficient approach that can identify Named Entities (NE) in Arabic texts without the need for morphological or syntactic analysis or gazetteers. The goal of our approach is to provide a general framework for Arabic NE recognition. Within this framework; the system learns the recognition of NE automatically and induces NE systematically, starting from sample NE instances as seeds. This method takes advantage from the web, the approach learns from a web corpus. The seeds are used to identify the contexts in the web denoting NE and then the contexts identify new NE. Thorough experimental evaluation of our approach, the performances measured by recall, precision and f-measure conducted to recognize NE are promising. We obtained an overall rate of F-measure equal to 83%.

[1] Aboaoga M. and Ab-Aziz M., Arabic Person Names Recognition by Using a Rule Based Approach, Journal of Computer Science, vol. 9, no 7, pp. 922-927, 2013.

[2] Abuleil S., Extracting Names From Arabic Text for Question-Answering Systems, in Proceeding of the 7th International Conference on Coupling Approaches, Coupling Media, and Coupling Languages for Information Retrieval, Vaucluse, pp. 638-647, 2004.

[3] Al-Jumaily H., Mart nez P., Mart nez-Fern ndez J., and Goot E., A Real Time Named Entity Recognition System for Arabic Text Mining, Journal of Language Resources and Evaluation, vol. 46, no. 4, pp. 543-563, 2012. (11) (12) (13) number of NE recognized by the systemRecallnumber of correct NE in the corpus number of correct NE recognized by the systemPrecisionnumber of NE given by the system 2 * ( * ) () recall precisionF measurerecall precision A New Approach for Arabic Named Entity Recognition 337

[4] Althobaiti M., Kruschwitz, U., and Poesio M., Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia, in Proceeding of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, Sweden, pp. 106-115, 2014.

[5] Karaa W. Named entity Recognition using Web Document Corpus, International Journal of Managing Information Technology, vol. 3, no. 1, pp. 46-56, 2011.

[6] Benajiba Y., Diab M., and Rosso P., Arabic Named Entity Recognition Using Optimized Feature Sets, in Proceeding of the Conference on Empirical Methods in Natural Language Processing, Stroudsburg, pp. 284-293, 2008.

[7] Benajiba Y., Diab M., and Rosso P., Using Language Independent and Language Specific Features to Enhance Arabic Named Entity Recognition, The International Arab Journal of Information Technology, vol. 6, no. 5, pp. 464- 472, 2009.

[8] Benajiba Y., Zitouni I., Diab M., and Rosso P., Arabic Named Entity Recognition: Using Features Extracted from Noisy Data, in Proceeding of the 48th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 281-285, 2010.

[9] Benajiba Y., Rosso P., and Ruiz J., ANERsys: An Arabic Named Entity Recognition system based on Maximum Entropy, in Proceeding of 8th International Conference on Computational Linguistics and Intelligent Text Processing, Mexico, pp. 143-153, 2007.

[10] Daille B. and Morin E., Reconnaissance Automatique Des Noms Propres De La langue crite: Les R centes Realizations, Traitement Automatique des Langues, vol. 41, no. 3, pp. 601- 621, 2000.

[11] Denis F., Gilleron R., and Letouzey F., Learning from Positive and Unlabeled Examples, Journal of Theoretical Computer Science, vol. 348, no. 1, pp. 70-83, 2005.

[12] Fourour N. and Morin E., Apport du Web Dans la Reconnaissance des Entit s Nomm es, Revue Qu b coise De Linguistique, vol. 32, no. 1, pp. 41-60, 2003.

[13] Kilgarriff A. and Grefenstette G., Introduction to the Special Issue on the Web as Corpus, Journal of Computational Linguistics, vol. 29, no. 3, pp. 333-347, 2003.

[14] Kiryakov A., Popov B., Terziev I., Manov D., and Ognyanoff D., Semantic Annotation, Indexing, and Retrieval, The Journal of Web Semantics, Elsevier, vol. 2, no. 1, pp. 49-79, 2004.

[15] Kumaran G. and Allan J., Text Classification and Named Entities for New Event Detection, in Proceeding of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, pp. 297- 304, 2004.

[16] Maloney J. and Niv M., TAGARAB: A Fast, Accurate Arabic Name Recognizer using High Precision Morphological Analysis, in Proceeding of the Workshop on Computational Approaches to Semitic Languages, Montreal, pp. 8-15, 1998.

[17] Nadeau D. and Sekine S., A Survey of Named Entity Recognition and Classification, Lingvisticae Investigationes, vol. 30, no. 1, pp. 3- 26, 2007.

[18] Salton G., Wong A., and Yang C., A Vector Space Model for Information Retrieval, Journal of the American Society for Information Science, vol. 8, no. 11, pp. 613-620, 1975.

[19] Samy D., Moreno A., and Guirao J., A Proposal for an Arabic Named Entity Tagger Leveraging a Parallel Corpus, in Proceedings of the International Conference on Recent Advances in Natural Language Processing, Borovets, pp. 459-465, 2004.

[20] Shaalan K., A Survey of Arabic Named Entity Recognition and Classification, Journal of Computational Linguistics, vol. 40, no. 2, pp. 469-510, 2014.

[21] Shaalan K. and Oudah M., A Hybrid Approach to Arabic Named Entity Recognition, Journal of Information Science, vol. 40, no. 1, pp. 67-87, 2014.

[22] Shaalan K. and Raza H., NERA: Named Entity Recognition for Arabic, The Journal of the American Society for Information Science and Technology, vol. 60, no. 8, pp. 1652-1663, 2009.

[23] Quinlan J., C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, 1993.

[24] Zaghouani W., RENAR: A Rule-Based Arabic Named Entity Recognition System, Journal of ACM Transactions on Asian Language, Information Processing, vol. 11, no. 1, pp. 1-13, 2012.

[25] Zribi I., Hammami S., and Belguith L., L apport d une Approche Hybride Pour la Reconnaissance des Entit s Nomm es en Langue Arabe, in Proceeding of the International Conference: Traitement Automatique des Langues Naturelles, Montr al, pp. 1-6, 2010. 338 The International Arab Journal of Information Technology, Vol. 14, No. 3, May 2017 Wahiba Karaa she is currently an assistant professor in the Department of Computer Science at Taif University, Saudi Arabia. She received the Master Degree from Paris III, New Sorbonne, France, and PhD, from Paris 7 Jussieu France. Her research interest includes Natural language processing, document annotation, information retrieval, Text Mining, Data Mining, and Image Mining. She is a member of the Editorial Board of several International Journals, and Editor in Chief of the International Journal of Image Mining (inderscience publishers). Thabet Slimani got a PhD in Computer Science from the University of Tunisia. He is currently an Assistant Professor in Computer Science department at Taif University of Saudia Arabia and a LARODEC Labo member (University of Tunisia). His research interests are mainly related to Semantic Web, Data Mining, Text Mining, Business Intelligence, Knowledge Management and Web services. He has published his research through international conferences and peer reviewed journals. He also serves as journals reviewer.