The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Exploring the Performance of Farasa and CAMeL Taggers for Arabic Dialect Tweets

In Natural Language Processing (NLP), Part Of Speech (POS) tagging is an important step; it is a fundamental requirement for many applications, such as information extraction, machine translation, and grammar checking. Successful POS taggers have been developed for many languages, including Arabic. Currently, the spread of social media has increased the diversity of dialects as people use them in their online communications. Therefore, it has become more difficult for researchers to classify some words that are understood by humans but not computers. In addition, most Arabic POS research focuses on Modern Standard Arabic (MSA), while Dialect Arabic (DA) receives less attention. This paper aims to evaluate the performance of two Arabic taggers when used on dialect Arabic tweets and determine which tagger is the appropriate one, which will accordingly help to improve the existent taggers for dialect Arabic tweets. We used the Farasa and CAMeL taggers, which are commonly used to analyze Arabic texts and are considered the best taggers for Arabic. The results indicate that CAMeL tagger performed better than Farasa tagger, with accuracies of 92% and 83% respectively. In other words, a hybrid POS tagger trained with MSA and DA returns better results than the one trained on MSA.

 


[1] Abdelali A., Darwish K., Durrani N., and Mubarak H., “Farasa: A Fast and Furious Segmenter for Arabic,” in Proceedings of the Conference of The North American Chapter of The Association for Computational Linguistics: Demonstrations, San Diego, pp. 11-16, 2016.

[2] Abumalloh R., Al-Sarhan H., Ibrahim O., and Abu-Ulbeh W., “Arabic Part-of-speech Tagging,” Journal of Soft Computing and Decision Support Systems, vol. 3, no. 2, pp. 45-52, 2016.

[3] Albared M., Omar N., Aziz M., Juzaiddin A., and Nazri M., “Automatic Part of Speech Tagging For Arabic: An Experiment Using Bigram Hidden Markov Model,” in Proceedings of the 5th International Conference, Beijing, pp. 361-370, 2010.

[4] Albogamy F. and Ramsay A., “Fast and Robust POS Tagger For Arabic Tweets Using Agreement- Based Bootstrapping,” in Proceedings of the 10th International Conference on Language Resources and Evaluation, Portorož, pp. 1500-1506, 2016.

[5] Albogamy F. and Ramsay A., “POS tagging for Arabic tweets,” in Proceedings of the Exploring the Performance of Farasa and CAMeL Taggers for Arabic Dialect Tweets 355 International Conference Recent Advances in Natural Language Processing, Hissar, pp. 1-8, 2015.

[6] Alharbi B., Alamro H., Alshehri M., Khayyat Z., Kalkatawi M., Jaber I., and Zhang X., “ASAD: A Twitter-based Benchmark Arabic Sentiment Analysis Dataset,” arXiv preprint arXiv:2011.00578, 2020.

[7] Alharbi R., Magdy W., Darwish K., AbdelAli A., and Mubarak H., “Part-of-Speech Tagging for Arabic Gulf Dialect using Bi-LSTM,” in Proceedings of the 11th International Conference on Language Resources and Evaluation, Japan, pp. 3925-3932, 2019.

[8] AlKhwiter W. and Al-Twairesh N., “Part-of- Speech Tagging for Arabic Tweets Using CRF and Bi-LSTM,” Computer Speech and Language, vol. 65, pp. 101138M, 2021.

[9] Al-Sabbagh R. and Girju R., “A Supervised POS Tagger for Written Arabic Social Networking Corpora,” in Proceedings of the 11th Conf Nat Lang Process KONVENS Empir Methods Nat Lang Process-Proc Conf Nat Lang Process, Vienna, pp. 39-52, 2012.

[10] Alshutayri A. and Atwell E., “Exploring Twitter as a Source of an Arabic Dialect Corpus,” International Journal of Computational Linguistics, vol. 8, no. 2, pp. 37-44, 2017.

[11] Darwish K. and Mubarak H., “Farasa: A New Fast and Accurate Arabic Word Segmenter,” in Proceedings of the 10th International Conference on Language Resources and Evaluation, Portorož, pp. 1070-1074, 2016.

[12] Darwish K., Mubarak H., Abdelali A., and Eldesouki M., “Arabic POS Tagging: Don’t Abandon Feature Engineering Just Yet,” in Proceedings of The 3rd Arabic Natural Language Processing Workshop, Valencia, pp. 130-137, 2017.

[13] Darwish K., Mubarak H., Abdelali A., Eldesouki M., Samih Y., Alharbi R., Magdy W., and Kallmeyer L., “Multi-dialect Arabic POS Tagging: A CRF Approach,” in Proceedings of The Eleventh International Conference on Language Resources And Evaluation, Miyazaki, pp. 93-98, 2018.

[14] Gimpel K., Schneider N., O’Connor B., Das D., Mills D., Eisenstein J., Heilman M., Yogatama D., Flanigan J., and Smith N., “Part-of-speech Tagging for Twitter: Annotation, Features, and Experiments,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Shortpapers, Portland, pp. 42-47, 2011.

[15] Kadim A. and Lazrek A., “ Parallel HMM-Based Approach for Arabic Part of Speech Tagging,” The International Arab Journal of Information Technology, vol. 15, no. 2, pp 341-351, 2018.

[16] Khoja S., “APT : Arabic Part-Of-speech Tagger,” in Proceedings of the Student Workshop at NAACL, pp. 20-25, 2001.

[17] Obeid O., Zalmout N., Khalifa S., Taji D., Oudah M., Alhafni B., Inoue G., Eryani F., Erdmann A., and Habash N., “CAMeL tools: An Open Source Python Toolkit for Arabic Natural Language Processing,” in Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, pp. 7022-7032, 2020.

[18] Pasha A., Al-Badrashiny M., Diab M., El Kholy A., Eskander R., Habash N., Pooleery M., Rambow O., and Roth R., “MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic,” in Proceedings of the Language Resources and Evaluation Conference, Reykjavik, pp. 1094-1101, 2014.

[19] Salameh S., “A Review of Part of Speech Tagger for Arabic Language,” International Journal of Computation and Applied Sciences, vol. 4, no. 2, pp. 344-349, 2018.

[20] Sawalha M. and Atwell E., “A standard Tag Set Expounding Traditional Morphological Features for Arabic Language Part-of-speech Tagging,” Word Structure, vol. 6, no. 1, pp. 43-99, 2013.

[21] Sawalha M. and Atwell E., Fine-Grain Morphological Analyzer and Part-of-Speech Tagger for Arabic Text,” in Proceedings of the 7th Conference on International Language Resources and EvaluationEuropean Language Resources Association, Valletta, pp. 1258-1265, 2010.

[22] Yousif J. and Al-risi M., “Part of Speech Tagger for Arabic Text Based Support Vector Machines : a Review Part of Speech Tagger for Arabic Text Based Support Vector Machines: a Review,” Journal on Soft Computing, vol. 9, no. 2, pp. 1867-1873, 2019.

[23] Zeroual I., Lakhouaja A., and Belahbib R., “Towards A Standard Part of Speech Tagset for the Arabic Language,” Journal of King Saud University-Computer and Information Sciences, vol. 29, no. 2, pp. 171-178, 2017.

[24] Zribi C., Torjmen A., and Ahmed M., “A Multi- Agent System for POS-Tagging Vocalized Arabic Texts,” The International Arab Journal of Information Technology, vol. 4, no. 4, pp. 322- 329, 2007.