The International Arab Journal of Information Technology (IAJIT)


Tunisian Arabic Chat Alphabet Transliteration Using Probabilistic Finite State Transducers

Internet is taking more and more scale in Tunisians life, especially after the revolution in 2011. Indeed, Tunisian Internet users are increasingly using social networks, blogs, etc. In this case, they favor Tunisian Arabic chat alphabet, which is a Latin-scripted Tunisian Arabic language. However, few tools were developed for Tunisian Arabic processing in this context. In this paper, we suggest developing a Tunisian Arabic chat alphabet-Tunisian Arabic transliteration machine based on weighted finite state transducers and using a Tunisian Arabic lexicon: aebWordNet (i.e., aeb is the ISO 639-3 code of Tunisian Arabic) and a Tunisian Arabic morphological analyzer. Weighted finite state transducers allow us to follow Tunisian Internet user’s transcription behavior when writing Tunisian Arabic chat alphabet texts. This last has not a standard format but respects a regular relation. Moreover, it uses aebWordNet and a Tunisian Arabic morphological analyzer to validate the generated transliterations. Our approach attempts good results compared with existing Arabic chat alphabet-Arabic transliteration tools such as EiKtub.

[1] Allauzen C., Riley M., Schalkwyk J., Skut W., and Mohri M., “OpenFst: A General and Efficient Weighted Finite-State Transducer,” in Proceedings of the 12th International Conference on Implementation and Application of Automata, Prague, pp. 11-23, 2007.

[2] Al-Onaizan Y. and Knight K., “Translating Named Entities using Monolingual and Bilingual Resources,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 400-408, 2002.

[3] Arbabi M., Fischthal S., Cheng V., and Bart E., “Algorithms for Arabic Name Transliteration,” IBM Journal of Research and Development, vol. 38, no. 2, pp. 183-194, 1994.

[4] Hall P. and Dowling G., “Approximate String Matching,” ACM Computing Surveys, vol. 12, no. 4, pp. 381-402, 1980.

[5] Hassan H. and Sorensen J., “An Integrated Approach for Arabic-English Named Entity Translation,” in Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages/Association for Computational Linguistics, Ann Arbor, pp. 87- 93, 2005.

[6] ISO 24613, Language Resource Management – Lexical Markup Framework, ISO. Geneva, 2008.

[7] Karimi S., Scholer F., and Turpin A., “Machine Transliterations Survey,” ACM Computing Surveys, vol. 43, no. 3, 2011.

[8] Karimi S., Scholer F., and Turpin A., “Collapsed Consonant and Vowel Models: New Approaches for English-Persian Transliteration and Back- Transliteration,” in Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics/Association for Computational Linguistics, Czech Republic, pp. 648-655, 2007.

[9] Karmani N. and Souilem D., “Préanalyse Du Mot Arabe Basée Sur Une Approche De Filtrage Pour Une Analyse Morphologique,” in Proceedings of 16th Congrés INFormatique des ORganisations et Systèmes d’Information et de Décision/Workshop of the Arabic Information System, Hammamet, 2006.

[10] Karmani N., Soussou H., and Alimi A., “Building a Standardized Wordnet in the ISO LMF for Tunisian Arabic Language,” in Proceedings of 7th Global Wordnet Conference, Tartu Estonia, 2014.

[11] Karmani N., “Construction d’un Wordnet Standard Pour l’Arabe Tunisien,” in Proceedings of the 2nd Colloque Pour Les Étudiants Chercheurs en Traitement Automatique du Langage Naturel ET ses Applications, Sousse, 2015.

[12] Karmani N., Soussou H., and Alimi A., “Tunisian Arabic aebWordNet: Current state and future extensions,” in Proceedings of the 1st International Conference on Arabic Computational Linguistics, Cairo, pp. 3-8, 2015.

[13] Kashani M., Automatic Transliteration from Arabic to English and its Impact on Machine Translation, Theses, Simon Fraser University, 2007.

[14] Kaur V., Kaur A., and Singh J., “Hybrid Approach for Hindi to English Transliteration System for Proper Nouns,” International Journal of Computer Science and Information Technologies, vol. 5, no. 5, pp. 6361-6366, 2014.

[15] Levenshtein V., “Binary Codes Capable of Correcting Deletions, Insertions and Reversals,” Doklady Akademi Nauk, vol. 163, no. 4, pp. 845- 848, 1965.

[16] Liang P., Taskar B., and Klein D., “Alignment by Agreement,” in Proceedings of the 5th of Human Language Technology Conference-North American Chapter of the Association for Computational Linguistics Annual Meeting, New York, pp. 104-111, 2006.

[17] Masmoudi A., Habash N., Ellouze M., and Esteve Y., “Arabic Transliteration of Romanized Tunisian Dialect Text: Preliminary Investigation,” in Proceedings of the 16th International Conference on Intelligent Text processing and Computational Linguistics, Cairo, pp. 608-619, 2015.

[18] Mohri M., “Weighted Finite-State Transducer Algorithms: An Overview,” Formal L a n g u a g e s and Applications, Heidelberg, pp. 551-536, 2004.

[19] Mohri M., Pereira F., and Riley M., “Weighted Finite-State Transducers in Speech Recognition,” Tunisian Arabic Chat Alphabet Transliteration Using Probabilistic ... 303 Computer Speech and Language, pp. 1-26, 2001.

[20] Mostafa L., “A survey of Automated Tools for Translating Arab Chat Alphabet into Arabic Language,” American Academic and Scholarly Research Journal, vol. 4, no. 3, 2012.

[21] Och F. and Ney H., “The Alignment Template Approach to Statistical Machine Translation,” Computational Linguistics, vol. 30, no. 4, pp. 417-449, 2004.

[22] Pal S., Kumar Naskar S., and Bandyopadhyay S., “A Hybrid Word Alignment Model for Phrase- Based Statistical Machine Translation,” in Proceedings of the 2nd Workshop on Hybrid Approaches to Translation/ Association for Computational Linguistics, Sofia, pp 94-101, 2013.

[23] Soria C. and Monachini M., Kyoto-LMF Wordnet Representation Format, KYOTO Working Paper, 2008.

[24] Stalls B. and knight K., “Translating Names and Technical Terms in Arabic Texts,” in Proceedings of the 17th International Conference on Computational Linguistics COLING/ACL Workshop on Computational Approach to Semitic Languages, Montreal, pp. 34-41, 1998. Appendix A Table 10. Examples from the training corpus. corpusN° TACA word Manual transliteration 1 Manuel transliteration 2 1 5alal لَ لَ خ لَ لَ خ 2 akhaw واهكأ وهكأ 3 ittasalt تل َ صَّ تإ تل َ صَّ تإ 4 alihom م هيلع م هيلع 5 aumourek ك ِ رو موأ ك ِ رو مأ 6 menha اهن ِ م اهن ِ م 7 orang جن َ رأ جنوروأ Nadia Karmani (IEEE Student Member since 2012) She is graduated in Information systems and new technologies in 2007. She obtained a PhD in Information Systems Engineering in 2017. She is a member of the research REGIM- Lab. on intelligent Machines. Hsan Soussou He obtained a PhD in Information Systems Engineering in 2011.He is the manager of the electronic journal “Tunisie numérique”.He is founder and manager of MDSoft society since 2005. Adel Alimi (IEEE Student Member’91, Member’96, Senior Member’00). He graduated in Electrical Engineering in 1990. He obtained a PhD and then an HDR both in Electrical & Computer Engineering in 1995 and 2000 respectively. He is full Professor in Electrical Engineering at the University of Sfax, ENIS, since 2006. He is founder and director of the research REGIM-Lab. on intelligent Machines.