The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Machine Translation Infrastructure for Turkic

In this study, a multilingual, extensible machine translation infrastructure for grammatically similar Turkic languages “MT-Turk” is presented. MT-Turk infrastructure has multi-word support and is designed using a combined rule- based translation approach thatunites the strengths of interlingual and transfer approaches. This resulted in achieving ease of extensibility by adding new Turkic languages. The new language can be used both as destination and as source language achieving two-way extensibility. In addition, the infrastructure is strengthened with the ability of learning from previous translations and using the suggestions of previous users for disambiguation. Finally, the success of MT-Turk for three Turkic languages-Turkish, Kirghiz and Kazan- is evaluated using BiLingual Evaluation Understudy (BLEU) metric and it is seen that the suggestion system improved the success by 43.66% in average. Although the lack of linguistic resources affected the success of the system negatively, this study led to the introduction of an extensible infrastructure that can learn from previous translations.


[1] Ahmed F. and Nurnberger A., “Literature Review of Interactive Cross Language Information Retrieval Tools,” The International Arab Journal of Information Technology, vol. 9, no. 5, pp. 479- 486, 2012.

[2] Aktaş Ö., “Türkçe için Verimli bir Cümle Sonu Belirleme Yöntemi,” in Proceedings of Akademik Bilişim Bilgi Teknolojileri Kongresi IV, Denizli, 2006.

[3] Alkım E. and Çebi Y., “Türk Lehçeleri Arası Otomatik Çeviri ve Karşılaşılan Sorunlar,” in Proceedings of V. Genç Türkologlar Sempozyumu Kitabı, Bishkek, 2012.

[4] Alkım E. and Çebi Y., “Türk Dillerinin Bilgisayarlı Çevirisi ve Karşılaşılan Sorunlar,” in Proceedings of VII Uluslararasi Turk Dili Kurultayi, Ankara, 2012.

[5] Altıntaş K. and Çiçekli İ., “A Machine Translation System Between a Pair of Closely Related Languages,” in Proceedings of the 17th International Symposium on Computer and Information Sciences, Orlando, pp. 192-196, 2002.

[6] Apertium, “Apertium Turkic Working Group,” 2016.

[Online]. Available: http://wiki.apertium.org/wiki/Apertium_Turkic, Last Visited, 2016.

[7] Birant, C., “Root-Suffix Seperation of Turkish Words,” Thesis, Dokuz Eylül Üniversitesi, 2008.

[8] Çengel H., Kırgız Türkçesi Grameri-Ses ve Şekil Bilgisi, Akçağ Yayınları, 2005.

[9] Dorr B., Hovy E., and Levin L., in Encyclopedia of Language and Linguistics, Elsevier, 2006.

[10] Ercilasun A., Karşılaştırmalı Türk Lehçeleri Sözlüğü, Kültür Bakanlığı Yayınları, 1992.

[11] Fatih University, “DİLMAÇ Project,” 2013.

[Online]. Available: http://datamining.ceng.fatih.edu.tr:8080/dilmac/, Last Visited, 2013.

[12] Forcada M., Ginestí-Rosell M., Nordfalk J., O’Regan J., Ortiz-Rojas S., Pérez-Ortiz J., Sánchez-Martínez F., Ramírez-Sánchez G., and Tyers F., “Apertium: A Free/Open-Source Platform for Rule-Based Machine Translation,” Machine Translation, vol. 25, no. 2, pp. 127-144, 2011.

[13] Hajič J., Hric J., and Kubon V., “Machine Translation of Very Close Languages,” in Proceedings of 6th Conference on Applied Natural Language Processing, Seattle, pp. 7-12, 2000.

[14] Hutchins J., “Towards A Definition of Example- Based Machine Translation,” in Proceedings of the 2nd Workshop on Example-Based Machine Translation at MT Summit X, Phuket, pp. 63-70, 2005.

[15] Hutchins W., in The Encyclopedia of Languages and Linguistics, Pergamon Press, 1994.

[16] Oflazer K., Çetinoğlu Ö., and Say B., “Integrating Morphology with Multi-Word Expression Processing in Turkish,” in Proceedings of the Workshop on Multiword Expressions: Integrating Processing, Barcelona, pp. 64-71, 2004.

[17] Orhun M., Adali E., and Tantuğ A., “Uygurcadan Türkçeye bilgisayarlı çeviri,” ITU Journal Series D: Engineering, vol. 10, no. 3, pp. 3-14, 2011.

[18] Papineni K., Roukos S., Ward T., Zhu W., and Heights Y., “IBM Research Report Bleu : a Method for Automatic Evaluation of Machine Translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 311-318, 2002.

[19] Salymjanov I., Washington J., and Tyers F., “A Free/Open-Source Kazakh-Tatar Machine Translation System,” in Proceedings of the XIV Machine Translation Summit, Nice, pp. 175-182, 2013.

[20] Shylov M., “Turkish and Turkmen Morphological Analyzer and Machine Translation Program,” Masters Thesis, Fatih University İstanbul Turkey, 2008.

[21] SIL International, “Ethnologue: Languages of the World,”

[Online]. Available: http://www.ethnologue.com/family/17-15, Last Visited, 2016.

[22] Tantuğ A., “Akraba Ve Bitişken Diller Arasında Bilgisayarlı Çeviri Için Karma Bir Model,” Thesis, Istanbul Technical University, 2007.

[23] Tantuğ A., Adali E., and Oflazer K., “Türkmenceden Türkçeye Bilgisayarlı Metin Çevirisi,” İstanbul Üniversitesi Mühendislik Derg, vol. 7, no. 4, pp. 83-94, 2008.

[24] Tantuğ A., Adalı E., and Oflazer K., “A MT System from Turkmen to Turkish Employing Finite State and Statistical Methods Turkish and Turkmen Languages,” in Proceedings of MT Summit XI, no. 1993, pp. 459-465, 2007.

[25] Tantuğ A., Oflazer K., and El-Kahlout I., “BLEU+: A Tool for Fine-Grained BLEU Computation,” in Proceedings of the International Conference on Language Resources and Evaluation, Marrakech, pp. 1493- 1499, 2008.

[26] Tayirova N., Tekerek M., and Brimkulov U., “Kırgız ve Türkiye Türkçeleri Arasında Istatistiksel Bilgisayarlı Çeviri Uygulaması Ve Başarım Testi,” MANAS Journal of Engineering, vol. 3, no. 2, pp. 59-68, 2015.

[27] Venkatapathy S. and Joshi A., “Using 388 The International Arab Journal of Information Technology, Vol. 16, No. 3, May 2019 Information about Multi-Word Expressions for the Word-Alignment Task,” in Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, Sydney, pp. 20-27, 2006.

[28] Wehrli E., Nerima L., and Scherrer Y., “Deep Linguistic Multilingual Translation and Bilingual Dictionaries,” in Proceedings of the 4th Workshop on Statistical Machine Translation, Athens, pp. 90-94, 2009.

[29] Zhang Y., Vogel S., and Waibel A., “Interpreting BLEU/NIST scores: How much Improvement Do We Need to Have A Better System,” in Proceedings of Language Resources and Evaluation, Lisbon, pp. 2051-2054, 2004. Emel Alkim received her B.Sc., M.Sc. and Ph.D. in Computer Engineering from Dokuz Eylul University, Izmir, Turkey. Her main research areas are natural language processing and machine translation. Yalçın Çebi received his B.Sc., M.Sc. and Ph.D. in Mining Engineering from Dokuz Eylul University, Izmir, Turkey. His main research areas include natural language processing, machine translation and wireless sensor and actor networks.