A Deep Learning based Arabic Script Recognition System: Benchmark on KHAT

Author Riaz Ahmad, Saeeda Naz, Muhammad Afzal, Sheikh Rashid, Marcus Liwicki, Andreas Dengel,

Keywords #Handwritten Arabic text recognition #deep learning #data augmentation

Abstract

This paper presents a deep learning benchmark on a complex dataset known as KFUPM Handwritten Arabic TexT (KHATT). The KHATT data-set consists of complex patterns of handwritten Arabic text-lines. This paper contributes mainly in three aspects i.e., (1) pre-processing, (2) deep learning based approach, and (3) data-augmentation. The pre-processing step includes pruning of white extra spaces plus de-skewing the skewed text-lines. We deploy a deep learning approach based on Multi-Dimensional Long Short-Term Memory (MDLSTM) networks and Connectionist Temporal Classification (CTC). The MDLSTM has the advantage of scanning the Arabic text-lines in all directions (horizontal and vertical) to cover dots, diacritics, strokes and fine inflammation. The data-augmentation with a deep learning approach proves to achieve better and promising improvement in results by gaining 80.02% Character Recognition (CR) over 75.08% as baseline.

References

[1] Ahmad I., Mahmoud S., and Fink G., “Open- Vocabulary Recognition of Machine-Printed Arabic Text Using Hidden Markov Models,” Pattern Recognition, vol. 51, no. c, pp. 97-111, 2016.

[2] Ahmad R., Rashid S., Afzal M., Liwicki M., Dengel A., and Breuel T., “A Novel Skew Detection and Correction Approach For Scanned Documents,” in Proceedings of 12th International IAPR Workshop on Document Analysis Systems, At Santorini, 2016.

[3] Ahmad R., Afzal M., Rashid S., Liwicki M., Breuel T., and Dengel A., “KPTI: Katib’spashto Text Image Base and Deep Learning Benchmark,” in Proceedings of 15th International Conference on Frontiers in Handwriting Recognition, Shenzhen, pp. 453- 458, 2016.

[4] Ahmad R., Naz S., Afzal M., Rashid S., Liwicki M., and Dengel A., “KHATT: a Deep Learning Benchmark on Arabic Script,” in Proceedings of 14th IAPR International Conference on Document Analysis and Recognition, Kyoto, pp. 10-14, 2017.

[5] Ahmed S., Naz S., Swati S., Razzak I., Umar A., and Khan A., “UCOM Offline Dataset- An Urdu Handwritten Dataset Generation,” The International Arab Journal of Information Technology, vol. 14, no. 2, pp. 239-245, 2017.

[6] Chherawala Y., Roy P., and Cheriet M., “Feature Design for Offline Arabic Handwriting Recognition: Handcrafted vs Automated?,” in Proceedings of 12th International Document Analysis and Recognition, Washington, pp. 290-294, 2013.

[7] Graves A. and Schmidhuber J., “Offline Handwriting Recognition with Multidimensional Recurrent Neural Net- Works,” in Proceedings of the 21st International Conference on Neural Information Processing Systems, Red Hook, pp. 545-552, 2009.

[8] Graves A., Liwicki M., Fernandez S., Bertolami R., Bunke H., and Schmidhuber J., “A Novel Connectionist System for Unconstrained Handwriting Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 855-868, 2009.

[9] Graves A., in Guide to OCR for Arabic Scripts, Springer, 2012.

[10] Huang G., Song S., Gupta J., and Wu C., “Semi-Supervised and Unsupervised Extreme Learning Machines,” Transactions on Cybernetics, no. 44, no. 12, pp. 2405-2417, 2014.

[11] Kohavi R., “A Study of Cross-Validation and Boot-Strap For Accuracy Estimation and Model Selection,” in Proceedings of the International Joint Conference on Artificial Intelligence, Montreal, pp. 1137-1145, 1995.

[12] Krizhevsky A., Sutskever I., and Hinton G., “Imagenet Classification with Deep Convolutional Neural Networks,” in Proceedings of Neural Information Processing Systems, Lake Tahoe, pp. 1097-1105, 2012.

[13] Levenshtein V., “Binary Codes Capable of Correcting Deletions, Insertions and Reversals,” in Soviet Physics Doklady, vol. 10, pp. 707-710, 1966.

[14] Liwicki M., Graves A., Bunke H., and Schmidhuber J., “A Novel Approach To on- Line Handwriting Recognition Based on Bidirectional Long Short-Term Memory Networks,” in Proceedings of the 9th International Conference on Document Analysis and Recognition, United States, pp. 367-371, 2007.

[15] Mahmoud S., Ahmad I., Al-Khatib W., Al- Shayeb M., Parvez M., Margner V., and Fink G., “Khatt: An Open Arabic Offline Handwritten Text Database,” Pattern Recognition, vol. 47, no. 3, pp. 1096-1112, 2014.

[16] Mahmoud S., Ahmad I., Alshayeb M., Al- Khatib W., Parvez M., Fink G., Margner V., 304 The International Arab Journal of Information Technology, Vol. 17, No. 3, May 2020 and El Abed H., “Khatt: Arabic Offline Handwritten Text Database,” in Proceedings of International Conference on Frontiers in Handwriting Recognition, Bari, pp. 449-454, 2012.

[17] Naz S., Hayat K., Razzak M., Anwar M., Madani S., and Khan S., “The Optical Character Recognition of Urdu-Like Cursive Scripts,” Pattern Recognition, vol. 47, no. 3, pp. 1229-1248, 2014.

[18] Naz S., Umar A., Ahmad R., Ahmed S., Shirazi S., and Razzak M., “Urdu Nastaliq text Recognition System Based on Multi- Dimensional Recurrent Neural Network and Statistical Features,” Neural Computing Application, vol. 26, no. 8, pp. 1-13, 2016.

[19] Naz S., Umar A., Ahmad R., Ahmed S., Shirazi S., Siddiqi I., and Razzak M., “Offline Cursive Nastaliq Script Recognition Using Multidimensional Recurrent Neural Networks with Statistical Features,” NeuroComputing, vol. 177, pp. 228-241, 2016.

[20] Naz S., Umar A., Ahmad R., Razzak M., Rashid S., and Shafait F., “Urdu Nastaliq Text Recognition Using Implicit Segmentation Based on Multi-Dimensional Long Short Term Memory Neural Networks,” SpringerPlus, vol. 5, no. 1, pp. 2010, 2016.

[21] Naz S., Umar A., Ahmad R., Siddiqi I., Ahmed S., Razzak M., and Shafait F., “Urdu Nastaliq Recognition Using Convolutional-Recursive Deep Learning,” NeuroComputing, vol. 243, pp. 80-87, 2017.

[22] Naz S., Ahmed S., Ahmad R., and Razzak M., “Zon-ing Features and 2DLSTM for Urdu Text- Line Recognition,” Procedia Computer Science, vol. 96, pp. 16-22, 2016.

[23] Pan P., Xu Z., Yang Y., Wu F., and Zhuang Y., “Hierarchi-Cal Recurrent Neural Encoder for Video Representation with Application to Captioning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hynes Convention Center in Boston, pp. 1029-1038, 2016.

[24] Parvez M and Mahmoud S, “Offline Arabic Handwritten Text Recognition: A Survey,” ACM Computing Surveys, vol. 45, no. 2, pp. 23, 2013.

[25] Rashid S., Schambach M., Rottland J., and Null S., “Low Resolution Arabic Recognition With Multidimensional Recurrent Neural Networks,” in Proceedings of the 4th International Workshop on Multilingual OCR, New York, pp. 6, 2013.

[26] Rehman A., Naz S., and Razzak M., “Writer Identification Using Machine Learning Approaches:A Comprehensive Review,” in Multimedia Tools and Applications, vol. 78, no. 8, pp. 10889-10931, 2018.

[27] Yang Z., Yang D., Dyer C., He X., Smola A., and Hovy E., “Hierarchical Attention Networks for Document Classification,” in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, pp. 1480-1489, 2016.

[28] Zhong Z., Jin L., and Xie Z., “High Performance Off-Line Handwritten Chinese Character Recognition Using Googlenet and Directional Feature Maps,” in Proceedings of the 13th International Conference on Document Analysis and Recognition, Tunis, pp. 846-850, 2015. A Deep Learning based Arabic Script Recognition System: Benchmark on KHAT 305 Riaz Ahmad has received his PhD Degree from Technical University of Kaiserslautern, Germany, 2018 Currently; he is working as a faculty member at Shaheed Benazir Bhutto University, Sheringal, Pakistan. His areas of research include document image analysis, image processing and Optical Character Recognition. More specifically, his work examines the challenges pose by cursive script languages in the field of OCR systems. Saeeda Naz received her PhD degree in Jan 2016 (Hazara University, Pakistan). Currently, she is Assistant Professor by designation and Head of Computer Science Department at GGPGC No.1, Abbottabad, Higher Education Department of Government of KPK, Pakistan, since 2008. Her areas of interest are Optical Character Recognition, Pattern Recognition, Machine Learning, Medical Imaging and Natural Language Processing and Multimedia. Muhammad Afzal received his PhD degree in 2016 (German Research Center for Artificial Intelligence, University of Kaiserslautern, Germany). His research interests include generic segmentation framework for natural, document and, medical images, scene text detection and recognition. Sheikh Rashid obtained Doctor of Engineering (Dr.-Ing.) from University of Kaiserslautern Germany. Research Center for Artificial Intelligence (DFKI) Kaiserslautern. Currently he is working as a director at Artificial Intelligence Research Lab, Al Khwarizmi Institute of Computer Science (KICS), UET Lahore, Pakistan. Marcus Liwicki received his PhD degree from the University of Bern, Switzerland, in 2007. Currently he is an apl.-professor in the University of Kaiserslautern and a senior assistant in the University of Fribourg. His research interests include machine learning, pattern recognition, artificial intelligence, human computer interaction, digital humanities, knowledge management, document analysis, and graph matching. Andreas Dengel is a member of the Management Board as well as Scientific Director at the German Research Center for Artificial Intelligence (DFKI) in Kaiserslautern where he is leading the Smart Data & Knowledge Services Research Department. In 1993 he became a Professor at the Computer Science Department of the University of Kaiserslautern. Since 2009 he also holds a Honorary Professorship at the Dept. of Computer Science and Intelligent Systems, Graduate School of Engineering of the Osaka Prefecture University.