The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


A Data Base for Arabic Handwritten Text Recognition Research

In this paper we present a new database for off-line Arabic handwriting recognition, together with several preprocessing procedures. We designed, collected and stored a database of Arabic handwriting (AHDB). This resulted in a unique databases dealing with handwritten information from Arabic text, both in terms of the size of the database as well as the number of different writers involved. We further designed an innovative, simple, yet powerful, in place tagging procedure for the database. It enables us to extract at will the bitmaps of words. We also built a preprocessing class, which contains some useful preprocessing operations. In this paper, the most popular words in Arabic writing were found for the first time using a specially designed program.

 


[1] Abuhaiba I., Mahmoud S., and Green R., “Recognition of Handwritten Cursive Arabic Characters,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 6, 1994.

[2] Almaadeed S., Higgens C., and Elliman D., “A New Preprocessing System for the Recognition of Off-line Handwritten Arabic Words,” IEEE International Symposium on Signal Processing and Information Technology, Egypt, December 2001.

[3] Almuallim H. and Yamaguchi S., “A Method of Recognition of Arabic Cursive Handwriting,” IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI-9), pp. 715-722, 1987.

[4] Al-Sadoun A. H. and Fischer S., “Hand-printed Character Recognition System Using Artificial Network,” Pattern Recognition, vol. 29, no. 4, pp. 663-675, 1996.

[5] Chen M. Y., Kundu A., and Zhou J., “Off-Line Handwritten Word Recognition Using a Hidden Markov Model Type Stochastic Network,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, no. 5, pp. 481-496, 1994.

[6] Cracknell C. and Downton A. C., “A colour approach to form dropout,” Downton A. and Impedovo S. (Eds.) in Progress in Handwriting Recognition, World Scientific, UK, 1997.

[7] Downton A. and Impedovo S., in Progress in Handwriting Recognition, World Scientific, UK, 1997.

[8] Freitas C. O., El Yacoubi A., Bortolozzi F., and Sabourin R., “Brazilian Bank Check Handwritten Legal Amount,” in Proceedings of the XIII Brizilian Symposium on Computer Graphics and Image Processing (SIBGRAPI'00), Brazil, 2000. A Data Base for Arabic Handwritten Text Recognition Research 121

[9] Howell D., “Getting to Grips with Graphic File Format,” Computer Publishing, issue 9, 2000.

[10] Hull J. J., “A database for handwritten text recognition research,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 5, pp. 550-554, 1994.

[11] Johansson S., Leech G. N., and Goodluck H., Manual of Information to Accompany the Lancaster-Oslo/Bergen Corpus of British English, for use with Digital Computers, Department of English, University of Oslo, Norway, 1978.

[12] LizardWorks, On-line reference, http://www. lizardworks.com/java.html, 2000.

[13] Marti U. and Bunke H., “A full English sentence database for off-line handwriting recognition,” in Proceedings of the 5th Int. Conf. on Document Analysis and Recognition (ICDAR'99), Bangalore, pp. 705-708, 1999.

[14] Nagy G., “At the Frontiers of OCR,” in Proceedings of IEEE, vol. 7, pp. 1093-1100, 1992.

[15] Obaid A. M., “Arabic Handwritten Character Recognition by Neural Nets,” Journal on Communications, vol. 45, pp. 90-91, 1994.

[16] Parker J. R., Algorithms for Image Processing and Computer Vision, John Wiley & Sons Inc., USA, 1997.

[17] Rafuel C. and Woods R. E., Digital Image Processing, Addison, USA, 1992.

[18] Saadallah S. and Yacu S., “Design of an Arabic Character Reading Machine,” in Proceedings of Computer Processing of Arabic Language, Kuwait, 1985.

[19] Saleh A., “A Method of Coding Arabic Characters and it's Application to Context-free Grammar,” Pattern Recognition Letters, vol. 15, issue 12, pp. 1265-1271, 1994.

[20] Schlosser S. G., “ERIM Arabic Document Database,” On-line reference: http://documents. cfar.umd.edu/resources/database/ERIM_Arabic_ DB.html, 2002.

[21] Senior A. W. and Robinson A. J., “An Off-line Cursive Handwriting Recognition System,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 309-321, March 1998.

[22] Suen C. Y., presented at the Int. Workshop Frontiers Handwriting Recognition, Montreal, Canada, April 1990.

[23] Zimmermann M., “The Homepage of the IAM Database,” On-line reference, http://iamwww. unibe.ch/~zimmerma/iamdb/iamdb.html, 2002. Somaya Al-Ma’adeed is a PhD student at the School of Computer Science and Information Technology, University of Nottingham, UK. She received her BSc in computer science from Qatar University and MSc in mathematics and computing from Alexandria University, Egypt. Previously, she worked as an assistant teacher at Qatar University. Her main interest is in Arabic handwritten recognition. Dave Elliman is a professor of applied computing at the School of Computer Science and Information Technology, University of Nottingham, UK. His main interests are document recognition especially graphics symbols and linework, hand-printed character and cursive script recognition, classifiers and neuro-fuzzy systems, genetic algorithms and swarm intelligence, financial modelling, knowledge reresentation and ontologies. Colin Higgins is a senior lecturer at the School of Computer Science and Information Technology, University of Nottingham, UK. His main interests are in cursive script recognition, pen computing and a multi-modal intelligent design aid via the designers apprentice project.