The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


A New Method for Curvilinear Text line Extraction and Straightening of Arabic Handwritten Text

Line extraction is a critical step from one of the main subtasks of Document Image Analysis, which is layout analysis. This paper presents a new method for curvilinear text line extraction and straightening in Arabic handwritten documents. The proposed method is based on a strategy that consists of two distinct steps. First, text line is extracted based on morphological dilation operation. Secondly, the extracted text line is straighten in two sub-steps: Course tuning of text line orientation based on Hough transform, then fine tuning based on centroid alignment of the connected component that forms the text line. The proposed approach has been extensively experimented on samples from the benchmark datasets of KFUPM Handwritten Arabic TexT (KHATT) and Arabic Handwriting DataBase (AHDB). Experimental results show that, the proposed method is capable of detecting and straightening curvilinear text lines even on challenging Arabic handwritten documents.


[1] Abuhaiba I., Datta S., and Holt M., Line Extraction and Stroke Ordering of Text Pages, in Proceedings of 3rd International Conference on Document Analysis and Recognition, Montreal, pp. 390-393, 1995.

[2] Al-Dmour A. and Fraij F., Segmenting Arabic Handwritten Documents into Text Lines and Words, International Journal of Advancements in Computing Technology, vol. 6, no. 3, pp. 109- 119, 2014.

[3] Al-Ma adeed S., Elliman D., and Higgins C., A Data Base for Arabic Handwritten Text Recognition Research, The international Arab Journal of Information Technology, vol. 1, no. 1, pp. 117-121, 2004.

[4] Al-Nashashibi M., Neagu D., and Yaghi A., An Improved Root Extraction Technique for Arabic Words, in Proceedings of 2nd International Conference on Computer Technology and Development, Cairo, pp. 264-269, 2010.

[5] Al-Rashdi S. and Arockiasamy S., Adopting Quadrilateral Arabic Roots in Search Engine of E-library System, International Journal of Recent Research in Social Science and Humanities, vol. 1, no. 1, pp. 47-53, 2014.

[6] Bennasri A., Zahour A., and Taconet B., Extraction Des Lignes D un Texte Manuscrit Arabe, in Proceedings of Vision Interface 99, Canada, pp. 42-48, 1999.

[7] Bhowmik T., Roy A., and Roy U., Character Segmentation for Handwritten Bangla Words Using Artificial Neural Network, in Proceedings of International Workshop on Neural Networks and Learning in Document Analysis and Recognition, Seoul, pp. 28-32, 2005.

[8] Bloomberg D., Kopec G., and Dasari L., Measuring Document Image Skew and Orientation, in Proceeding of IS & T/SPIE EI 95 Conference, San Jose, pp. 302-316, 1995.

[9] Boussellaa W., Zahour A., Elabed H., Benabdelhafid A., and Alimi A., Unsupervised Block Covering Analysis for Text-Line Segmentation of Arabic Ancient Handwritten Document Images, in Proceedings of International Conference on Pattern Recognition, Istanbul, pp. 1929-1932, 2010.

[10] Bukhari S., Shafait F., and Breuel T., Segmentation of Curled Textlines using Active Contours, in Proceedings of the 8th IAPR Workshop on Document Analysis Systems, Nara, pp. 270-277, 2008.

[11] Bukhari S., Shafait F., and Breuel T., Performance Evaluation of Curled Textlines Segmentation Algorithms, in Proceedings of 9th IAPR Workshop on Document Analysis Systems, DAS 10, Boston, 2010.

[12] Canny J., A Computational Approach to Edge Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, pp. 679-698, 1986.

[13] Fethi G., Mondher M., Snoussi M., and Margner V., Segmentation of Handwritten and Printed Arabic Documents, in Proceedings of Workshop on Signal and Document Processing, Hammamet, pp. 1-5, 2012.

[14] Gonzalez R., Woods R., and Eddins S., Digital Image Processing Using Matlab, Reading, MA: Addison-Wesley, 2004.

[15] Kasturi R., O'Gorman L., and Govindaraju V., Document Image Analysis: A Primer, Sadhana, vol. 27, no. 1, pp. 3-22, 2002. A New Method for Curvilinear Text line Extraction and Straightening ... 887

[16] Khayyat M., Lam L., Suen C., Yin F., and Liu C., Arabic Handwritten Text Line Extraction by Applying an Adaptive Mask to Morphological Dilation, in Proceedings of 10th IAPR International Workshop on Document Analysis Systems, Gold Cost, pp. 100-104, 2012.

[17] Kumar J., Abd-Almageed W., Kang L., and Doermann D., Handwritten Arabic Text Line Segmentation using Affinity Propagation, in Proceeding of the 9th IAPR International Workshop on Document Analysis Systems, Boston, pp. 135-142, 2010.

[18] Li Y., Zheng Y., Doermann D., and Jaeger S., Script- Independent Text Line Segmentation in Freestyle Handwritten Documents, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 8, pp. 1313-1329, 2008.

[19] Lutf M., You X., and Li H., Offline Arabic Handwriting Identification Using Language Diacritics, in Proceedings of International Conference on Pattern Recognition, Istanbul, pp. 1912-1915, 2010.

[20] Mahmoud S., Ahmad I., Al-Khatib W., Alshayeb M., TanvirParvez M., M rgner V., and Fink G., KHATT: An Open Arabic Offline Handwritten Text Database, Pattern Recognition, vol. 47, no. 3, pp. 1096-1112, 2014.

[21] Marqoues O., Practical Image and Video Processing Using MATLAB, Wiley, 2011.

[22] Nagabhusan P. and Alaei A., Tracing and Straightening the Baseline in Handwritten Persian/Arabic Text-line: A New Approach Based on Painting-Technique, International Journal on Computer Science and Engineering, vol. 2, no. 4, pp. 907-916, 2010.

[23] Nandini N., Murthy K., and Kumar H., Estimation of Skew Angle in Binary Document Images using Hough Transform, World Academy of Science, Engineering and Technology, vol. 2, no. 6, pp. 44-49, 2008.

[24] O Gorman L. and Kasturi R., Document Image Analysis, Computer Society Executive Briefing, 2009.

[25] Ouwayed N. and Belaid A., A General Approach for Multi-oriented Text Line Extraction of Handwritten Documents, International Journal on Document Analysis and Recognition, vol. 15, no. 4, pp. 1-18, 2011.

[26] Oztop E., Mulayim A., Atalay V., and Yarman Vural F., Repulsive Attractive Network for Baseline Extraction on Document Images, Signal Processing, vol. 75, no. 1, pp. 1-10, 1999.

[27] Peake G. and Tan T., Script and Language Identification from Document Images, in Proceedings of the British Machine Vision Conference (BMVC97), Essex, pp. 610-619,1997.

[28] Razak Z., Zulkiflee k., Idris M., and Yaacob M., Off-Line Handwriting Text Line Segmentation: A Review, International Journal of Computer Science and Network Security, vol. 8, no. 7, pp. 12-20, 2008.

[29] Yi L., Zheng Y., Doermann D., and Jaeger S., Script-Independent Text Line Segmentation in Freestyle Handwritten Documents, Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 8, pp. 1313-1329, 2008.

[30] Zahour A., Taconet B., and Ramdane S., Contribution `a la Segmentation De Textes Manuscrits Anciens, in Proceedings of Confrence Internationale Francophone Sur l Ecrit et le Document, La Rochelle, 2004.

[31] Zahour A., Taconet B., Likforman L. and Boussella W., Overlapping and Multi-Touching Text-Line Segmentation by Block Covering analysis, Pattern Analysis and Applications, vol. 12, no. 4, pp. 335 -351, 2009.

[32] Zheng Y., Li H., and Doermann D., A Model- Based Line Detection Algorithm in Documents, in Proceedings of 7th International Conference on Document Analysis and Recognition, Edinburgh pp. 44-48, 2003. 888 The International Arab Journal of Information Technology, Vol. 15, No. 5, September 2018 Ayman Al-Dmour received his BSc in Electronic - Communication Engineering in 1994 from Jordan University of Science and Technology, Irbid, Jordan. He pursued his MSc and PhD in 2003 and 2006, respectively, both in Computer Information Systems in the Arab Academy for Banking and Financial Sciences, Amman, Jordan. At Al-Hussein Bin Talal University (AHU), he has led the Department of Computer Information Systems, the Computer and Information Technology Center and the College of Information Technology. His research interests are in Arabic language processing, data compression and computer education. Ibrahim El rube' received his M.Sc. degree in Computer Engineering from Arab Academy for Science and Technology, Egypt in 1999 and his Ph.D. in Systems Design Engineering from the University of Waterloo in 2005. Currently, he is working as associate professor at the Computer Engineering Department in Taif University, Taif-KSA. His research interests include pattern recognition and image processing. Laiali Almazaydeh is a an assistant professor of Software Engineering at Al-Hussein Bin Talal University (AHU) in Jordan. She received a B.S. in Computer Science from Al- Hussein Bin Talal University and an M.S. in Computer Information Systems from The Arab Academy for Banking and Financial in 2003 and 2007, respectively. She received her Ph.D. in Computer Science and Engineering at the University of Bridgeport in2013, USA. Her research interests involve the wireless sensor networks, image processing and human computer interaction.