The International Arab Journal of Information Technology (IAJIT)


Arabic Character Extraction and Recognition using

The intention behind this research is to present an original work undertaken for Arabic character extraction and recognition for attaining higher percentage of recognition rate. Copious techniques for character, text extraction were proposed in earlier decades, but very few of them shed light on Arabic character set. From literature survey, it was found that 100% recognition rate is not attained by earlier proposed implementations. The proposed technique is novel and is based on traversing of the characters in a given text and marking their directions viz. North-South (NS), East-West (EW), North East- South West (NE-SW), North West-South East (NW-SE) etc., in an array and comparing them with the pre-defined codes of every character in the dataset. The experiments were conducted on Arabic news videos, documents taken from Arabic Printed Text Image (APTI) database and the results achieved from this research are very promising with a recognition rate of 98.1%. The proposed algorithm in this research work can replace the existing algorithms used in present Arabic Optical Character Recognition (AOCR) systems.

[1] Abuzaraida M., Zeki A., and Zeki A., Feature Extraction Techniques of Online Handwriting Arabic Text Recognition, in Proceedings of 5th International Conference on Information and Communication Technology for the Muslim World, Rabat, pp. 1-7, 2013.

[2] Alasadi A. and Subber T., Arabic-Text Extraction from Video Images, Journal of Basrah Researches (Sciences), vol. 39, no. 4, pp. 120-136, 2013.

[3] Aljarrah I., Al-Khaleel O., Mhaidat K., Alrefai M., Alzu'bi A., and Rabab'ah M., Automated System for Arabic Optical Character Recognition with Lookup Dictionary, Journal of Emerging Technologies in Web Intelligence, vol. 4, no. 4, pp. 362-370, 2012.

[4] Alsaad A. and Abbod M., Arabic Text Root Extraction Via Morphological Analysis and Linguistic Constraints, in Proceedings of 16th International Conference on Computer Modeling and Simulation, Cambridge, pp. 125-130, 2014.

[5] Alshameri A., Abdou S., and Mostafa K., A Combined Algorithm for Layout Analysis of Arabic Document Images and Text Lines Extraction, International Journal of Computer Applications, vol. 49, no. 23, pp. 30-37, 2012.

[6] Darab M. and Rahmati M., A Hybrid Approach to Localize Farsi Text in Natural Scene Images, Procedia Computer Science, vol. 13, pp. 171- 184, 2012.

[7] Dinges L., Al-Hamadi A., Elzobi M., Al-Aghbari Z., and Mustafa H., Offline Automatic Segmentation based Recognition of Handwritten Arabic Words, International Journal of Signal Processing, Image Processing and Pattern Recognition, vol. 4, no. 4, pp. 131- 144, 2011.

[8] Elnagar A. and Bentrcia R., A Multi-Agent Approach to Arabic Handwritten Text Segmentation, Journal of Intelligent Learning Systems and Applications, vol. 4, pp. 207-215, 2012.

[9] Ghorpade J., Palvankar R., Patankar A. and Rathi S., Extracting Text from Video, Signal and Image Processing: an International Journal, vol. 2, no. 2, pp. 103-112, 2011.

[10] Halima M., Karray H., and Alimi A., Arabic Text Recognition in Video Sequences, in Proceedings of International Conference on Informatics, Cybernetics and Computer Applications, Bangalore, pp. 603-608, 2010.

[11] Haraty R. and Ghaddar C., Arabic Text Recognition, The International Arab Journal of Information Technology, vol. 1, no. 2, pp. 156- 163, 2004.

[12] Kaushik K. and Suresha D., Automatic Text Extraction in Video Based on the Combined Corner Metric and Laplacian Filtering Technique, International Journal of Advanced Research in Computer Engineering and Technology, vol. 2, no. 6, pp. 2119-2124, 2013.

[13] Khayyat M., Lam L., Suen C., Yin F., and Liu C., Arabic Handwritten Text Line Extraction by Applying an Adaptive Mask to Morphological Dilation, in Proceedings of 10th IAPR International Workshop on Document Analysis Systems, Gold Cost, pp. 100-104, 2012.

[14] Meryem H., Ouatik S., and Lachkar A., A Novel Method for Arabic Multi-Word Term Extraction, International Journal of Database Management Systems, vol. 6, no. 3, pp. 53-67, 2014.

[15] Mohanabharathi R., Surender K., and Selvi C., Detecting and Localizing Color Text in Natural Scene Images using Region Based and Connected Component Method, International Journal of Modern Engineering Research, vol. 3, no. 1, pp. 331-335, 2013.

[16] Moradi M., Mozaffari S., and Orouji A., Farsi/Arabic Text Extraction from Video Images by Corner Detection, in Proceedings of 6th Iranian Conference on Machine Vision and Image Processing, Isfahan, pp. 1-6, 2010.

[17] Murthy K. and Kumaraswamy Y., Robust Model for Text Extraction from Complex Video Inputs Based on SUSAN Contour Detection and Fuzzy C Means Clustering, International Journal of Computer Science Issues, vol. 8, no. 3, pp. 225-234, 2011.

[18] Pan Y., Hou X., and Liu C., A Hybrid Approach to Detect and Localize Texts in Natural Scene Images, IEEE Transactions on Image Processing, vol. 20, no. 3, pp. 800-813, 2011. 368 The International Arab Journal of Information Technology, Vol. 15, No. 3, May 2018

[19] Pratheeba T., Kavitha V., and Rajeswari S., Morphology Based Text Detection and Extraction from Complex Video Scene, International Journal of Engineering and Technology, vol. 2, no. 3, pp. 200-206, 2010.

[20] Saudagar A., Mohammed H., Iqbal K., and Gyani Y., Efficient Arabic Text Extraction and Recognition Using Thinning and Dataset Comparison Technique, in Proceedings of International Conference on Communication Information and Computing Technology, Mumbai, pp. 1-5, 2015.

[21] Saudagar A., Syed A., Al-Tameem A., Al-Otaibi M., and Mohammed H., Efficient Video Splitting Technique for Scrolling Arabic Text Extraction: a Comparative Study, in Proceedings of the 2nd International Conference on Applied Information and Communications Technology, Muscat, pp. 702-707, 2014.

[22] Saudagar A. and Mohammed H., A comparative Study of Video Splitting Techniques, in Proceedings of the 23rd International Conference on Systems Engineering, Las Vegas, pp. 783-788, 2014.

[23] Saudagar A. and Mohammed H., Opencv Based Implementation of Zhang-Suen Thinning Algorithm Using Java for Arabic Text Recognition, in Proceedings of the 3rd International Conference on Information System Design and Intelligent Applications, Visakhapatnam, pp. 265-271, 2016.

[24] Saudagar A. and Mohammed H., Concatenation Technique for Extracted Arabic Characters for Efficient Content Based Indexing and Searching, in Proceedings of the 2nd International Conference on Computer and Communication Technologies, Hyderabad, pp. 567-575, 2015.

[25] Vijayakumar V. and Nedunchezhian R., Novel Method for Super Imposed Text Extraction in a Sports Video, International Journal of Computer Applications, vol. 15, no. 1, pp. 1-6, 2011.

[26] Xiang D., Yan H., Chen X., and Cheng Y., Offline Arabic Handwriting Recognition System Based on HMM, in Proceedings of 3rd International Conference on Computer Science and Information Technology, Chengdu, pp. 526- 529, 2010.

[27] Zaraket F. and Makhlouta J., Arabic Temporal Entity Extraction using Morphological Analysis, International Journal Computer Linguistics and Applications, vol. 3, no. 1, pp. 121-136, 2012.

[28] Zhang J., Extraction of Text Objects in Image and Video Documents, Thesis, University of South Florida, 2012. Abdul Khader Saudagar received his Bachelor of Engineering B.E, Master of Technology M. Tech and Doctor of Philosophy PhD in Computer Science & Engineering in 2001, 2006 and 2010 respectively. His areas of interests are: Artificial Image Processing, E-Commerce, Information Technology, Databases, Web and Mobile Application Development. He has 6 years of teaching experience at both undergraduate (UG) and postgraduate (PG) level and presently working as Assistant Professor in Department of Information Systems, College of Computer & Information Sciences, Al Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, The Kingdom of Saudi Arabia. Dr. Saudagar has published a number of research papers in National, International Conferences and International Journals. He is associated as member with various professional bodies like IACSIT, IAENG, ISTE etc., and working as Editorial Board member, Reviewer for many international Journals. Habeeb Mohammed working as a lecturer in Department of Computer Science, Al Imam Mohammad Ibn Saud Islamic University , Riyadh, The Kingdom of Saudi Arabia. He completed M.C.A (Master of Computer Applications) in 1999. He has 15 years of teaching experience and a certified Java Professional.