The International Arab Journal of Information Technology (IAJIT)


Complementary Approaches Built as Web Service for Arabic Handwriting OCR Systems via Amazon

Arabic Optical Character Recognition (OCR) as Web Services represents a major challenge for handwritten document recognition. A variety of approaches, methods, algorithms and techniques have been proposed in order to build powerful Arabic OCR web services. Unfortunately, these methods could not succeed in achieving this mission in case of large quantity Arabic handwritten documents. Intensive experiments and observations revealed that some of the existing approaches and techniques are complementary and can be combined to improve the recognition rate. Designing and implementing these recent sophisticated complementary approaches and techniques as web services are commonly complex; they require strong computing power to reach an acceptable recognition speed especially in case of large quantity documents. One of the possible solutions to overcome this problem is to benefit from distributed computing architectures such as cloud computing. This paper describes the design and implementation of Arabic Handwriting Recognition as a web service (AHRweb service) based on the complementary approach K-Nearest Neighbor (KNN) /Support Vector Machine (SVM) (K-NN/SVM) via Amazon Elastic Map Reduce (EMR) model. The experiments were conducted on a cloud computing environment with a real large scale handwriting dataset from the Institut Für Nachrichtentechnik (IFN)/ Ecole Nationale d’Ingénieur de Tunis (ENIT) IFN/ENIT database. The J-Sim (Java Simulator) was used as a tool to generate and analyze statistical results. Experimental results show that Amazon Elastic Map Reduce (EMR) model constitutes a very promising framework for enhancing large Arabic Handwriting Recognition (AHR) web service performances.

[1] Alhutaish R. and Omar N., Arabic Text Classification using K-Nearest Neighbour Algorithm, The International Arab Journal of Information Technology, vol. 12, no. 2, pp. 190- 195, 2015. Complementary Approaches Built as Web Service for Arabic Handwriting ... 427

[2] AlKhateeb J., Khelifi F., Jiang J., and Ipson S., A New Approach for Off Line Handwritten Arabic Word Recognition Using K-NN Classifier, in Proceedings of IEEE International Conference on Signal and Image Processing Applications, Kuala Lumbour, pp. 191-194, 2009.

[3] Armbrust M., Fox A., Griffith R., Joseph A., Katz R., Konwinski A., Lee G., Patterson D., Rabkin A., Stoica I., and Zaharia M., Above the Clouds: A Berkeley View of Cloud Computing, Technical Report UCB/EECS-2009-28, 2009.

[4] Belhouari S., Bermak A., Shi M., and Chan P., Fast and Robust Gas Identication System Using an Integrated Gas Sensor Technology and Gaussian Mixture Models, IEEE Sensors Journal, vol. 5, no. 6, pp. 1433-1444, 2005.

[5] Bellili A., Gilloux M., and Gallinari P., An Hybrid MLP-SVM Handwritten Digit Recognizer, in Proceedings of 6th International Conference on Document Analysis and Recognition, Seattle, pp. 28-32, 2001.

[6] Boser B., Guyon I., and Vapnik V., A Training Algorithm for Optimal Margin Classifiers, in Proceedings of 5th Annual Workshop on Computational Learning Theory, Pittsburg, pp. 144-152,1992.

[7] Byun H. and Lee S., A Survey on Pattern Recognition Applications of Support Vector Machines, International Journal of Pattern Recognition and Artificial Intelligence, vol. 17, no. 3, pp. 459-486, 2010.

[8] Cheng J. and Wang K., Active Learning for Image Retrieval with CoSVM, Pattern Recognition, vol. 40, no. 1, pp. 330-334, 2007.

[9] Chow T. and Huang D., Data Reduction for Pattern Recognition and Data Analysis, Computational Intelligence: A Compendium, pp. 81-109, Springer, 2008.

[10] Dean J. and Ghemawat S., Mapreduce: Simplified Data Processing on Large Clusters, Communications of the ACM-50th Anniversary Issue: 1958, vol. 51, no. 1, pp. 107-113, 2008.

[11] Eken S. and Sayar A., Big Data Frameworks for Efficient Range Queries to Extract Interested Rectangular Sub Regions, International Journal of Computer Applications, vol. 119, no. 22, pp. 36-39, 2015.

[12] Foster I., Zhao Y., Raicu I., and Lu S., Cloud Computing and Grid Computing 360- Degree Compared, in Proceedings of Grid Computing Environments Workshop, Austin, pp. 1-10, 2008.

[13] Goto H., An Overview of the WeOCR System and a Survey of its Use, in Proceedings of Image and Vision Computing, Hamilton, pp. 121-125, 2007.

[14] Ha K., Chen Z., Hu W., Richter W., Pillaiy P., and Satyanarayanan M., Towards Wearable Cognitive Assistance, in Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services, Bretton Woods, pp. 68-81, 2013.

[15] Hamdi H. and Khemakhem M., Arabic Islamic Manuscripts Digitization Based on Hybrid K- NN/SVM Approach and Cloud Computing Technologies, in Proceedings of International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, Medina, pp. 366-371, 2013.

[16] Hamdi H. and Khemakhem M., A Comparative Study of Arabic Handwritten Characters Invariant Feature, International Journal of Advanced Computer Science and Applications, vol. 2, no. 12, pp. 62-68, 2011.

[17] Jain A., Duin R., and Mao J., Statistical Pattern Recognition: A Review, IEEE Transactions on Pattern Analysis and Machine Intelligence Transactions. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, 2000.

[18] Khemakhem M. and Belghith A., Towards A Distributed Arabic OCR Based on the DTW Algorithm: Performance Analysis, The International Arab Journal of Information Technology, vol. 6, no. 2, pp. 153-161, 2009.

[19] Khemakhem M. and Belghith A., Towards Distributed Cursive Writing OCR Systems Based on the Combination of Complementary Approaches, Guide to OCR for Arabic Scripts, Springer London, pp. 351-371, 2012.

[20] Milgram J., Cheriet M., and Sabourin R., Speeding Up the Decision Making of Support Vector Classifiers, in Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition, Kokubunji, pp. 57-62, 2004.

[21] Rashvand H., Salah K., Calero J., and Harn L., Distributed Security for Multiagent Systems Review and Applications, IET Information Security, vol. 4, no. 4, pp.188-201, 2010.

[22] Roper G., World Survey of Islamic Manuscripts, Al-Furqa n Islamic Heritage Foundation, 1992.

[23] Sala K. and Calero J., Achieving Elasticity for Cloud MapReduce Jobs, in Proceeding of 2nd IEEE International Conference on Cloud Networking, San Francisco, pp. 195-199, 2013.

[24] Sergios T. and Koutroumbas K., Pattern Recognition, Elsevier, 2006.

[25] Shi M., Bermak A., Chandrasekaran S., and Amira A., An Eficient FPGA Implementation of Gaussian Mixture Models Based Classifer Using Distributed Arithmetic, in Proceedings of 13th IEEE International Conference on Electronics, Circuits and Systems, Nice, pp. 1276-1279, 2006.

[26] Singh S., Current Trends in Cloud Computing A Survey of Cloud Computing Systems, International Journal of Electronics and Computer Science Engineering, vol. 1, no. 3, pp. 428 The International Arab Journal of Information Technology, Vol. 15, No. 3, May 2018 1214-1219, 2011.

[27] Srihari S. and Ball G., Statistical Characterization of Handwriting Characteristics using Automated Tools, in Proceedings of SPIE -The International Society for Optical Engineering, San Jose, pp.1-10, 2011.

[28] Srihari S., Handwriting Address Interpretation: a Task of Many Pattern Recognition Problem, International Journal of Pattern Recognition and Articial Intelligence, vol. 14, no. 5, pp. 663-674, 2000.

[29] Verma B., A Contour Code Feature Based Segmentation For Handwriting Recognition, in Proceedings of the 7th International Conference on Document Analysis and Recognition, Edinburgh, pp. 1203-1207, 2003.

[30] Zanchettin C., Bezerra B ., and Andrade V., AK- NN-SVM Hybrid Model for Cursive Handwriting Recognition, in Proceedings of WCCI 2012 IEEE World Congress on Computational Intelligence, Brisbane, pp. 1-8, 2012.

[31] Zangeneh I., Moradi M., and Mokhtarbaf A., The Comparison of Data Replication in Distributed Systems, International Journal of Computer, Electrical, Automation, Control and Information Engineering, vol. 5, no. 11, pp. 1183- 1185, 2011. Hassen Hamdi received in 2008 Masters Degree in Computer Science from the University of Sfax, Tunisia. He is currently Lecturer of Computer Science at the Faculty of Computing and Information Technology at Taibah University, Saudi Arabia and doing his PhD research at the Multimedia, InfoRmation Systems and Advanced Computing Laboratory University of Sfax, Tunisia. His research interests include Arabic OCR, distributed systems, performance analysis, and networks security. Maher Khemakhem received his Master of Science, his Ph.D. and Habilitation accreditation degrees, respectively, from the University of Paris11 (Paris Sud, Orsay), France, in 1984, 1987 and the University of Sfax, Tunisia, in 2008. He is currently Associate Professor of Computer Science at the Faculty of Computing and Information Technology at King Abdulaziz University, Jeddah, Saudi Arabia. His research interests include Arabic OCR, distributed systems, performance analysis, and networks security. Aisha Zaidan received her BSc in Computer Information Systems from Zarqa Private University in 2004, and her MSc in Computer Science from Jordan University of Science and Technology in 2012. She is currently working as a lecturer at Taibah University. Her area of interests includes web applications, data mining, and virtual reality.