The International Arab Journal of Information Technology (IAJIT)


Dynamic Random Forest for the Recognition of Arabic Handwritten Mathematical Symbols with A

Mathematics has a number of characteristics which distinguish it from conventional text and make it a challenging area for recognition. This include principally its two dimensional structure and the diversity of used symbols, especially in Arabic context. Recognition of mathematical formulas requires solving three sub problems: segmentation, the symbol recognition and finally the symbol arrangement analysis. In this paper we will focus on the Arabic mathematical symbol recognition step. This is a challenging task due to the large symbol set with many similar looking symbols used in Arabic mathematics and also the great variability found in human writing. The strength of the selected features and the effectiveness of the classifier are the two key factors determining the performance of a handwritten symbols recognition System .In this paper we proposed a novel Shape Context (SH) descriptor and explored its combination with a modified Chain Code Histogram (CCH) and a Histogram of Oriented Gradient (HOG) at the level of descriptors extraction. For the classification we used a Dynamic Random Forest (DRF) model which has the advantage of efficiently modelling the interaction among trees to determine the right prediction. The results carried out Handwritten Arabic Mathematical Dataset (HAMF) show that the DRF proves a significant improvement in terms of accuracy compared to the standard static RF and Support Vector Machines (SVM).

[1] lvaro F., Sanchez J., and Benedi J., Classification of On-Line Mathematical Symbols with Hybrid Features and Recurrent Neural Networks, in Proceedings of the International Conference on Document Analysis and Recognition, Washington, pp. 1012-1016, 2013.

[2] lvaro F. and Sanchez J., Comparing Several Techniques for Offline Recognition of Printed Mathematical Symbols, in Proceedings of the 20th International Conference on Pattern Recognition, Istanbul, pp. 1953-1956, 2010.

[3] Belgiu M. and Dragut L., Random Forest in Remote Sensing: A Review of Applications and Future Directions, ISPRS Journal of Photogrammetry and Remote Sensing, vol. 114, pp. 24-31, 2016.

[4] Belongie S., Malik J., and Puzicha J., Shape Matching and Object Recognition Using Shape Contexts, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 509-522, 2002.

[5] Bernard S., Adam S., and Heutte L., Dynamic Random Forests, Pattern Recognition Letters, vol. 33, no. 12, pp. 1580-1586, 2012.

[6] Bernard S., Heutte L., and Adam S., Using Random Forests for Handwritten Digit Recognition, in Proceedings of the 9th International Conference on Document Analysis and Recognition, Parana, pp. 1043-1047, 2007.

[7] Breiman L., Random Forests, Machine Learning Journal, vol. 45, no. 1, pp. 5-32, 2001.

[8] Chan K. and Yeung D., Mathematical Expression Recognition: A Survey, International Journal of Document Analysis and Recognition, vol. 3, no. 1, pp. 3-15, 2000. 574 The International Arab Journal of Information Technology, Vol. 15, No. 3A, Special Issue 2018

[9] Criminisi A. and Shotton J., Decision Forests for Computer Vision and Medical Image Analysis, Springer, 2013.

[10] Dalal N. and Triggs B., Histograms of Oriented Gradients for Human Detection, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, pp. 886-893, 2005.

[11] Davila K., Ludi S., and Zanibbi R., Using Off- line Features and Synthetic Data for On-Line Handwritten Math Symbol Recognition, in Proceedings of 14th International Conference on Frontiers in Handwriting Recognition, Heraklion, pp. 323-328, 2014.

[12] Deniz O., Bueno G., Salido J., and De la Torre F., Face Recognition Using Histograms of Oriented Gradients, Pattern Recognition Letters Journal, vol. 32, no. 12, pp. 1598-1603, 2011.

[13] El-Sheikh T., Recognition of Handwritten Arabic Mathematical Formulas, in Proceeding of the UK IT 1990 Conference, Southampton, pp. 344-351, 1990.

[14] Greenhalgh J. and Mirmehdi M., Traffic Sign Recognition Using MSER and Random Forests, in Proceedings of the 20th European Signal Processing Conference, Bucharest, pp. 1935-1939, 2012.

[15] Hadj I. and Mahjoub M., Database of Handwritten Arabic Mathematical Formula Images, in Proceedings of the 13thInternational Conference Computer Graphics, Imaging and Visualization, Beni Mellal, pp. 145-149, 2016.

[16] Hu L. and Zanibbi R., HMM-Based Recognition of Online Handwritten Mathematical Symbols Using Segmental K-Means Initialization and a Modified Pen-Up/Down Feature, in Proceedings of the International Conference on Document Analysis and Recognition, Beijing, pp. 457-462, 2011.

[17] Jayech K., Mahjoub M., and Ben Amara N., Arabic Handwritten Word Recognition Based on Dynamic Bayesian Network, The International Arab Journal of Information Technology, vol. 13, no. 6B, pp. 1024- 1031, 2016.

[18] Khazri K., Kacem A., and Bela d A., A Syntax Directed System for the Recognition of Printed Arabic Mathematical Formulas, in Proceedings of the International Conference on Document Analysis and Recognition, Tunis, pp. 186-190, 2015.

[19] Malon C., Uchida S., and Suzuki M., Mathematical Symbol Recognition with Support Vector Machines, Journal of Pattern Recognition Letters, vol. 29, no. 9, pp. 1326- 1332, 2008.

[20] Minetto R., Thome N., Cord M., Leite N., and StolfiJ., T-HOG: An Effective Gradient-Based Descriptor for Single Line Text Regions, Pattern recognition Journal, vol. 46, no.3, pp. 1078-1090, 2013.

[21] Nguyen H., Le A., and Nakagawa M., Recognition of Online Handwritten Math Symbols Using Deep Neural Networks, Journal of IEICE Transactions on Information and Systems, vol. E99.D, no. 12, pp. 3110-3118, 2016.

[22] Otsu N., A Threshold Selection Method From Gray-Level Histograms, IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, 1979.

[23] Ping D., A Review On Image Feature Extraction And Representation Techniques, International Journal of Multimedia and Ubiquitous Engineering, vol. 8, no. 4, pp. 385-396, 2013.

[24] Terasawa K. and Tanaka Y., Slit Style HOG Feature for Document Image Word Spotting, in Proceedings of the 10th International Conference on Document Analysis and Recognition, Barcelona, pp.116-120, 2009.

[25] Tustison N., Shrinidhi K., Wintermark M., Durst C., Kandel B., Gee J., Grossman M., and Avants B., Optimal Symmetric Multimodal Templates and Concatenated Random Forests for Supervised Brain Tumor Segmentation with ANTsR, Neuroinformatics Journal, vol. 13, no. 2, pp. 209-225, 2015.

[26] Zamani Y., Souri Y., Rashidi H., and Kasaei S., Persian Handwritten Digit Recognition by Random Forest and Convolutional Neural Networks, in Proceedings of the 9th Iranian Conference on Machine Vision and Image Processing, Tehran, pp. 37-40, 2015.

[27] Zanibbi R. and Blostein D., Recognition and Retrieval of Mathematical Expressions, International Journal of Document Analysis and Recognition, vol. 15, no. 4, pp. 331-357, 2012. Dynamic Random Forest for the Recognition of Arabic Handwritten Mathematical ... 575 Ibtissem Ali Received the Diploma of computer science Engineering and Diploma of master respectively in 2010 and 2013 from the National Engineering School of Sousse - Tunisia. She is currently a PH D student and member of research laboratory LATIS (Laboratory of Advanced Technology and Intelligent Systems) team of analysis and processing of document. Her research interests include handwritten mathematical recognition, Arabic optical character recognition, document analysis, computer vision and pattern recognition. Mohamed Mahjoub is an associate professor in Signal and Image processing at the National Engineering School of Sousse (ENISo) and member of the Laboratory of Advanced Technology and Intelligent Systems (LATIS). His research interests include dynamic Bayesian network, computer vision, pattern recognition, HMM and data retrieval. He is a member of IEEE and his main results have been published in international journals and conferences.