The International Arab Journal of Information Technology (IAJIT)


A Fast High Precision Skew Angle Estimation of Digitized Documents

In this paper, we treated the problem of automatic skew angle estimation of scanned documents. The skew of document occurs very often, due to incorrect positioning of the documents or a manipulation error during scanning. This has negative consequences on the steps of automatic analysis and recognition of text. It is therefore essential to verify, before proceeding to these steps, the presence of skew on the document to be processed and to correct it. The difficulty of this verification is associated to the presence of graphic zones, sometimes dominant, that have a considerable impact on the accuracy of the text skew angle estimation. We also noted the importance of preprocessing to improve the accuracy and the calculation cost of skew estimation approaches. These two elements have been taken into consideration in our design and development of a new approach of skew angle estimation and correction. Our approach is based on local binarization followed by horizontal smoothing by the Run Length Smoothing Algorithm (RLSA) method, detection of horizontal contours and the Hierarchical Hough Transform (HHT). The algorithms involved in our approach have been chosen to guarantee a skew estimation: accurate, fast and robust, especially to graphic dominance and real time application. The experimental tests show the effectiveness of our approach on a representative database of the Document Image Skew Estimation Contest (DISEC) contest International Conference on Document Analysis and Recognition (ICDAR).

[1] AlKhatatneh A., Pitchay S., and Al Qudah M., “A Review of Skew Detection Techniques for Document,” in Proceedings of 7th UKSim-AMSS International Conference on Modelling and Simulation, Cambridge, pp. 316-321, 2015. (1)     .p E(j) k(j)jKNCE N j otherwise0 10if1 where)(1 1 A Fast High Precision Skew Angle Estimation of Digitized Documents 831

[2] Epshtein B., “Determining Document Skew Using Inter-line Spaces,” in Proceedings of International Conference on Document Analysis and Recognition, Beijing, pp. 27-31, 2011.

[3] Gaceb D., Contributions Au Tri Automatique De Documents Et De Courrier D'entreprises, phd, Thesis, INSA de Lyon, 2009.

[4] Hashizume A., Yeh P., and Rosenfeld A., “A Method of Detecting the Orientation of Aligned Components,” Pattern Recognition Letters, vol. 4, no. 2, pp. 125-132, 1986.

[5] Jiang X., Bunke H., and Widmer-Kljajo D., “Skew Etection of Document Images By Focused Nearest-Neighbor Clustering,” in Proceedings of the 5th International Conference on Document Analysis and Recognition, Bangalore, pp. 629- 632, 1999.

[6] Kapogiannopoulos G., and Kalouptsidis N.,“A Fast High Precision Algorithm for the Estimation of Skew Angle Using Moments,” in Proceedings of Sindh Public Procurement Regulatory Authority, Crete, pp. 275-279, 2002.

[7] Kumar D. and Singh D., “Modified Approach of Hough Transform for Skew Detection and Correction in Documented Images,” International Journal of Research in Computer Science, vol. 2, no. 3, pp. 37-40, 2012.

[8] Li S., Shen Q., and Sun J., “Skew Detection Using Wavelet Decomposition and Projection Profile Analysis,” Pattern Recognition Letters, vol. 28, pp. 555-562, 2007.

[9] Liolios N., Fakotakis N., and Kokkinakis G., “Improved Document Skew Detection Based on Text Line Connected-Component Clustering,” in Proceedings of International Conference on Image Processing, Thessaloniki, pp. 1098-1101, 2001.

[10] Mehta S., Walia E., and Dutta M., “Time and Accuracy Analysis of Skew Detection Methods for Document Images,” International Journal of Information Technology and Computer Science, vol.7, pp. 43-54, 2015.

[11] O'Gorman L., “The Document Spectrum for Page Layout Analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligenc, vol. 15, no. 11 pp. 1162-1173, 1993.

[12] Papandreou A., and Gatos B., A “Novel Skew Detection Technique Based on Vertical Projections,” in Proceedings of International Conference on Document Analysis and Recognition, Beijing, pp. 384-388, 2011.

[13] Papandreou A., Gatos B., Louloudis G., and Stamatopoulos N., “ICDAR2013 Document Image Skew Estimation Contest (DISEC’13),” 12th in Proceedings of Document Analysis and Recognition, Washington, pp. 1444-1448, 2013.

[14] Postl W., “Detection of Linear Oblique Structures and Skew Scan in Digitized Documents,” in Proceedings of 8th Pattern Recognition, pp. 687-689, 1986.

[15] Shah L., Patel R., Patel S., and Maniar J., “Skew Detection and Correction for Gujarati Printed and Handwritten Character using Linear Regression,” International Journal of Advanced Research in Computer Science and Software Engineering, vol. 4, no. 1, pp. 642-648, 2014.

[16] Shukla B., Kumar G., and Kumar A., “an Approach for Skew Detection Using Hough Transform,” International Journal of Computer Applications, vol. 136, no. 9, pp. 20-23, 2016.

[17] Srihari S. and Govindaraju V., “Analysis of Textual Images Using the Hough Transform,” Machine Vision and Applications, vol. 2, pp. 141-153, 1989.

[18] Touji S., Ben Amara N., and Amiri1 H., “Generalized Hough Transform for Arabic Printed Optical Character Recognition,” The International Arab Journal of Information Technology, vol. 2, no. 4, pp. 326-332, 2005.

[19] Verma R. and Malik L., “Review of Illumination and Skew Correction Techniques for Scanned Documents,” in Procedia Computer Science, vol. 45, pp. 322-327, 2015.

[20] Wong K., Casey R., and Wahl F., “Documents Analysis System,” IBM Journal of Research and Development, vol. 26, no. 6, pp. 647-656, 1982.

[21] Yildirim B., “Projection Profile Analysis for Skew Angle Estimation of Woven Fabric Images,” Journal of the Textile Institute, vol. 105, no. 6, pp. 654-660, 2014. Merouane Chettat is a Ph.D student at UMBB University, Algeria. Currently, he is member of LIMOSE laboratory. His current research interests include image processing, and computer vision. Djamel Gaceb received his Ph.D. in computer science from INSA of Lyon in 2009. Currently he is working on the topic of business document image recognition, analysis and industrial vision. Soumia Belhadi is a senior lecturer at the University of Blida. Currently, she is member of LOMOSE Laboratory at UMBB University. She is working on the themes of image processing, and computer vision.