Curved Text Detection in Scenic Images via Proposal-Free Panoptic Segmentation and Deep Learning
Curved texts pose a significant challenge in detection in ‘the wild’, primarily due to the inherent variabilities in text orientation and possible distortions while the images are being acquired. Standard text detection models have been observed to exhibit low accuracy in detecting texts on curved surfaces. To fill this gap, there have been a variety of deep learning-based models proposed to date which have achieved low to moderate success. This paper implements a DL-based model for the detection and recognition of text from scenic images. The proposed approach applies different image processing techniques such as Gray scale conversion, noise removal using median filter, normalization and Otsu’s Binarization and a panoptic segmentation technique for achieving desired text detection performance. A synthetic dataset is created which is used to fill in the gaps of character annotation and multi-orientation. The performance of the proposed approach is determined using different evaluation metrics and the results are compared against existing techniques such as You Only Look Once version 5 (YOLOv5), HDBNet, Bidirectional Perspective Network (BiP-Net), and Res18-LVT. Results show that the proposed approach achieves better performance in terms of precision (98.7%), recall (91.7%), and F1 score (94.5%) as compared to existing classification models.
[1] Atitallah A., Said Y., Atitallah M., Albekairi M., Kaaniche K., Alanazi T., Boubaker S., and Atri M., “Embedded Implementation of an Obstacle Detection System for Blind and Visually Impaired Persons’ Assistance Navigation,” Computers and Electrical Engineering, vol. 108, pp. 108714, 2023. https://doi.org/10.1016/j.compeleceng.2023.1087 14
[2] Alshanqiti A., Bajnaid A., Gilal A., Aljasir S., Alsughayyir A., and Albouq S., “Intelligent Parallel Mixed Method Approach for Characterising Viral Youtube Videos in Saudi Arabia,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 3, pp. 661-671, 2020. DOI:10.14569/IJACSA.2020.0110382
[3] Bhatt M., Arya D., Mishra A., Singh M., Singh P., and Gautam M., “A New Wavelet-Based Multifocus Image Fusion Technique Using Method Noise-Median Filtering,” in Proceedings of the 4th International Conference on Internet of Things: Smart Innovation and Usages, Ghaziabad, pp. 1-6, 2019. DOI:10.1109/IoT- SIU.2019.8777615
[4] Baek Y., Lee B., Han D., Yun S., and Lee H., “Character Region Awareness for Text Detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, pp. 9365-9374, 2019. DOI:10.1109/CVPR.2019.00959
[5] De Carvalho O., De Carvalho Júnior O., Albuquerque A., Bem P., Silva C., Ferreira P., Santos de Moura R., Trancoso Gomes R., Guimarães R., and Borges D., “Instance Segmentation for Large, Multi-Channel Remote Sensing Imagery Using Mask-RCNN and a Mosaicking Approach,” Remote Sensing, vol. 13, no. 1, pp. 39, 2021. https://doi.org/10.3390/rs13010039
[6] Choudhary S., Singh N., and Chichadwani S., “Text Detection and Recognition from Scene Images Using MSER and CNN,” in Proceedings of the 2nd International Conference on Advances in Electronics, Computers and Communications, Bangalore, pp. 1-4, 2018. DOI:10.1109/ICAECC.2018.8479419
[7] Detectron2. Detectron2 Documentation, https://detectron2.readthedocs.io/en/latest/, Last Visited, 2024.
[8] Dey R., Balabantaray R., and Mohanty S., “Sliding Window Based Off-Line Handwritten Text Recognition Using Edit Distance,” Multimedia Tools and Applications, vol. 81, pp. 22761-22788, 2022. https://doi.org/10.1007/s11042-021-10988-9
[9] Gilal A., Jaafar J., Capretz L., Omar M., Basri S., and Aziz I., “Finding an Effective Classification Technique to Develop a Software Team Composition Model,” Journal of Software: Evolution and Process, vol. 30, no. 1, pp. 1920, 2018. DOI:10.1002/smr.1920
[10] Geetha M., Pooja R., Swetha J., Nivedha N., and Daniya T., “Implementation of Text Recognition and Text Extraction on Formatted Bills Using Deep Learning,” International Journal of Control and Automation, vol. 13, no. 2, pp. 646-651, 2020. http://sersc.org/journals/index.php/IJCA/article/vi ew/11207
[11] He K., Zhang X., Ren S., and Sun J., “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, pp. 770-778, 2016. DOI:10.1109/CVPR.2016.90
[12] Huang L., Tseng H., Hsieh C., and Yang C., “Deep Learning Based Text Detection Using Resnet for Feature Extraction,” Multimedia Tools and Applications, vol. 82, pp. 46871-46903, 2023. https://doi.org/10.1007/s11042-023-15449-z
[13] Huang J., Haq I., Dai C., Khan S., Nazir S., and Imtiaz M., “Isolated Handwritten Pashto Character Recognition Using a K-NN Classification Tool Based on Zoning and HOG Feature Extraction Techniques,” Complexity, vol. 2021, no. 558373, pp. 1-8, 2021. https://doi.org/10.1155/2021/5558373 Curved Text Detection in Scenic Images via Proposal-Free Panoptic Segmentation ... 897
[14] Islam M., Monda C., Azam M., and Islam A., “Text Detection and Recognition Using Enhanced MSER Detection and a Novel OCR Technique,” in Proceedings of the 5th International Conference on Informatics, Electronics and Vision, Dhaka, pp. 15-20, 2016. DOI:10.1109/ICIEV.2016.7760054
[15] Jamieson L., Moreno-Garcia C., and Elyan E., “Deep Learning for Text Detection and Recognition in Complex Engineering Diagrams,” in Proceedings of the International Joint Conference on Neural Networks, Glasgow, pp. 1- 7, 2020. DOI:10.1109/IJCNN48605.2020.9207127
[16] Kirillov A., He K., Girshick R., Rother C., and Dollár P., “Panoptic Segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, pp. 9404-9413, 2019. DOI:10.1109/CVPR.2019.00963
[17] Lin Z., Chen Y., Chen P., Chen H., Chen F., and Ling N., “JMNET: Arbitrary-Shaped Scene Text Detection Using Multi-Space Perception,” Neurocomputing, vol. 513, pp. 261-272, 2022. https://doi.org/10.1016/j.neucom.2022.09.095
[18] Liao M., Zhang J., Wan Z., Xie F., Liang J., Lyu P., Yao C., and Bai X. “Scene Text Recognition from Two-Dimensional Perspective,” in Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, pp. 8714-8721, 2019. https://doi.org/10.1609/aaai.v33i01.33018714
[19] Liu X., Meng G., and Pan C., “Scene Text Detection and Recognition with Advances in Deep Learning: A Survey,” International Journal on Document Analysis and Recognition, vol. 22, pp. 143-162, 2019. https://doi.org/10.1007/s10032- 019-00320-5
[20] Lin T., Dollár P., Girshick R., He K., Hariharan B., and Belongie S. “Feature Pyramid Networks for Object Detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, pp. 2117-2125, 2017. DOI:10.1109/CVPR.2017.106.
[21] Maskrcnn Benchmark, https://github.com/facebookresearch/maskrcnn- benchmark, Last Visited, 2024.
[22] Ma Y. and Wang Y., “Feature Refinement with Multi-Level Context for Object Detection,” Machine Vision and Applications, vol. 34, no. 49, 2023. https://doi.org/10.1007/s00138-023-01402-5
[23] Manjari K., Verma M., Singal G., and Namasudra S., “QEST: Quantized and Efficient Scene Text Detector Using Deep Learning,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 22, no. 5, pp. 1-18. 15, 2023. https://doi.org/10.1145/3526217
[24] Obi Y., Claudio K., Budiman V., Achmad S., and Kurniawan A., “Sign Language Recognition System for Communicating to People with Disabilities,” Procedia Computer Science, vol. 216, pp. 13-20. 2023. https://doi.org/10.1016/j.procs.2022.12.106
[25] Qiao Z., Zhou Y., Yang D., Zhou Y., and Wang W., “Seed: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, pp. 13528-13537, 2020. DOI:10.1109/cvpr42600.2020.01354
[26] Sah A., Bhowmik S., Malakar S., Sarkar R., Kavallieratou E., and Vasilopoulos N., “Text and Non-Text Recognition Using Modified HOG Descriptor,” in Proceedings of the IEEE Calcutta Conference, Kolkata, pp. 64-68, 2017. DOI:10.1109/CALCON.2017.8280697
[27] Titijaroonroj T., “Modified Stroke Width Transform for Thai Text Detection,” in Proceedings of the International Conference on Information Technology, Khon Kaen, pp. 1-5, 2018. DOI:10.23919/INCIT.2018.8584869
[28] Tan S., Chuah J., Chow C., Kanesan J., and Leong H., “Artificial Intelligent Systems for Vehicle Classification: A Survey,” Engineering Applications of Artificial Intelligence, vol. 129, pp. 107497, 2024. https://doi.org/10.1016/j.engappai.2023.107497
[29] Turki H., Elleuch M., Othman K., and Kherallah M., “Arabic Text Detection on Traffic Panels in Natural Scenes, Arabic Text Detection on Traffic Panels in Natural Scenes,” The International Arab Journal of Information Technology, vol. 21, no. 4, pp. 571-588, 2024 https://doi.org/10.34028/iajit/21/4/3
[30] Verma M., Sood N., Roy P., and Raman B., “Script Identification in Natural Scene Images: A Dataset and Texture-Feature Based Performance Evaluation,” in Proceedings of International Conference on Computer Vision and Image Processing, Venice, pp. 309-319, 2017. DOI:10.1007/978-981-10-2107-7_28
[31] Wang X., He Z., Wang K., Wang Y., Zou L., and Wu Z., “A Survey of Text Detection and Recognition Algorithms Based on Deep Learning Technology,” Neurocomputing, vol. 556, 2023. https://doi.org/10.1016/j.neucom.2023.126702
[32] Wang L., Yao X., and Song C., “Text Detection Method Based on HDBNet in Natural Scenes,” The Journal of Engineering, vol. 2023, no. 1, pp. 1-10, 2023. https://doi.org/10.1049/tje2.12212
[33] Wu F., Zhu C., Xu J., Bhatt M., and Sharma A., “Research on Image Text Recognition Based on Canny Edge Detection Algorithm and K-Means Algorithm,” International Journal of System Assurance Engineering and Management, vol. 13, pp. 72-80, 2021. https://doi.org/10.1007/s13198- 021-01262-0 898 The International Arab Journal of Information Technology, Vol. 21, No. 5, September 2024
[34] Wang X., Zheng S., Zhang C., Li R., and Gui L., “R-YOLO: A Real-Time Text Detector for Natural Scenes with Arbitrary Rotation,” Sensors, vol. 21, no. 3, 2021. https://doi.org/10.3390/s21030888
[35] Yang C., Chen M., Yuan Y., and Wang Q., “Bip- Net: Bidirectional Perspective Strategy Based Arbitrary-Shaped Text Detection Network,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, pp. 2255-2259, 2022. DOI:10.1109/ICASSP43922.2022.9747331
[36] Yang K., Yi J., Chen A., and Jin Z., “Buffer-Text: Detecting Arbitrary Shaped Text in Natural Scene Image,” Engineering Applications of Artificial Intelligence, vol. 130, pp. 107774, 2024. https://doi.org/10.1016/j.engappai.2023.107774
[37] Yu D., Li X., Zhang C., Liu T., Han J., Liu J., and Ding E., “Towards Accurate Scene Text Recognition with Semantic Reasoning Networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, pp. 12113-12122, 2020. https://doi.org/10.48550/arXiv.2003.12294
[38] Yang L., Ergu D., Cai Y., Liu F., and Ma B., “A Review of Natural Scene Text Detection Methods,” Procedia Computer Science, vol. 199, pp. 1458-1465, 2022. https://doi.org/10.1016/j.procs.2022.01.185
[39] Yu R., Jin F., Qiao Z., Yuan Y., and Wang G., “Multi-Scale Image-Text Matching Network for Scene and Spatio-Temporal Images,” Future Generation Computer Systems, vol. 142, pp. 292- 300, 2023. https://doi.org/10.1016/j.future.2023.01.004
[40] Yim M., Kim Y., Cho H., and Park S., “Synth TIGER: Synthetic Text Image Generator towards Better Text Recognition Models,” in Proceedings of the Document Analysis and Recognition 16th International Conference, Lausanne, pp. 109-124, 2021. https://doi.org/10.1007/978-3-030-86337- 1_8
[41] Yin F., Wu Y., Zhang X., and Liu C., “Scene Text Recognition with Sliding Convolutional Character Models,” arXiv Preprint, vol. arXiv:1709.01727, pp. 1-11, 2017. DOI:10.48550/arXiv.1709.01727
[42] Zhang F., Luan J., Xu Z., and Chen W., “DetReco: Object-Text Detection and Recognition Based on Deep Neural Network,” Mathematical Problems in Engineering, vol. 2020, no. 2365076, pp. 1-15, 2020. https://doi.org/10.1155/2020/2365076
[43] Zhang C., Ding W., Peng G., Fu F., and Wang W., “Street View Text Recognition with Deep Learning for Urban Scene Understanding in Intelligent Transportation Systems,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no.7, pp. 4727-4743, 2020. DOI:10.1109/TITS.2020.3017632
[44] Zhong D., Lyu S., Shivakumara P., Pal U., and Lu Y., “Text Proposals with Location-Awareness- Attention Network for Arbitrarily Shaped Scene Text Detection and Recognition,” Expert Systems with Applications, vol. 205, pp. 117564, 2022. https://doi.org/10.1016/j.eswa.2022.117564
[45] Zacharias E., Teuchler M., and Bernier B., “Image Processing Based Scene-Text Detection and Recognition with Tesseract,” arXiv Preprint, vol. arXiv:2004.08079.20, pp. 1-6, 2020. DOI:10.48550/arXiv.2004.08079