The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Arabic Text Detection on Traffic Panels in Natural Scenes

Identifying and acknowledging Traffic Panels (TP) and the text they display constitute significant use cases for Advanced Driver Assistance Systems (ADAS). In recent years, particularly in the context of the Arabic language, extracting textual information from TP and signs has emerged as a challenging problem in the field of computer vision. Furthermore, the significant rise in road traffic accidents within Arabic-speaking countries has resulted in substantial financial losses and loss of human lives. This is largely attributed to the limited number of diverse datasets for traffic signs and the absence of a reliable system for TP detection. Implementing warning and guidance systems for drivers on the road not only addresses this issue but also paves the way for the integration of intelligent components into future vehicles, offering decision support for transitioning to semi-automatic or fully automatic driving based on the driver’s health condition. These tasks present us with two main challenges. First, it involves developing a new Arabic dataset called the Syphax Traffic Panels dataset (STP) tailored to the diverse conditions of natural scenes gathered from “Sfax,” a city in Tunisia. This dataset aims to provide high-quality images of Arabic TP. Secondly, we suggest a deep learning method for detecting Arabic text on TP by evaluating the performance of the state-of-the-art algorithms in this context. In our study, we enhance the architecture of the most successful result achieved. The experiments conducted reveal promising results, affirming the significant contribution of our dataset to this research area, and even more encouraging results stemming from the enhancements made to the proposed method. The dataset we possess is accessible to the general public on IEEE DataPort https://dx.doi.org/10.21227/5zd9-pe55.

[1] Ahmed S., Razzak M., and Yusof R., Cursive Script Text Recognition in Natural Scene Images, Springer, 2020. https://doi.org/10.1007/978-981- 15-1297-1_2

[2] Akallouch M., Boujemaa K., Bouhoute A., Fardousse K., and Berrada I., ‘‘ASAYAR: A Dataset for Arabic-Latin Scene Text Localization in Highway Traffic Panels,’’ IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 4, pp. 3026-3036, 2022. DOI:10.1109/TITS.2020.3029451

[3] Bloice M., Roth P., and Holzinger A., “Biomedical Image Augmentation Using Augmentor,” Bioinformatics, vol. 35, no. 21, pp. 4522-4524, 2019. https://doi.org/10.1093/bioinformatics/btz259

[4] Boujemaa K., Akallouch M., Berrada I., Fardousse K., and Bouhoute A., “ATTICA: A Dataset for Arabic Text-Based Traffic Panels Detection,” IEEE Access, vol. 9, pp. 93937-93947. 2021. DOI:10.1109/ACCESS.2021.3092821

[5] Butt H., Raza M., Ramzan M., Ali M., and Haris M., “Attention-based CNN-RNN Arabic Text Recognition from Natural Scene Images,” Forecasting, vol. 3, no. 3, pp. 520-540, 2021. https://doi.org/10.3390/forecast3030033

[6] Chen C., Wang C., Liu B., He C., Cong L., and Wan S., “Edge Intelligence Empowered Vehicle Detection and Image Segmentation for Autonomous Vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 11, pp. 13023-13034, 2023. DOI:10.1109/TITS.2022.3232153

[7] Chigorin A. and Konushin A., ‘‘A System for Large-Scale Automatic Traffic Sign Recognition and Mapping,’’ in Proceedings of the CMRT13- City Models, Roads and Traffic, Antalya, pp. 13- 17, 2013. https://doi.org/10.5194/isprsannals-II-3- W3-13-2013

[8] Cleofas-Sánchez L., Posadas-Durán J., Martínez- Ortiz P., Loyo-Desiderio G., Ruvalcaba- Hernández E., and González Brito O., “Automatic Detection of Vehicular Traffic Elements Based on Deep Learning for Advanced Driving Assistance Systems,” Computación y Sistemas, vol. 27, no. 3, pp. 643-651, 2023. https://doi.org/10.13053/cys- 27-3-4508

[9] Deng D., Liu H., Li X., and Cai D., “Pixellink: Detecting Scene Text Via Instance Segmentation,” in Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, pp. 6773- 6780, 2018. https://doi.org/10.1609/aaai.v32i1.12269

[10] Everingham M., Gool L., Williams C., Winn J., and Zisserman A., “The Pascal Visual Object Classes Challenge,” International Journal of Arabic Text Detection on Traffic Panels in Natural Scenes 585 Computer Vision, vol. 88, no. 2, pp. 303-338, 2010. https://doi.org/10.1007/s11263-009-0275-4

[11] Girshick R., Donahue J., Darrell T., and Malik J., “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, pp. 580-587, 2014. DOI:10.1109/CVPR.2014.81

[12] Gonzalez A., Bergasa L., and Yebes J., “Text Detection and Recognition on Traffic Panels from Street-Level Imagery Using Visual Appearance,” IEEE Transactions on Intelligent Transportation Systems, vol. 15, no. 1, pp. 228-238, 2014. DOI:10.1109/TITS.2013.2277662

[13] Haque W., Arefin S., Shihavuddin A., and Hasan M., “DeepThin: A Novel Lightweight CNN Architecture for Traffic Sign Recognition without GPU Requirements,” Expert Systems with Applications, vol. 168, pp. 114481, 2021. https://doi.org/10.1016/j.eswa.2020.114481

[14] Harizi R., Walha R., and Drira F., “Deep-Learning Based End-to-End System for Text Reading in the Wild,” Multimedia Tools and Applications, vol. 81, no. 17, pp. 24691-24719, 2022. https://doi.org/10.1007/s11042-022-11998-x

[15] He X., Yuan J., Li M., Wang R., Wang H., and Li Z., “A Text-Specific Domain Adaptive Network for Scene Text Detection in the Wild,” Applied Intelligence, vol. 53, no. 22, pp. 26827-26839, 2023. https://doi.org/10.1007/s10489-023-04873-1

[16] Jain M., Mathew M., and Jawahar C., “Unconstrained OCR for Urdu Using Deep CNN- RNN Hybrid Networks,” in Proceedings of the 4th IAPR Asian Conference on Pattern Recognition, pp. 747-752, Nanjing, 2017. DOI:10.1109/ACPR.2017.5

[17] Jurisic F., Filkovic I., and Kalafatic Z., “Multiple- Dataset Traffic Sign Classification with OneCNN,” in Proceedings of 3rd Asian Conference on Pattern Recognition, Kuala Lumpur, pp. 614-618, 2015. DOI:10.1109/ACPR.2015.7486576

[18] Karatzas D., Gomez-Bigorda L., Nicolaou A., Ghosh S., Bagdanov A., Iwamura M., Matas J., Neumann L., Chandrasekhar V., Lu S., Shafait F., Uchida S., and Valveny E., ‘‘ICDAR Competition on Robust Reading,’’ in Proceedings of the 13th International Conference on Document Analysis and Recognition, Tunis, pp. 1156-1160, 2015. DOI:10.1109/ICDAR.2015.7333942.

[19] Karatzas D., Shafait F., Uchida S., Iwamura M., Bigorda L., Mestre S., Mas J., Mota D., Almazan J., and Heras L., ‘‘ICDAR Robust Reading Competition,’’ in Proceedings of the 12th International Conference on Document Analysis and Recognition, Washington (DC), pp. 1484- 1493, 2013. DOI:10.1109/ICDAR.2013.221

[20] Larsson F., Felsberg M., and Forssen P., ‘‘Correlating Fourier Descriptors of Local Patches for Road Sign Recognition,’’ IET Computer Vision, vol. 5, no. 4, pp. 244-254, 2011. DOI:10.1049/iet-cvi.2010.0040

[21] Lazzeretti L., Innocenti N., Nannelli M., and Oliva S., “The Emergence of Artificial Intelligence in the Regional Sciences: A Literature Review,” European Planning Studies, vol. 31, no. 7, pp. 1304-1324, 2023. https://doi.org/10.1080/09654313.2022.2101880

[22] Li J. and Wang Z., “Real-Time Traffic Sign Recognition Based on Efficient CNNs in the Wild,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 3, pp. 975- 984, 2019. DOI:10.1109/TITS.2018.2843815

[23] Li J., Liang X., Wei Y., Xu T., Feng J., and Yan S., “Perceptual Generative Adversarial Networks for Small Object Detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, pp. 1951-1959, 2017. DOI:10.1109/CVPR.2017.211

[24] Li X., Song R., Fan J., Liu M., and Wang F., “Development and Testing of Advanced Driver Assistance Systems through Scenario-based System Engineering,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 8, pp. 3968-3973, 2023. DOI:10.1109/TIV.2023.3297168

[25] Li X., Wang W., Hou W., Liu R., Lu T., and Yang J., “Shape Robust Text Detection with Progressive Scale Expansion Network,” arXiv Preprint, vol. arXiv:1806.02559, pp. 1-12, 2018. https://arxiv.org/pdf/1806.02559

[26] Liao M., Shi B., Bai X., Wang X., and Liu W., “TextBoxes: A Fast Text Detector with a Single Deep Neural Network,” in Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, pp. 4161-4167, 2017. https://dl.acm.org/doi/10.5555/3298023.3298172

[27] Lin H., Yang P., and Zhang F., “Review of Scene Text Detection and Recognition,” Archives of Computational Methods in Engineering, vol. 27, no. 2, pp. 433-454, 2019. https://doi.org/10.1007/s11831-019-09315-1

[28] Lin T., Dollár P., Girshick R., He K., Hariharan B., and Belongie S., “Feature Pyramid Networks for Object Detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulum, pp. 2117-2125, 2017. DOI:10.1109/CVPR.2017.106

[29] Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C., and Berg A., “SSD: Single Shot MultiBox Detector,” in Proceedings of the 14th European Conference on Computer Vision, Amsterdam, pp. 21-37, 2016. https://link.springer.com/book/10.1007/978-3- 319-46448-0

[30] Long J., Shelhamer E., and Darrell T., “Fully Convolutional Networks for Semantic 586 The International Arab Journal of Information Technology, Vol. 21, No. 4, July 2024 Segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, pp. 3431-3440, 2015. DOI:10.1109/CVPR.2015.7298965

[31] Ma D., Lin Q., and Zhang T., “Mobile Camera Based Text Detection and Translation,” Stanford University, pp. 1-5, 2000. https://stacks.stanford.edu/file/druid:my512gb21 87/Ma_Lin_Zhang_Mobile_text_recognition_and _translation.pdf

[32] Meng Z., Fan X., Chen X., Chen M., and Tong Y., “Detecting Small Signs from Large Images,” in Proceedings of the International Conference on Information Reuse and Integration, San Diego, pp. 217-224, 2017. DOI:10.1109/IRI.2017.57

[33] Mogelmose A., Trivedi M., and Moeslund T., “Vision-Based Traffic Sign Detection and Analysis for Intelligent Driver Assistance Systems: Perspectives and Survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 13, no. 4, pp. 1484-1497, 2012. DOI:10.1109/TITS.2012.2209421

[34] Mohammad S., “Artificial Intelligence in Information Technology,” SSRN Electronic Journal, pp. 1-15, 2020. http://dx.doi.org/10.2139/ssrn.3625444

[35] Nayef N., Yin F., Bizid I., Choi H., Feng Y., Karatzas D., Luo Z., Pal U., Rigaud C., and Chazalon J., “ICDAR Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification-RRC-MLT,” in Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, Kyoto, pp. 1454-1459, 2017. DOI:10.1109/ICDAR.2017.237

[36] Nirmala P., Ramesh S., Tamilselvi M., Ramkumar G., and Anitha G., “An Artificial Intelligence Enabled Smart Industrial Automation System Based on Internet of Things Assistance,” in Proceedings of the International Conference on Advances in Computing, Communication and Applied Informatics, Chennai, pp. 1-6, 2022. DOI:10.1109/ACCAI53970.2022.9752651

[37] Panneerselvam J., Subramaniam B., and Meenakshisundaram M., “A Cognitive Approach to Predict the Multi-Directional Trajectory of Pedestrians,” The International Arab Journal of Information Technology, vol. 20, no. 2, pp. 242- 252, 2023. https://doi.org/10.34028/iajit/20/2/11

[38] Raisi Z. and Zelek J., “Text Detection and Recognition in the Wild for Robot Localization,” arXiv Preprint, vol. arXiv:2205.08565v2, pp. 163- 174, 2022. DOI:10.48550/arXiv.2205.08565

[39] Ramesh M. and Mahesh K., “A Performance Analysis of Pre-Trained Neural Network and Design of CNN for Sports Video Classification,” in Proceedings of the International Conference on Communication and Signal Processing, Chennai, pp. 0213-0216, 2020. DOI:10.1109/ICCSP48568.2020.9182113

[40] Rawlley O. and Gupta S., “Artificial Intelligence‐ Empowered Vision‐Based Self-Driver Assistance System for Internet of Autonomous Vehicles,” Transactions on Emerging Telecommunications Technologies, vol. 34, no. 2, pp. e4683, 2023. https://doi.org/10.1002/ett.4683

[41] Redmon J. and Farhadi A., “Yolov3: An Incremental Improvement,” arXiv Preprint, vol. arXiv:1804.02767, pp. 1-6, 2018. https://arxiv.org/pdf/1804.02767

[42] Redmon J., Divvala S., Girshick R., and Farhadi A., “You Only Look Once: Unified, Real-Time Object Detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, pp. 779-788, 2016. DOI:10.1109/CVPR.2016.91

[43] Ren S., He K., Girshick R., and Sun J., “Faster R- CNN: Towards Real-Time Object Detection with Region Proposal Networks,” in Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, pp. 91-99, 2015. https://dl.acm.org/doi/10.5555/2969239.2969250

[44] Rong X., Yi C., and Tian Y., “Recognizing Text- Based Traffic Guide Panels with Cascaded Localization Network,” in Proceedings of the ECCV Workshops, Amsterdam, pp. 109-121, 2016. https://doi.org/10.1007/978-3-319-46604- 0_8

[45] Schaefer S., McPhail T., and Warren J., “Image Deformation Using Moving Least Squares,” AMC Transitions on Graphics, vol. 25, no. 3, pp. 533- 540, 2006. https://doi.org/10.1145/1141911.1141920

[46] Sermanet P. and LeCun Y., “Traffic Sign Recognition with Multi-Scale Convolutional Networks,” in Proceedings of the International Joint Conference on Neural Networks, San Jose, pp. 2809-2813, 2011. DOI:10.1109/IJCNN.2011.6033589

[47] Shahab A., Shafait F., and Dengel A., ‘‘ICDAR Robust Reading Competition Challenge 2: Reading Text in Scene Images,’’ in Proceedings of the International Conference on Document Analysis and Recognition, Beijing, pp. 1491-1496, 2011. DOI:10.1109/ICDAR.2011.296

[48] Shaout A., Mysuru D., and Raghupathy K., “Vehicle Condition, Driver Behavior Analysis and Data Logging through CAN Sniffing,” The International Arab Journal of Information Technology, vol. 16, no. 3A, pp. 493-498, 2019. https://ccis2k.org/iajit/PDF/Special%20Issue%20 2019,%20No.%203A/18594.pdf

[49] Shi B., Bai X., and Belongie S., “Detecting Oriented Text in Natural Images by Linking Segments,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Arabic Text Detection on Traffic Panels in Natural Scenes 587 Recognition, Honolulu, pp. 2550-2558, 2017. https://openaccess.thecvf.com/content_cvpr_201 7/papers/Shi_Detecting_Oriented_Text_CVPR_2 017_paper.pdf

[50] Shi X., Peng G., Shen X., and Zhang C., “TextFuse: Fusing Deep Scene Text Detection Models for Enhanced Performance,” Multimedia Tools and Applications, vol. 83, pp. 22433-22454, 2024. https://doi.org/10.1007/s11042-023-16389-4

[51] Shorten C. and Khoshgoftaar T., “A Survey on Image Data Augmentation for Deep Learning,” Journal of Big Data, vol. 6, no. 1, pp. 1-48, 2019. https://doi.org/10.1186/s40537-019-0197-0

[52] Simonyan K. and Zisserman A., “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv Preprint, vol. arXiv:1409.1556, pp. 1-14, 2015. https://doi.org/10.48550/arXiv.1409.1556

[53] Stallkamp J., Schlipsing M., Salmen J., and Igel C., “The German Traffic Sign Recognition Benchmark: A Multi-Class Classification Competition,” in Proceedings of the International Joint Conference on Neural Networks, San Jose, pp. 1453-1460, 2011. DOI:10.1109/IJCNN.2011.6033395

[54] Sun Q., Xiao Z., and Ji P., “Improved CTPN Based Attention Mechanism for Scene Text Detection,” in Proceedings of the 2nd International Conference on Big Data, Artificial Intelligence and Risk Management, Xian, pp. 199-202, 2022. DOI:10.1109/ICBAR58199.2022.00045

[55] Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., Erhan D., Vanhoucke V., and Rabinovich A., ‘‘Going Deeper with Convolutions,’’ in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, pp. 1-9, 2015. DOI:10.1109/CVPR.2015.7298594

[56] Tian Z., Huang W., He T., He P., and Qiao Y., “Detecting Text in Natural Image with Connectionist Text Proposal Network,” in Proceedings of the 14th European Conference on Computer Vision, Amsterdam, pp. 56-72, 2016. https://link.springer.com/chapter/10.1007/978-3- 319-46484-8_4

[57] Timofte R., Zimmermann K., and Van Gool L., ‘‘Multi-View Traffic Sign Detection, Recognition, and 3D Localisation,’’ in Proceedings of the Workshop on Applications of Computer Vision, Snowbird, pp. 633-647, 2014. DOI:10.1109/WACV.2009.5403121

[58] Tounsi M., Moalla I., and Alimi A., “ARASTI: A Database for Arabic Scene Text Recognition,” in Proceedings of the 1st International Workshop on Arabic Script Analysis and Recognition, Nancy, pp. 140-144, 2017. DOI:10.1109/ASAR.2017.8067776

[59] Turki H., Elleuch M., Kherallah M., Syphax Traffic Panels Dataset, IEEE Dataport, https://dx.doi.org/10.21227/5zd9-pe55, Last Visited, 2024.

[60] Turki H., Halima M., and Alimi A., “Text Detection Based on MSER and CNN Features,” in Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, Kyoto, pp. 949-954, 2017. DOI:10.1109/ICDAR.2017.159

[61] Veit A., Matera T., Neumann L., Matas J., and Belongie S., ‘‘COCO Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images,’’ arXiv Preprint, arXiv:1601.07140, pp. 1-8, 2016. https://arxiv.org/pdf/1601.07140

[62] Wada K., Labelme: Image Polygonal Annotation with Python, Github, 2016, https://github.com/labelmeai/labelme, Last Visited, 2024.

[63] Wan S., Ding S., and Chen C., “Edge Computing Enabled Video Segmentation for Real-Time Traffic Monitoring in Internet of Vehicles,” Pattern Recognition, vol. 121, pp. 108146, 2022. https://doi.org/10.1016/j.patcog.2021.108146

[64] Wang J., Chen Y., Dong Z., and Gao M., “Improved YOLOv5 Network for Real-Time Multi-Scale Traffic Sign Detection,” Neural Computing and Applications, vol. 35, no. 10, pp. 7853-7865, 2023. https://link.springer.com/article/10.1007/s00521- 022-08077-5

[65] Wang K. and Belongie S., ‘‘Word Spotting in the Wild,’’ in Proceedings of the 11th European Conference on Computer Vision, Heraklion, pp. 591-604, 2010. https://link.springer.com/chapter/10.1007/978-3- 642-15549-9_43

[66] Wikipedia, https://en.wikipedia.org/wiki/Sfax, Last Visited, 2024.

[67] Wu Y., Li Z., Chen Y., Nai K., and Yuan J., “Real- Time Traffic Sign Detection and Classification towards Real Traffic Scene” Multimedia Tools and Applications, vol. 79, no. 25, pp. 18201-18219, 2020. https://doi.org/10.1007/s11042-020-08722-y

[68] Yousfi S., Berrani S., and Garcia C., ‘‘ALIF: A Dataset for Arabic Embedded Text Recognition in TV Broadcast,’’ in Proceedings of the 13th International Conference on Document Analysis and Recognition, Tunis, pp. 1221-1225, 2015. DOI:10.1109/ICDAR.2015.7333958

[69] Zayene O., Hennebert J., Touj S., Ingold R., and Amara N., ‘‘A Dataset for Arabic Text Detection, Tracking and Recognition in News Videos- AcTiV,’’ in Proceedings of the 13th International Conference on Document Analysis and Recognition, Tunis, pp. 996-1000, 2015. DOI:10.1109/ICDAR.2015.7333911

[70] Zeng W., Meng Q., and Zhang S., “Natural Scene 588 The International Arab Journal of Information Technology, Vol. 21, No. 4, July 2024 Chinese Character Text Detection Method Based on Improved CTPN,” in Proceedings of the 3rd International Conference on Electrical, Mechanical and Computer Engineering, Guizhou, vol. 1314, no. 1, pp. 1-7, 2019. DOI:10.1088/1742-6596/1314/1/012200

[71] Zhang H., Zhao K., Song Y., and Guo J., “Text Extraction from Natural Scene Image: A Survey,” Neurocomputing, vol. 122, pp. 310-323, 2013. https://doi.org/10.1016/j.neucom.2013.05.037

[72] Zhang J., Xie Z., Sun J., Zou X., and Wang J., “A Cascaded R-CNN with Multiscale Attention and Imbalanced Samples for Traffic Sign Detection,” IEEE Access, vol. 8, pp. 29742-29754, 2020. DOI:10.1109/ACCESS.2020.2972338

[73] Zhang Q., Zhang M., Chen T., Sun Z., Ma Y., and Yu B., ‘‘Recent Advances in Convolutional Neural Network Acceleration,’’ Neurocomputing, vol. 323, pp. 37-51, 2019. https://doi.org/10.1016/j.neucom.2018.09.038

[74] Zhong L., Zheng X., and Su Y., “Improved EAST Scene Text Detection Based on ResNet-50,” in Proceedings of the International Conference on Computer Vision, Application, and Algorithm, Chongqing, pp. 155-159, 2022. https://doi.org/10.1117/12.2673275

[75] Zhou X., Yao C., Wen H., Wang Y., Zhou S., He W., and Liang J., “East: An Efficient and Accurate Scene Text Detector,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, pp. 5551-5560, 2017. DOI:10.1109/CVPR.2017.283

[76] Zhu Y., Zhang C., Zhou D., Wang X., Bai X., and Liu W., “Traffic Sign Detection and Recognition Using Fully Convolutional Network Guided Proposals,” Neurocomputing, vol. 214, pp. 758- 766, 2016. https://doi.org/10.1016/j.neucom.2016.07.009

[77] Zhu Z., Liang D., Zhang S., Huang X., Li B., and Hu S., “Traffic-Sign Detection and Classification in the Wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, pp. 2110-2118, 2016. DOI:10.1109/CVPR.2016.232