The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Improved YOLOv3-tiny for Silhouette Detection Using Regularisation Techniques

Although recent advances in Deep Learning (DL) algorithms have been developed in many Computer Vision (CV) tasks with a high accuracy level, detecting humans in video streams is still a challenging problem. Several studies have, therefore, focused on the regularisation techniques to prevent the overfitting problem which is one of the most fundamental issues in the Machine Learning (ML) area. Likewise, this paper thoroughly examines these techniques, suggesting an improved you Only Look Once (YOLO) v3-tiny based on a modified neural network and an adjusted hyperparameters file configuration. The obtained experimental results, which are validated on two experimental tests, show that the proposed method is more effective than the YOLOv3-tiny predecessor model. The first test which includes only the data augmentation techniques indicates that the proposed approach reaches higher accuracy rates than the original YOLOv3-tiny model. Indeed, Visual Object Classes (VOC) test dataset accuracy rate increases by 32.54 % compared to the initial model. The second test which combines the three tasks reveals that the adopted combined method wins a gain over the existing model. For instance, the labelled crowd_human test dataset accuracy percentage rises by 22.7 % compared to the data augmentation model.


[1] Ayadi S., Ben Said A., Jabbar R., Aloulou C., Chabbouh A., and Achballah A., “Dairy Cow Rumination Detection: A Deep Learning Approach,” in Proceedings of International Workshop on Distributed Computing for Emerging Smart Networks, Bizerte, pp. 123-139, 2020.

[2] Al-Sa’d M., Al-Ali A., Mohamed A., Khattab T., and Erbad A., “RF-Based Drone Detection And Identification Using Deep Learning Approaches: An Initiative Towards A Large Open Source Drone Database,” Future Generation Computer Systems, vol. 100, pp. 86-97, 2019.

[3] Ammous D., kallel A., Kammoun F., and Masmoudi N., “Analysis of Coding and Transfer of Arien Video Sequences from H. 264 Standard,” in Proceedings of 5th International Conference on Advanced Technologies for Signal and Image Processing, Sousse, pp. 1-5, 2020.

[4] David B. and Rangasamy D., “Spatial-Contextual Texture and Edge Analysis Approach for Unsupervised Change Detection of Faces in Counterfeit Images,” International Journal of Computers and Applications, vol. 37, no. 3-4, pp. 143-159, 2015.

[5] Dong X., Han Y., Li W., and Li B., “Pedestrian Detection in Metro Station Based on Improved SSD,” in Proceedings of IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering, Dalian, pp. 936-939, 2019.

[6] Everingham M., Van Gool L., Williams C., Winn J., and Zisserman A., “The Pascal Visual Object Classes (Voc) Challenge,” International Journal of Computer Vision, vol. 88, pp. 303-338, 2010.

[7] Ghalleb A., Boumaiza S., and Amara N., “Demographic Face Profiling Based on Age, Gender and Race,” in Proceedings of 5th International Conference on Advanced Technologies for Signal and Image Processing, Sousse, pp. 1-6, 2020.

[8] Gollapudi S., “Object Detection and Recognition,” in Proceedings of Learn Computer Vision Using OpenCV, Berkeley, pp. 97-117, 2019.

[9] Girshick R., Donahue J., Darrell T., and Malik J., “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, pp. 580-587, 2014.

[10] Girshick R., “Fast R-Cnn,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1440-1448, 2015.

[11] Felzenszwalb P., McAllester D., and Ramanan D., “A Discriminatively Trained, Multiscale, 'HIRUPDEOH3DUW0RGHO´in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, pp. 1-8, 2008.

[12] Huang M. and Wu Y., “GCS-YOLOV4-Tiny: A Lightweight Group Convolution Network for Multi-Stage Fruit Detection,” Mathematical Biosciences and Engineering, vol. 20, no. 1, pp. 241-268, 2022.

[13] He K., Zhang X., Ren S., and Sun J., “Deep Residual Learning for Image Recognition,” in Proceedings IEEE Conference on Computer Vision Pattern Recognition, Las Vegas, pp. 770- 778, 2016.

[14] Jiang Z., Zhao L., Li S., and Jia Y., “Real-Time Object Detection Method for Embedded Devices,” Computer Vision and Pattern Recognition, vol. 3, pp. 1-11, 2020.

[15] Jamiya S. and Rani E., “LittleYOLO-SPP: A Delicate Real-Time Vehicle Detection Algorithm,” Optik, vol. 225, pp. 165818, 2021.

[16] Kessentini Y., Besbes M., Ammar S., and Chabbouh A., “A Two-Stage Deep Neural Network for Multi-Norm License Plate Detection and Recognition,” Expert Systems with Applications, vol. 136, pp. 159-170, 2019.

[17] Kong W., Hong J., Jia M., Yao J., Cong W., Hu H., and Zhang H., “YOLOv3-DPFIN: A Dual- Path Feature Fusion Neural Network for Robust Real-Time Sonar Target Detection,” IEEE Sensors Journal, vol. 20, no. 7, pp. 3745-3756, 2019.

[18] Lin Y., Cai R., Lin P., and Cheng S., “A Detection Approach for Bundled Log Ends Using K-Median Clustering and Improved Yolov4-Tiny Improved YOLOv3-tiny for Silhouette Detection Using Regularisation Techniques 279 Network,” Computers and Electronics in Agriculture, vol. 194, pp. 106700, 2022.

[19] Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C., and Berg A., “Ssd: Single Shot Multibox Detector,” in Proceedings of European Conference on Computer Vision, Amsterdam, pp. 21-37, 2016.

[20] Lin T., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., and Zitnick C., “Microsoft Coco: Common Objects Incontext,” in Proceedings of Computer Vision-ECCV, Zurich, pp. 740-755, 2014.

[21] Mzoughi H., Njeh I., Wali A., Slima M., BenHamida A., Mhiri C., and Mahfoudhe K., “Deep Multi-Scale 3D Convolutional Neural Network (CNN) For MRI Gliomas Brain Tumor Classification,” Journal of Digital Imaging, vol. 33, no. 4, pp. 903-915, 2020.

[22] Nasri M., Hmani M., Mtibaa A., Petrovska- Delacretaz D., Slima M., and Hamida A., “Face Emotion Recognition From Static Image Based on Convolution Neural Networks,” in Proceedings of 5th International Conference on Advanced Technologies for Signal and Image Processing, Sousse, pp. 1-6, 2020.

[23] Niu J., Chen Y., Yu X., Li Z., and Gao H., “Data Augmentation on Defect Detection of Sanitary Ceramics,” in Proceedings of IECON the 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, pp. 5317-5322, 2020.

[24] Ogundoyin S., “An Autonomous Lightweight Conditional Privacy-Preserving Authentication Scheme with Provable Security for Vehicular Ad-Hoc Networks,” International Journal of Computers and Applications, vol. 42, no. 2, pp. 1-16, 2018.

[25] Pokkuluri K., Nedunuri S., “Crop Disease Prediction with Convolution Neural Network (CNN) Augmented with Cellular Automata,” The International Arab Journal of Information Technology, vol. 19, no. 5, pp. 765-773, 2022.

[26] Felzenszwalb P., Girshick R., McAllester D., and Ramanan D., “Object Detection with Discriminatively Trained Part-Based Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627- 1645, 2009.

[27] Prasetyo E., Suciati N., and Fatichah C., “Yolov4-Tiny and Spatial Pyramid Pooling for Detecting Head and Tail of Fish,” in Proceedings of International Conference on Artificial Intelligence and Computer Science Technology, Yogyakarta, pp. 157-161, 2021.

[28] Piotrowski A. and Napiorkowski J., “A Comparison of Methods to Avoid Overfitting in Neural Networks Training in The Case of Catchment Runoff Modelling,” Journal of Hydrology, vol. 476, pp. 97-111, 2013.

[29] Qi H., Xu T., Wang G., Cheng Y., and Chen C., MYOLOv3-Tiny: “A New Convolutional Neural Network Architecture for Real-Time Detection of Track Fasteners,” Computers in Industry, vol. 123, pp. 103303, 2020.

[30] Ren S., He K., Girshick R., and Sun J., “Faster r- Cnn: Towards Real-Time Object Detection with Region Proposal Networks,” Advances in Neural Information Processing Systems, vol. 28, 2015.

[31] Ren C., Kim D., and Jeong D., “A Survey of Deep Learning in Agriculture: Techniques and Their Applications,” Journal of Information Processing Systems, vol. 16, no. 5, pp. 1015- 1033, 2020.

[32] Redmon J., Divvala S., Girshick R., and Farhadi A., “You Only Look Once: Unified, Real-Time Object Detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, pp. 779-788, 2016.

[33] Redmon J. and Farhadi A, “YOLO9000: Better, Faster, Stronger,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, pp. 6517-6525, 2017.

[34] Redmon, J., and Farhadi, A, “YOLOv3: An LQFUHPHQWDO LPSURYHPHQW´  arXiv:1804.02767.

[Online]. Available: /web/20230216105855/https://arxiv.org/abs/1804 .02767, Last Visited, 2023.

[35] Ranjbar M., Mori G., and Wang Y., “Optimizing Complex Loss Functions in Structured Prediction,” in Proceedings of European Conference on Computer Vision, Heraklion, pp. 580-593, 2010.

[36] Redmon J., Darknet: Open source neural networks /web/20221224110653/https://pjreddie. com/darknet/, Last Visited, 2021.

[37] Srivastava N., Hinton G., Krizhevsky A., Sutskever I., and Salakhutdinov R., “Dropout:A Simple Way to Prevent Neural Networks From Overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.

[38] Shao S., Zhao Z., Li B., Xiao T., Yu G., Zhang X., and Sun J., “Crowdhuman: A Benchmark for Detecting Human in A Crowd,” arXiv preprint arXiv:1805.00123, 2018.

[39] Wang Y., Jia K., and Liu P., “Impolite Pedestrian Detection by Using Enhanced Yolov3- Tiny,” Journal of Artificial Intelligence, vol. 2, no. 3, pp. 113-124, 2020.

[40] /web/20221224110042/https://www.anavid.co/, Last Visited, 2021.

[41] /web/20221224110203/https://github.com/Cartu cho/mAP, Last Visited, 2021.

[42] /web/20221224110324/https://github.com/Alexe yAB/darknet, Last Visited, 2021. 280 The International Arab Journal of Information Technology, Vol. 20, No. 2, March 2023

[43] /web/20221224110506/http://host.robots.ox.ac.u k/pascal/VOC/, Last Visited, 2021.

[44] /web/20221224110807/https://cocodataset.org/

[45] /web/20221224111240/https://www.crowdhuman .org/, Last Visited, 2021.

[46] /web/20221224111209/https://www.cis.upenn.ed u/~jshi/ped_html/, Last Visited, 2021.

[47] Wang L., Shi J., Song G., and Shen I., “Object Detection Combining Recognition and Segmentation,” in Proceedings of Asian Conference on Computer Vision, Tokyo, pp. 189- 199, 2007.

[48] Xun Z., Wang L., and Liu Y., “Improved Face Detection Algorithm Based on Multitask Convolutional Neural Network for Unmanned Aerial Vehicles View,” Journal of Electronic Imaging, vol. 31, no. 6, pp. 061804, 2022.

[49] Yang Z., Xu W., Wang Z., He X., Yang F., and Yin Z., “Combining YOLOV3-Tiny Model with Dropblock for Tiny-Face Detection,” in Proceedings of IEEE 19th International Conference on Communication Technology, Xi'an, pp. 1673-1677, 2019.

[50] Yolo: Open Source Neural Networks in C. Availableonline: /web/20221224105904/https://pjreddie.com/dark net/yolo/, Last Visited, 2021.

[51] Ying X., “An Overview of Overfitting and Its Solutions,” in Journal of Physics: Conference Series, vol. 1168, no. 2, pp. 022022, 2019.

[52] Yi Z., Yongliang S., and Jun Z., “An Improved Tiny-Yolov3 Pedestrian Detection Algorithm,” Optik, vol. 183, pp. 17-23, 2019.

[53] Zhang P., Zhong Y., and Li X., “SlimYOLOv3: Narrower, Faster And Better for Real-Time UAV Applications,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, 2019.

[54] Zhang S., Wen L., Bian X., Lei Z., and Li S., “Single-shot Refinement Neural Network For Object Detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, pp. 4203-4212, 2018. Improved YOLOv3-tiny for Silhouette Detection Using Regularisation Techniques 281 Donia Ammous obtained her bachelor degree at FSS (Faculty of Sciences of Sfax), Tunisia, in 2008. She received her MS degree in Electrical Engineers from the National School of Engineering (ENIS), Sfax, Tunisia, in 2012. She is currently a PhD student in the Laboratory of Electronics and Information Technology (LETI) ENIS, University of Sfax. Her main research activities include image\video processing on H.264/AVC, lossless video compression, cryptography and data security, remote sensing, UAV, computer vision and deep learning. Achraf Chabbouh received his engineering degree from Higher School of Communication of Tunis. He currently works as University Teacher at Higher Institute of Technological Studies of Sidi bouzid, Tunisia. He is an experienced team player with a strong technical background especially in artificial intelligence, web and mobile application technology. He coordinates multiple complexes IT projects with many stakeholders in different fields, such as retail, agriculture, geospatial and finance. Awatef Edhib received her engineering degree National Engineering School of Sfax (ENIS) in 2018. She currently works as a Research and Development IA engineer for Sogimel. She is passionate about artificial intelligence and innovation. She has a strong technical background in artificial intelligence, especially deep learning and computer vision. Ahmed Chaari Ahmed Chaari received his Ph.D. in Automation and Industrial Engineering from Lille University, France in 2009. He worked as IT Program Manager in different companies from 2010 to 2018 in France, Sweden and Portugal. He is currently General Manager at Anavid France. His research interests include artificial intelligence, computer vision and data analysis. Fahmi Kammoun received the DEA degree in automatic and signal processing from the University of Pierre et Marie Curie (Paris VI)- France in 1987, the Ph.D. degree in signal processing from the University of Orsay (Paris XI)-France in 1991. His doctoral work focused on the luminance uniformity, the contrast enhancement, the edges detection and gray-level video analysis. He received the HDR degree in electrical engineering from Sfax National School of Engineering (ENIS)-Tunisia in 2007. He is currently a professor in the department of physics at the Faculty of Sciences of Sfax (FSS)- University of Sfax. He is a member of the Laboratory of Electronics and Information Technology (LETI) - Tunisia. His current research interests include video quality metrics, video compression, video encryption, face and silhouette recognition, and Artificial Intelligence. Nouri Masmoudi received his electrical engineering degree from the Faculty of Sciences and Techniques-Sfax, Tunisia, in 1982, the DEA degree from the National Institute of Applied Sciences—Lyon and University Claude Bernard- Lyon, France in 1984. From 1986 to 1990, he achieved his Ph.D. degree at the National School Engineering of Tunis (ENIT), Tunisia and obtained in 1990. He is currently a professor at the electrical engineering department, ENIS. Since 2000, he has been a director of ‘Circuits and Systems’ in the Laboratory of Electronics and Information Technology. Since 2003, he has been responsible for the Electronic Master Program at ENIS. His research activities have been devoted to several topics: Design, Telecommunication, Embedded Systems, Information Technology, Video Coding and Image Processing.