Downloads 722

..............................

Views 2k

..............................

Cited by

..............................

Received date May 8, 2024

Accepted date September 2, 2024

Image Object and Scene Recognition Based on Improved Convolutional Neural Network

Author Guoyan Li, Fei Wang,

Keywords #Object recognition #deep learning #scene recognition #convolutional neural network #sliding window fusion

Abstract

In recent years, due to the continuous optimization of network structure and the emergence of large-scale data, Convolutional neural network has made breakthroughs in a series of applications of computer vision. Based on this, the Convolutional neural network is improved and optimized. The improved convolutional neural network is introduced into image Object detection and scene recognition, and image object detection is carried out by combining sliding window fusion and Convolutional neural network. The image scene recognition model is constructed by using potential object area recognition and Convolutional neural network Transfer learning. Using different data sets to verify the algorithm, the research results show that in Group1 and Group2, the error rate of the multi column convolutional neural network fused by sliding window is reduced by about 25% compared with the single column convolutional neural network. As the group with the smallest decrease in error rate, Group3 also achieved a 9% decrease in error rate. The fitness rate of object detection algorithm is gradually stable after 7 runs, reaching about 9.8%, and its operation effect is obviously better than other algorithms. The multi column convolutional neural network fused by sliding window is more adaptive to the training data set, and gets better recognition effect in the algorithm operation. However, the image scene recognition model based on potential object area recognition algorithm and Convolutional neural network has good convergence. The average recognition time for image scenes is 1.5356s. The recognition speed is fast and stable, which can effectively solve the problem of multi-scale image scene recognition.

References

[1] Afif M., Ayachi R., Said Y., and Atri M., “Deep Learning-Based Application for Indoor Scene Recognition,” Neural Processing Letters, vol. 51, no. 3, pp. 2827-2837, 2020. https://doi.org/10.1007/s11063-020-10231-w

[2] Anami B. and Sagarnal C., “Influence of Different Activation Functions on Deep Learning Models in Indoor Scene Images Classification,” Pattern Recognition and Image Analysis, vol. 32, no. 1, pp. 78-88, 2020. https://doi.org/10.1134/S1054661821040039

[3] Andriyanov N., Dementiev V., and Kargashin Y., “Analysis of the Impact of Visual Attacks on the Characteristics of Neural Networks in Image Recognition,” Procedia Computer Science, vol. 186, no. 12, pp. 495-502, 2021. https://doi.org/10.1016/j.procs.2021.04.170

[4] Bai H., Zhang H., and Wang Q., “Dual Discriminative Auto-Encoder Network for Zero Shot Image Recognition,” Journal of Intelligent and Fuzzy Systems, vol. 40, no. 12, pp. 1-12, 2021. https://doi.org/10.3233/JIFS-201920

[5] Carpenter C., “Machine-Learning Image Recognition Enhances Rock Classification,” Journal of Petroleum Technology, vol. 72, no. 10, pp. 63-64, 2020. https://doi.org/10.2118/1020- 0063-JPT

[6] Chen A., Hong S., Wang Y., Li C., Yang C., and Chen H., “Rapid Assessment of Gasoline Quality by Near-Infrared (NIR) Deep Learning Model Combined with Fractional Derivative Pretreatment,” Analytical Letters, vol. 55, no. 11, pp. 1745-1756, 2022. https://doi.org/10.1080/00032719.2021.2024219

[7] Chen Z., Su Y., Liu Y., Huang J., and Cao W., “Key Technologies of Intelligent Transportation Based on Image Recognition,” International Journal of Advanced Robotic Systems, vol. 17, no. 3, pp. 110- 120, 2020. https://doi.org/10.1177/1729881420917277

[8] Cooper M., Krishnan R., and Bhat M., “Deep Learning and the Future of the Model for End- Stage Liver Disease-Sodium Score,” Liver Transplantation, vol. 28, no. 7, pp. 1128-1130, 2022. DOI:10.1002/lt.26485

[9] Corti E., Khanna A., Niang K., Robertson J., Moselund K., Gotsmann B., Datta S., and Karg S., “Time-Delay Encoded Image Recognition in a Network of Resistively Coupled VO₂ on Si Oscillators,” IEEE Electron Device Letters, vol. 41, no. 4, pp. 629-632, 2020. DOI:10.1109/LED.2020.2972006

[10] Daradkeh Y., Tvoroshenko I., Gorokhovatskyi V., Latiff L., and Ahmad N., “Development of Effective Methods for Structural Image Recognition Using the Principles of Data Granulation and Apparatus of Fuzzy Logic,” IEEE Access, vol. 9, no. 99, pp. 13417-13428, 2021. DOI:10.1109/ACCESS.2021.3051625

[11] Gautam S. and Dharv G., “Detection of Novel Corona Virus Using Machine Learning and Image Recognition,” International Journal for Modern Trends in Science and Technology, vol. 6, no. 12, pp. 394-397, 2020. https://doi.org/10.46501/IJMTST061274

[12] Ge Z., Cao G., Li X, and Fu P., “Hyperspectral Image Classification Method Based on 2D-3D CNN and Multibranch Feature Fusion,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 5776-5788, 2020. DOI:10.1109/JSTARS.2020.3024841

[13] Han X., “An Improved Classification Model for English Syntax Error Correction Design of DL Algorithm,” The International Arab Journal of Information Technology, vol. 21, no. 4, pp. 560- 570, 2024. DOI:10.34028/iajit/21/4/2

[14] Hua W., Guan X., and Jiang X., “Clinical Study on Gastroscopy Image Recognition Model Based on Artificial Intelligence in Diagnosis of Chronic Atrophic Gastritis,” Chinese Journal of Gastroenterology, vol. 12, pp. 588-593, 2020. DOI:10.3760/cma.j.cn112148-20200420-00123

[15] Kwon K. and Lee H., “Vespa Mandarinia Image Recognition Using Image Fused Preprocessing and Deep Learning,” Journal of Digital Contents Society, vol. 21, no. 10, pp. 1855-1862, 2020. http://journal.dcs.or.kr/xml/26359/26359.pdf

[16] Lee K., Na J., Sohn J., Sohn S., and Lee S., “Image Recognition Algorithm for Maintenance Data Digitization: CNN and FCN,” Transactions of the Korean Society for Noise and Vibration Engineering, vol. 30, no. 2, pp. 136-142, 2020. DOI:10.5050/KSNVE.2020.30.2.136

[17] Li Z., Zhou A., and Shen Y., “An End-To-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment,” Sensors, vol. 20, no. 6, pp. 1556-1571, 2020. DOI:10.3390/s20061556

[18] Long S. and Zhao X., “Smart Teaching Mode Based on Particle Swarm Image Recognition and Human-Computer Interaction Deep Learning,” Journal of Intelligent and Fuzzy Systems, vol. 39, no. 4, pp. 5699-5711, 2020. https://doi.org/10.3233/JIFS-179762

[19] Maschler B., Kamm S., and Weyrich M., “Deep Industrial Transfer Learning at Runtime for Image Recognition,” Automatisierungstechnik, vol. 69, no. 3, pp. 211-220, 2021. https://doi.org/10.1515/auto-2020-0119

[20] Matsuzaki S., Miura J., and Masuzawa H., “Multi- Source Pseudo-Label Learning of Semantic Segmentation for the Scene Recognition of Agricultural Mobile Robots,” Advanced Robotics, Image Object and Scene Recognition Based on Improved Convolutional Neural Network 937 vol. 36, no. 19, pp. 1011-1029, 2022. https://doi.org/10.48550/arXiv.2102.06386

[21] Moradipour K., Fallah M., Abdali E., and Asadi S., “Efficiency Evaluation in Hybrid Two-Stage Network DEA with Non-Discretionary Inputs and Shared Discretionary Inputs,” International Journal of Computer Mathematics: Computer Systems Theory, vol. 7, no. 1, pp. 33-41, 2022. https://doi.org/10.1080/23799927.2021.1983876

[22] Prajwalasimha S., Sahana G., and Vaani K., “Iris Image Recognition Based on Combined Hamming and Cosine Distances Approach,” International Journal of Advanced Science and Technology, vol. 29, no. 4, pp. 6708-6719, 2020.

[23] Rafique A., Gochoo M., Jalal A., and Kim K., “Maximum Entropy Scaled Super Pixels Segmentation for Multi-Object Detection and Scene Recognition Via Deep Belief Network,” Multimedia Tools and Applications, vol. 82, no. 9, pp. 63-64, 2023. DOI:10.1007/s11042-022- 13717-y

[24] Rehman A., Saleem S., Khan U., Jabeen S., and Shafiq M., “Scene Recognition by Joint Learning of DNN from Bag of Visual Words and Convolutional DCT Features,” Applied Artificial Intelligence, vol. 35, no. 9, pp. 623-641, 2020. https://doi.org/10.1080/08839514.2021.1881296

[25] Seong H., Hyun J., and Kim E., “Fosnet: An End- To-End Trainable Deep Neural Network for Scene Recognition,” arxiv Preprint, vol. arXiv:1907.075702020, pp. 1-11, 2019. DOI:10.48550/arXiv.1907.07570

[26] Tan K., Xu Y., Zhang S., Yu M., and Yu D., “Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 3, pp. 542-553, 2020. DOI:10.1109/jstsp.2020.2987209

[27] Wang C., Peng G., and Lin W., “Self-Weighted Discriminative Metric Learning Based on Deep Features for Scene Recognition,” Multimedia Tools and Applications, vol. 79, no. 3-4, pp. 2769- 2788, 2020. https://doi.org/10.1007/s11042-019- 08486-0

[28] Yu D., Xu Q., Guo H., Zhao C., Lin Y., And Li D., “An Efficient and Lightweight Convolutional Neural Network for Remote Sensing Image Scene Classification,” Sensors, vol. 20, no. 7, pp. 63-64, 2020. DOI:10.3390/S20071999