The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Hybrid CNN Xception and Long Short-Term Memory Model for the Detection of Interpersonal Violence in Videos

It is common that interpersonal violence is recurrent in public spaces, these are manifested in different ways such as punching, slapping, kicking and pushing, being recorded by video surveillance cameras, these records of images are currently processed by algorithms that are able to detect interpersonal violence, but it is necessary to further improve performance. This paper proposes a hybrid model combining Xception Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) for the detection of violence in videos. We evaluate the effectiveness of our proposal using two datasets: the Hockey fight dataset and real life violence situations dataset. The results showed an accuracy of 93.90% and 98.50% respectively, highlighting that the best performance was achieved with the real life violence situations dataset, comparing the proposed hybrid model with other models proposed in related work, the one we propose shows better performance.

[1] Akash S., Moorthy R., Esha K., and Nathiya N., “Human Violence Detection Using Deep Learning Techniques,” in Proceedings of the 8th International Virtual Conference on Biosignals, Images, and Instrumentation, Online, pp. 1-12, 2022. DOI 10.1088/1742-6596/2318/1/012003

[2] Al-Dulaimi O. and Kurnaz S., “A Hybrid CNN- LSTM Approach for Precision Deepfake Image Detection Based on Transfer Learning,” Electronics, vol. 13, no. 9, pp. 1-22, 2024. https://doi.org/10.3390/electronics13091662

[3] Basavaraj G. and Kodli Post S., “Violence Detection in Real Life Videos Using Pre-Trained Models,” International Journal of Creative Research Thoughts, vol. 12, no. 10, pp. 670-670, 2024. https://www.ijcrt.org/papers/IJCRT2410662.pdf

[4] Calderon-Vilca H., Ramos K., Quiroz E., Rojas J., Vilca R., and Tarqui A., “The Best Model of Convolutional Neural Networks Combined with LSTM for the Detection of Interpersonal Physical Violence in Videos,” in Proceedings of the 29th Conference of Open Innovation Association, Tampere, pp. 81-86, 2021. DOI: 10.23919/FRUCT52173.2021.9435563

[5] Chollet F., “Xception: Deep Learning with Depthwise Separable Convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, pp. 1800-1807, 2017. DOI: 10.1109/CVPR.2017.195

[6] Elesawy M., Hussein M., and Abd El Massih M., Real Life Violence Situations Dataset, 1000 Videos Containing Real Street Fight and 1000 Video from other Classes, https://www.kaggle.com/datasets/mohamedmusta fa/real-life-violence-situations-dataset/data, Last Visited, 2024.

[7] Febin I., Jayasree K., and Joy P., “Violence Detection in Videos for an Intelligent Surveillance System Using MoBSIFT and Movement Filtering Algorithm,” Pattern Analysis and Applications, vol. 23, no. 2, pp. 611-623, 2020. https://doi.org/10.1007/s10044-019-00821-3

[8] Gangu. and Bhadrashetty A., “Violence Detection in Real Life videos Using Pre-Trained Models,” International Research Journal of Modernization in Engineering Technology and Science, vol. 6, no. 6, pp. 1825-1830, 2024. https://www.doi.org/10.56726/IRJMETS59050

[9] Gruosso M., Capece N., and Erra U., “Human Segmentation in Surveillance Video with Deep Learning,” Multimedia Tools and Applications, vol. 80, no. 1, pp. 1175-1199, 2021. https://doi.org/10.1007/s11042-020-09425-

[10] Gudla S. and Bhoi S., “A Study on Effect of Learning Rates Using Adam Optimizer in LSTM Deep Intelligent Model for Detection of DDoS Attack to Support Fog Based IoT Systems,” in Proceedings of the 1st International Conference Computing, Communication and Learning, Warangal, pp. 27-38, 2022. DOI: 10.1007/978-3- 031-21750-0_3

[11] Hussain A., Muhammad K., Ullah Hayat., Amin Ullah., and et al., “Anomaly Based Camera Prioritization in Large Scale Surveillance Networks,” Computers, Materials and Continua, vol. 70, no. 2, pp. 2171-2190, 2022. https://doi.org/10.32604/cmc.2022.018181

[12] Huszar V., Adhikarla V., Negyesi I., and Krasznay C., “Toward Fast and Accurate Violence Detection for Automated Video Surveillance Applications,” IEEE Access, vol. 11, pp. 18772-18793, 2023. DOI: 10.1109/ACCESS.2023.3245521

[13] Institute for Economics and Peace, Global Peace Index: Measuring Peace in a Complex World, https://www.visionofhumanity.org/wp- content/uploads/2021/06/GPI-2021-web-1.pdf, 1032 The International Arab Journal of Information Technology, Vol. 22, No. 5, September 2025 Last Visited, 2024.

[14] Instituto Nacional de Estadistica e Informatica (INEI), Estadísticas de Seguridad Ciudadana, https://www.gob.pe/institucion/inei/colecciones/6 094-estadisticas-de-seguridad- ciudadana?sheet=2, Last Visited, 2024.

[15] Instituto Nacional de Estadistica e Informatica (INEI), Victimizacion en el Peru 2010-2019, Principales Resultados, https://www.inei.gob.pe/media/MenuRecursivo/p ublicaciones_digitales/Est/Lib1730/Libro.pdf, Last Visited, 2024.

[16] Janbi N., Ghaseb M., and Almazroi A., “ESTS- GCN : An Ensemble Spatial-Temporal Skeleton- based Graph Convolutional Networks for Violence Detection,” International Journal of Intelligent Systems, vol. 2024, no. 1, pp. 1-19, 2024. https://doi.org/10.1155/2024/2323337

[17] Jaouedi N., Boujnah N., and Bouhlel M., “A Novel Recurrent Neural Networks Architecture for Behavior Analysis,” The International Arab Journal of Information Technology, vol. 18, no. 2, pp. 133-139, 2021. https://doi.org/10.34028/iajit/18/2/1

[18] Jebur S., Hussein K., Hoomod H., and Alzubaidi L., “Novel Deep Feature Fusion Framework for Multi-Scenario Violence Detection,” Computers, vol. 12, no. 9, pp. 14475-14482, 2023. https://doi.org/10.48084/etasr.7270

[19] Kang M., Park R., and Park H., “Efficient Spatio- Temporal Modeling Methods for Real-Time Violence Recognition,” IEEE Access, vol. 9, pp. 76270-76285, 2021. DOI: 10.1109/ACCESS.2021.3083273

[20] Khan M., El Saddik A., Gueaieb W., De Masi G., and Karray F., “VD-Net: An Edge Vision-based Surveillance System for Violence Detection,” IEEE Access, vol. 12, pp. 43796-43808, 2024. DOI: 10.1109/ACCESS.2024.3380192

[21] Magdy M., Fakhr M., and Maghraby F., “Violence 4D: Violence Detection in Surveillance Using 4D Convolutional Neural Networks,” IET Computer Vision, vol. 17, no. 3, pp. 282-294, 2023. https://doi.org/10.1049/cvi2.12162

[22] Mohammadi H. and Nazerfard E., “Video Violence Recognition and Localization Using a Semi-Supervised Hard Attention Model,” Expert Systems with Applications, vol. 212, pp. 118791, 2023. https://doi.org/10.1016/j.eswa.2022.118791

[23] Muiruri S., Okong’o M., and Mwathi D., “Enhancing Public Safety through Advanced Video Analysis: A Conv-LSTM-SVM Model for Violence Detection in Surveillance Footage,” East African Journal of Information Technology, vol. 7, no. 1, pp. 202-214, 2024. https://doi.org/10.37284/eajit.7.1.2117

[24] Rendon-Segador F., Alvarez-Garcia J., Salazar- Gonzalez J., and Tommasi T., “CrimeNet: Neural Structured Learning using Vision Transformer for Violence Detection,” Neural Networks, vol. 161, no. 1, pp. 318-329, 2023. https://doi.org/10.1016/j.neunet.2023.01.048

[25] Sanhueza A., Caffe S., Araneda N., Soliz P., San Roman-Orozco O., and Baer B., “Homicide among Young People in the Countries of the Americas,” Pan American Journal of Public Health, vol. 47, pp. 1-11, 2023. https://doi.org/10.26633/RPSP.2023.108

[26] Shrief Y., Hockey Fight Dataset, Fight and Non- Fight Videos, https://www.kaggle.com/datasets/yassershrief/ho ckey-fight-vidoes/data, Last Visited, 2024.

[27] Snoun A., Jlidi N., Bouchrika T., Jemai O., and Zaied M., “Towards a Deep Human Activity Recognition Approach Based on Video to Image Transformation with Skeleton Data,” Multimedia Tools and Applications, vol. 80, no. 19, pp. 29675- 29698, 2021. https://link.springer.com/article/10.1007/s11042- 021-11188-1

[28] Sumon S., Shahria M., Goni M., Hasan N., Almarufuzzaman A., and Rahman R., “Violent Crowd Flow Detection Using Deep Learning,” in Proceedings of the 11th Asian Conference, Intelligent Information and Database Systems, Yogyakarta, pp. 613-625, 2021. https://doi.org/10.1007/978-3-030-14799-0_53

[29] Sun F., Zhang J., Wu X., Zheng Z., and Yang X., “Video Anomaly Detection Based on Global- Local Convolutional Autoencoder,” Electronics, vol. 13, no. 22, pp. 1-18, 2024. https://doi.org/10.3390/electronics13224415

[30] Sung C. and Park J., “Design of an Intelligent Video Surveillance System for Crime Prevention: Applying Deep Learning Technology,” Periodicals Multimedia Tools and Applications, vol. 80, no. 26-27, pp. 34297-34309, 2021. https://doi.org/10.1007/s11042-021-10809-z

[31] Vijayakumar E., Puviarasan A., Natarajan P., and Ganesan S., “Optical Flow-based Feature Selection with Mosaicking and FrIFrO Inception V3 Algorithm for Video Violence Detection,” Engineering, Technology and Applied Science Research, vol. 14, no. 3, pp. 14475-14482, 2024. https://doi.org/10.48084/etasr.7270

[32] Wankhade A., Jaiswal S., and Tingane S., “Violence Detection in Surveillance Videos Using Artificial Intelligence,” International Journal of Engineering Research and Management, vol. 11, no. 5, pp. 32-39, 2024. https://www.ijerm.com/download_data/IJERM11 05009.pdf

[33] Wasim M., Ahmed I., Ahmad J., and Hassan M., “A Novel Deep Learning Based Automated Academic Activities Recognition in Cyber- Physical Systems,” IEEE Access, vol. 9, pp. Hybrid CNN Xception and Long Short-Term Memory Model for the Detection of Interpersonal ... 1033 63718-63728, 2021. DOI: 10.1109/ACCESS.2021.3073890

[34] Wu C. and Cheng Z., “A Novel Detection Framework for Detecting Abnormal Human Behavior,” Mathematical Problems in Engineering, vol. 2020, no. 1, pp. 1-9, 2020. https://doi.org/10.1155/2020/6625695

[35] Xu F., Luo Y., Sun C., and Zhao H., “Improved Convolutional Neural Network for Traffic Scene Segmentation,” Computer Modeling in Engineering and Sciences, vol. 138, no. 3, pp. 2691-2708, 2024. https://doi.org/10.32604/cmes.2023.030940

[36] Zhang L., Ruan X., and Wang J., “WiVi: A Ubiquitous Violence Detection System with Commercial WiFi Devices,” IEEE Access, vol. 8, pp. 6662-6672, 2020. DOI: 10.1109/ACCESS.2019.2962813

[37] Zhang P., Zhao X., Dong L., Lei W., Zhang W., and Lin Z., “A Framework for Detecting Fighting Behavior Based on Key Points of Human Skeletal Posture,” Computer Vision and Image Understanding, vol. 248, pp. 104123, 2024. https://doi.org/10.1016/j.cviu.2024.104123