The International Arab Journal of Information Technology (IAJIT)


A Novel Recurrent Neural Networks Architecture for Behavior Analysis

Behavior analysis is an important yet challenging task on computer vision area. However, human behavior is still a necessity in differents sectors. In fact, in the increase of crimes, everyone needs video surveillance to keep their belongings safe and to automatically detect events by collecting important information for the assistance of security guards. Moreover, the surveillance of human behavior is recently used in medicine fields to quickly detect physical and mental health problems of patients. The complex and the variety presentation of human features in video sequence encourage researches to find the effective presentation. An effective presentation is the most challenging part. It must be invariant to changes of point of view, robust to noise and efficient with a low computation time. In this paper, we propose new model for human behavior analysis which combine transfer learning model and Recurrent Neural Network (RNN). Our model can extract human features from frames using the pre-trained model of Convolutional Neural Network (CNN) the Inception V3. The human features obtained are trained using RNN with Gated Recurrent Unit (GRU). The performance of our proposed architecture is evaluated by three different dataset for human action, UCF Sport, UCF101 and KTH, and achieved good classification accuracy.

[1] Adama D., Lotfi A., Langensiepen C., Lee K., and Trindade P., “Human Activity Learning for Assistive Robotics Using A Classifier Ensemble,” Soft Computing, vol. 22, no. 21, pp. 7027-7039, 2018.

[2] Bobick A. and Davis J., “The Recognition of Human Movement Using Temporal Templates,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 257- 267, 2001.

[3] Cho K., Merrienboer B., Gülçehre Ç., Bougares F., Schwenk H., and Bengio Y., “Learning Phrase Representations Using RNN Encoder- Decoder for Statistical Machine Translation,” in Proceedings of the Conference on Empirical Methods in Natural Language, Doha, pp. 1724- 1734, 2014.

[4] Dollar P., Rabaud V., Cottrell G., and Belongie S., “Behavior Recognition via Sparse Spatio- Temporal Features,” in Proceedings of IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Beijing, pp. 65-72, 2005.

[5] Donahue J., Hendricks L., Rohrbach M., Venugopalan S., Guadarrama S., Saenko K., and Darrell T., “Long-Term Recurrent Convolutional Networks for Visual Recognition and Description,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp. 677-691, 2017.

[6] Gao Y., Xiang X., Xiong N., Huang B., Lee H., Alrifai R., Jiang X., and Fang Z., “Human Action Monitoring for Healthcare based on Deep Learning,” IEEE Access, vol. 6, pp. 52277- 52285, 2018.

[7] Hochreiter S. and Schmidhuber J., “Long Short- Term Memory,” Neural Computer, vol. 9, no. 8, pp. 1735-1780, 1997.

[8] Ji S., Xu W., Yang M., and Yu K., “3D Convolutional Neural Networks for Human Action Recognition,” IEEE Transactions on 138 The International Arab Journal of Information Technology, Vol. 18, No. 2, March 2021 Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221-231, 2012.

[9] Jin C., Do T., Liu M., and Kim H., “Real-Time Action Detection in Video Surveillance using a Sub-Action Descriptor with Multi- Convolutional Neural Networks,” Journal of Institute of Control, vol. 24, no. 3, pp. 298-308, 2018.

[10] Kamel A., Sheng B., Yang P., Li P., Shen R., and Feng D., “Deep Convolutional Neural Networks for Human Action Recognition Using Depth Maps and Postures,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 9, pp. 1806-1819, 2018.

[11] Khuangga M. and Widyantoro D., “Human Identification Using Human Body Features Extraction,” in Proceedings of International Conference on Advanced Computer Science and Information Systems, Yogyakarta, pp. 397-402 2018.

[12] Kingma D. and Adam J., “A Method for Stochastic Optimization,” in Proceedings of the 3rd International Conference for Learning Representations, San Diego, 2015.

[13] Laptev I., Marszalek M., Schmid C., and Rozenfeld B., “Learning Realistic Human Actions from Movies,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 2008.

[14] Oren M., Papageorgiou C., Sinha P., Osuna E., and Poggio T., “Pedestrian Detection Using Wavelet Templates,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, pp. 193-199, 1997.

[15] Perronnin F., Sánchez J., and Mensink T., “Improving the Fisher Kernel for Large-Scale Image Classification,” in Proceedings of European Conference on Computer Vision, Heraklion, pp. 143-156, 2010.

[16] Sargano A., Wang X., Angelov P., and Habib Z., “Human Action Recognition Using Transfer Learning with Deep Representations,” in Proceedings of International Joint Conference on Neural Networks, Anchorage, pp. 463-469, 2017.

[17] Shaout A. and Crispin B., “Streaming Video Classification Using Machine Learning,” The International Arab Journal of Information Technology, vol. 17, no. 4A, pp. 677-682, 2020.

[18] Shao L., Zhen X., Tao D., and Li X., “Spatio- Temporal Laplacian Pyramid Coding for Action Recognition,” IEEE Transactions on Cybernetics, vol. 44, no. 6, pp. 817-827, 2013.

[19] Silva V., Vidal F., and Romariz A., “Human Action Recognition Based on a Two-stream Convolutional Network Classifier,” in Proceedings of International Conference on Machine Learning and Applications, Cancun, pp. 774-778, 2017.

[20] Szegedy C., Vanhoucke V., Ioffe S., Shlens J., and Wojna Z., “Rethinking the Inception Architecture for Computer Vision,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, pp. 2818-2826, 2016.

[21] Wang H., Kläser A., Schmid C., and Liu C., “Dense Trajectories and Motion Boundary Descriptors for Action Recognition,” International Journal of Computer Vision, vol. 103, no. 1, pp. 60-79, 2013.

[22] Wang Y., Huang K., and Tan T., “Human Activity Recognition Based on R Transform,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, pp. 1-8, 2007.

[23] Yeffet L. and Wolf L., “Local Trinary Patterns for Human Action Recognition,” in Proceedings of 12th International Conference on Computer Vision, Kyoto, pp. 492-497, 2009.

[24] Zhen X., Shao L., Tao D., and Li X., “Embedding Motion and Structure Features for Action Recognition,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 7, pp. 1182-1190, 2013.

[25] Zeng M., Nguyen L., Yu B., Mengshoel O., Zhu J., Wu P., and Zhang J., “Convolutional Neural Networks for Human Activity Recognition using Mobile Sensors,” in Proceedings of the 6th International Conference on Mobile Computing, Applications and Services, Austin, pp. 197-205, 2014.