The International Arab Journal of Information Technology (IAJIT)


Human Activity Recognition Based on Transfer

A Gait History Image (GHI) is a spatial template that accumulates regions of motion into a single image in which moving pixels are brighter than others. A new descriptor named Time-Sliced Averaged Gradient Boundary Magnitude (TAGBM) is also designed to show the time variations of motion. The spatial and temporal information of each video can be condensed using these templates. Based on this opinion, a new method is proposed in this paper. Each video is split into N and M groups of consecutive frames, and the GHI and TAGBM are computed for each group, resulting spatial and temporal templates. Transfer learning with the fine-tuning technique has been used for classifying these templates. This proposed method achieves the recognition accuracies of 96.50%, 92.30% and 97.12% for KTH, UCF Sport and UCF-11 action datasets, respectively. Also it is compared with state-of-the-art approaches and the results show that the proposed method has the best performance.

[1] Abdelbaky A. and Aly S., “Human Action Recognition using Short-Time Motion Energy Template Images and Pcanet Features,” Neural Computing and Applications, vol. 23, pp. 12561- 12574 , 2020.

[2] Boualia S. and Amara N., “3D CNN for Human Action Recognition,” in Proceedings of 18th International Multi-Conference on Systems, Signals and Devices, Monastir, pp. 276-282, 2021.

[3] Chen L., Hoey J., Nugent C., Cook D., and Yu Z., “Sensor-Based Activity Recognition,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 6, pp. 790-808, 2012.

[4] Chou K., Prasad M., Wu D., Li D., Sharma N., Lin Y., Blumenstein M., Lin W., and Lin C., “Robust Feature-Based Automated Multi-View Human Action Recognition System,” IEEE Access, vol. 6, pp. 15283-15296, 2018.

[5] Ge H., Yan Z., Yu W., and Sun L., “An Attention Mechanism based Convolutional LSTM Network for Video Action Recognition,” Multimedia Tools and Applications, vol. 78, no. 14, pp. 20533-20556, 2019.

[6] Jaouedi N., Boujnah N., and Bouhlel M., “A Novel Recurrent Neural Networks Architecture for Behavior Analysis,” The International Arab Journal of Information Technology, vol. 18, no. 2, pp. 133-139, 2021.

[7] Ji S., Xu W., Yang M., and Yu K., “3D Convolutional Neural Networks for Human Action Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221-231, 2012.

[8] Karpathy A., Toderici G., Shetty S., Leung T., Sukthankar R., and Fei-Fei L., “Large-Scale Video Classification with Convolutional Neural Networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, pp. 1725-1732, 2014.

[9] Lee C., Tan A., and Tan C., “Time-Sliced Averaged Motion History Image for Gait Recognition,” Journal of Visual Communication and Image Representation, vol. 25, no. 5, pp. 822-826, 2014.

[10] Liu C., Liu J., He Z., Zhai Y., Hu Q., and Huang Y., “Convolutional Neural Random Fields for Action Recognition,” Pattern Recognition, vol. 59, pp. 213-224, 2016.

[11] Liu J., Luo J., and Shah M., “Recognizing Realistic Actions from Videos “in the wild”,” in Proceedings of Conference on Computer Vision and Pattern Recognition, Miami, pp. 1996-2003, 2009.

[12] Liu J. and Zheng N., “Gait History Image: A Novel Temporal Template for Gait Recognition,” in Proceedings of IEEE International Conference on Multimedia and Expo, Beijing, pp. 663-666, 2007.

[13] Ramasinghe S. and Rodrigo R., “Action Recognition by Single Stream Convolutional Neural Networks: An Approach Using Combined Motion and Static Information,” in Proceedings of 3rd IAPR Asian Conference on Pattern Recognition, Kuala Lumpur, pp. 101-105, 2015.

[14] Saremi M. and Yaghmaee F., “Efficient Encoding of Video Descriptor Distribution for Action Recognition,” Multimedia Tools and Applications, vol. 79, no. 9, pp. 6025-6043, 2020.

[15] Schuldt C., Laptev I., and Caputo B., “Recognizing Human Actions: A Local SVM Approach,” in Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, pp. 32-36, 2004.

[16] Soomro K. and Zamir A., in Computer Vision in Sports, Springer link, 2014.

[17] Ullah A., Ahmad J., Muhammad K., Sajjad M., and Baik S., “Action Recognition in Video Sequences using Deep Bi-Directional LSTM with CNN Features,” IEEE Access, vol. 6, pp. 1155-1166, 2017.

[18] Wang L., Xu Y., Cheng J., Xia H., Yin J., and Wu J., “Human Action Recognition by Learning Spatio-Temporal Features with Deep Neural Networks,” IEEE Access, vol. 6, pp. 17913- 17922, 2018.

[19] Wei J., Wang H., Yi Y., Li Q., and Huang D., “P3d-Ctn: Pseudo-3d Convolutional Tube Network for Spatio-Temporal Action Detection in Videos,” in Proceedings of IEEE International Conference on Image Processing, Taipei, pp. 300-304, 2019.

[20] Zare A., Moghaddam H., and Sharifi A., “Video Spatiotemporal Mapping for Human Action Recognition by Convolutional Neural Network,” Pattern Analysis and Applications, vol. 23, no. 1, pp. 265-279, 2020.

[21] Zhou Y., Pu N., Qian L., Wu S., and Xiao G., “Human Action Recognition In Videos of Realistic Scenes Based on Multi-Scale CNN Feature,” in Proceedings of Pacific Rim Conference on Multimedia, Harbin, pp. 316-326, 2017. Human Activity Recognition Based on Transfer Learning with Spatio-Temporal ... 845 Saeedeh Zebhi received the B.S. and M.S. degrees from the Department of Electrical Engineering, Yazd University of Iran, in 2009 and 2012, respectively. She is also currently a PHD candidate at yazd university. Her research interests include deep learning and video action recognition. SMT Almodarresi obtained his B.S. degree in Electronics Engineering and M.S. degree in Communication Systems, both from the Isfahan University of Technology, Isfahan, Iran. He also holds Ph.D. in Electronics (Intelligent Signal Processing) from University of Southampton, UK (Department of Electrical and Computer Science: ECS). He works at the Department of Electrical and Computer Engineering in Yazd University where he pursues his research interests in: 1) Networked Control Systems (NCS) 2) Neuro-Fuzzy Networks 3) Wireless Networks. Vahid Abootalebi received the B.S. and M.S. degrees in electrical Engineering from Sharif University of Technology, Tehran, Iran, in 1997 and 2000, respectively. He also received his Ph.D. degree in biomedical engineering from Amirkabir University of Technology, Tehran, Iran in 2006. Since 2007, he has been working as a faculty member of the Electrical Eng. Department of Yazd University, where he is currently an Associate Professor. His main research interests include biomedical signal processing and pattern recognition.