The International Arab Journal of Information Technology (IAJIT)


Connectionist Temporal Classification Model for Dynamic Hand Gesture Recognition using RGB

Automatic classification of dynamic hand gesture is challenging due to the large diversity in a different class of gesture, Low resolution, and it is performed by finger. Due to a number of challenges many researchers focus on this area. Recently deep neural network can be used for implicit feature extraction and Soft Max layer is used for classification. In this paper, we propose a method based on a two-dimensional convolutional neural network that performs detection and classification of hand gesture simultaneously from multimodal Red, Green, Blue, Depth (RGBD) and Optical flow Data and passes this feature to Long-Short Term Memory (LSTM) recurrent network for frame-to-frame probability generation with Connectionist Temporal Classification (CTC) network for loss calculation. We have calculated an optical flow from Red, Green, Blue (RGB) data for getting proper motion information present in the video. CTC model is used to efficiently evaluate all possible alignment of hand gesture via dynamic programming and check consistency via frame-to-frame for the visual similarity of hand gesture in the unsegmented input stream. CTC network finds the most probable sequence of a frame for a class of gesture. The frame with the highest probability value is selected from the CTC network by max decoding. This entire CTC network is trained end-to-end with calculating CTC loss for recognition of the gesture. We have used challenging Vision for Intelligent Vehicles and Applications (VIVA) dataset for dynamic hand gesture recognition captured with RGB and Depth data. On this VIVA dataset, our proposed hand gesture recognition technique outperforms competing state-of-the-art algorithms and gets an accuracy of 86%.

[1] Alghamdi M., Alwajeeh T., Aljabeer F., Assegaff S., and Budiarto R., “Experimenting Hand- Gesture Image Recognition using Simple Deep Neural Network,” International Journal of Engineering and Technology, vol. 7, no. 3, pp.103-105, 2018.

[2] Altoff F., Lindl R., and Walchshausl L., “Robust Multimodal Hand And Head Gesture Recognition for Controlling Automotive Infotainment Systems,” VDI-Tagung: DerFahrer im 21, no. 4, pp. 1-10, 2005.

[3] Belbachir K., and Tlemsani R., “Temporal Neural System Applied to Arabic Online Characters Recognition,” The International Arab Journal of Information Technology, vol.16, no. 3A, pp.1-19, 2019.

[4] Chai X., Liu Z., Yin F., Liu Z., and Chen X., “Two streams Recurrent Neural Networks for Large-Scale Continuous Gesture Recognition,” in Proceedings of 23rd International Conference on Pattern Recognition (ICPR), Cancum, pp. 31-36, 2016.

[5] Chen Z., Zhuang Y., Qian Y., and Yu K., “Phone Synchronous Speech Recognition with CTC Lattices,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 1, pp. 90-101, 2017. Connectionist Temporal Classification Model for Dynamic Hand Gesture ... 505

[6] Danafar S., and Gheissari N., “Action Recognition for Surveillance Applications Using Optic Flow and SVM,” in Proceedings of Asian Conference on Computer Vision, Springer, Japan, pp. 457-466, 2007.

[7] El-Alfy E., Baigh Z., and Abdel-Aal R., “A Novel Approach for Face Recognition Using Fused GMDH-Based Networks,” The International Arab Journal of Information Technology, vol. 15, no. 3, pp. 369-377, 2018.

[8] Farneback G., “Two-Frame Motion Estimation Based on Polynominal Expansion,” in Proceedings of Scandinavian Conference on Image analysis, Berlin, pp. 363-370, 2003.

[9] Graves A., “Supervised sequence labeling, in Supervised Sequence Labelling with Recurrent Neural Networks,” Springer Berlin Heidelberg, Berlin, pp. 5-13, 2012.

[10] Hochreiter S., and Schmidhuber J., “Long Short- Term Memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.

[11] Hu W., Cai M., Chen K., Ding H., Sun L., Liang S., Mo X., and Huo Q., “Sequence Discriminative Training for Offline Handwriting Recognition by an Interpolated CTC and Lattice- Free MMI Objective Function,” in Proceedings of IAPR International Conference on Document Analysis and Recognition, Kyoto, pp. 61-66, 2017.

[12] Huang D., Fei-Fei L., and Niebles J., “Connectionist Temporal Modeling For Weakly Supervised Action Labeling,” in Proceedings of in European Conference on Computer Vision- Springer, Amsterdam, pp. 137-153, 2016.

[13] Lin M., Inoue N., and Shinoda K., “CTC Network with Statistical Language Modeling for Action Sequence Recognition in Videos,” in Proceedings of Workshop on ACM Multimedia, California, pp. 393-401, 2017.

[14] Liu Q., Wang L., and Huo Q., “A Study on Effects of Implicit and Explicit Language Model Information for DBLSTM-CTC Based Handwriting Recognition,” in Proceedings of International Conference on Document Analysis and Recognition (ICDAR), Tunis, pp. 461-465, 2015.

[15] Molchanov P., Gupta S., Kim K., and Kautz J., “Hand Gesture Recognition with 3D Convolutional Neural Networks,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, pp. 1-7, 2015.

[16] Molchanov P., Gupta S., Kim K., and Pulli K., “Multi-Sensor System for Driver’s Hand-Gesture Recognition,” in Proceedings of 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Ljubljana, pp. 1-8, 2015.

[17] Molchanov P., Yang X., Gupta S., Kim K., Tyree S., and Kautz J., “Online detection and Classification of Dynamic Hand Gestures with Recurrent 3d Convolutional Neural Network,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, pp. 4207-4215, 2016.

[18] Ohh-Bar E., and Trivedi M., “Hand Gesture Recognition In Real Time for Automotive Interfaces: A Multimodal Vision-Based Approach and Evaluations,” IEEE Trans. on Intelligent Transportation Systems, vol. 15, no. 6, pp. 2368-2377, 2014.

[19] Parada-Loira F., Gonzalez-Agulla E., and Alba- Castro J., “Hand Gestures to Control Infotainment Equipment in Cars,” in Proceedings of IEEE Intelligent Vehicles Symposium, Dearborn, pp. 1-6, 2014.

[20] Reddy N., Rao M., and Satyanarayana C., “A Novel Face Recognition System by the Combination of Multiple Feature Descriptors,” The International Arab Journal of Information Technology, vol.16, no. 4, pp. 669-676, 2019.

[21] Rumelhart D., Hinton G., and Williams R., “Learning representations by Backpropagating Errors,” Neurocomputing, vol. 5, pp. 696-699, 1988.

[22] Tran D., Bourdev L., Fergus R., Torresani L., and Paluri M., “Learning Spatiotemporal Features With 3d Convolutional Networks,” in Proceedings of IEEE International Conference on Computer Vision, Santiago, pp. 4489-4497, 2015.

[23] Tsironi E., Barros P., Weber C., and Wermeter S., “An analysis of Convolutional Long Short- Term Memory Recurrent Neural Networks for Gesture Recognition,” Neurocomputing, vol. 268, pp. 76-86, 2017.

[24] Tu Z., Xei W., Zhang D., Poppe R., Veltkamp R., Li b., and Yuan J., “A Survey of Variational and CNN-based Optical Flow Techniques,” Signal Processing: Image Communication, vol. 72, pp. 9-24, 2019.

[25] Yi J., Ni H., Wen Z., Liu B., and Tao J., “CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition,” Journal of Signal Processing Systems, vol. 90, no. 7, pp. 985-997, 2018.

[26] Zobl M., Nieschulz R., Gieger M., Lang M., and Rigoll G., “Gesture Components for Natural Interaction with in-Car Devices,” in Proceedings of Gesture-Based Communication in Human- Computer Interaction, Genova, pp. 448-459, 2004. 506 The International Arab Journal of Information Technology, Vol. 17, No. 4, July 2020 Sunil Patel is a Research Scholar at the Gujarat Technological University, Ahmedabad. He is currently working as an Assistant Professor in Government Engineering College, Patan, Gujarat, India. He received master’s degree from S. P. University, Vallabh Vidyanagar in 2008. He is a Computer Vision researcher and his research interests includes visual representation learning, object recognition, action recognition, video analysis, and deep learning. Ramji Makwana is a Managing Director of AIIVINE PXL PVT. LTD. He received Ph.D. degree from S. P. University, Vallabh Vidyanagar in 2011. He has authored several papers in major computer vision and multimedia conferences and journals. His research interests include Data mining, Soft computing and deep learning with applications on computer vision tasks, like object recognition, action recognition and Object tracking.