The International Arab Journal of Information Technology (IAJIT)


An Optimized Model for Visual Speech Recognition Using HMM

Visual Speech Recognition (VSR) is to identify spoken words from visual data only without the corresponding acoustic signals. It is useful in situations in which conventional audio processing is ineffective like very noisy environments or impossible like unavailability of audio signals. In this paper, an optimized model for VSR is introduced which proposes simple geometric projection method for mouth localization that reduces the computation time.16-point distance method and chain code method are used to extract the visual features and its recognition performance is compared using the classifier Hidden Markov Model (HMM). To optimize the model, more prominent features are selected from a large set of extracted visual attributes using Discrete Cosine Transform (DCT). The experiments were conducted on an in-house database of 10 digits [1 to 10] taken from 10 subjects and tested with 10-fold cross validation technique. Also, the model is evaluated based on the metrics specificity, sensitivity and accuracy. Unlike other models in the literature, the proposed method is more robust to subject variations with high sensitivity and specificity for the digits 1 to 10. The result shows that the combination of 16-point distance method and DCT gives better results than only 16-point distance method and chain code method.

[1] Azmi A. and Nasien D., Freeman Chain Code Representation in Signature Fraud Detection Based on Nearest Neighbour and Artificial Neural Network Classifiers, International Journal of Image Processing, vol. 8, no. 6, pp. 434-454, 2014.

[2] Borde P., Varpe A., Manza R., and Yannawar P., Recognition of Isolated Words using Zernike and MFCC Features for Audio Visual Speech Recognition, International Journal of Speech An Optimized Model for Visual Speech Recognition Using HMM 201 Technology, vol. 18, no. 2, pp. 167-175, 2014.

[3] Estellers V., Gurban M., and Thiran J., On Dynamic Stream Weighting for Audio-Visual Speech Recognition, IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 4, pp. 1145-1157, 2012.

[4] Lienhart R. and Maydt J., An Extended Set of Haar-like Features for Rapid Object Detection, in Proceedings of IEEE International Conference on Image Processing, Rochester, pp. 900-903, 2002.

[5] Liu H. and Srinath M., Corner Detection from Chain-code, Pattern Recognition, vol. 23, no. 1- 2, pp. 51-68, 1990.

[6] Minotto V., Lopes C., Scharcanski J., Jung C., and Lee B., Audiovisual Voice Activity Detection Based on Microphone Arrays and Color Information, IEEE Journal of Signal Processing, vol. 7, no.1, pp. 147-156, 2013.

[7] Morade S. and Patnaik S., A Novel LipReading Algorithm by Using Localized ACM and HMM: Tested for Digit Recognition, Optik- International Journal for Light and Electron Optics, vol. 125, no. 18, pp. 5181-5186, 2014.

[8] Morade S. and Patnaik S., Lip Reading Using DWT and LSDA, in Proceedings of IEEE International Conference on Advance Computing Conference, Gurgaon, pp. 1013-1018, 2014.

[9] Petajan E., Automatic Lipreading to Enhance Speech Recognition (speech reading), PhD Dissertation, University of Illinois at Urbana- Champaign, 1984.

[10] Petajan E., Bischoff B., Bodoff D., and Brooke N., An Improved Automatic Lipreading System to Enhance Speech Recognition, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Washington, pp. 19-25, 1988.

[11] Potamianos G., Neti C., Gravier G., Garg A., and Senior A., Recent Advances in the Automatic Recognition of Audiovisual Speech, in Proceedings of the IEEE, vol. 91, no. 9, pp. 1306-1326, 2003.

[12] Rabiner L., A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, in Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, 1989.

[13] Sagheer A., Tsuruta N., Taniguchi R., and Maeda S., Appearance Feature Extraction Versus Image Transform-based Approach for Visual Speech Recognition, International Journal of Computational Intelligence and Applications, vol. 6, no. 1, pp. 101-122, 2006.

[14] Shaikh A., Kumar D., and Gubbi J., Visual Speech Recognition using Optical Flow and Support Vector Machines, International Journal of Computational Intelligence and Applications, vol. 10, no. 2, pp. 167-187, 2011.

[15] Singh P., Laxmi V., and Gaur M., Near-Optimal Geometric Feature Selection for Visual Speech Recognition, International Journal of Pattern Recognition and Artificial Intelligence, vol. 27, no. 8, 2013.

[16] Sridhar D. and Krishna I., Face Recognition Using Two Dimensional Discrete Cosine Transform, Linear Discriminant Analysis And K Nearest Neighbor Classifier, IAES International Journal of Artificial Intelligence, vol. 1, no. 4, pp. 161-170, 2012.

[17] Sujatha P. and Radhakrishnan M., Real Time Lip Tracking for Human-Computer Interaction, International Journal of Engineering Research and Technology, vol. 2, no. 11, pp. 3455-3461, 2013.

[18] Sumby W. and Pollack I., Visual Contribution to Speech Intelligibility in Noise, The Journal of the Acoustical Society of America, vol. 26, no. 2, pp. 212-215, 1954.

[19] Uddin M., Kim D., and Kim T., A Human Activity Recognition System using HMMs with GDA on Enhanced Independent Component Features, The International Arab Journal of Information Technology, vol. 12, no. 3, pp. 304- 310, 2015.

[20] Viola P. and Jones M., Robust Real-time Object Detection, International Journal of Computer Vision, vol. 4, pp. 34-47, 2001. Sujatha Paramasivam is an Associate professor in the Department of Computer Science and Enineering in Sudharsan Engineering College, India. Her specialization in B.E and M.E degree was Computer Science and Engineering from Anna University and Annamalai University, India. Currently, she is pursuing her PhD in Anna University, India. Se has a teaching experience of 10 years and 4 years in research. Her area of interest is in the field of image processing, Computer Vision and data mining. Radhakrishnan Murugesanadar is currently a Professor in the Department of Civil Engineering, Sethu Institute of Technology, India.He has more than 43 years of teaching experience. His field of interest includes Computer Aided Structural Analysis, Computer Networks, Image Processing and Effort Estimation.