The International Arab Journal of Information Technology (IAJIT)


Multiclass SVM based Spoken Hindi Numerals

  This paper presents recognition of isolated Hindi n umerals using multiclass Support Vector Machine (SV M). The acoustic features in terms of Linear Predictive Cod ing (LPC), Mel+Frequency Cepstral Coefficients (MFC C) and combination of LPC and MFCC have been considered as inputs to t he recognition process. The extracted acoustic features are given as input to the SVM. The classification is performed i n two steps. In first step, a one+versus+all SVM classifier is used to identify the Hindi language. Further, in second step ten one +versus+all classifiers are used to recognize numerals. The linear, polynomial and RBF kernels are used for the constru ction of SVM for recognition purpose. In the first phase, the best kernel strategy was explored for a fixed number of frames of the speech signal. The highest recognition rate has been achieved using linear kernel strategy. Next, the number of frames in order to calculate LPCs and MFCCs was varied and recognition accuracy was calculated. The highest recognition ac curacy achieved in this study is 96.8%.       

[1] Aggarwal K. and Dave M., Application of Genetically Optimized Neural Networks for Hindi Speech Recognition System, in Proceeding of World Congress on Information and Communication Technologies , Mumbai, India, pp. 512-517, 2011.

[2] Aida-Zade K., Ardil C., and Rustamov S., Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems, International Journal of Signal Processing , vol. 3, no. 1, pp. 105-111, 2006.

[3] Allwein E., Schapire R., and Singer Y., Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers, Journal of Machine Learning Research , vol. 1, pp. 113-141, 2000.

[4] Atal B. and Rabiner L., A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition, IEEE Transaction on Acoustics, Speech and Signal Processing , vol. 24, no. 3, pp. 201-212, 1976.

[5] Burges C., A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery , vol. 2, no. 2, pp. 121- 167, 1998.

[6] Campbell W., Campbell J., Reynolds D., Singer E., and Torres-Carrasquillo P., Support Vector Machines for Speaker and Language Recognition, Computer Speech and Language , vol. 20, no. 2-3, pp. 210-229, 2006.

[7] Cerf P. and Compernolle D., A New Variable Frame Rate Analysis Method for Speech Recognition, IEEE Signal Processing Letters , vol. 1, no. 12, pp. 185-187, 1994.

[8] Chandrasekhar C. and Yegnanarayana B., A Constraint Satisfaction Model for Recognition of Stop Consonant Vowel (SCV) Utterances, IEEE Trans. on Speech and Audio Processing , vol. 10, no. 7, pp. 472-480, 2002.

[9] Cristianini N. and Taylor J., An Introduction to Support Vector Machines and Other Kernel+ Based Learning Methods , Cambridge University Press, 2000.

[10] Faycal Y. and Messaoud B., Comparative Performance Study of Several Features for Voiced/Non-Voiced Classification, the International Arab Journal of Information and Technology , vol. 11, no. 3, pp. 293-299, 2014.

[11] Ganapathiraju A., Hamaker J., and Picone J., Applications of Support Vector Machines to Speech Recognition, IEEE Trans. on Signal Processing , vol. 52, no. 8, pp. 2348-2355, 2004.

[12] Gangashetty S., Sekhar C., and Yegnanarayana B., Acoustic Model Combination for Recognition of Speech in Multiple Languages using Support Vector Machines, in Proceeding of IEEE International Joint Conference on Neural Networks , pp. 3065-3069, 2004.

[13] Gordan M., Kotropoulos C., and Pitas I., Application of Support Vector Machines Classifiers to Visual Speech Recognition, in Proceedings of IEEE International Conference on ICIP , pp. 129-132, 2002.

[14] Hai J. and Joo M., Improved Linear Predictive Coding Method for Speech Recognition, in Proceedings of IEEE International Conference on Information, Communication and Signal Processing , pp. 1614-1618, 2003.

[15] Hwang D. and Kim D., Near-Boundary Data Selection for Fast Support Vector Machines, Malaysian Journal of Computer Science , vol. 25, no. 1, pp. 23-37, 2012.

[16] Liu J., Wang Z., and Xiao X., A Hybrid SVM/DDBHMM Decision Fusion Modeling for Robust Continuous Digital Speech Recognition, Pattern Recognition Letters , vol. 28, no. 8, pp. 912-920, 2007.

[17] Manikandan J. and Venkataramani B., Evaluation of Multiclass Support Vector Machine Classifiers using Optimum Threshold- based Pruning Technique, IET Signal Processing , vol. 5, no. 5, pp. 506-513, 2011.

[18] Paul A., Das D., and Kamal M., Bangla Speech Recognition System using LPC and ANN, in Proceedings of the 7 th International Conference on Advances in Pattern Recognition , Kolkata, India, pp. 171-174, 2009.

[19] Peacocke R. and Graf D., An Introduction to Speech and Speaker Recognition, Computer, vol. 23, no. 8, pp. 26-33, 1990.

[20] Quatieri T., Discrete+Time Speech Signal Processing Principles and Practice , Prentice Hall, 2002.

[21] Rabiner L. and Juang B., Fundamentals of Speech Recognition , Pearson Education, 1993.

[22] Ramirez J., Yelamos P., Gorriz J., and Segura J., SVM Based Speech End Point Detection using Contextual Speech Features, IET Electronics Letters , vol. 42, no. 7, pp. 426-428, 2006. Multiclass SVM based Spoken Hindi Numerals Recognition 671

[23] Samudravijaya K., Hindi Speech Recognition, Acoustic Society of India , vol. 29, no. 1, pp. 385- 393, 2001.

[24] Sanand D. and Umesh S., VTLN using Analytically Determined Linear-Transformation on Conventional MFCC, IEEE Transaction on Audio, Speech and Language Processing , vol. 20, no. 5, pp. 1573-1584, 2012.

[25] Scholkopf B., Burges C., and Smola A., Advances in Kernel Methods: Support Vector Machines , Cambridge, MA: MIT Press, 1998.

[26] Shin J., Chang J., and Kim N., Voice Activity Detection based on Statistical Models and Machine Learning Approaches, Computer Speech and Language , vol. 24, no. 3, pp. 515- 530, 2010.

[27] Sloin A. and Burshtein D., Support Vector Machine Training for Improved Hidden Markov Modeling, IEEE Transaction on Signal Processing , vol. 56, no. 1, pp. 172-188, 2008.

[28] Solera-Urena R., Martin-Iglesias D., Gallardo- Antolin A., Pelaez-Moreno C., and Diaz-de- Maria F., Robust ASR using Support Vector Machines, Speech Communication , vol. 49, no. 4, pp. 253-267, 2007.

[29] Stone M., Cross Validation Choice and Assessment of Statistical Predictions, Journal of the Royal Statistical Society B , vol. 36, no. 2, pp. 111-147, 1974.

[30] Tan Z. and Lindberg B., Low-Complexity Variable Frame Rate Analysis for Speech Recognition and Voice Activity Detection, IEEE Journal of Selected Topics in Signal Processing , vol. 4, no. 5, pp. 798-807, 2010.

[31] Urena R., Moral A., Moreno C., Ramon M., and Maria F., Real-time Robust Automatic Speech Recognition using Compact Support Vector Machines, IEEE Transaction on Audio, Speech and Language Processing , vol. 20, no. 4, pp.1347-1361, 2012.

[32] Vapnik V., Statistical Learning Theory , Wiley NewYork, 1998.

[33] Vapnik V., The Nature of Statistical Learning Theory , Springer-Verlag, New York, 2000.

[34] Verma A., Kumar M., and Rajput N., A Large Vocabulary Continuous Speech Recognition System for Hindi, IBM Journal of Research and Development , vol. 48, no. 5-6, pp. 703-715, 2004.

[35] Weston J. and Watkins C., Support Vector Machines for Multi-Class Pattern Recognition, in Proceedings of European Symposium Artificial Neural Networks , Belgium, pp. 219-224, 1999. Rajendra Kumar Sharma received his PhD degree in 1993 from Indian Institute of Technology, Roorkee, India. Currently, he is Professor in School of Mathematics and Computer Applications, Thapar University, Patiala, India. He obtained his His research interests include soft computing, neural networks, and statistical methods in NLP. Teena Mittal received her MS degree of engineering from M.I.T.S. Gwalior in 2006. Currently, she is pursuing her PhD degree from Thapar University, Patiala, India. Her research interests include natural language processing, speech recognition, and machine learning.