Recognition of Spoken Bengali Numerals Using

Author Avisek Gupta and Kamal Sarkar,

Keywords #Speech recognition #isolated digits #principal component analysis #support vector machines #multi-layered perceptrons #random forests

Abstract This paper presents a method of automatic recognition of Bengali numerals spoken in noise-free and noisy environments by multiple speakers with different dialects. Mel Frequency Cepstral Coefficients (MFCC) are used for feature extraction, and Principal Component Analysis is used as a feature summarizer to form the feature vector from the MFCC data for each digit utterance. Finally, we use Support Vector Machines, Multi-Layer Perceptrons, and Random Forests to recognize the Bengali digits and compare their performance. In our approach, we treat each digit utterance as a single indivisible entity, and we attempt to recognize it using features of the digit utterance as a whole. This approach can therefore be easily applied to spoken digit recognition tasks for other languages as well.

References

[1] Abushariah M., Ainon R., Zainuddin R., Elshafei M., and Khalifa O., Arabic Speaker- Independent Continuous Automatic Speech Recognition Based on a Phonetically Rich and Balanced Speech Corpus, The International Arab Journal of Information Technology, vol. 9, no. 1, pp. 84-93, 2012.

[2] Ali A., Hossain M., and Bhuiyan N., Automatic Speech Recognition Technique for Bangla Words, International Journal of Advanced Science and Technology, vol. 50, pp. 51-60, 2013.

[3] Alotaibi A., Comparative Study of ANN and HMM to Arabic Digit Recognition Systems, Engineering Sciences, vol. 19, no. 1, pp. 43-60, 2008.

[4] Breiman L., Bagging Predictors, Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.

[5] Breiman L., Random Forests, Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.

[6] Brookes M., VOICEBOX: Speech Processing Toolbox for MATLAB, http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voi cebox.html, Last Visited 2014.

[7] Caruana R., Karampatziakis N., and Yessenalina A., An Empirical Evaluation of Supervised Learning in High Dimensions, in Proceedings of the 25th International Conference on Machine Learning, Finland, pp. 96-103, 2008.

[8] Davis H., Biddulph R., and Balashek S., Automatic Recognition of Spoken Digits, The Journal of the Acoustical Society of America, vol. 24, no. 6, pp. 637-642, 1952.

[9] Ghanty K., Shaikh H., and Chaki N., On Recognition of Spoken Bengali Numerals, in Proceedings of Computer Information Systems and Industrial Management Applications, Poland, pp. 54-59, 2010.

[10] Eaton W., Bateman D., and Hauberg S., GNU Octave Version 4 3.0.1 Manual: a High-Level Interactive Language for Numerical Computations, CreateSpace Independent Publishing Platform, 2009.

[11] Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., and Witten H., The WEKA Data Mining Software: An Update, SIGKDD Explorations, vol. 11, no. 1, pp. 10-18, 2009.

[12] Jurafsky D. and Martin H., Speech and Language Processing: An Introduction to Natural Language Processing, Computational Recognition of Spoken Bengali Numerals Using MLP, SVM, RF Based Models with ... 269 Linguistics, and Speech Recognition, 2nd Edition, Prentice Hall, 2008.

[13] Kumar K. and Aggarwal K., Hindi Speech Recognition System using HTK, International Journal of Computing and Business Research, vol. 2, no. 2, 2011.

[14] Martin A., Charlet D., and Mauuary L., Robust Speech/non-speech Detection using LDA Applied to MFCC, in Proceedings of the 2001 IEEE International Conference on Acoustics, Speech and Signal Processing, Salt Lake, pp. 237-240, 2001.

[15] Muhammad G., Alotaibi A., and Huda N., Automatic Speech Recognition for Bangla Digits, in Proceedings of the 12th International Conference on Computers and Information Technology, Dhaka, pp. 379-383, 2009.

[16] Muhammad G. and Alghathbar K., Environment Recognition for Digital Audio Forensics using MPEG-7 and Mel Cepstral Features, The International Arab Journal of Information Technology, vol. 10, no. 1, pp. 43-50, 2013.

[17] Othman Z., Abdullah N., Razak Z., and Mohd- Yusoff M., Speech to Text Engine for Jawi Language, The International Arab Journal of Information Technology, vol. 11, no. 5, pp. 507- 513, 2014.

[18] Vapnik V., The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995.

[19] Zheng F., Zhang G., and Song Z., Comparison of Different Implementations of MFCC, Journal of Computer Science and Technology, vol. 16, no. 6, pp. 582-589, 2001. Avisek Gupta He has obtained an M.E. degree from Jadavpur University, Kolkata, India, and has previously obtained a B.Tech. Degree from Future Institute of Engineering and Management, Kolkata, India. His research interests include Speech Recognition, Information Retrieval, and Machine Learning. Kamal Sarkar He received his B.E degree in Computer Science and Engineering from the Faculty of Engineering, Jadavpur University in 1996. He received the M.E degree and Ph.D. (Engg) in Computer Science and Engg. From the same University in 1999 and 2011 respectively. In 2001, he joined as a lecturer in the Department of Computer Science & Engineering, Jadavpur University, Kolkata, where he is currently a professor. His research interest includes Natural Language Processing, Machine Learning, Text Summarization, Text Mining, Speech Recognition.