Gammachirp Filter Banks Applied in Roust Speaker Recognition Based on GMM-UBM Classifier

Author Lei Deng and Yong Gao,

Keywords #Feature extraction #gammachirp filter bank #RASTA #CMVN #GMM-UBM

Abstract In this paper, authors propose an auditory feature extraction algorithm in order to improve the performance of the speaker recognition system in noisy environments. In this auditory feature extraction algorithm, the Gammachirp filter bank is adapted to simulate the auditory model of human cochlea. In addition, the following three techniques are applied: cube-root compression method, Relative Spectral Filtering Technique (RASTA), and Cepstral Mean and Variance Normalization algorithm (CMVN).Subsequently, based on the theory of Gaussian Mixes Model-Universal Background Model (GMM-UBM), the simulated experiment was conducted. The experimental results implied that speaker recognition systems with the new auditory feature has better robustness and recognition performance compared to Mel-Frequency Cepstral Coefficients (MFCC), Relative Spectral-Perceptual Linear Predictive (RASTA-PLP),Cochlear Filter Cepstral Coefficients (CFCC) and gammatone Frequency Cepstral Coefficeints (GFCC).

References

[1] Abdallah A. and Hajaiej Z., “Improved Closed Set Text Independent Speaker Identification System Using Gammachirp Filterbank in Noisy Environments,” in Proceedings of 11th International Mulit-conference on Systems, Signals and Devices, Barcelona, pp. 1-5, 2014.

[2] Abushariah M., Ainon R., Zainuddin R., Elshafei M., and Khalifa O., “Arabic Speaker- Independent Continuous Automatic Speech Recognition Based on A Phonetically Rich and Balanced Speech Corpus,” The International Arab Journal of Information Technology, vol. 9, no. 1, pp. 84-93, 2012.

[3] Bouchamekh M., Bousseksouand B., and Berkani D., “Gammachirp Filterbank Based Speech Analysis for Speaker Identification,” in Proceedings of 8th International Conference on Computational Intelligence, Man-Machine Systems and Cybernetics, USA, pp. 19-23, 2009.

[4] Chougule S. and Chavan M., “Channel Robust MFCCs for Continuous Speech Speaker Recognition,” Advances in Signal Processing and Intelligent Recognition Systems, vol. 264, pp. 557-568, 2014.

[5] Chowdhury F., Selouani S., and O'Shaughnessy D., “Distributed Automatic Text-Independent Speaker Identification Using GMM-UBM Speaker Models,” in Proceedings of Canadian Conference on Electrical and Computer Engineering, St. John's, pp. 372-375, 2009.

[6] Chu K. and Leung S., “SNR-dependent Nonuniform Spectral Compression for Noisy Speech Recognition,” in Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing, Montreal, pp. I-973, 2004.

[7] Hermansky H. and Morgan N., “RASTA Processing of Speech,” IEEE Transactions on Speech and Audio Processing, vol. 2, no. 4, pp. 578-589, 1994.

[8] Hermansky H., Morgan N., Bayya A., and Kohn P., “RASTA-PLP Speech Analysis Technique,” in Proceedings of IEEE International Conference Acoustics Speech Signal Processing, San Francisco, pp. 121-124, 2002. Gammachirp Filter Banks Applied in Roust Speaker Recognition Based on GMM-UBM Classifier 177

[9] Irino T. and Patterson R., “A Time-Domain, Level-Dependent Auditory Filter: The Gammachirp,” The Journal of Acoustical Society of America, vol. 101, no. 1, pp. 412-419, 1997.

[10] Irino T. and Unoki M., “A Time-Varying, Analysis/Synthesis Auditory Filterbank Using the Gammachirp,” in Proceeding of IEEE International Conference Acoustics Speech and Signal Processing, Seattle, pp. 3653-3656, 2002.

[11] Jacobson G., “Magnetoencephalographic Studies of Auditory System Function,” Journal of Clinical Neurophysiology, vol. 11, no. 3, pp. 343- 364, 1994.

[12] Jin Q., Robust Speaker Recognition, Theses, Carnegie Mellin University, 2007.

[13] Li Q. and Huang Y., “An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification under Mismatched Conditions,” IEEE Transactions on Audio Speech Language Processing, vol. 19, no. 6, pp. 1791-1801, 2011.

[14] Li Q. and Huang Y., “Robust Speaker Identification Using an Auditory-Based Feature,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, pp. 4514-4517, 2010.

[15] Mammone R., Zhang X., and Ramachandran R., “Robust Speaker Recognition: A Feature-Based Approach,” IEEE Signal Processing Magazine, vol. 13, no. 5, pp. 58-71, 1996.

[16] Muda L., Begam M., and Elamvazuthi I., “Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient and Dynamic Time Warping Techniques,” Journal of Computing, vol. 2, no. 3, pp. 138-143, 2010.

[17] Prasad N. and Umesh S., “Improved Cepstral Mean and Variance Normalization Using Bayesian Framework,” in Proceeding of IEEE Automatic Speech Recognition and Understanding, Olomouc, pp. 156-161, 2013.

[18] Reynolds D. and Rose R., “Robust Text- Independent Speaker Identification Using Gaussian Mixture Speaker Models,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 72-83, 1995.

[19] Salhi K., Hajaiej Z., and Ellouze N., “A Novel Approach for Auditory Spectrum Enhancement to Improve Speech Recognition's Robustness,” in Proceedings of 12th IEEE International Multi- Conference on Systems, Signals and Devices, Mahdia, pp. 1-5, 2015.

[20] Tazi E., Benabbou A., and Harti M., “Efficient Text Independent Speaker Identification Based on GFCC and CMN Methods,” in Proceedings of International Conference on Multimedia Computing and Systems, Tangier, pp. 90-95, 2012.

[21] Vikram C. and Umarani K., “Text Independent Classification of Normal and Pathological Voices Using Mfccs and GMM-UBM,” in Proceedings of IEEE International Conference on Information and Communication Technologies, Thuckalay, pp. 980-985, 2013. Lei Deng was born in Sichuan, China in 1993. She received the B.S. degree from the College of information science and technology, Chengdu University of technology, Chengdu, China in 2015.She is currently pursuing the M.S. degree at the College of Electronics and Information Engineering, Sichuan University, Chengdu, China. Her research area manly includes speaker recognition, language identification, and speech signal processing. Yong Gao (Corresponding author: gaoyong@scu.edu.cn) was born in Xi’an, China in 1969. He received the M.S. and Ph.D. degrees from the school of Communication and Information Engineering, University of Electronic Science and Technology of China, Chengdu, in 1997 and 2000, respectively. He is a professor in College of Electronics and Information Engineering, Sichuan University. His research area mainly includes speech signal processing, anti-interference and anti- interception technology in communication, modulation recognition, emergency communication, array signal processing, blind analysis of signal.