Downloads 704

..............................

Views 2k

..............................

Cited by

..............................

Received date May 23, 2020

Accepted date October 21, 2021

VoxCeleb1: Speaker Age-Group Classification using Probabilistic Neural Network

Author Ameer Badr, Alia Abdul-Hassan,

Keywords #Speaker age-group recognition #features fusion #SSC #F0 #jitter and shimmer

Abstract The human voice speech includes essentially paralinguistic information used in many applications for voice recognition. Classifying speakers according to their age-group has been considered as a valuable tool in various applications, as issuing different levels of permission for different age-groups. In the presented research, an automatic system to classify speaker age-group without depending on the text is proposed. The Fundamental Frequency (F0), Jitter, Shimmer, and Spectral Sub-Band Centroids (SSCs) are used as a feature, while the Probabilistic Neural Network (PNN) is utilized as a classifier for the purpose of classifying the speaker utterances into eight age-groups. Experiments are carried out on VoxCeleb1 dataset to demonstrate the proposed system's performance, which is considered as the first effort of its kind. The suggested system has an overall accuracy of roughly 90.25%, and the findings reveal that it is clearly superior to a variety of base- classifiers in terms of overall accuracy.

References

[1] Amich H., Mohamed M., and Zrihui M., “Multi- Level Improvement for a Transcription Generated by Automatic Speech Recognition System for Arabic,” The International Arab Journal of Information Technology, vol. 16, no. 3, pp. 460-466, 2019. 0102030405060708090100 Adaboost LR Decision tree GNB SVM Bagging RF KNN PNN 39.25 64 64.25 65 67 78 85.25 87.5 90.25 Overall Accuracy Classifiers VoxCeleb1: Speaker Age-Group Classification using Probabilistic Neural Network 859

[2] Bahari M. and Van hamme H., “Speaker Age Estimation Using Hidden Markov Model Weight Supervectors,” in Proceedings of 11th International Conference on Information Science, Signal Processing and their Applications, Montreal, pp. 517-521, 2012.

[3] Bolat B. and Sert S., “Classification of Parkinson’s Disease by Using Probabilistic Neural Networks,” in Proceedings of International Symposium on Innovations in Intelligent Systems and Applications, Trabzon, 2009.

[4] Cheveigne A. and Kawahara H., “YIN, a Fundamental Frequency Estimator for Speech and Music,” The Journal of the Acoustical Society of America, vol. 111, no. 4, pp. 1917-1930, 2002.

[5] Chougule S. and Chavan M., “Speaker Recognition in Mismatch Conditions: A Feature Level Approach,” International Journal of Image, Graphics and Signal Processing, vol. 4, pp. 37-43, 2017.

[6] Eskenazi M., Mostow J., and Graff D., “The CMU Kids Corpus,” in Linguistic Data Consortium, Philadelphia, USA, 1997.

[7] Faek F., “Objective Gender and Age Recognition from Speech Sentences,” The Scientific Journal of Koya University, vol. 3, no. 2, pp. 24-29, 2015.

[8] Farrús M., Hernando J., and Ejarque P., “Jitter and Shimmer Measurements for Speaker Recognition,” in Proceedings of 8th Annual Conference of the International Speech Communication Association, Antwerp, pp. 778- 781, 2007.

[9] Feinberg R. “Parselmouth Praat Scripts in Python,” OSF, 2019.

[10] Ghahremani P., Nidadavolu P., Chen N., Villalba J., Povey D., Khudanpur S., and Dehak N., “End- to-End Deep Neural Network Age Estimation,” in Proceedings of Interspeech, Hyderabad, pp. 277- 281, 2018.

[11] Grzybowska J. and Kacprzak S., “Speaker Age Classification and Regression Using I-Vectors,” in Proceedings of Interspeech, San Francisco, 2016.

[12] Kinnunen T., Zhang B., Zhu J., and Wang Y., “Speaker Verification with Adaptive Spectral Subband Centroids,” in Proceedings of International Conference on Biometrics, Seoul, Korea, pp. 58-66, 2007.

[13] Minematsu N., Sekiguchi M., and Hirose K., “Automatic Estimation of One’s Age with His/her Speech Based Upon Acoustic Modeling Techniques of Speakers,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, pp. 137-140, 2002.

[14] Mohebali B., Tahmassebi A., Meyer-Baese A., and Gandomi A., “Probabilistic Neural Networks: A Brief Overview of Theory, Implementation, and Application,” Handbook of Probabilistic Models, pp. 347-367, 2020.

[15] Muller C. and Burkhardt F., “Combining Short- term Cepstral and Long-Term Pitch Features for Automatic Recognition of Speaker Age,” in Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech), Antwerp, pp. 2277-2280, 2007.

[16] Nagraniy A., Chungy J., and Zisserman A., “VoxCeleb: a Large-Scale Speaker Identification Dataset,” in Proceedings of the Interspeech, Stockholm, 2007.

[17] Naini A. and Homayounpour M., “Speaker Age Interval and Sex Identification Based on Jitters, Shimmers and Mean MFCC using Supervised and Unsupervised Discriminative Classification Methods,” in Proceedings of 8th International Conference on Signal Processing, Guilin, 2006.

[18] Paliwal K., “Spectral Subband Centroid Features for Speech Recognition,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), Seattle, pp. 617-620, 1998.

[19] Pribil J., Pribilova A., and Matousek J., “GMM- based Speaker Age and Gender Classification in Czech and Slovak,” Journal of Electrical Engineering, vol. 68, no. 1, pp. 3-12, 2017.

[20] Sedaaghi M., “A Comparative Study of Gender and Age Classification in Speech Signals,” Iranian Journal of Electrical and Electronic Engineering, vol. 5, no. 1, pp. 1-12, 2009.

[21] Sharma G., Umapathy K., and Krishnan S., “Trends in Audio Signal Feature Extraction Methods,” Applied Acoustics, vol. 158, 2020. 107020,

[22] Yücesoy E. and Nabiyev V., “A New Approach with Score-Level Fusion for the Classification of A Speaker Age and Gender,” Computers and Electrical Engineering, vol. 53, pp. 29-39, 2016.

[23] Zazo R., Nidadavolu P., Chen N., Rodriguez J., and Dehak N., “Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks, IEEE Access, vol. 6, pp. 22524-22530, 2018. 860 The International Arab Journal of Information Technology, Vol. 19, No. 6, November 2022 Ameer Badr received the B.Sc. degree and the M.Sc. degree in System Software from Computer Science Department, University of Technology, Baghdad, Iraq, in 2014 and 2018 respectively, and the Ph.D. degree in AI from Computer Science Department, University of Technology, Baghdad, Iraq, in 2021. Currently, he is a lecturer at Imam Ja’afar Al-Sadiq University, Salahaddin, Iraq. He has authored or co- authored more than 10 refereed journal and conference papers. His research interests include AI, Machine learning, Speech processing, Speech Enhancement, Speech recognition, Speaker recognition and verification, and, Voice-based HRI. Alia Abdul-Hassan received the B.Sc. degree, the M.Sc. degree and the Ph.D. degree from Computer Science Department, University of Technology, Baghdad, Iraq, in 1993, 1999 and 2004 respectively. She is working as a Dean of Computer Science Department since Feb 2019 till now .She was supervised on more than 30 M.Sc. & Ph.D. thesis in Computer Science since 2007. Her research interests include Soft computing, Green computing, AI, Data Mining, Software Engineering, Electronic Management, and Computer security.