The International Arab Journal of Information Technology (IAJIT)


Robust Hearing-Impaired Speaker Recognition from Speech using Deep Learning Networks in Native

Several research works in speaker recognition have grown recently due to its tremendous applications in security, criminal investigations and in other major fields. Identification of a speaker is represented by the way they speak, and not on the spoken words. Hence the identification of hearing-impaired speakers from their speech is a challenging task since their speech is highly distorted. In this paper, a new task has been introduced in recognizing Hearing Impaired (HI) speakers using speech as a biometric in native language Tamil. Though their speech is very hard to get recognized even by their parents and teachers, our proposed system accurately identifies them by adapting enhancement of their speeches. Due to the huge variety in their utterances, instead of applying the spectrogram of raw speech, Mel Frequency Cepstral Coefficient features are derived from speech and it is applied as spectrogram to Convolutional Neural Network (CNN), which is not necessary for ordinary speakers. In the proposed system of recognizing HI speakers, is used as a modelling technique to assess the performance of the system and this deep learning network provides 80% accuracy and the system is less complex. Auto Associative Neural Network (AANN) is used as a modelling technique and performance of AANN is only 9% accurate and it is found that CNN performs better than AANN for recognizing HI speakers. Hence this system is very much useful for the biometric system and other security related applications for hearing impaired speakers.

[1] Andy-Jason C. and Kumar S., “An Appraisal on Speech and Emotion Recognition Technologies based on Machine Learning,” in International Journal of Recent Technology and Engineering, vol. 8, no. 1, pp. 2266-2276, 2020.

[2] Ahmad R., Naz S., Afzal M., Rashid S., Liwicki M., and Dengel A.,“A Deep Learning based Arabic Script Recognition System: Benchmark on KHAT,” The International Arab Journal of Information Technology, vol. 17, no. 3, pp. 299- 305, 2020.

[3] Al-Hassani R., Cagdas-Atilla D., and Aydin C., “Development of High Accuracy Classifier for the Speaker Recognition System,” Applied Bionics and Biomechanics, vol. 2021, 2021.

[4] Ashar A., Shahid-Bhatti M., and Mushtaq U., “Speaker Identification Using a Hybrid CNN- MFCC Approach,” in Proceedings of International Conference on Emerging Trends in Smart Technologies, Karachi, pp. 1-4, 2020.

[5] Bai Z. and Zhang Z., “Speaker Recognition Based on Deep Learning: an Overview,” Neural Networks, vol. 140, pp. 65-99, 2021.

[6] Chowdhury A. and Ross A., “Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 1616-1629, 2019.

[7] Dhakal P., Damacharla P., Javaid A., and Devabhaktun V., “A Near Real Time Automatic Speaker Recognition Architecture for Voice- Based User Interface,” Machine learning and Knowledge Extraction, vol. 1, no. 1, pp. 504-520, 2019.

[8] Ganvir S. and Lal N., “Automatic Speaker Recognition using Transfer Learning Approach of Deep Learning Models,” in Proceedings of 6th International Conference on Inventive Computation Technologies, Coimbatore, pp. 595- 601, 2021.

[9] Hagan M., Demuth H., and Beale M., Neural Network Design, Campus Pub. Service, 2002.

[10] Https:// tutorials/neural-network-models-r, Last Vested, 2022.

[11] Irum A., and Salman. A, “Speaker Verification Using Deep Neural Networks: A Review,” International Journal of Machine Learning and Computing, vol. 9, no. 1, pp. 20-25, 2019.

[12] Jeyalakshmi C. and Revathi A., “Efficient Speech Recognition System for Hearing Impaired Children in Classical Tamil Language,” International Journal of Biomedical Engineering Robust Hearing-Impaired Speaker Recognition from Speech using Deep Learning ... 111 and Technology, vol. 26, no. 1, pp. 84-100, 2018.

[13] Liang M. and Hu X., “Recurrent Convolutional Neural Network for Object Recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, pp. 3367-3375, 2015.

[14] Lukic Y., Vogt C., Dürr O., and Stadelmann T., “Speaker Identification and Clustering Using Convolutional Neural Networks,” in Proceedings of IEEE 26th International Workshop on Machine Learning for Signal Processing, Vietri sul Mare, pp. 1-6, 2016.

[15] Meftah A., Mathkour H., Kerrache S., and Ajami-Alotaibi Y., “Speaker Identification in Different Emotional States in Arabic and English,” IEEE Access, vol. 8, pp. 60070-60083, 2020.

[16] Polur P. and Miller G., “Investigation of an HMM/ANN Hybrid Structure in Pattern Recognition Application Using Cepstral Analysis of Dysarthric (Distorted) Speech Signals,” Medical Engineering and Physics, vol. 28, no. 8, pp. 741-748, 2006.

[17] Revathi A. and Jeyalakshmi C., “Robust Speech Recognition in Noisy Environment using Perceptual Features and Adaptive Filters,” in Proceedings of International Conference on Communication and Electronics Systems, Coimbatore, pp. 692-696, 2018.

[18] Revathi A. and Jeyalakshmi C., “A Challenging Task in Recognizing The Speech of The Hearing Impaired Using Normal Hearing Models in Classical Tamil Language,” Journal of Engineering research, vol. 5 no. 2, pp. 110-128, 2017.

[19] Revathi A., Jeyalakshmi C., and Thenmozhi K., “Person Authentication Using Speech As A Biometric Against Play Back Attacks,” Multimedia Tools and Applications, vol. 78, no. 24, pp. 1569-1582, 2019.

[20] Shi Y., Zhou J., Long Y., Li Y., and Mao H., “Addressing Text-Dependent Speaker Verification Using Singing Speech,” Applied Sciences, vo. 9, no. 13, pp. 2636, 2019.

[21] Sidi-Yakoub M., Selouani S., Zaidi B., and Bouchair A., “Improving Dysarthric Speech Recognition Using Empirical Mode Decomposition and Convolutional Neural Network,” EURASIP Journal on Audio, Speech, and Music Processing, no.1, pp. 1-7, 2020.

[22] Tripathi S. and Bhatnagar S., “Speaker Recognition,” in Proceedings of 3rd International Conference on Computer and Communication Technology, Allahabad, pp. 283-287, 2012.

[23] Zhao Y., Lin X., and Hu X., “Recurrent Convolutional Neural Network for Speech Processing,” in Proceedings of IEEE International conference on Acoustics, Speech and Signal Processing, New Orleans, pp. 5300- 5304, 2017.

[24] Zhang Z., Sun Z., Liu J., Chen J., Huo Z., and Zhang X., “Deep Recurrent Convolutional Neural Network: Improving Performance for Speech Recognition,” arXiv preprint arXiv:1611.07174, pp. 1-10, 2016.

[25] Zhao H., Zarar S., Tashev I., and Hui- Lee C., “Convolutional-Recurrent Neural Networks for Speech Enhancement,” in Proceedings of IEEE International conference on Acoustics Speech and Signal Processing, Calgary, pp. 2401-2405, 2018. 112 The International Arab Journal of Information Technology, Vol. 20, No. 1, January 2023 Jeyalakshmi Chelliah received B.E degree in Electronics and Communication Engineering from Bharathidasan University in 2002 and M.E. degree in Communication systems from Anna University, Chennai in 2008. She served as a faculty for 11 years in the Department of ECE,Trichy Engineering college, Tamilnadu. Since 2016 she has been with K.Ramakrishnan college of Engg., where she is working as Professor in ECE dept. She has obtained PhD degree from Anna University, Chennai in the field of Speech recognition of hearing-impaired people in 2015. Her research interest also includes speech processing, Image processing, Machine learning. She has published 35 papers in Reputed International journals and presented papers in more than 10 International Conferences. KiranBala Benny Presently working as a Head of the Department, Department of Artificial Intelligence and Data Science, K.Ramakrishnan College of Engineering (Autonomous), Trichy, TamilNadu, India. He received his Bachelor degree in B.Tech Information Technology, Master Degree in M.E Computer and Communication Engineering, Management degree in M.B.A Human Resource Management and Doctorate Degree in Ph.D Computer Science and Engineering (Field of Image Processing). He has having 10 years of Teaching & Research Experience and also published more than 50 papers in peer reviewed journal. Revathi Arunachalam has obtained B.E (ECE), M.E (Communication Systems), and Ph.D (Speech Processing) from National Institute of Technology, Tiruchirappalli, Tamilnadu, India in 1988, 1993 and 2009 respectively. She has been serving on the faculty of Electronics and Communication Engineering for 30 years and she is currently working as a Professor in the Department of ECE, SASTRA Deemed University, Thanjavur, India. She has published 40 papers in Reputed International journals and presented papers in more than 50 International Conferences. Her areas of interest include Speech processing, Signal processing, Image processing, Biometrics and Security, Communication Systems, Embedded Systems and Computer Networks. Viswanathan Balasubramanian is currently working as Associate Professor in Department of ECE, K. Ramakrishnan College of Engineering. He completed his B.Tech degree in Electronics Engineering from Madras Institute of Technology, Anna University, his M.S and PhD from Institute of Microtechnology and Swiss Federal Institute of Technology (EPFL) respectively. After completing his PhD, he worked as a Senior Analog Design Engineer in microelectronics companies (Semtech and Kandou) working on Sigma delta ADCs, linear regulators and SAR ADCs. His broad research interests are towards, Analog/Digital integrated circuit/systems design for biomedical and wireless transceiver applications. In total, he has more than 10 years of industrial and research experience in IC design and 3 years of academic experience.