The International Arab Journal of Information Technology (IAJIT)


Robust Hearing-Impaired Speaker Recognition from Speech using Deep Learning Networks in Native Language

Several research works in speaker recognition have grown recently due to its tremendous applications in security, criminal investigations and in other major fields. Identification of a speaker is represented by the way they speak, and not on the spoken words. Hence the identification of hearing-impaired speakers from their speech is a challenging task since their speech is highly distorted. In this paper, a new task has been introduced in recognizing Hearing Impaired (HI) speakers using speech as a biometric in native language Tamil. Though their speech is very hard to get recognized even by their parents and teachers, our proposed system accurately identifies them by adapting enhancement of their speeches. Due to the huge variety in their utterances, instead of applying the spectrogram of raw speech, Mel Frequency Cepstral Coefficient features are derived from speech and it is applied as spectrogram to Convolutional Neural Network (CNN), which is not necessary for ordinary speakers. In the proposed system of recognizing HI speakers, is used as a modelling technique to assess the performance of the system and this deep learning network provides 80% accuracy and the system is less complex. Auto Associative Neural Network (AANN) is used as a modelling technique and performance of AANN is only 9% accurate and it is found that CNN performs better than AANN for recognizing HI speakers. Hence this system is very much useful for the biometric system and other security related applications for hearing impaired speakers.

[1] Andy-Jason C. and Kumar S., “An Appraisal on Speech and Emotion Recognition Technologies based on Machine Learning,” in International Journal of Recent Technology and Engineering, vol. 8, no. 1, pp. 2266-2276, 2020.

[2] Ahmad R., Naz S., Afzal M., Rashid S., Liwicki M., and Dengel A.,“A Deep Learning based Arabic Script Recognition System: Benchmark on KHAT,” The International Arab Journal of Information Technology, vol. 17, no. 3, pp. 299- 305, 2020.

[3] Al-Hassani R., Cagdas-Atilla D., and Aydin C., “Development of High Accuracy Classifier for the Speaker Recognition System,” Applied Bionics and Biomechanics, vol. 2021, 2021.

[4] Ashar A., Shahid-Bhatti M., and Mushtaq U., “Speaker Identification Using a Hybrid CNN- MFCC Approach,” in Proceedings of International Conference on Emerging Trends in Smart Technologies, Karachi, pp. 1-4, 2020.

[5] Bai Z. and Zhang Z., “Speaker Recognition Based on Deep Learning: an Overview,” Neural Networks, vol. 140, pp. 65-99, 2021.

[6] Chowdhury A. and Ross A., “Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 1616-1629, 2019.

[7] Dhakal P., Damacharla P., Javaid A., and Devabhaktun V., “A Near Real Time Automatic Speaker Recognition Architecture for Voice- Based User Interface,” Machine learning and Knowledge Extraction, vol. 1, no. 1, pp. 504-520, 2019.

[8] Ganvir S. and Lal N., “Automatic Speaker Recognition using Transfer Learning Approach of Deep Learning Models,” in Proceedings of 6th International Conference on Inventive Computation Technologies, Coimbatore, pp. 595- 601, 2021.

[9] Hagan M., Demuth H., and Beale M., Neural Network Design, Campus Pub. Service, 2002.

[10] Https:// tutorials/neural-network-models-r, Last Vested, 2022.

[11] Irum A., and Salman. A, “Speaker Verification Using Deep Neural Networks: A Review,” International Journal of Machine Learning and Computing, vol. 9, no. 1, pp. 20-25, 2019.

[12] Jeyalakshmi C. and Revathi A., “Efficient Speech Recognition System for Hearing Impaired Children in Classical Tamil Language,” International Journal of Biomedical Engineering Robust Hearing-Impaired Speaker Recognition from Speech using Deep Learning ... 111 and Technology, vol. 26, no. 1, pp. 84-100, 2018.

[13] Liang M. and Hu X., “Recurrent Convolutional Neural Network for Object Recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, pp. 3367-3375, 2015.

[14] Lukic Y., Vogt C., Dürr O., and Stadelmann T., “Speaker Identification and Clustering Using Convolutional Neural Networks,” in Proceedings of IEEE 26th International Workshop on Machine Learning for Signal Processing, Vietri sul Mare, pp. 1-6, 2016.

[15] Meftah A., Mathkour H., Kerrache S., and Ajami-Alotaibi Y., “Speaker Identification in Different Emotional States in Arabic and English,” IEEE Access, vol. 8, pp. 60070-60083, 2020.

[16] Polur P. and Miller G., “Investigation of an HMM/ANN Hybrid Structure in Pattern Recognition Application Using Cepstral Analysis of Dysarthric (Distorted) Speech Signals,” Medical Engineering and Physics, vol. 28, no. 8, pp. 741-748, 2006.

[17] Revathi A. and Jeyalakshmi C., “Robust Speech Recognition in Noisy Environment using Perceptual Features and Adaptive Filters,” in Proceedings of International Conference on Communication and Electronics Systems, Coimbatore, pp. 692-696, 2018.

[18] Revathi A. and Jeyalakshmi C., “A Challenging Task in Recognizing The Speech of The Hearing Impaired Using Normal Hearing Models in Classical Tamil Language,” Journal of Engineering research, vol. 5 no. 2, pp. 110-128, 2017.

[19] Revathi A., Jeyalakshmi C., and Thenmozhi K., “Person Authentication Using Speech As A Biometric Against Play Back Attacks,” Multimedia Tools and Applications, vol. 78, no. 24, pp. 1569-1582, 2019.

[20] Shi Y., Zhou J., Long Y., Li Y., and Mao H., “Addressing Text-Dependent Speaker Verification Using Singing Speech,” Applied Sciences, vo. 9, no. 13, pp. 2636, 2019.

[21] Sidi-Yakoub M., Selouani S., Zaidi B., and Bouchair A., “Improving Dysarthric Speech Recognition Using Empirical Mode Decomposition and Convolutional Neural Network,” EURASIP Journal on Audio, Speech, and Music Processing, no.1, pp. 1-7, 2020.

[22] Tripathi S. and Bhatnagar S., “Speaker Recognition,” in Proceedings of 3rd International Conference on Computer and Communication Technology, Allahabad, pp. 283-287, 2012.

[23] Zhao Y., Lin X., and Hu X., “Recurrent Convolutional Neural Network for Speech Processing,” in Proceedings of IEEE International conference on Acoustics, Speech and Signal Processing, New Orleans, pp. 5300- 5304, 2017.

[24] Zhang Z., Sun Z., Liu J., Chen J., Huo Z., and Zhang X., “Deep Recurrent Convolutional Neural Network: Improving Performance for Speech Recognition,” arXiv preprint arXiv:1611.07174, pp. 1-10, 2016.

[25] Zhao H., Zarar S., Tashev I., and Hui- Lee C., “Convolutional-Recurrent Neural Networks for Speech Enhancement,” in Proceedings of IEEE International conference on Acoustics Speech and Signal Processing, Calgary, pp. 2401-2405, 2018.