The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Environmental Noise Adaptable Hearing Aid using Deep Learning

Speech de-nosing is one of the essential processes done inside hearing aids, and has recently shown a great improvement when applied using deep learning. However, when performing the speech de-noising for hearing aids, adding noise frequency classification stage is of a great importance, because of the different hearing loss types. Patients who suffer from sensorineural hearing loss have lower ability to hear specific range of frequencies over the others, so treating all the noise environments similarly will result in unsatisfying performance. In this paper, the idea of environmental adaptable hearing aid will be introduced. A hearing aid that can be programmed to multiply the background noise by a weight based on its frequency and importance, to match the case and needs of each patient. Furthermore, a more generalized Deep Neural Network (DNN) for speech enhancement will be presented, by training the network on a diversity of languages, instead of only the target language. The results show that the learning process of DNN for speech enhancement is more efficient when training the network using diversity of languages. Moreover, the idea of adaptable hearing aid is shown to be promising and achieved 70% overall accuracy. This accuracy can be improved using a larger environmental noise dataset.


[1] Charniak E., Introduction to Deep Learning, MIT Press, 2019.

[2] Dave N., “Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition,” International Journal for Advance Research in Engineering and Technology, vol. 1, pp. 1-4, 2013.

[3] Du J. and Huo Q., “A Speech Enhancement Approach Using Piecewise Linear Approximation of an Explicit Model of Environmental Distortions,” in Proceeding of the 9th Annual Conference of the International Speech Communication Association, Brisbane, 2008.

[4] Fu S., Tsao Y., Lu X., and Kawai H., “Raw Waveform-based Speech Enhancement by Fully Convolutional Networks,” in Proceeding of the Asia-pacific Signal and Information Processing Association Annual Summit and Conference, Kuala Lumpur, pp. 006-012, 2017.

[5] Grais E. and Plumbley M., “Single Channel Audio Source Separation Using Convolutional Denoising Autoencoders,” in Proceeding of the IEEE Global Conference on Signal and Information Processing, Montreal, pp. 1265-1269, 2017.

[6] Hinton G., Srivastava N., Krizhevsky A., Sutskever I., and Salakhutdinov R., “Improving Neural Networks By Preventing Co-adaptation of Feature Detectors,” arXiv, 2012.

[7] Hu G., “100 Nonspeech Environmental Sounds,” The Ohio State University, Department of Computer Science and Engineering, 2004.

[8] Itu R., “Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-end Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs,” Recommendation ITU-T P. 862, 2001.

[9] Johnsson L. and Hawkins J., “Sensory and Neural Degeneration with Aging, As Seen in Microdissections of the Human Inner Ear,” Annals of Otology, Rhinology and Laryngology, vol. 81, no. 2, pp. 179-193, 1972.

[10] Leaver A. and Rauschecker J., “Cortical Representation of Natural Complex Sounds: Effects of Acoustic Features and Auditory Object Category,” Journal of Neuroscience, vol. 30, no. 22, pp. 7604-7612, 2010.

[11] LeCun Y., Bengio Y., and Hinton G., “Deep Learning,” Nature, vol. 521, no. 7553, pp. 436- 444, 2015.

[12] Meyer W., “Programmable Hearing Aid with Automatic Adaption to Auditory Conditions,” ed: Google Patents, 1997.

[13] Nadol J. and Joseph B., “Hearing Loss,” New England Journal of Medicine, vol. 329, no. 15, pp. 1092-1102, 1993.

[14] Nossier S., Rizk M., Moussa N., and Shehaby S., “Enhanced Smart Hearing Aid Using Deep Neural Networks,” Alexandria Engineering Journal, vol. 58, no. 2, pp. 539-550, 2019.

[15] Pan S. and Yang Q., “A Survey on Transfer Learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345-1359, 2009.

[16] Panayotov V., Chen G., Povey D., and Khudanpur S., “Librispeech: an ASR Corpus Based on Public Domain Audio Books,” in Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing, South Brisbane, pp. 5206- 5210, 2015.

[17] Pascual S., Park M., Serrà J., Bonafonte A., and Ahn K., “Language and Noise Transfer in Speech Enhancement Generative Adversarial Network,” in Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, pp. 5019-5023, 2018.

[18] Peters G., Baum L., Peters M., and Tonkin- Leyhausen B., “Spectral Characteristics of Intense Mew Calls in Cat Species of the Genus Felis 840 The International Arab Journal of Information Technology, Vol. 19, No. 5, September 2022 (Mammalia: Carnivora: Felidae),” Journal of ethology, vol. 27, no. 2, p. 221-237, 2009.

[19] Piczak K., “ESC: Dataset for Environmental Sound Classification,” in Proceedings of the 23rd ACM international conference on Multimedia, Brisbane, pp. 1015-1018, 2015.

[20] Piczak K., “Environmental Sound Classification with Convolutional Neural Networks,” in Proceeding of the IEEE 25th International Workshop on Machine Learning for Signal Processing, Boston, pp. 1-6, 2015.

[21]

[21] Pongrácz P., Molnár C., and Miklósi Á., “Acoustic Parameters of Dog Barks Carry Emotional Information for Humans,” Applied Animal Behaviour Science, vol. 100, no. 3-4, pp. 228-240, 2006.

[22] Sajid S., Javed A., and Irtaza A., “An Effective Framework for Speech and Music Segregation,” The International Arab Journal of Information Technology, vol. 17, no. 4, pp. 507-514, 2020.

[23] Salamon J., Jacoby C., and Bello J., “A Dataset and Taxonomy for Urban Sound Research,” in Proceedings of the 22nd ACM international conference on Multimedia, New York, pp. 1041- 1044, 2014.

[24] Schreiber B., Agrup C., Haskard D., and Luxon L., “Sudden Sensorineural Hearing Loss,” The Lancet, vol. 375, no. 9721, pp. 1203-1211, 2010.

[25] Stelmachowicz P., Beauchaine K., Kalberer A., Kelly W., and Jesteadt W., “High‐frequency Audiometry: Test Reliability and Procedural Considerations,” The Journal of the Acoustical Society of America, vol. 85, pp. 879-887, 1989.

[26] Stockham J. and Chabries D., “Hearing Aid Device Incorporating Signal Processing Techniques,” ed: Google Patents, 1996.

[27] Taal C., Hendriks R., Heusdens R., and Jensen J., “An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125-2136, 2011.

[28] Tonndorf J., “Acute Cochlear Disorders: the Combination of Hearing Loss, Recruitment, Poor Speech Discrimination, and Tinnitus,” Annals of Otology, Rhinology and Laryngology, vol. 89, no. 4, pp. 353-358, 1980.

[29] TopCoder. Possible Languages Spoken, http://www.topcoder.com/con- test/problem/SpokenLanguages2/trainingdata.zip. Last Visited, 2020.

[30] Varga A. and Steeneken H., “Assessment for Automatic Speech Recognition: II. NOISEX-92: A Database and an Experiment to Study the Effect of Additive Noise on Speech Recognition Systems,” Speech communication, vol. 12, no. 3, pp. 247-251, 1993.

[31] Veaux C., Yamagishi J., and King S., “The Voice Bank Corpus: Design, Collection and Data Analysis of A Large Regional Accent Speech Database,” in Proceeding of the International Conference Oriental COCOSDA Held Jointly with Conference on Asian Spoken Language Research and Evaluation, Gurgaon, pp. 1-4, 2013.

[32] Vivek V., Vidhya S., and Madhanmohan P., “Acoustic Scene Classification in Hearing aid Using Deep Learning,” in Proceeding of the International Conference on Communication and Signal Processing, Chennai, pp. 695-699, 2020.

[33] Walden B., Surr R., Cord M., Edwards B., and Olson L., “Comparison of Benefits Provided by Different Hearing Aid Technologies,” Journal of the American Academy of Audiology, vol. 11, no. 10, pp. 540-560, 2000.

[34] Wang D. and Lim J., “The Unimportance of Phase in Speech Enhancement,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 30, no. 4, pp. 679-681, 1982.

[35] Wang D., “Deep Learning Reinvents the Hearing Aid,” IEEE spectrum, vol. 54, no. 3, pp. 32-37, 2017.

[36] Xu Y., USTC-made 15 Noise Types, https://github.com/yongxuUSTC/DNN-for- speech-enhancement, Last Visited, 2021.

[37] Xu Y., Du J., Dai L., and Lee C., “A Regression Approach to Speech Enhancement Based On Deep Neural Networks,” Transactions on Audio, Speech, and Language Processing, vol. 23, no. 1, pp. 7-19, 2015.

[38] Xu Y., Du J., Dai L., and Lee C., “Cross- Language Transfer Learning for Deep Neural Network based Speech Enhancement,” in Proceeding of The 9th International Symposium on Chinese Spoken Language Processing, Singapore, pp. 336-340, 2014.

[39] Xu Y., Du J., Dai L., and Lee C., “An Experimental Study on Speech Enhancement Based on Deep Neural Networks,” IEEE Signal Processing Letters, vol. 21, no. 1, pp. 65-68, 2013.

[40] Zaman K., Sah M., and Direkoğlu C., “Classification of Harmful Noise Signals for Hearing Aid Applications using Spectrogram Images and Convolutional Neural Networks,” in Proceeding of 4th International Symposium on Multidisciplinary Studies and Innovative Technologies, Istanbul, pp. 1-9, 2020.

[41] Zeiler M. and Fergus R., “Visualizing and Understanding Convolutional Networks,” in Proceedings of European conference on computer vision, Zurich, pp. 818-833, 2014.

[42] Zhao H., Zarar S., Tashev I., and Lee C., “Convolutional-Recurrent Neural Networks for Speech Enhancement,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, pp. 2401-2405, 2018. Environmental Noise Adaptable Hearing Aid using Deep Learning 841 Soha A. Nossier is currently a PhD student at the University of East London, London, UK and an Assistant Lecturer of Biomedical Engineering at Medical Research Institute, Alexandria University, Alexandria, Egypt. She received a B.Sc. degree in Electrical Engineering in 2014 and a M.Sc. degree in Biomedical Devices in 2019, both from Alexandria University, Alexandria, Egypt. She is interested in speech enhancement and deep learning. M. R. M. Rizk is an Associate Professor of Electrical Engineering, Faculty of Engineering, Alexandria University, Alexandria, Egypt. He received a B.Sc. degree in Electrical Engineering in 1971, from Alexandria University, Alexandria, Egypt, and a M.Sc. and PhD degrees in Electrical Engineering in 1975 and 1979, both from McMaster University, Ontario, Canada. His area of expertise includes Signal, Image and Video Processing and Neural Networks, and he has more than 100 publications Saleh El Shehaby is the Head of the Biomedical Engineering Department, Medical Research Institute, Alexandria University, Alexandria, Egypt. He received a B.Sc. degree in Electrical Engineering in 1973, and a M.Sc. and PhD degrees in Computer Engineering, all from Faculty of Engineering, Alexandria University, Alexandria, Egypt. His area of expertise includes pattern recognition and artificial intelligence, and he has many publications in this area. Nancy Diaa Moussa is an Associate Professor of Biomedical Engineering at Medical Research Institute, Alexandria University, Alexandria, Egypt. She received a B.Sc. degree in 2002, M.Sc. degree in 2007, and PhD degree in 2013, in Electrical Engineering, all from Faculty of Engineering, Alexandria University, Alexandria, Egypt. Her area of expertise includes signal processing and machine learning, and she has many publications in this area.