The International Arab Journal of Information Technology (IAJIT)


Recognition of Spoken Arabic Digits Using Neural Predictive Hidden Markov Models

In this study, we propose an algorithm for Arabic isolated digit recognition. The algorithm is based on extracting acoustical features from the speech signal and using them as input to multi-layer perceptrons neural networks. Each word in the vocabulary digits (0 to 9) is associated with a network. The networks are implemented as predictors for the speech samples for a certain duration of time. The back-propagation algorithm is used to train the networks. The hidden markov model (HMM) is implemented to extract temporal features (states) for the speech signal. The input vector to the networks consists of twelve mel frequency cepstral coefficients, log of the energy, and five elements representing the state. Our results show that we are able to reduce the word error rate comparing with an HMM word recognition system.


[1] Bahl L. R., Jelinek F., and Mercer R. L., “A Maximum Likelihood Approach to Continuous Speech Recognition,” IEEE Transaction Pattern Analysis Machine Intelligence, vol. PAMI- 5, no.2, pp. 179-190, 1983.

[2] Bengio Y., Cardin R., De Mori R., and Normandin Y., “A Hybrid Coder for Hidden Markov Models Using A Recurrent Neural Network,” in Proceedings International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 537-540, Albuq-erque, NM, 1990.

[3] Bengio Y., “Artificial Neural Networks and their Application to Sequence Recognition,” PhD Thesis, McGill University, Montreal, Canada, 1991.

[4] Bengio Y., “A Connectionist Approach to Speech Recognition,” to appear in the special issue of IJPRAI on Neural Nets, 2003.

[5] Bengio Y., “Markovian Models for Sequential Data,” Neural Computing Surveys 2, pp. 129-162, 1999.

[6] Bilmes J., “What HMMs Can Do,” Technical Report, University of Washington, February 2002.

[7] Botros N. M., Siddiqi M., and Deiri M. Z., “Automatic Speech Recognition Using Hidden Markov Models and Artificial Neural Networks,” in Proceedings of IEEE, pp. 1770-1775, 1993.

[8] Bourlard H. and Wellekens C. J., “Speech Pattern Discrimination and Multilayer Perceptrons,” Computer Speech and Language, vol. 3, pp. 1-19, 1989.

[9] Bridle J. S., “Training Stochastic Model Recognition Algorithms as Networks Can Lead to Maximum Mutual Information Estimation of Parameters,” Advances in Neural Information Processing Systems 2, in Touretsky D. S. (Ed), Morgan Kaufmann, pp. 211-217, 1990.

[10] Cohen M., Franco H., Morgan N., Rumelhart D., and Abrash V., “Hybrid Neural Network/Hidden Markov Model Continuous Speech Recognition,” in Proceedings of International Conference on Spoken Language Processing (ICSLP), Banff, Canada, pp. 915-918, 1992.

[11] Cullough W. Mc. and Pitts W., “A Logical Calculus of Ideas Immanent In Nervous Activity,” Bull. Math. Biophysics, vol. 5, pp. 115-133, 1943.

[12] Cybenko G., “Approximation by Superpositions of a Sigmoidal Functions,” Mathematics of Control Signals and Systems, vol. 2, no. 4, pp. 303-314, 1989.

[13] Davis S. B. and Mermelstein P., “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” in Proceedings of ICASSP, pp. 357-366, August 1980.

[14] Djemili R., “Reconnaissance de Mots Arabes Isolés Par Dynamic Time Warping & Hidden Markov Models,” Magister Thesis, Université Badji Mokhtar Annaba, June 2001.

[15] Djemili R., Bedda M., and Bourouba H., “On Combining Artificial Neural Networks into an HMM Arabic Speech Word Recognizer,” in Proceedings of the International Arab Conference on Information Technology (ACIT'2002), vol. 1, pp. 349-355, Doha, Qatar, 2002.

[16] Driancourt X., Bottou L., and Gallinari P., “Learning Vector Quantization Multilayer Perceptron and Dynamic Programming: Comparison and Cooperation,” in Proceedings of the International Joint Conference on Neural Networks, IJCNN, vol. 2, pp. 815-819, 1991.

[17] Franzini M., Lee K. F., and Waibel A., “Connectionist Viterbi Training: A New Hybrid Method for Continuous Speech Recognition,” in Proceedings of ICASSP, Albuquerque, NM, pp. 425-428, 1990.

[18] Haffner P., Franzini M., and Waibel A., “Integrating Time Alignment and Neural Networks for High Performance Continuous Speech Recognition,” in Proceedings of ICASSP, Torento, pp. 105-108, 1991.

[19] Hush D. R. and Horne B. G., “Progress in Supervised Neural Networks,” IEEE Signal Processing Magazine, vol. 1, pp. 8-39, January 1993.

[20] Iwanida H., Katagiri S., and McDermott E., “Speaker Independent Large Vocabulary Word Recognition Using LVQ/HMM Hybrid Algorithm,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Torento, pp. 553-556, 1991.

[21] Le Cerf P., and Weiye Ma Van Compernolle D., “Multilayer Perceptrons as labelers for Hidden Markov Models,” IEEE Transactions on Speech and Audio Processing, vol. 2, no. 1, pp. 185-193, 1994.

[22] Le Cun Y., “Modèles Connexionistes de l'apprentissage,” PhD Thesis, Paris VI University, 1987. Recognition of Spoken Arabic Digits Using Neural Predictive Hidden Markov Models 233

[23] Lee K. F., Automatic Speech Recognition: The Development of the SPHINX System, Kluwer Academic Publication, 1989.

[24] Levin E., “Word Recognition Using Hidden Control Neural Architecture,” in Proceedings ICASSP, Albuquerque, NM, pp. 433-436, 1990.

[25] Levinson S. E., Rabiner L. R., and Sondhi M. M., “An Introduction to the Application of The Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition,” Bell System Technical Journal, vol. 64, no.4, pp. 1035-1074, 1983.

[26] Lippman R. P., “An Introduction to Computing with Neural Nets,” IEEE ASSP Magazine, vol. 4, pp. 4-22, 1987.

[27] Makhoul J., El-Jaroudi A., and Schwartz R., “Formation of Disconnected Decision Regions with a Single Hidden Layer,” in Proceedings of the International Joint Conference on Neural Networks, vol. 1, pp. 455-460, 1989.

[28] Morgan N. and Bourlard H., “Continuous Speech Recognition Using Multilayer Perceptrons with Hidden Markov Models,” in Proceedings of IEEE ICASSP, vol. 2, pp. 26-30, Albuquerque, 1990.

[29] Morgan N. and Bourlard H., “Neural Networks for Statistical Recognition of Continuous Speech,” in Proceedings of IEEE, vol. 83, no. 5, pp. 742-770, 1995.

[30] Niles L. T. and Silverman H. F., “Combining Hidden Markov Models and Neural Networks classifiers,” in Proceedings ICASSP, pp. 417-420, Albuquerque, NM, 1990.

[31] Rabiner L. R., “A Tutorial in Hidden Markov Models and Selected Applications in Speech Recognition,” in Proceedings IEEE, vol. 7, no. 2, pp. 257-286, 1989.

[32] Rabiner L. R. and Juang B. H., Fundamentals of Speech Recognition, Prentice-Hall, 1993.

[33] Renals S., Morgan N., Cohen M., and Franco H., “Connectionist Probability Estimation in the Decipher Speech Recognition System,” in Proceedings IEEE ICASSP, San Francisco, pp. 601-604, 1992.

[34] Rumelhart D. E., Hinton G. E., and Williams R. J., Learning Internal Representations by Error Propagation, Parallel Distributed Processing Exportation of the Microstructure of Cognition, MIT-Press, vol. 1, pp. 318-362, 1986.

[35] Smyth P., Heckerman D., and Jordan M., “Probabilistic Independence Networks for Hidden Markov Models,” Neural Computation, vol. 9, no. 2, 227-269, 1997.

[36] Tebelskis J. and al., “Continuous Speech Recognition Using Linked Predictive Networks,” Advances in Neural Information Processing Systems 4, in Hanson M. and Lippman (Eds), Morgan Kaufman, pp. 977-984, 1992.

[37] Zavaliagkos G., Zhao Y., Schwartz R., and Makhoul J., “A Hybrid Segmental Neural Net/Hidden Markov Model System for Continuous Speech Recognition,” IEEE Transactions on Speech and Audio Processing, vol. 2, no. 1, pp. 151-160, 1994. Rafik Djemili received the engineering and the MSc degrees, respectively, in 1993 and 2001, both from Badji Mokhtar Annaba University. In 2001, he joined the Automatic and Signals Laboratory of Annaba, where he worked on Arabic speech recognition, statistical methods and neural networks. He has been an assistant professor at Djelfa Univeristy, Algeria since December 2002. Mouldi Bedda obtained the high studies degree in physics in 1981 from Houari Boumediene Algiers University, and the PhD in electrical engineering in 1985 from Nancy University, France. In 1990 he was a professor at Badji Mokhtar Annaba University. His interests are in the areas of signal processing, speech recognition, text to speech conversion and character recognition. He has been the director of the Automatic and Signals Laboratory of Annaba, since 2001. Hocine Bourouba received the engineering and the MSc degrees, from Badji Mokhtar Annaba University in 1998 and 2001, respectively. Since 2001, he has joined the Automatic and Signals Laboratory of Annaba in research work in speech recognition and signal processing algorithms.