
Fuzzy Speech Recognition Algorithm Based on Continuous Density Hidden Markov Model and Self Organizing Feature Map
Speech recognition refers to the process of receiving and understanding human speech input through a computer, converting it into readable text or instructions. In order to improve the denoising effect and speech recognition effect of fuzzy speech, a fuzzy speech recognition algorithm based on continuous density hidden Markov model and self-organizing feature map is proposed. Firstly, the conventional Wiener filtering algorithm is improved by using the dynamic estimation algorithm of noise power spectrum, and the endpoint detection of noisy speech signal is performed by using spectral entropy, and the noise power spectrum of the silent segment is dynamically updated according to the detection results to obtain a more ideal priori signal to noise ratio; Secondly, the fuzzy speech is input into the Wiener filter to eliminate the noise in the speech signal; then, Mel- Frequency Cepstrum Coefficient (MFCC) of speech signal is extracted as speech feature; Finally, combined with the continuous hidden Markov model and the self-organizing feature neural network in the artificial intelligence algorithm, through the process of adjusting parameters, Viterbi decoding, and the time adjustment of the voice signal in the same state, the speech classification and recognition are realized according to the speech characteristics. In the experiment, comparative experiments were conducted on the LibriSpeech dataset using speech recognition algorithms based on convolutional neural networks and recurrent neural networks, speech recognition algorithms based on residual networks and gated convolutional networks, speech recognition algorithms based on multi-scale Mel domain feature map extraction. The experimental results show that the algorithm has good denoising performance. With the increase of added environmental noise intensity, the algorithm can maintain the Signal-to-Noise Ratio (SNR) of speech signals between 88dB-98dB; This algorithm can accurately detect the sound areas in the signal, and the endpoint detection accuracy is high; The accuracy and recall of the Continuous Density Hidden Markov Model-Self-Organizing Feature Neural Network (CDHMM-SOFM) designed in the algorithm increase with the number of iterations, and the highest levels of accuracy and recall can reach 0.89, respectively; The minimum recognition time of this algorithm is only 8.2 seconds, and the highest recognition rate can reach 98.7%; after applying this algorithm, the user’s error rate ranges from 0.0031 to 0.0084. The above results indicate that the algorithm has good application performance.
[1] Abdul-Ghaffar M., Khan U., Iqbal J., Rashid N., Hamza A., Qureshi W., Tiwana M., and Izhar U., “Improving Classification Performance of Four Class FNIRS-BCI Using Mel Frequency Cepstral Coefficients (MFCC),” Infrared Physics and Technology, vol. 112, pp. 103589-103597, 2020. https://doi.org/10.1016/j.infrared.2020.103589
[2] Ali S. and Bouguila N., “Multimodal Action Recognition Using Variational-Based Beta- Liouville Hidden Markov Models,” IET Image Processing, vol. 14, no. 17, pp. 4785-4794, 2020. https://doi.org/10.1049/iet-ipr.2020.0709
[3] Bhardwaj V. and Kukreja V., “Effect of Pitch Enhancement in Punjabi Children’s Speech Recognition System under Disparate Acoustic Conditions,” Applied Acoustics, vol. 177, pp. 1-7, 2021. https://doi.org/10.1016/j.apacoust.2021.107918
[4] Gao Z., Sun Z., and Liang S., “Probability Density Function for Wave Elevation Based on Gaussian Mixture Models,” Ocean Engineering, vol. 213, no. 3, pp. 1-10, 2020. https://doi.org/10.1016/j.oceaneng.2020.107815
[5] Gurov I., Kapranova V., and Skakov P., “Dynamical Evaluation of Interference Fringe Parameters by the Wiener Adaptive Filtering Method,” Applied Optics, vol. 60, no. 23, pp. 6799-6808, 2021. https://doi.org/10.1364/AO.428251
[6] He T., Dong C., Yuan L., and Yin H., “Motion State Classification for Micro‐Drones Via Modified Mel Frequency Cepstral Coefficient and Hidden Markov Mode,” Electronics Letters, vol. 58, no. 4, pp. 164-166, 2022. https://doi.org/10.1049/ell2.12384
[7] Li L., Watze T., Ludwig K., and Rigoll G., “Towards Constructing HMM Structure for 362 The International Arab Journal of Information Technology, Vol. 22, No. 2, March 2025 Speech Recognition with Deep Neural Fenonic Baseform Growing,” IEEE Access, vol. 9, no. 8, pp. 39098-39110, 2021. DOI:10.1109/ACCESS.2021.3064197
[8] Li Z., Ma J., Wang X., and Li X., “An Optimal Parameter Selection Method for MOMEDA Based on EHNR and its Spectral Entropy,” Sensors, vol. 21, no. 2, pp. 533-541, 2021. DOI:10.3390/s21020533
[9] Lin Y., Yu M., Chen K., Jiang G., Chen F., and Peng Z., “Blind Mesh Assessment Based on Graph Spectral Entropy and Spatial Features,” Entropy, vol. 22, no. 2, pp. 190-197, 2020. https://doi.org/10.3390/e22020190
[10] Liu S., Wang P., Zhang H., and Tu W., “Multi- Dimensional Speech Information Recognition Method of Human-Computer Interaction System,” Computer Simulation, vol. 38, no. 12, pp. 367- 370, 2021.
[11] Oane M., Mahmood M., and Popescu A., “A State-of-the-Art Review on Integral Transform Technique in Laser-Material Interaction: Fourier and Non-Fourier Heat Equations,” Materials, vol. 14, no. 16, pp. 4733-4745, 2021. https://doi.org/10.3390/ma14164733
[12] Shy D., Chen Z., Fessler J., and He Z., “Filtered Back Projection in Compton Imaging Using a Spherical Harmonic Wiener Filter with Pixelated CdZnTe,” IEEE Transactions on Nuclear Science, vol. 68, no. 2, pp. 211-219, 2020. DOI:10.1109/TNS.2020.3045878
[13] Sun Z. and Tang P., “Automatic Communication Error Detection Using Speech Recognition and Linguistic Analysis for Proactive Control of Loss of Separation,” Transportation Research Record: Journal of the Transportation Research Board, vol. 2675, no. 5, pp. 1-12, 2021. https://doi.org/10.1177/0361198120983004
[14] Taufik D. and Hanafiah N., “AutoVAT: An Automated Visual Acuity Test Using Spoken Digit Recognition with Mel Frequency Cepstral Coefficients and Convolutional Neural Network,” Procedia Computer Science, vol. 179, pp. 458- 467, 2021. https://doi.org/10.1016/j.procs.2021.01.029
[15] Teyfouri N., Rabbani H., Kafieh R., and Jabbari I., “An Exact and Fast CBCT Reconstruction Via Pseudo-Polar Fourier Transform-Based Discrete Grangeat’s Formula,” IEEE Transactions on Image Processing, vol. 29, pp. 5832-5847, 2020. DOI:10.1109/TIP.2020.2985874
[16] Tonolini M., Sorensen K., Skou P., Ray C., and Engelsen S., “Prediction of α-Lactalbumin and β- Lactoglobulin Composition of Aqueous Whey Solutions Using Fourier Transform Mid-Infrared Spectroscopy and Near-Infrared Spectroscopy,” Applied Spectroscopy, vol. 75, no. 6, pp. 718-727, 2021. DOI:10.1177/0003702820979747
[17] Tzhir H., Iqbal N., Maqbool H., Khan M., and Tahir M., “Amputee Walking Mode Recognition Based on Mel Frequency Cepstral Coefficients Using Surface Electromyography Sensor,” International Journal of Sensor Networks, vol. 32, no. 3, pp. 139-152, 2020. https://doi.org/10.1504/IJSNET.2020.105562
[18] Wei D. and Hong L., “An Chinese Voice Recognition Technology Based on Neural Network,” Journal of Sichuan Normal University, vol. 45, no. 1, pp. 131-135, 2022.
[19] Xuechao Z., Zhang F., Gao L., Ren X., and Hao B., “Research on Speech Recognition Based on Residual Network and Gated Convolution Network,” Computer Engineering and Applications, vol. 58, no. 7, pp. 185-191, 2022.
[20] Yanxia Y., Pu W., Xuejin G., Huihui G., and Zeyang Q., “Optimization Learning Algorithm Based on Hybrid Bilevel Self-Organizing Radial Basis Function Neural Network,” Journal of Beijing University of Technology, vol. 50, no. 1, pp. 38-49, 2024. DOI:10.11936/bjutxb2022020006
[21] Zarrouk E. and Benayed Y., “Hybrid SVM/HMM Model for the Arab Phonemes,” The International Arab Journal of Information Technology, vol. 13, no. 5, pp. 45-53, 2016.
[22] Zhang Y., Yang K., and Yang Q., “Probability Density Function of Ocean Noise Based on a Variational Bayesian Gaussian Mixture Model,” The Journal of the Acoustical Society of America, vol. 147, no. 4, pp. 2087-2097, 2020. https://doi.org/10.1121/10.0000972
[23] Zhao J., Xue P., Bai J., Shi C., Yuan B., and Shi T., “A Multiscale Feature Extraction Algorithm for Dysarthric Speech Recognition,” Journal of Biomedical Engineering, vol. 40, no. 1, pp. 44-50, 2023. DOI:10.7507/1001-5515.202205049
[24] Zhou G., Sun L., Lu C., and Lau A., “Multi- Symbol Digital Signal Processing Techniques for Discrete Eigenvalue Transmissions Based on Nonlinear Fourier Transform,” Journal of Lightwave Technology, vol. 39, no. 17, pp. 5459- 5467, 2021. DOI:10.1109/JLT.2021.3084825
[25] Zhou X., Liu Y., Wu Y., and Guo J., “Artificial Bee Colony Algorithm Based on Multiple Information Guidance,” Acta Electronica Sinica, vol. 52, no. 4, pp. 1349-1363, 2024. https://doi.org/10.1016/j.eswa.2024.125283