The International Arab Journal of Information Technology (IAJIT)


Audiovisual Speaker Identification Based on Lip and Speech Modalities

In this article, we pre sent a bimodal speaker identification method, which integrates both acoustic and visual features and where the two audiovisual stream modalities are processed in parallel. We also propose a fusion technique that combines the two modalities to make the final recognition decision. Experiments are conducted on an audiovisual dataset containing the 28 Arabic syllables pronounced by ten speakers. Results show the importance of the visual information that is provided by Discrete Cosine Transform (DCT) and Discret e Wavelet Transform (DWT) in addition to the audio information corresponding to the Mel Frequency Cepstral Coefficients (MFCC) and Perceptual Linear Predictive (PLP). Furthermore, some artificial neural networks such as Multilayer Perceptron (MLP) and Radi al Basis Function (RBF) were investigated and tested successfully in this dataset by presenting good recognition performances with serial concatenation for the acoustic and visual vectors.

[ 30] Zhang D., Automated Biometrics , Springer US , 2000. The International Arab Journal of Information Technology, Vol. 14, No. 1, January 2017 110 Fatma Zohra Chelali received her engineering degree in Electronic engineering from University of science and technology Houari Boumedienne of Algiers; ALGERIA (USTHB) in 1994. She works as Assistant teacher in the high school of Aeronautical Technicians (Ecole sup rieure des technicien s de l A ronautique ESTA) from 1997 to 2008, she receive d an academic certificate for tea ching from the Algerian institute of management (Institut international de management d Alger) in 1999. She spent a year of post graduation from 2002 to 2003. Then, she received a magister degree in speech communication in 2006 and Doctorate degree in speech communication and signal processing laboratory (LCPTS, USTHB,Algiers) in 2012, the subject of her thesis treats audiovisual speaker recognition applied to Arabic phonemes. She teaches courses with telecommunications department on Electromagnetic waves, transmission lines and digital electronics since 2007 in Electronic engineering and computer science Faculty, university of science and technology (USTHB). Her interests include audiovisual analysis and recognition, pattern recognition and classification, speech and image processing. Amar Djeradi received his engi neering degree in Electronics in 1984, his magister degree in applied electronics, and Doctorate degree in 1992.He teaches since 1985 in different modules for graduation and post graduation such as Electronics, Television , digital electronics, and principal functions of electronics, pattern recognition, and human-machine communication. His current research interests are in the area of speech communication, human- machine Communication, multimodal interfaces and signal analysis.