The International Arab Journal of Information Technology (IAJIT)


The Evaluation of Spoken Dialog Management

The implementation of voice dialogs enables the realization of some of the aims of modern Human Computer Interaction (HCI) services more successfully and efficiently. Sadly the multimodal Lithuanian HCIs carried by the most natural form of communication-speech are still in the prototype stage and no services are provided to end user at the time of writing. This paper describes an experimental evaluation of the possibilities of using the spoken language dialogs as the main modality in modern application control. The recognition accuracy of the tree main types of spoken dialogues (dictation, keyword spotting, isolated utterances) was evaluated and user preference survey was done on proposed multimodal HCIs. The goal of this research was to gather the results by possible everyday future users not familiar with such systems.

[1] Allen J., Natural Language Understanding, Addison Wesley, 1994.

[2] Bennacef S., Bonneau-Maynard H., Gauvain J., Lamel L., and Minker W., A Spoken Language System for Information Retrieval, in Proceedings of the International Conference of Speech and Language Processing, pp. 1271- 1274, 1994.

[3] Deutsch B., The Structure of Task Oriented Dialogs, in Proceedings of the IEEE Symposium on Speech Recognition, Pennsylvania, pp. 1-14, 1974.

[4] Di-Fabbrizio G. and Stent A., Learning the Structure of Task-Driven Human-Human Dialogs Bangalore, IEEE Audio, Speech, and Language Processing, vol. 16, no. 7, pp. 1249-1259, 2008.

[5] Gauvain J., Bennacef S., Devillers L., Lamel L., and Rosset S., Spoken Language Component of the MASK Kiosk, in Proceedings of Human Comfort & Security of Information Systems, Berlin, pp. 93-103, 1997.

[6] Hacioglu K. and Ward W., Dialog-Context Dependent Language Modeling Combining n- Grams and Stochastic Context-Free Grammars, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, UT, vol. 1, pp. 537-540, 2001.

[7] Lamel L., Rosset S., Gauvain J., and Bennacef S., The LIMSI ARISE System, in Proceedings of IEEE 4 th Workshop Interactive Voice Technology for Telecommunications Applications, Torino, pp. 209-214, 1998.

[8] Levin E., Pieraccini R., and Eckert W., A Stochastic Model of Human-Machine Interaction for Learning Dialog Strategies, IEEE Speech and Audio Processing, vol. 8, no. 1, pp. 11-23, 2001.

[9] Litman D. and Allen J., A Plan Recognition Model for Subdialogues in Conversations, Technical Report, Rochester University Ny Department of Computer Science, pp. 163-200, 1987.

[10] Marque F., Bennacef S., Neel F., and Trinh S. PAROLE: A Vocal Dialogue System for Air Traffic Control Training, in Proceedings of Applications of Speech Technology, Germany, pp. 91-94, 1993.

[11] Martinez F., Ferreiros J., Cordoba R., Montero J., San-Segundo R., and Pardo J., A Bayesian Networks Approach for Dialog Modeling: The Fusion BN, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4789-4792, 2009.

[12] Maskeliunas R. and Rudzionis V., Multimodal Interface Model for Socially Dependent People, in Proceedings of Analysis of Verbal and Nonverbal Communication and Enactment, Lecture Notes in Computer Science, Berlin, vol. 6800, pp. 113-119, 2011.

[13] Matrouf A., Gauvain J., Neel F., and Mariani J. Adapting Probability-Transitions in DP Matching Process for an Oral Task-Oriented Dialogue, in Proceedings of International Conference on Acoustics Speech and Signal Processing vol. 1, pp. 569-572, 1990.

[14] Meng H., Wai C., and Pieraccini R., The use of Belief Networks for Mixed-Initiative Dialog Modeling, Speech and Audio Processing, IEEE Transactions, vol. 11, no. 6, pp. 757-773, 2003.

[15] Ostler N., LOQUI: How Flexible Can a Formal Prototype Be?, The Structure of Multimodal Dialogue pp. 407-416, 1989.

[16] Rudzionis V., Maskeliunas R., and Rudzionis A., Assistive Tools for the Motor-Handicapped People using Speech Technologies: Lithuanian Case, in Proceedings of Business Information Systems Workshops, Lecture Notes in Business Information Processing, vol. 97, pp. 123-131, 2011. 24 The International Arab Journal of Information Technology, Vol. 11, No. 1, January 2014

[17] Sarikaya R., Gao Y., Erdogan H., and Picheny M., Turn-Based Language Modeling for Spoken Dialog Systems, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-781-I-784, 2002.

[18] Trabelsi Z., A Generic Multimodal Architecture for Integrating Voice and Ink XML Formats, International Arab Journal of Information Technology, vol. 1, no. 1, pp. 93-101, 2004.

[19] Ward W., Extracting Information in Spontaneous Speech, in Proceedings of International Conference of Speech and Language Processing pp. 83-86, 1994.

[20] Young S., Hauptmann A., Ward W., Smith E., and Werner P., High Level Knowledge Sources in Usable Speech Recognition Systems, Communications of the ACM vol. 32, no. 2, pp. 183-194, 1989. Rytis Maskeliunas received his PhD degree in computer science, in 2009 from Kaunas University of Technology, Lithuania. He is a senior scientific researcher and a project manager in computer science field at Kaunas University of Technology, Information Technology Development and Automation and Control Systems Institutes, with an expertise in development and analysis of multimodal interfaces, automatic speech recognizers. He has won various awards/honours including the National Science Academy Award for Young Scholars of Lithuania in 2010, the Postdoctoral Research Fellowship 2010, the Best Master, in 2004 and Master Work, 2006. He has coordinated/participated in several research projects in computer science domain and was involved in the EU COST actions 278, 2102 and is an MC member (Lithuania) of the currently running COST IC1002. He is a member of an IEEE, author/co- author of over 30 refereed scientific articles and serves as a reviewer for a number of refereed journals. His research interest includes modelling, development and analysis of multimodal interfaces, engineering of virtualization systems, programming web and telephony servers and applications.