Voice Versus Keyboard and Mouse for Text Creation on Arabic User Interfaces

Author Khalid Majrashi,

Keywords #Voice #speech #recognition #input modal #user interface #user performance #Arabic #text entry #keyboard #mouse

Abstract

Voice User Interfaces (VUIs) are increasingly popular owing to improvements in automatic speech recognition. However, the understanding of user interaction with VUIs, particularly Arabic VUIs, remains limited. Hence, this research compared user performance, learnability, and satisfaction when using voice and keyboard-and-mouse input modalities for text creation on Arabic user interfaces. A Voice-enabled Email Interface (VEI) and a Traditional Email Interface (TEI) were developed. Forty participants attempted pre-prepared and self-generated message creation tasks using voice on the VEI, and the keyboard-and-mouse modal on the TEI. The results showed that participants were faster (by 1.76 to 2.67 minutes) in pre- prepared message creation using voice than using the keyboard and mouse. Participants were also faster (by 1.72 to 2.49 minutes) in self-generated message creation using voice than using the keyboard and mouse. Although the learning curves were more efficient with the VEI, more participants were satisfied with the TEI. With the VEI, participants reported problems, such as misrecognitions and misspellings, but were satisfied about the visibility of possible executable commands and about the overall accuracy of voice recognition.

References

[1] Al-Anzi F. and AbuZeina D., “The Effect of Diacritization on Arabic Speech Recogntion,” in Proceedings IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies, Aqaba, pp. 1-5, 2017.

[2] Alsharhan E. and Ramsay A., “Investigating The Effects of Gender, Dialect, and Training Size on The Performance of Arabic Speech Recognition,” Language Resources and Evaluation, vol. 54, no. 4, pp. 975-998, 2020.

[3] Alsharhan E., Ramsay A., and Ahmed H., “Evaluating the Effect of Using Different Transcription Schemes in Building A Speech Recognition System for Arabic,” International Journal of Speech Technology, 2020.

[4] Amrouche A., Falek L., and Teffahi H., “Design and Implementation of a Diacritic Arabic Text- To-Speech System,” The International Arab Journal of Information Technology, vol. 14, no. 4, pp. 488-494, 2017.

[5] Begany G., Sa N., and Yuan X., “Factors Affecting User Perception of A Spoken Language Vs. Textual Search Interface: A Content Analysis,” Interacting with Computers, vol. 28, no. 2, pp. 170-180, 2015.

[6] Clark L., Doyle P., Garaialde D., Gilmartin E., Schlögl S., Edlund J., Aylett M., Cabral J., Munteanu C., Edwards J., and Cowan B., “The State of Speech in HCI: Trends, Themes and Challenges,” in Interacting with Computers, vol. 31, no. 3, pp. 349-371, 2019.

[7] Corbett E. and Weber A., “What Can I Say?: Addressing User Experience Challenges of A Mobile Voice User Interface for Accessibility,” in Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services, Florence, pp. 72-82, 2016.

[8] Danis C., Comerford L., Janke E., Davies K., De Vries J., Bertrand A., “Storywriter: A Speech Oriented Editor,” in Proceedings of the Conference Companion on Human Factors in Computing Systems, Massachusetts, pp. 277-278, 1994.

[9] De Barcelos Silva A., Gomes M., Da Costaa C., da Rosa Righi R., Barbosa J., Pessin G., De Doncker G., and Federizzi G., “Intelligent Personal Assistants: A Systematic Literature Review,” Expert Systems with Applications, vol. 147, 2020.

[10] Elmahdy M., Gruhn R., Minker W., Abdennadher S., “Cross-lingual Acoustic Modeling for Dialectal Arabic Speech Recognition,” in Proceedings of 8th Annual Conference of the International Speech Communication Association, Makuhari, pp. 873-876, 2010.

[11] Elmahdy M., Gruhn R., and Minker W., Novel Techniques for Dialectal Arabic Speech Recognition, Springer Science and Business Media, 2012.

[12] Furqan A., Myers C., and Zhu J., “Learnability through Adaptive Discovery Tools in Voice User Interfaces,” in Proceedings of the CHI Conference Extended Abstracts on Human Factors in Computing Systems, Colorado, pp. 1617-1623, 2017.

[13] Gardner-Bonneau D. and Blanchard H., Human Factors and Voice Interactive Systems, Springer Science and Business Media, 2007.

[14] Hassine M., Boussaid L., and Massaoud H., “Tunisian Dialect Recognition Based on Hybrid Techniques,” The International Arab Journal of Information Technology, vol. 15, no. 1, pp. 58- 65, 2018.

[15] Karat C., Halverson C., Horn D., and Karat J., “Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Pennsylvania, pp. 568-575, 1999.

[16] Karat J., Horn D., Halverson C., and Karat C., “Overcoming Unusability: Developing Efficient Strategies in Speech Recognition Systems,” in Proceedings of CHI'00 Extended Abstracts on Human factors in Computing Systems, The Hague, pp. 141-142, 2000.

[17] Karl L., Pettey M., and Shneiderman B., “Speech Versus Mouse Commands for Word Processing: An Empirical Evaluation,” International Journal of Man-Machine Studies, vol. 39, no. 4, pp. 667-687, 1993.

[18] Kepuska V. and Bohouta G., “Next-Generation of Virtual Personal Assistants (Microsoft Cortana, Apple Siri, Amazon Alexa And Google Home),” in Proceedings of IEEE 8th Annual Computing and Communication Workshop and Conference, Las Vegas, pp. 99-103, 2018.

[19] Kirchhoff K. and Vergyri D., “Cross-Dialectal Data Sharing for Acoustic Modeling in Arabic Speech Recognition,” Speech Communication, vol. 46, no. 1, pp. 37-51, 2005.

[20] Kirschthaler P., Porcheron M., and Fischer J., “What Can I Say? Effects of Discoverability in Vuis on Task Performance and User Experience,” in Proceedings of the 2nd Conference on Conversational User Interfaces, Bilbao, pp. 1-9, 2020.

[21] Le Bigot L., Jamet E., Rouet J., and Amiela V., “Mode and Modal Transfer Effects on Performance and Discourse Organization with an Information Retrieval Dialogue System in Natural Language,” Computers in Human Behavior, vol. 22, no. 3, pp. 467-500, 2006. 142 The International Arab Journal of Information Technology, Vol. 19, No. 1, January 2022

[22] Le Bigot L., Terrier P., Amiel V., Poulain G., Jamet É., Rouet J., “Effect of Modality on Collaboration with A Dialogue System,” International Journal of Human-Computer Studies, vol. 65, no. 12, pp. 983-991, 2007.

[23] Limerick H., Moore J., and Coyle D., “Empirical Evidence for A Diminished Sense of Agency in Speech Interfaces,” in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, Seoul Republic of Korea, pp. 3967-3970, 2015.

[24] Luger E. and Sellen A., “Like Having A Really Bad PA: The Gulf Between User Expectation and Experience of Conversational Agents,” in Proceedings of the CHI Conference on Human Factors in Computing Systems, San Jose, pp. 5286-5297, 2016.

[25] Maguire M., “Development of A Heuristic Evaluation Tool for Voice User Interfaces,” in Proceedings of International Conference on Human-Computer Interaction, Orlando, pp. 212- 225, 2019.

[26] McTear M., Callejas Z., and Griol D., The Conversational Interface, Springer International Publishing, 2016.

[27] Murad C., Munteanu C., Cowan B., and Clark L., “Revolution or Evolution? Speech Interaction and HCI Design Guidelines,” IEEE Pervasive Computing, vol. 18, no. 2, pp. 33-45, 2019.

[28] Murata A. and Takahashi Y., “Does speech Input System Lead to Improved Performance for elderly? Discussion of Problems When Using Speech Interfaces for Elderly,” in Proceedings of IEEE International Conference on Systems, Man and Cybernetics, Yasmine Hammamet, pp. 108- 113, 2002.

[29] Myers C., Furqan A., Nebolsky J., Caro K., and Zhu J., “Patterns for How Users Overcome Obstacles in Voice User Interfaces,” in Proceedings of the CHI Conference on Human Factors in Computing Systems, Montreal, pp. 1-7, 2018.

[30] Nowacki C., Gordeeva A., and Lizé A., “Improving the Usability of Voice User Interfaces: A New Set of Ergonomic Criteria,” in Proceedings of International Conference on Human-Computer Interaction, Denmark, pp. 117- 133, 2020.

[31] Porcheron M., Fischer J., Reeves S., and Sharples S., “Voice Interfaces in Everyday Life,” in Proceedings of the CHI Conference on Human Factors in Computing Systems, Montreal QC Canada, pp. 1-12, 2018.

[32] Rheu M., Shin J., Peng W., and Huh-Yoo J., “Systematic Review: Trust-Building Factors and Implications for Conversational Agent Design,” International Journal of Human-Computer Interaction, vol. 37, no. 1, pp. 81-96, 2021.

[33] Sa N. and Yuan X., “Examining User Perception and Usage of Voice Search,” Data and Information Management, vol. 5, no. 1, pp. 40- 47, 2021.

[34] Satori H., Harti M., and Chenfour N., “Introduction to Arabic Speech Recognition Using CMUSphinx System,” arXiv preprint arXiv:0704.2083, 2007.

[35] Shneiderman B., “The Limits of Speech Recognition,” Communications of the ACM, vol. 43, no. 9, pp. 63-65, 2000.

[36] Srinivasan A., Dontcheva M., Adar E., and Walker S., “Discovering Natural Language Commands in Multimodal Interfaces,” in Proceedings of the 24th International Conference on Intelligent User Interfaces, California, pp. 661-672, 2019.

[37] Suhm B., Myers B., and Waibel A., “Multimodal Error Correction for Speech User Interfaces,” ACM Transactions on Computer-Human Interaction, vol. 8, no. 1, pp. 60-98, 2001.

[38] Vergyri D. and Kirchhoff K., “Automatic Diacritization of Arabic for acoustic Modeling in Speech Recognition,” in Proceedings of the Workshop on Computational Approaches to Arabic Script-Based Languages, Geneva Switzerland, pp. 66-73, 2004.

[39] Xu Y., Branham S., Deng X., Collins P., and Warschauer M., “Are Current Voice Interfaces Designed to Support Children’s Language Development?,” in Proceedings of CHI Conference on Human Factors in Computing Systems, Yokohama, pp. 1-12, 2021.

[40] Zaidi B., Boudraa M., Selouani S., and Yakoub M., “Control Interface of an Automatic Continuous Speech Recognition System in Standard Arabic Language,” in Proceedings of SAI Intelligent Systems Conference, London, pp. 295-303, 2020.