Analysis of QA System Behavior against Context and Question Changes

Author Rachid Karra, Abdelali Lasfar,

Keywords #Adversarial attacks #BERT #data quality #question answering #simplification

Abstract

Data quality has gained increasing attention across various research domains, including pattern recognition, image processing, and Natural Language Processing (NLP). The goal of this paper is to explore the impact of data quality (both questions and context) on Question-Answering (QA) system performance. We introduced an approach to enhance the results of the QA system through context simplification. The strength of our methodology resides in the utilization of human-scale NLP models. This approach promotes the utilization of multiple specialized models within the workflow to enhance the QA system’s outcomes, rather than relying solely on resource-intensive Large Language Model (LLM). We demonstrated that this method improves the correct response rate of the QA system without modification or additional training of the model. In addition, we conducted a cross-disciplinary study involving NLP and linguistics. We analyzed QA system results to showcase their correlation with readability and text complexity linguistic metrics using Coh-Metrix. Lastly, we explore the robustness of Bidirectional Encoder Representations from Transformers (BERT) and Reliable National Entrance Test (R-NET) models when confronted with noisy questions.

References

[1] Alzantot M., Sharma Y., Elgohary A., Ho B., Srivastava M., and Chang K., “Generating Natural Language Adversarial Examples,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Brussels, pp. 2890-2896, 2018. https://aclanthology.org/D18- 1316.pdf

[2] Andriushchenko M., Croce F., Flammarion N., and Hein M., “Square Attack: A Query-Efficient Black-Box Adversarial Attack Via Random Search,” in Proceedings of the European Conference on Computer Vision, Glasgow, pp. 484-501, 2020. https://doi.org/10.1007/978-3- 030-58592-1_29

[3] Ann W., Make Every Piece you Write Easier to Read and Understand, Wylie Communications Inc., https://freewritingtips.wyliecomm.com/2017-05- Analysis of QA System Behavior against Context and Question Changes 199 09/, Last Visited, 2024.

[4] Asthana P. and Hazela B., Intelligent Systems Reference Library, Springer Nature, 2020. https://doi.org/10.1007/978-981-13-8759-3_16

[5] Bajaj P., Campos D., Craswell N., and Deng L., “MS MARCO: A Human Generated MAchine Reading COmprehension Dataset,” in Proceedings of the Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches, Barcelona, pp. 1-16, 2016. https://arxiv.org/abs/1611.09268

[6] Boukkouri H., Ferret O., Lavergne T., Noji H., Zweigenbaum P., and Tsujii J., “CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations from Characters,” in Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, pp. 6903-6915, 2020. https://aclanthology.org/2020.coling-main.609.pdf

[7] Brown T., Mann B., Ryder N., and Subbiah M., Advances in Neural Information Processing Systems 33, NeurIPS, 2020. https://proceedings.neurips.cc/paper/2020

[8] Chen L. and Chan H., “Generative Adversarial Networks with Data Augmentation and Multiple Penalty Areas for Image Synthesis,” The International Arab Journal of Information Technology, vol. 20, no. 3, pp. 428-434, 2023. DOI: 10.34028/iajit/20/3/15

[9] Crossley S., Allen D., and McNamara D., “Text Readability and Intuitive Simplification: A Comparison of Readability Formulas,” Reading in a Foreign Language, vol. 23, no. 1, pp. 84-101, 2011. https://files.eric.ed.gov/fulltext/EJ926371.pdf

[10] Crossley S., Allen D., and McNamara D., “Text Simplification and Comprehensible Input: A Case for an Intuitive Approach,” Language Teaching Research, vol. 16, no. 1, pp. 89-108, 2012. DOI:10.1177/1362168811423456

[11] Denis P. and Baldridge J., “A Ranking Approach to Pronoun Resolution,” in Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, pp. 1588-1593, 2007. https://dl.acm.org/doi/10.5555/1625275.1625532

[12] Devlin J., Chang M., Lee K., and Toutanova K., “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the North American Chapter of the Association for Computational Linguistics, Minnesota, pp. 4171-4186, 2019. https://arxiv.org/pdf/1810.04805.pdf

[13] Jin D., Jin Z., Zhou J., and Szolovits P., “Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment,” in Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, pp. 8018-8025, 2020. https://ojs.aaai.org/index.php/AAAI/issue/view/2 53

[14] Hien H., Cuong P., Nam L., Nhung H., and Thang L., “Intelligent Assistants in Higher-Education Environments: The FIT-EBot, a Chatbot for Administrative and Learning Support,” in Proceedings of the 9th International Symposium on Information and Communication Technology, Danang City, pp. 69-76, 2018. https://doi.org/10.1145/3287921.3287937

[15] Ian G., Yoshua B., and Aaron C., Deep Learning, MIT Press, 2016. https://mitpress.mit.edu/9780262035613/deep- learning/

[16] Graesser A., McNamara D., Louwerse M., and Cai Z., “Coh-Metrix: Analysis of Text on Cohesion and Language,” Behavior Research Methods, Instruments, and Computers, vol. 36, no. 2, pp. 193-202, 2004. https://link.springer.com/article/10.3758/BF0319 5564

[17] Karra R. and Lasfar A., “Effect of Questions Misspelling on Chatbot Performance: A Statistical Study,” in Proceedings of the International Conference on Digital Technologies and Applications, Fez, pp. 124-132, 2022. https://doi.org/10.1007/978-3-031-02447-4_13

[18] Karra R. and Lasfar A., “Enhancing Education System with a Q and A Chatbot: A Case Based on Open edX Platform,” in Proceedings of the International Conference on Digital Technologies and Applications, Fez, pp. 655-662, 2021. https://doi.org/10.1007/978-3-030-73882-2_59

[19] Karra R. and Lasfar A., “Impact of Data Quality on Question Answering System Performances,” Intelligent Automation and Soft Computing, vol. 35, no. 1, pp. 335-349, 2023. https://doi.org/10.32604/iasc.2023.026695

[20] Khandelwal U., He H., Qi P., and Jurafsky D., “Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, pp. 284- 294, 2018. https://aclanthology.org/P18-1027

[21] Kocijan V., Davis E., Lukasiewicz T., Marcus G., and Morgenstern L., “The Defeat of the Winograd Schema Challenge,” Artificial Intelligence, vol. 325, pp. 103971, 2023. https://doi.org/10.1016/j.artint.2023.103971

[22] Levesque H., Davis E., and Morgenstern L., “The Winograd Schema Challenge,” in Proceedings of the 13th International Conference on Principles of Knowledge Representation and Reasoning, Rome, pp. 552-561, 2012. https://dl.acm.org/doi/10.5555/3031843.303190

[23] Liu Z., Ding G., Bukkittu A., and Gupta M., “A Data-Centric Framework for Composable NLP Workflows,” in Proceedings of the Empirical 200 The International Arab Journal of Information Technology, Vol. 21, No. 2, March 2024 Methods in Natural Language Processing: System Demonstrations, Punta Cana, pp. 197-204, 2020. https://doi.org/10.48550/arXiv.2103.01834

[24] Nieuwland M. and Van Berkum J., “Individual Differences and Contextual Bias in Pronoun Resolution: Evidence from ERPs,” Brain Research, vol. 1118, no. 1, pp. 155-167, 2006. https://doi.org/10.1016/j.brainres.2006.08.022

[25] Niu T. and Bansal M., “Adversarial Over- Sensitivity and Over-Stability Strategies for Dialogue Models,” in Proceedings of the 22nd Conference on Computational Natural Language Learning, Brussels, pp. 486-496, 2018. https://aclanthology.org/K18-1047

[26] Rajpurkar P., Zhang J., Lopyrev K., and Liang P., “SQuAD: 100,000+ Questions for Machine Comprehension of Text,” in Proceedings of the Empirical Methods in Natural Language Processing Conference, Texas, pp. 2383-2392, 2016. DOI:10.18653/v1/D16-1264

[27] Renggli C., Rimanic L., Gürel N., Karlaš B., Wu W., and Zhang C., “A Data Quality-Driven View of MLOps,” IEEE Data Engineering Bulletin, vol. 44, no. 1, pp. 11-23, 2021. https://www.research- collection.ethz.ch/handle/20.500.11850/526606

[28] Schelter S., Lange D., Schmidt P., Celikel M., Biessmann F., and Grafberger A., “Automating Large-Scale Data Quality Verification,” Proceedings of the VLDB Endowent, vol. 11, no. 12, pp. 1781-1794, 2018. https://doi.org/10.14778/3229863.3229867

[29] Singh P. and Manure A., Natural Language Processing with TensorFlow 2.0, Springer Nature, 2020. https://doi.org/10.1007/978-1-4842-5558- 2_5

[30] Stepak A., “Frequency Value Grammar and Information Theory,” Journal Applied Science, vol. 5, no. 6, pp. 952-964, 2005. DOI:10.3923/jas.2005.952.964

[31] Sun L., Hashimoto K., Yin W., Asai A., Li J., Yu P., and Xiong C., “Adv-BERT: BERT is not Robust on Misspellings! Generating Nature Adversarial Samples on BERT,” arXiv Preprint, arXiv:2003.04985v1, 2020. https://arxiv.org/pdf/2003.04985.pdf

[32] Vilares J., Alonso M., Doval Y., and Vilares M., “Studying the Effect and Treatment of Misspelled Queries in Cross-Language Information Retrieval,” Information Processing and Management, vol. 52, no. 4, pp. 646-657, 2016. https://doi.org/10.1016/j.ipm.2015.12.010

[33] Vogel J., Chatbots: Development and Applications, Bachelor’s Thesis, HTW Berlin- University of Applied Sciences, 2017. https://jorin.me/chatbots.pdf

[34] Wang W., Yang N., Wei F., Chang B., and Zhou M., “Gated Self-Matching Networks for Reading Comprehension and Question Answering,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, pp. 189-198, 2017. https://aclanthology.org/P17-1018

[35] Zang Y., Qi F., Yang C., and Liu Z., “Word-Level Textual Adversarial Attacking as Combinatorial Optimization,” in Proceedings of the 58th Annual Meeting Association Computational Linguistics, Seattle, pp. 6066-6080, 2020. https://doi.org/10.18653/v1/2020.acl-main.540

[36] Zhang W., Sheng Q., Alhazmi A., and Li C., “Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey,” ACM Transactions on Intelligent Systems and Technology, vol. 11, no. 3, pp. 1-41, 2020. https://doi.org/10.1145/3374217

[37] Zhao S., Meng R., He D., Saptono A., and Parmanto B., “Integrating Transformer and Paraphrase Rules for Sentence Simplification,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium: Association for Computational Linguistics, Brussels, pp. 3164-3173, 2018. https://aclanthology.org/D18-1355