The International Arab Journal of Information Technology (IAJIT)


Text Mining Approaches for Dependent Bug Report Assembly and Severity Prediction

In general, most existing bug report studies focus only on solving a single specific issue. Considering of multiple issues at one is required for a more complete and comprehensive process of bug fixing. We took up this challenge and proposed a method to analyze two issues of bug reports based on text mining techniques. Firstly, dependent bug reports are assembled into an individual cluster and then the bug reports in each cluster are analyzed for their severity. The method of dependent bug report assembly is experimented with threshold-based similarity analysis. Cosine similarity and BM25 are compared with term frequency (tf) weighting to obtain the most appropriate method. Meanwhile, four classification algorithms namely Random Forest (RF), Support Vector Machines (SVM) with the RBF kernel function, Multinomial Naïve Bayes (MNB), and k-Nearest Neighbor (k-NN) are utilized to model the bug severity predictor with four term weighting schemes, i.e., tf, term frequency-inverse document frequency (tf-idf), term frequency-inverse class frequency (tf-icf), and term frequency-inverse gravity moment (tf-igm). After the experimentation process, BM25 was found to be the most appropriate for dependent bug report assemblage, while for severity prediction using tf-icf weighting on the RF method yielded the best performance value.

[1] Aggarwal K., Timbers F., Rutgers T., Hindle A., Stroulia E., and Greiner R., “Detecting Duplicate Bug Reports with Software Engineering Domain Knowledge,” Journal of Software: Evolution and Process, vol. 29, no. 3, pp. e1821, 2017.

[2] Almhana R., Mkaouer W., Kessentini M., and Ouni A., “Recommending Relevant Classes for Bug Reports using Multi-Objective Search,” in Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, Singapore, pp. 286-295, 2016.

[3] Amati G. and Van Rijsbergen C., “Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness,” ACM Transactions on Information Systems, vol. 20, no. 4, pp. 357-389, 2002.

[4] Baeza-Yates R. and Ribeiro-Neto B., Modern Information Retrieval, Addison Wesley, 1999.

[5] Bettenburg N., Just S., Schröter A., Weiß C., Premraj R., and Zimmermann T., “Quality of bug reports in Eclipse,” in Proceedings of the OOPSLA Workshop on Eclipse Technology Ex- Change, Montreal, pp. 21-25, 2007.

[6] Bettenburg N., Just S., Schröter A., Weiss C., Premraj R., and Zimmermann T., “What Makes A Good Bug Report?,” in Proceedings of the 16th ACM SIGSOFT International Symposium on Found-Ations of Software Engineering, Atlanta, pp. 308-318, 2008.

[7] Bhattacharya P. and Neamtiu I., “Fine-Grained Incremental Learning and Multi-Feature Tossing Graphs to Improve Bug Triaging,” in Proceedings of IEEE Inter-National Conference on Software Main-Tenance, Timisoara, pp. 1-10, 2010.

[8] Bhattacharya P. and Neamtiu I., “Bug-Fix Time Prediction Models: Can We Do Better?,” in Proceedings of the 8th Working Conference on Mining Software Repositories, New York, pp. 207-210, 2011.

[9] Breiman L., “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.

[10] Chen K., Zhang Z., Long J., and Zhang H., “Turning From TF-IDF to TF-IGM For Term Weighting in Text Classification,” Expert Systems with Applications, vol. 66, no. 30 pp. 245-260, 2016.

[11] Ferreira I., Cirilo E., Vieira V., and Mourao F., Text Mining Approaches for Dependent Bug Report Assembly and Severity Prediction 923 “Bug Report Summarization: An Evaluation of Ranking Techniques,” in Proceedings of X Brazilian Symposium on Software Components, Architectures and Reuse, Maringá, pp. 101-110, 2016.

[12] Gomes L., Torres R., and Côrtes M., “Bug Report Severity Level Prediction in Open Source Software: A Survey and Research Oppor- Tunities,” Information and Software Tech-nology, vol. 115, pp. 58-78, 2019.

[13] Gopalan R. and Krishna A., “Duplicate Bug Report Detection Using Clustering,” in Proceedings of 23rd Australian Software Engineering Conference, Milsons Point, pp. 104- 109, 2014.

[14] Herzig K., Just S., and Zeller A., “It's Not A Bug, It's A Feature: How Misclassification Impacts Bug Prediction,” in Proceedings of 35th International Conference on Software Engineering, San Francisco, pp. 392-401, 2013.

[15] Ho T., “Random Decision Forests,” in Proceedings of 3rd International Conference on Document Analysis and Recognition, Montreal pp. 278-282, 1995.

[16] Jalbert N. and Weimer W., “Automated Duplicate Detection for Bug Tracking Systems,” in Proceedings of IEEE International Conference on Dependable Systems and Networks with FTCS and DCC, Anchorage, pp. 52-61, 2008.

[17] Jiang S., Pang G., Wu M., and Kuang L., “An Improved K-Nearest-Neighbor Algorithm for Text Categorization,” Expert Systems with Applications, vol. 39, no. 1, pp. 1503-1509, 2012.

[18] Kanwal J. and Maqbool O., “Bug Prioritization to Facilitate Bug Report Triage,” Journal of Computer Science and Technology, vol. 27, no. 2, pp. 397-412, 2012.

[19] Kaur S. and Dutta M., “Improved Framework for Bug Severity Classification using N-gram Features with Convolution Neural Network,” International Journal of Recent Technology and Engineering, vol. 8, no. 3, pp. 1190-1196, 2019.

[20] Kim M., Kim Y., and Kim H., “A Comparative Study of Software Model Checkers as Unit Testing Tools: an Industrial Case Study,” IEEE Transactions on Software Engineering, vol. 37, no. 2, pp. 146-160, 2010.

[21] Kowsari K., Jafari Meimandi K., Heidarysafa M., Mendu S., Barnes L., and Brown D., “Text Classification Algorithms: A Survey,” Information, vol. 10, no. 4, pp. 150, 2019.

[22] Lamkanfi A., Demeyer S., Giger E., and Goethals B., “Predicting The Severity of A Reported Bug,” in Proceedings of 7th IEEE Working Conference on Mining Software Repositories, Cape Town, pp. 1-10, 2010.

[23] Lamkanfi A., Demeyer S., Soetens Q., and Verdonck T., “Comparing Mining Algorithms for Predicting the Severity of A Reported Bug,” in Proceedings of 15th European Conference on Software Main-tenance and Reengineering, Oldenburg, pp. 249-258, 2011.

[24] Lee C., Hu D., Feng Z., and Yang C., “Mining Temporal Information to Improve Duplication Detection on Bug Reports,” in Proceedings of IIAI 4th International Congress on Advanced Applied Informatics, Okayama, pp. 551-555, 2015.

[25] Lee J., Kim D., and Jung W., “Cost-Aware Clustering of Bug Reports by Using a Genetic Algorithm,” Journal of Information Science and Engineering, vol. 35, no. 1, pp. 175-200, 2019.

[26] Lertnattee V. and Theeramunkong T., “Analysis of Inverse Class Frequency in Centroid-Based Text Classification,” in Proceedings of IEEE International Symposium on Communications and Information Technology, Sapporo, pp. 1171- 1176, 2004.

[27] Limsettho N., Hata H., Monden A., and Matsumoto K., “Automatic Unsupervised Bug Report Cate-Gorization,” in Proceedings of 6th International Workshop on Empirical Software Engineering in Practice, Osaka, pp. 7-12, 2014.

[28] Luaphol B., Srikudkao B., Kachai T., Srikanjanapert N., Polpinij J., and Bheganan P., “Feature Comparison for Automatic Bug Report Classification,” in Proceedings of International Conference on Com-puting and Information Technology, Bangkok, pp. 69-78, 2019.

[29] Luaphol B., Polpinij J., and Kaenampornpan M., “Automatic Dependent Bug Reports Assembly for Bug Tracking Systems by Threshold-Based Similarity,” Indonesian Journal of Electrical Engi-neering and Computer Science, vol. 23, no. pp. 1620-1633, 2021.

[30] Ohira M., Hassan A., Osawa N., and Matsumoto K., “The Impact of Bug Management Patterns on Bug Fixing: A Case Study of Eclipse Projects,” in Proceedings of 28th IEEE International Conference on Software Maintenance, Trento, pp. 264-273, 2012.

[31] Otoom A., Al-Shdaifat D., Hammad M., and Abdallah E., “Severity Prediction of Software Bugs,” in Proceedings of 7th International Conference on Information and Communication Systems, Irbid, pp. 92-95, 2016.

[32] Pandey N., Hudait A., Sanyal D., and Sen A., Automated Classification of Issue Reports from A Software Issue Tracker,” in Proceedings of Progress in Intelligent Computing Techniques: Theory, Practice, and Applications, pp. 423-430, 2018.

[33] Rocha H., Oliveira G., Maques-Neto H., and Valente M., “Nextbug: A Tool for Recommending Similar Bugs in Open-Source Systems,” in Proceedings of V Brazilian 924 The International Arab Journal of Information Technology, Vol. 19, No. 6, November 2022 Conference on Software: Theory and Practice- Tools Track (CBSoft Tools) SBC, Maceio, pp. 53- 60, 2014.

[34] Rocha H., De Oliveira G., Marques-Neto H., and Valente M., “Nextbug: A Bugzilla Extension For Recommending Similar Bugs,” Journal of Software Engineering Research and Development, vol. 3, no. 1, pp. 1-14, 2015.

[35] Roy N. and Rossi B., “Towards An Improvement Of Bug Severity Classification,” in Proceedings of 40th EUROMICRO Conference on Software Engi-Neering and Advanced Applications, Verona, pp. 269-276, 2014.

[36] Salton G., Wong A., and Yang C., “A Vector Space Model for Automatic Indexing,” Commu- Nications of the ACM, vol. 18, no. 11, pp. 613- 620, 1975.

[37] Shokripour R., Anvik J., Kasirun Z., and Zamani S., “Why So Complicated? Simple Term Filtering and Weighting for Location-Based Bug Report Assignment Recommendation,” in Proceedings of 10th Working Conference on Mining Software Repo-Sitories, San Francisco, pp. 2-11, 2013.

[38] Singh P. and Verma S., “Multi-Classifier Model for Software Fault Prediction,” The International Arab Journal of Information Technology, vol. 15, no. 5, pp. 912-919, 2018.

[39] Śliwerski J., Zimmermann T., and Zeller A., “When do Changes Induce Fixes?,” ACM Sigsoft Software Engineering Notes, vol. 30, no. 4, pp. 1- 5, 2005.

[40] Tian Y., Lo D., and Sun C., “Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction,” in Proceedings of 19th Working Conference on Reverse Engineering, Kingston, pp. 215-224, 2012.

[41] Tian Y., Sun C., and Lo D., “Improved Duplicate Bug Report Identification,” in Proceedings of 16th European Conference on Software Maintenance and Re-Engineering, Szeged, pp. 385-390, 2012.

[42] Verma T., Renu R., and Gaur D., “Tokenization and Filtering Process in Rapidminer,” International Journal of Applied Information Systems, vol. 7, no. 2, pp. 16-18, 2014.

[43] Willett P., “The Porter Stemming Algorithm: Then and Now,” Program, vol. 40, no, 3, pp. 219- 223, 2006.

[44] Yang C., Du H., Wu S., and Chen X., “Duplication Detection for Software Bug Reports Based on Bm25 Term Weighting,” in Proceedings of Conference on Technologies and Applications of Artificial Intelligence, Tainan, pp. 33-38, 2012.

[45] Zhang J., Wang X., Hao D., Xie B., Zhang L., and Mei H., “A Survey on Bug-Report Analysis,” Science China Information Sciences, vol. 58, no. 2, pp. 1-24, 2015.

[46] Zhou Y., Tong Y., Gu R., and Gall H., “Combining Text Mining and Data Mining for Bug Report Classification,” Journal of Software: Evolution and Process, vol. 28, no. 3, pp. 150- 176, 2016. Bancha Luaphol received Ph.D. degree in Computer Science from Mahasarakham University. He currently works for Department of Digital Technology, Faculty of Administrative Science, Kalasin University, Thailand. He is currently engaged in the study of applications of natural language processing, and machine learning and deep learning approach. Jantima Polpinij received Ph.D. degree in Computer Science from University of Wollongong, Australia. She is an associate professor of computer science at Mahasarakham University, Thailand. Her research interest includes data science, natural language processing, text mining, and machine learning and deep learning approach. Manasawee Kaenampornpan received Ph.D. degree in Computer Science from University of Bath, UK. She is an assistant professor of computer science at Mahasarakham University, Thailand. Her research interests are user experience design, context awareness, mobile and ubiquitous computing.