The International Arab Journal of Information Technology (IAJIT)


Improvised Software Code Comprehension Using Data Mining

Millions of lines of code are used to create the modern software applications, which are more complicated in terms of their structure, behaviour, and functionality. The rapid advancement of supporting and enabling technologies, for example, is one reason why the development life cycles of these applications show a propensity to get shorter. As a result, a growing amount of the expense associated with software development moves from the generation of new artefacts to their adaption. Understanding the layout, functionality, and behaviour of current code artefacts is essential to this activity. The task of understanding code is crucial to software maintenance. We employed data mining techniques including clustering, classification, and associative rules to improvise software code comprehension.

[1] Anquetil N. and Lethbridge T., “Experiments with Clustering as a Software Remodularization Method,” in Proceedings of the 6th Working Conference on Reverse Engineering, Atlanta, pp. 235-255, 1999. DOI:10.1109/WCRE.1999.806964

[2] Balmas F., Wertz H., and Singer J., “Understanding Program Understanding,” in Proceedings of the 8th International Workshop Program Comprehension, Washington (DC), pp. 256, 2000.

[3] Chen K., Tjortjis C., and Layzell P., “A Method for Legacy Systems Maintenance by Mining Data Extracted from Source Code,” in Proceedings of the IEEE 6th European Conference Software Maintenance and Reengineering, Washington (DC), pp. 54-60, 2002.

[4] Eisenbarth T., Koschke R., and Simon D., “Locating Features in Source Code,” IEEE Transactions on Software Engineering, vol. 29, no. 3, pp. 210-224, 2003. DOI:10.1109/TSE.2003.1183929

[5] Fayyad U., Piatetsky-Shapiro G., and Smyth P., “From Data Mining to Knowledge Discovery: An Overview,” Advances in Knowledge Discovery and Data Mining, pp. 1-34, 1996.

[6] GitHub, The-NextGen-Project/jet,, Last Visited, 2024.

[7] Gresta R., Durelli V., and Cirilo E., “Naming Practices in Object-Oriented Programming: An Empirical Study,” Journal of Software Engineering Research and Development, vol. 11, no. 1, pp. 1-16, 2023.

[8] Husein S. and Oxley A., “A Coupling and Cohesion Metrics Suite for Object-Oriented Software,” in Proceedings of the International Conference on Computer Technology and Development, Kota Kinabalu, pp. 421-425, 2009. DOI:10.1109/ICCTD.2009.209

[9] Kanellopoulos Y., Dimopulos T., Tjortjis C., and Makris C., “Mining Source Code Elements for Comprehending Object-Oriented Systems and Evaluating their Maintainability,” SIGKDD Explorations, vol. 8, no. 1, pp. 33-40, 2006.

[10] Kanellopoulos Y. and Tjortjis C., “Data Mining Source Code to Facilitate Comprehension: Experiments on Clustering Data Retrieved from C++ Program,” in Proceedings of the 12th IEEE International Workshop on Program Comprehension, Bari, pp. 214-223, 2004. DOI:10.1109/WPC.2004.1311063

[11] Kanellopoulos Y., Makris C., and Tjortjis C., “An Improved Methodology on Information Distillation by Mining Program Source Code,” Data and Knowledge Engineering, vol. 61, no. 2, pp. 359-383, 2007.

[12] Kunz T. and Black J., “Using Automatic Process Clustering for Design Recovery and Distributed Debugging,” IEEE Transactions on Software Engineering, vol. 21, no. 6, pp. 515-527, 1995. DOI:10.1109/32.391378

[13] Liang X., Xue C., and Huang M., “Improved Apriori Algorithm for Mining Association Rules of Many Diseases,” in Proceedings of the 5th International Symposium, ISICA, Wuhan, pp. 272- 279, 2010. 642-16388-3_30

[14] Lung C., Zaman M., and Nandi A., “Applications of Clustering Techniques to Software Portioning, Recovery and Restructuring,” The Journal of Systems and Software, vol. 73, no. 2, pp. 227-244, 546 The International Arab Journal of Information Technology, Vol. 21, No. 3, May 2024 2004. 1212(03)00234-6

[15] Maione C., Nelson D., and Barbosa R., “Research on Social Data by Means of Cluster Analysis,” Applied Computing and Informatics, vol. 15, no. 2, pp. 153-162, 2019.

[16] Majumdar S., Papdeja S., Das P., and Ghosh S., Advanced Computing and Systems for Security, Springer, 2020. 981-15-2930-6_3

[17] Mancoridis S., Mitchell B., Chen Y., and Gansner E., “Bunch: A Clustering Tool for the Recovery and Maintenance of Software System Structures,” in Proceedings of the IEEE International Conference on Software Maintenance for Business Change, Oxford, pp. 50-59, 1998. DOI:10.1109/ICSM.1999.792498

[18] Maqbool O., Babri H., Karim A., and Sarwar M., “Metarule-Guided Association Rule Mining for Program Understanding,” IEE Proceedings- Software, vol. 152, no. 6, pp. 281-296, 2005. DOI:10.1049/ip-sen:20050012

[19] Mayrhauser A., Vans A., and Howe A., “Program Understanding Behaviour during Enhancement of Large-Scale Software,” Journal of Software Maintenance: Research and Practice, vol. 9, no. 5, pp. 299-327, 1997. 908X(199709/10)9:5<299::AID- SMR157>3.0.CO;2-S

[20] Mazumdar B. and Mishra R., “Customer Orientation Based Multi-Agent Negotiation for B2C e-Commerce,” International Journal of Agent Technologies and Systems, vol. 2, no. 2, pp. 24-48, 2010. https://www.igi- multi-agent/43867

[21] Moreira G. and Santos J., “Applying Coupling and Cohesion Concepts in Object-Oriented Software: A Controlled Experiment,” in Proceedings of the 19th Brazilian Symposium on Software Quality, Sao Luis, pp. 1-10, 2020.

[22] Offutt J., Abdurazik A., and Schach S., “Quantitatively Measuring Object-Oriented Couplings,” Software Quality Journal, vol. 16, no. 4, pp. 489-512, 2008. 008-9051-x

[23] Oliveira T., Thales1330/PSP,, Last Visited, 2024.

[24] Praditwong K., Harman M., and Yao X., “Software Module Clustering as a Multi- Objective Search Problem,” IEEE Transactions on Software Engineering, vol. 37, no. 2, pp. 264- 282, 2011. DOI:10.1109/TSE.2010.26

[25] Rathee A. and Chhabra J., “Improving Cohesion of Software System by Performing Usage Pattern Based Clustering,” in Proceedings of 6th International Conference on Smart Computing and Communication, Kurukshetra, pp. 740-746, 2018.

[26] Saeed M., Maqbool O., Babri H., Hassan S., and Sarwar S., “Software Clustering Techniques and the Use of Combined Algorithm,” in Proceedings of the 7th European Conference on Software Maintenance and Reengineering, Benevento, pp. 301-306, 2003. DOI:10.1109/CSMR.2003.1192438

[27] Shirabad J., Lethbridge T., and Matwin S., “Mining the Maintenance History of Legacy Software System,” in Proceedings of the International Conference on Software Maintenance, Amsterdam, pp. 95-104, 2003. DOI:10.1109/ICSM.2003.1235410

[28] Standish T., “An Essay on Software Reuse,” IEEE Transactions on Software Engineering, vol. SE- 10, no. 5, pp. 494-497, 1984. DOI:10.1109/TSE.1984.5010272

[29] Sun J. and Ling B., “Software Module Clustering Algorithm Using Probability Selection,” Wuhan University Journal of Natural Sciences, vol. 23, no. 2, pp. 93-102, 2018. 018-1299-9

[30] Supriyamenon M. and Rajarajeswari P., “A Review on Association Rule Mining Techniques with Respect to their Privacy Preserving Capabilities,” International Journal of Applied Engineering Research, vol. 12, no. 24, pp. 15484- 15488, 2017. 4_216.pdf

[31] Tang W., Xu Z., Liu C., Wu J., Yang S., Li Y., and Liu Y., “Towards Understanding Third-Party Library Dependency in C/C++ Ecosystem,” in Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, Michigan, pp. 1-12, 2022.

[32] Tiarks R., “What Programmers Really Do: An Observational Study,” Softwaretechnik-Trends, vol. 31, no. 2, pp. 36-37, 2011. 04

[33] Understand by SciTools,, Last Visited, 2024.

[34] Wedyan F. and Abufakher S., “Impact of Design Patterns on Software Quality: A Systematic Literature Review,” IET Software, vol. 14, no. 1, 1-17, 2020. sen.2018.5446 Improvised Software Code Comprehension Using Data Mining 547

[35] Xiao C. and Tzerpos V., “Software Clustering Based on Dynamic Dependencies,” in Proceedings of the 9th European Conference on Software Maintenance and Reengineering, Manchester, pp. 124-133, 2005. DOI:10.1109/CSMR.2005.49

[36] Yadav V., Singh R., and Yadav V., “Estimation Model for Enhanced Predictive Object Point Metric in OO Software Size Estimation Using Deep Learning,” The International Arab Journal of Information Technology, vol. 20, no. 3, pp. 293- 302, 2023.

[37] Ying A., Murphy G., Ng R., and Chu-Carroll M., “Predicting Source Code Changes by Mining Change History,” IEEE Transactions on Software Engineering, vol. 30, no. 9, pp. 574-586, 2004. DOI:10.1109/TSE.2004.52

[38] Zhang M., Hall T., and Baddoo N., “Code Bad Smells: A Review of Current Knowledge,” Journal of Software Maintenance and Evolution: Research and Practice, vol. 23, no. 3, pp. 179- 202, 2011.