The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


EncCD: A Framework for Efficient Detection of Code Clones

Minhaj Khan,
Code clones represent similar snippets of code written for an application. The detection of code clones is essential for maintenance of a software as modification to multiple snippets with a similar bug becomes cumbersome for a large software. The clone detection techniques perform conventional parsing before final match detection. An inefficient parsing mechanism however deteriorates performance of the overall clone detection mechanism. In this paper, we propose a framework called Encoded Clone Detector (EncCD), which is based on encoded pipeline processing for efficiently detecting clones. The proposed framework makes use of efficient labelled encoding followed by tokenization and match detection. The experimentation performed on the Intel Core i7 and Intel Xeon processor based systems shows that the proposed EncCD framework outperforms the widely used JCCD and CCFinder frameworks by producing a significant performance improvement.


[1] Baker B., “On Finding Duplication and Near- duplication in Large Software Systems,” in Proceedings of the 2nd Working Conference on Reverse Engineering, Washington, pp. 86-95, 1995.

[2] Baxter I., Yahin A., Moura L., Santanna M., and Bier L., “Clone Detection Using Abstract Syntax Trees,” in Proceedings of the International Conference on Software Maintenance, Bethesda, pp. 368-377, 1998.

[3] Behlendorf B., “Apache HTTP Server Project", Apache, Available at: https://httpd.apache.org/, Last Visited, 2016.

[4] Biegel B. and Diehl S., “JCCD: A Flexible and Extensible API for Implementing Custom Code Clone Detectors,” in Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, New York, pp. 167-168, 2010.

[5] Bruggen D., “JavaParser,” Available at: http://javaparser.org, Last Visited, 2016.

[6] Cagnon E., “SableCC,” SableCC.org, Available at: http://www.sablecc.org/, Last Visited, 2016.

[7] Ducasse S., Rieger M., and Demeyer S., “A Language Independent Approach for Detecting Duplicated Code,” in Proceedings of the IEEE International Conference on Software Maintenance, Oxford, pp. 109-118, 1999.

[8] Dunwiddie B., “Java CSV,” Csvreader.com, Available at: https://www.csvreader.com/, Last Visited, 2017.

[9] Fisher J., “OWASP DirBuster Project,” Owasp.org, Available at: https://www.owasp.org/index.php /Category:OWASP_DirBuster_Project, Last Visited, 2017.

[10] Gauci R., “Smelling out Code Clones: Clone Detection Tool Evaluation and Corresponding Challenges,” CoRR, vol. abs/1503.00711, 2015.

[11] Gamma E. and Eggenschwiler T., “JHotDraw as Open-Source Project,” JHotDraw.org, Available at: http://www. jhotdraw.org/, Last Visited, 2016. 952 The International Arab Journal of Information Technology, Vol. 16, No. 5, September 2019

[12] Javacardos F., “Java Card PKI Applet,” Sourceforge, Available at: https://sourceforge.net/projects/java-card- pkiapplet/, Last Visited, 2016.

[13] Johnson J., “Identifying Redundancy in Source Code Using Fingerprints,” in Proceedings of the Conference of the Centre for Advanced Studies on Collaborative Research, Toronto, pp. 171- 183, 1993.

[14] Kamiya T., Kusumoto S., and Inoue K., “CCFinder: A Multilinguistic Token-based Code Clone Detection System for Large Scale Source Code,” IEEE Transactions on Software Engineering, vol. 28, pp. 654-670, 2002.

[15] Lewis L., “Open Visual Traceroute,” Visualtraceroute, Available at: http://visualtraceroute.net/, Last Visited, 2016.

[16] Mubarak-Ali A., Syed-Mohamad S., and Sulaiman S., “Enhancing Generic Pipeline Model for Code Clone Detection using Divide and Conquer Approach,” The International Arab Journal of Information Technology, vol. 12, no. 5, pp. 510-517, 2015.

[17] Mohapatra T., “Java Class File Editor,” Sourceforge, Available at: http://classeditor.sourceforge.net/, Last Visited, 2016.

[18] Roy C., Cordy J., and Koschke R., “Comparison and Evaluation of Code Clone Detection Techniques and Tools: A Qualitative Approach,” Science of Computer Programming, vol. 74, pp. 470-495, 2009.

[19] Sajnani H., Saini V., Svajlenko J., Roy C., and Lopes C., “SourcererCC: Scaling Code Clone Detection to Big Code,” in Proceedings of the 38th International Conference on Software Engineering, Texas, pp. 1157-1168, 2016.

[20] Smith R. and Horwitz S., “Detecting and Measuring Similarity in Code Clones,” in Proceedings of the 13th European Conference on Software Maintenance and Reengineering, USA, pp. 28-34, 2009.

[21] Svajlenko J., Keivanloo I., and Roy C., “Big Data Clone Detection Using Classical Detectors: an Exploratory Study,” Journal of Software: Evolution and Process, vol. 27, no. 6, pp. 430- 464, 2015.

[22] Triemax S., “Jalopy Java Source Code Formatter Beautifier Pretty Printer,” TrieMax, Available at: https://www.triemax.com/, Last Visited, 2016.

[23] Wahler V., Seipel D., Gudenberg J., and Fischer G., “Clone Detection in Source Code by Frequent Itemset Techniques,” in Proceedings of the Source Code Analysis and Manipulation, 4th IEEE International Workshop, Washington, pp. 128-135, 2004.

[24] White M., Tufano M., Vendome C., and Poshyvanyk D., “Deep Learning Code Fragments for Code Clone Detection,” in Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, New York, pp. 87-98, 2016.

[25] Yang W., “Identifying Syntactic Differences Between Two Programs,” Software-Practice and Experience, vol. 21, no. 7, pp. 739-755, 1991.

[26] Zamudio E., “Introduction to ISO8583,” Sourceforge, Available at: http://j8583.sourceforge.net/ iso8583.html, Last Visited, 2016. Minhaj Khan obtained his MS and Ph.D degrees from University of Versailles, France. He is currently working as Associate Professor at Bahauddin Zakariya University, Multan. His research interests include code optimization and high performance computing.