The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Enhancing Generic Pipeline Model for Code Clone Detection using Divide and Conquer Approach

 Code clone is known as identical copies of the same instances or fragments of source codes in software. Current code clone research focuses on the detection and an alysis of code clones in order to help software developers identify code clones in source codes and reuse the source codes i n order to decrease the maintenance cost. Many appr oaches such as textual based comparison approach, token based comparison a nd tree based comparison approach have been used to detect code clones. As software grows and becomes a legacy syst em, the complexity of these approaches in detecting code clones increases. Thus, this scenario makes it more diffic ult to detect code clones. Generic pipeline model i s the most recent code clone detection that comprises five processes which are parsing process, pre)processing process, pooling process, comparing processes and filtering process to detect code clon e. This research highlights the enhancement of the generic pipeline model using divide and conquer approach that involves con catenation process. The aim of this approach is to produce a better input for the generic pipeline model by processing smalle r part of source code files before focusing on the large chunk of source codes in a single pipeline. We implement and apply the proposed approach with the support of a tool called Java Code Clone Detector (JCCD). The result obtained shows an impro vement in the rate of code clone detection and overall runtime performance as compared to the existing generic pip eline model.  


[1] ANTLR Parser Generator., available at: http://www.antlr.org/, last visited 2012.

[2] Bellon S., Koschke R., Antoniol G., Krinke J., and Merlo E., Comparison and Evaluation of Clone Detection Tools, IEEE Transactions on Software Engineering , vol. 33, no. 9, pp. 577- 591, 2007.

[3] Biegel B. and Diehl S., Highly Configurable and Extensible Code Clone Detection, in Proceedings of the 17 th Working Conference on Reverse Engineering , Massachusetts, USA, pp. 237-241, 2010.

[4] Biegel B. and Diehl S., JCCD: A Flexible and Extensible API for Implementing Custom Code Clone Detectors, in Proceedings of IEEE/ACM International Conference on Automated Software Engineering , Antwerp, Belgium, pp. 167-168, 2010.

[5] Dasgupta S., Papadimitriou C., and Vazirani U., Algorithms , McGraw Hill, New York, USA, 2006.

[6] Deissenboeck F., Hummel B., Juergens E., Pfaehler M., and Schaetz B., Model Clone Detection in Practice, in Proceedings of the 4 th International Workshop on Software Clones , Cape Town, South Africa, pp. 57-64, 2010.

[7] Duala-Ekoko E. and Robillard M., Tracking Code Clones in Evolving Software, in Proceedings of the 29 th International Conference on Software Engineering , Minnesota, USA, pp. 158-167, 2007.

[8] Hou D., Jacob F., and Jablonski P., Exploring the Design Space of Proactive Tool Support for Copy-and-Paste Programming, in Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research , Ontario, Canada, pp. 188-202, 2009.

[9] Ibrahim S., Idris N., Munro M., and Deraman A., Integrating Software Traceability for Change Impact Analysis, the International Arab Journal of Information Technology , vol. 2, no. 4, pp. 301- 308, 2005.

[10] Ishio T., Date H., Miyake T., and Inoue K., Mining Coding Patterns to Detect Crosscutting Concerns in Java Programs, in Proceedings of the 15 th Working Conference on Reverse Engineering , Antwerp, Belgium, pp. 123-132, 2008.

[11] Jarzabek S. and Xue Y., Are Clones Harmful for Maintenance? in Proceedings of the 4 th International Workshop on Software Clones , Cape Town, South Africa, pp. 73-74, 2010.

[12] JHotDraw., available at: http://www.randelshofer .ch/oop/jhotdraw/, last visited 2012.

[13] Jiang L., Misherghi G., Su Z., and Glondu S., DECKARD: Scalable and Accurate Tree-based Enhancing Generic Pipeline Model for Code Clone Detection Using Divide and Conquer Approach 517 Detection of Code Clones, in Proceedings of the 29th International Conference on Software Engineering , Minnesota, USA, pp. 96-105, 2007.

[14] Kamiya T., Kusumoto S., and Inoue K., CCFinder: A Multilinguistic Token-based Code Clone Detection System for Large Scale Source Code, IEEE Transactions on Software Engineering , vol. 28, no. 7, pp. 654-670, 2002.

[15] Koschke R., Falke R., and Frenzel P., Clone Detection using Abstract Syntax Suffix Trees, in Proceedings of the 13 th Working Conference in Reverse Engineering , Benevento, Italy, pp. 253- 262, 2006.

[16] Mubarak-Ali A., Syed-Mohamed S., and Sulaiman S., An Enhanced Generic Pipeline Model for Code Clone Detection, in Proceedings of the 5 th Malaysian Conference in Software Engineering , Johor Bahru, Malaysia, pp. 434-438, 2011.

[17] Roy C. and Cordy J., A Survey on Software Clone Detection Research, Technical Report, Queen s University, 2007.

[18] SableCC., available at: http://sablecc.org/, last visited 2012. Al-Fahim Mubarak-Ali received his BS degree of computer science (software engineering) from University Malaysia Pahang, Malaysia in 2009 and MS degree of science (computer science) from University Sains Malaysia, Malaysia in 2012. Currently, he is pursuing his PhD in the a rea of software engineering in University Teknologi Malaysia, Malaysia. Sharifah Syed-Mohamad is a senior lecturer of the School of Computer Sciences, University Sains Malaysia. She received her PhD degree in software engineering from the University of Technology, Australia in 2012. Her research interests include software reliability, software te sting, software maintenance and agile development. Shahida Sulaiman is an associate professor of the Faculty of Computing, University Teknologi Malaysia. She holds a PhD degree in computer science and Ms degree in computer science (software engineering in real time systems). Her expertise includes software design, software maintenance, software visualisation and documentati on and knowledge management. .