The International Arab Journal of Information Technology (IAJIT)


Cuckoo Search with Mutation for Biclustering of

DNA microarrays have been applied successfully in diverse research fields such as gene discovery, disease diagnosis and drug discovery. The roles of the genes and the mechanisms of the underlying diseases can be identified using microarrays. Biclustering is a two dimensional clustering problem, where we group the genes and samples simultaneously. It has a great potential in detecting marker genes that are associated with certain tissues or diseases. The proposed work finds the significant biclusters in large expression data using the Cuckoo Search with Mutation (CSM). The cuckoo imitates its egg similar to host bird’s egg using a mutation operator. Mutation is used for exploration of search space, more precisely to allow candidates to escape from local minima. It focuses on finding maximum biclusters with lower Mean Squared Residue (MSR) and higher gene variance. A qualitative measurement of the formed biclusters with a comparative assessment of results is provided on four benchmark gene expression dataset. To demonstrate the effectiveness of the proposed method, the results are compared with the swarm intelligence techniques Binary Particle Swarm Optimization (BPSO), Shuffled Frog Leaping (SFL), and Cuckoo Search with Levy flight (CS) algorithm. The results show that there is significant improvement in the fitness value.

[1] Ben-Dor A., Chor B., Karp R., and Yakhini Z., Discovering Local Structure in Gene Expression Data: The Order-Preserving Submatrix Problem, Journal of Computational Biology, vol. 10, no. 3- 4, pp. 373-384, 2003.

[2] Bergmann S., Ihmels J., and Barkai N., Iterative Signature Algorithm for the Analysis of Large- Scale Gene Expression Data, Physical Review E, vol. 67, no. 3, pp. 1-18, 2003.

[3] Bleuler S., Prelic A., and Zitzler E., An EA Framework for Biclustering of Gene Expression Data, in Proceeding of IEEE on Evolutionary Computation, Portland, pp. 166-173, 2004.

[4] Cheng Y. and Church G., Biclustering of Expression Data, in Proceeding of the 8th International Conference on Intelligent Systems for Molecular Biology, Menlo Park, pp. 93 -103, 2000.

[5] Cho J., Campbell J., Winzeler A., Steinmetz L., Conway A., Wodicka L., Wolfsberg G., Gabrielian E., Landsman D., and Lockhart J., A Genome-Wide Transcriptional Analysis of the Mitotic Cell Cycle, Molecular Cell, vol. 2, no. 1, pp. 65-73, 1998.

[6] Dawkins R., The Selfish Gene, Oxford University Press, 2006.

[7] Divina F. and Aguilar-Ruiz S., Biclustering of Expression Data with Evolutionary Computation, IEEE Transactions on Knowledge Data Engineering, vol. 18, no. 5, pp. 590-602, 2006.

[8] Eusuff M., Lansey K., and Pasha F., Shuffled Frog-Leaping Algorithm: A Memetic Meta- Heuristic for Discrete Optimization, Engineering Optimization, vol. 38, no. 2, pp. 129-154, 2007.

[9] Gasch A., Spellman P., Kao C., Carmel-Harel O., Eisen M., Storz G., Botstein D., and Brown P., Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes, Molecular Biology of the Cell, vol. 11, no. 12, pp. 4241-4257, 2000.

[10] Huang Q., Tao D., Li X., and Liew A., Parallelized Evolutionary Learning for Detection of Biclusters in Gene Expression Data, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 2, pp. 560-570, 2012.

[11] Jaccard P., The Distribution of the Flora in the Alpine Zone, New Phytologist, vol. 11, no. 2, pp. 37-50, 1912.

[12] Kennedy J. and Eberhart R., A Discrete Binary Version of the Particle Swarm Algorithm, in Proceeding of IEEE International Conference on Systems, Man and Cybernetics, Orlando, pp. 4104-4108, 1997.

[13] Liu C., Introduction to Combinatorial Mathematics, McGraw-Hill Publication, 1968.

[14] Liu J., Li Z., Hu X., and Chen Y., Biclustering of Microarray Data With Mospo Based on Crowding Distance, Bioinformatics, vol. 10, no. 4, 2009.

[15] Liu X. and Wang L., Computing the Maximum Similarity Bi-Clusters of Gene Expression Data, BMC Bioinformatics, vol. 23, no. 1, pp. 50-56, 2007.

[16] Lockhart D. and Winzeler E., Genomics, Gene Expression and DNA Arrays, Nature, vol. 405, no. 6788, pp. 827-836, 2000.

[17] Mansour N., Awad M., and El-Fakih K., Incremental Genetic Algorithm, The International Arab Journal of Information Technology, vol. 3, no. 1, pp. 42-47, 2006.

[18] Roy S., Bhattacharyya D., and Kalita J., CoBi: Pattern Based Co-Regulated Biclustering of Gene Expression Data, Pattern Recognition Letters, vol. 34, no. 14, pp. 1669-1678, 2013.

[19] Tanay A., Sharan R., and Shamir R., Discovering Statistically Significant Biclusters in Gene Expression Data, BMC Bioinformatics, vol. 18, no. 1, pp. 136-144, 2002.

[20] Wen X., Fuhrman S., Michaels G., Carr D., Smith S., Barker J., and Somogyi R., Large- Scale Temporal Gene Expression Mapping of Central Nervous System Development, Proceeding of the National Academy of Sciences, vol. 95, no. 1, pp. 334-339, 1998.

[21] Yang X. and Deb S., Cuckoo Search via Levy Flights, in Proceeding of the World Congress on Nature and Biologically Inspired Computing, Coimbatore, pp. 210-214, 2009. 306 The International Arab Journal of Information Technology, Vol. 14, No. 3, May 2017 Balamurugan Rengeswaran is currently, working as a Senior Research Fellow for the DBT sponsored project at Bannari Amman Institute of Technology, Erode, Tamil Nadu, India. He received his BE and ME degrees in Computer Science and Engineering (CSE) from Anna University, Chennai. His areas of interest include data mining and optimization techniques. Natarajan Mathaiyan is currently, working as Chief Executive at Bannari Amman Institute of Technology, Erode, Tamil Nadu, India. He received BE, MSc and PhD degrees from the PSG College of Technology, Coimbatore, India. He has more than 40 years of experience in Academic- Teaching, Research and Administration. He had published more than 110 papers in National and International Journals and He authored and published 10 Books. His research areas of interest include data mining, image processing and soft computing. Premalatha Kandasamy is currently, working as a Professor in the Department of Computer Science and Engineering at Bannari Amman Institute of Technology, Erode, Tamil Nadu, India. She completed her PhD in Computer Science and Engineering (CSE) at Anna University, Chennai, India. She did her ME and BE degrees in CSE at Bharathiar University, Coimbatore, Tamil Nadu, India. Her research interests include data mining, image processing, information retrieval and soft computing.