The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Improving Classification Performance Using Genetic Programming to Evolve String Kernels

The objective of this work is to present a novel evolutionary-based approach that can create and optimize powerful string kernels using Genetic Programming. The proposed model creates and optimizes a superior kernel, which is expressed as a combination of string kernels, their parameters, and corresponding weights. As a proof of concept to demonstrate the feasibility of the presented approach, classification performance of the newly evolved kernel versus a group of conventional single string kernels was evaluated using a challenging classification problem from biology domain known as theclassification of binder and non-binder peptides to Major Histocompatibility Complex Class II. Using 4794 strings containing 3346 binder and 1448 non-binder peptides, the present approach achieved Area Under Curve=0.80, while the 11 tested conventional string kernels have Area Under Curve ranging from 0.59 to 0.75. This significant improvement of the optimized evolved kernel over all other tested string kernels demonstrates the validity of this approach for enhancing Support Vector Machine classification. The presented approach is not exclusive for biological strings. It can be applied to solve pattern recognition problems for other types of strings as well as natural language processing.


[1] Ahn H., Lee K., and Kim K., “Global Optimization of SVMs Using Genetic Algorithms for Bankruptcy Prediction,” in Proceedings of International Conference on Neural Information Processing, Berlin, pp. 420- 429, 2006.

[2] Ben-Hur A., Horn D., Siegelmann H., and Vapnik V., “Support Vector Clustering,” Journal of Machine Learning Research 2, pp. 125-137, 2001.

[3] Bennet J., Ganaprakasam C., and Kumar N., “A Hybrid Approach for Gene Selection and Classification using Support Vector Machine,” The International Arab Journal of Information Technology, vol. 12, no. 6A, pp. 695-700, 2015.

[4] Bhasin M. and Raghava G., “SVM based Method for Predicting HLA-DRB1*0401 Binding Peptides in an Antigen Sequence,” Bioinformatics, vol. 20, no. 3, pp. 421-423, 2004.

[5] Crammer K., Keshet J., and Singer Y., Kernel Design Using Boosting, MIT Press, 2002.

[6] Dioşan L., Rogozan A., and Pecuchet J., “Optimising Multiple Kernels for SVM by Genetic Programming,” in Proceedings of European Conference on Evolutionary Computation in Combinatorial Optimization, Naples, pp. 230-241, 2008.

[7] Giguere S., Marchand M., Laviolette F., Drouin A., and Corbeil J., “Learning a Peptide-Protein Binding Affinity Predictor with Kernel Ridge Regression,” BMC Bioinformatics, vol. 14, no. 82, pp. 2-16, 2013.

[8] Gönen M. and Alpaydın E., “Multiple Kernel Learning Algorithms,” Journal of Machine Learning Research, vol. 12, pp. 2211-2268, 2011.

[9] Gunn S., “Support Vector Machines for Classification and Regression,” Technical Report, University oF Southampton, 1998.

[10] Howley T. and Madden M., “The Genetic Kernel Support Vector Machine: Description and Evaluation,” Artificial Intelligence Review, vol. 24, no. 3-4, pp. 379-395, 2005.

[11] Li L., Rakitsch B., and Borgwardt K., “ccSVM: Correcting Support Vector Machines for Confounding Factors in Biological Data Classification,” Bioinformatics, vol. 27, no. 13, pp. 342-348, 2011.

[12] Liao W. and Arthur J., “Predicting Peptide Binding to Major Histocompatibility Complex Molecules,” Autoimmunity Reviews, vol. 10, no. 8, pp. 469-73, 2011.

[13] Mullan L., “Pairwise Sequence Alignment-It's All About Us!,” Brief Bioinform, vol. 7, no. 1, pp. 113-115, 2006.

[14] Nielsen M., Lund O., Buus S., and Lundegaard C., “MHC Class II Epitope Predictive Algorithms,” Immunology, vol. 130, no. 3, pp. 319-28, 2010.

[15] Scholkopf B. and Smola A., Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press Cambridge, 2001.

[16] Scholkopf B., Tsuda K., and Vert J., Kernel Methods in Computational Biology, MIT Press, Cambridge, 2004.

[17] Silva S., “A Genetic Programming Toolbox for MATLAB,” 2009.

[18] Sonnenburg S., Raetsch G., Henschel S., Widmer C., Behr J., Zien A., Bona F., Binder A., Gehl C., and Franc V., “The SHOGUN Machine Learning Toolbox,” Journal of Machine Learning Research, pp. 1799-1802, 2010.

[19] Sonnenburg S., Rätsch G., Schäfer C., and Schölkopf B., “Large Scale Multiple Kernel Learning,” Journal of Machine Learning Research, vol. 7, pp. 1531-1565, 2006. Improving Classification Performance Using Genetic Programming to ... 459

[20] Suykens J., Argyriou A., De Brabanter K., Diehl M., Pelckmans K., Signoretto M., Van Belle V., and Vandewalle J., “International Workshop on Advances in Regularization, Optimization, Kernel Methods and Support Vector Machines: theory and applications (ROKS 2013),” Book of Abstracts, Leuven, pp. 128, 2013.

[21] Taylor J. and Cristianini N., Kernel Methods for Pattern Analysis, Cambridge University Press, 2004.

[22] Zhang L., Udaka K., Mamitsuka H., and Zhu S., “Toward More Accurate Pan-Specific MHC- Peptide Binding Prediction: A Review of Current Methods and Tools,” Brief Bioinform, vol. 13, no. 3, pp. 350-364, 2012. Ruba Sultan received her MSc. in Informatics in 2012 from Palestine Polytechnic University, Palestine. She is currently working as a teaching and research assistant in the College of Information Technology and Computer Engineering in Palestine Polytechnic University. Her recent research focuses on developing computerized Bioinformatic tools. Yaqoub Ashhab is an associate professor of molecular biology and bioinformatics in the Biotechnology Research Center at Palestine Polytechnic University and a visiting professor at the Autonomous University of Barcelona. His recent research focuses mainly on developing bioinformatic tools to improve classification of genomic and immunomic data related to host-pathogen interaction. Hashem Tamimi received his Ph.D. in computer science from the University of Tubingen, Germany in 2006. Currently, he is an assistant professor at the College of Information Technology and Computer Engineering, Palestine, Polytechnic University, Hebron, Palestine. His research interests include machine learning and bioinformatics.