The International Arab Journal of Information Technology (IAJIT)


An Additive Sparse Logistic Regularization Method for Cancer Classification in Microarray Data

Now a day’s cancer has become a deathly disease due to the abnormal growth of the cell. Many researchers are working in this area for the early prediction of cancer. For the proper classification of cancer data, demands for the identification of proper set of genes by analyzing the genomic data. Most of the researchers used microarrays to identify the cancerous genomes. However, such kind of data is high dimensional where number of genes are more compared to samples. Also the data consists of many irrelevant features and noisy data. The classification technique deal with such kind of data influences the performance of algorithm. A popular classification algorithm (i.e., Logistic Regression) is considered in this work for gene classification. Regularization techniques like Lasso with L1 penalty, Ridge with L2 penalty, and hybrid Lasso with L1/2+2 penalty used to minimize irrelevant features and avoid overfitting. However, these methods are of sparse parametric and limits to linear data. Also methods have not produced promising performance when applied to high dimensional genome data. For solving these problems, this paper presents an Additive Sparse Logistic Regression with Additive Regularization (ASLR) method to discriminate linear and non-linear variables in gene classification. The results depicted that the proposed method proved to be the best-regularized method for classifying microarray data compared to standard methods.

[1] Algamal Z. and Lee M., “Penalized Logistic Regression with the Adaptive LASSO for Gene Selection in High-Dimensional Cancer Classification,” Expert Systems with Applications, vol. 42, no. 23, pp. 9326-9332, 2015.

[2] Bashir K., Li T., and Yahaya M., “A Novel Feature Selection Method Based on Maximum Likelihood Logistic Regression for Imbalanced Learning in Software Defect Prediction,” The International Arab Journal of Information Technology, vol. 17, no. 5, pp. 721-730, 2020.

[3] Becker N., Toedt G., Lichter P., and Benner A., “Elastic SCAD as A Novel Penalization Method for SVM Classification Tasks in High- Dimensional Data,” BMC Bioinformatics, vol. 12, no. 138, pp. 1-13, 2011.

[4] Goh L., Song Q., and Kasabov N., “A Novel Feature Selection Method to Improve Classification of Gene Expression Data,” in Proceedings of the 2nd Conference on Asia- Pacific Bioinformatics, Dunedin, pp. 161-166, 2004.

[5] Hu Y. and Kasabov N., “Ontology-Based Framework for Personalized Diagnosis and Prognosis of Cancer Based on Gene Expression Data,” in Proceedings of International Conference on Neural Information Processing, Kitakyushu, pp. 846-855, 2008.

[6] Knight K. and Fu W., “Asymptotics for LASSO- Type Estimators,” The Annals of Statistics, vol. 28, no. 5, pp. 1356-1378, 2000.

[7] Lavanya K., Reddy L., and Reddy B., Computational Intelligence in Data Mining, Springer, 2019. 220 The International Arab Journal of Information Technology, Vol. 18, No. 2, March 2021

[8] Lavanya K., Reddy L., and Reddy B., “Modelling of Missing Data Imputation using Additive LASSO Regression Model in Microsoft Azure,” Journal of Engineering and Applied Sciences, vol. 13, no. 8, pp. 6324-6334, 2018.

[9] Lin Y. and Zhang H., “Component Selection and Smoothing in Multivariate Nonparametric Regression,” Annals of Statistics, vol. 34, no.5, pp. 2272-2297, 2006.

[10] Malioutov D., Cetin M., and Willsky A., “Sparse Signal Reconstruction Perspective for Source Localization with Sensor Arrays,” IEEE Transactions on Signal Processing, vol. 53, no. 8, pp. 3010-3022, 2005.

[11] Meier L., Geer S., and Bühlmann P., “The Group LASSO for Logistic Regression,” Journal of the Royal Statistical Society Series B, vol. 70, pp. 53- 71, 2008.

[12] Meinshausen N. and Yu B., “Lasso-Type Recovery of Sparse Representations For High- Dimensional Data,” Institute of Mathematical Statistics, vol. 37, no. 1, pp. 246-270, 2009.

[13] Singh D., Febbo P., Ross K., Jackson D., Manola J., Ladd C., Tamayo P., Renshaw A., D'Amico A., Richie J., Lander E., Loda M., Kantoff P., Golub T., and Sellers W., “Gene Expression Correlates of Clinical Prostate Cancer Behavior,” Cancer Cell, vol. 1, no. 2, pp. 203-209, 2002.

[14] Taylan P. and Weber G., Data Science and Digital Business, Springer, 2019.

[15] Tibshirani R., “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 58, no. 1, pp. 267-288, 1996.

[16] Van-de-Geer S., “High-Dimensional Generalized Linear Models and the Lasso,” Institute of Mathematical Statistics, vol. 36, no. 2, pp. 614- 645, 2008.

[17] Vincent M. and Hansen N., “Sparse Group Lasso and High Dimensional Multinomial Classification,” Computational Statistics and Data Analysis, vol. 71, pp. 771-786, 2014.

[18] Wang L., Chen G., and Li H., “Group SCAD Regression Analysis for Microarray Time Course Gene Expression Data,” Bioinformatics, vol. 23, no. 12, pp. 1486-1494, 2007.

[19] Wu S., Jiang H., Shen H., and Yang Z., “Gene Selection in Cancer Classification Using Sparse Logistic Regression with L1/2 Regularization,” Applied Sciences, vol. 8, no. 9, pp. 1569, 2018.

[20] Xu Z., Chang X., Xu F., and Zhang H., “L1/2 Regularization: A Thresholding Representation Theory and A Fast Solver,” IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 7, pp. 1013-27, 2012.

[21] Xu Z., Zhang H., Wang Y., Chang X., and Liang Y., “L1/2 Regularization,” Science China Information Sciences, vol. 53, no. 6, pp. 1159- 1169, 2010.

[22] Yuan G., Ho C., and Lin C., “An Improved GLMNET for L1-Regularized Logistic Regression,” Journal of Machine Learning Research, vol. 13, pp. 1999-2030, 2012.

[23] Zeng J., Lin S., Wang Y., and Xu Z., “L1/2 Regularization: Convergence of Iterative Half Thresholding Algorithm,” IEEE Transactions on Signal Processing, vol. 62, no. 9, pp. 2317-2329, 2014.

[24] Zhu J. and Hastie H., “Classification of Gene Microarrays by Penalized Logistic Regression,” Biostatistics, vol. 5, no. 3, pp. 427-443, 2002.

[25] Zou H. and Hastie T., “Regularization and Variable Selection via the Elastic Net,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 3, pp. 301- 320, 2005.

[26] Zou H., “The Adaptive Lasso and its Oracle Properties,” Journal of the American Statistical Association, vol. 101, no. 476, pp. 1418-1429, 2006.