The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Hybrid Feature Selection based on BTLBO and RNCA to Diagnose the Breast Cancer

Feature selection is a feasible solution to improve the speed and performance of machine learning models. Optimization algorithms are doing a significant job in searching for optimal variables from feature space. Recent feature selection methods are purely depending on various meta heuristic algorithms for searching a good combination of features without considering the importance of individual features, which makes classification models to suffer from local optima or overfitting problems. In this paper, a novel hybrid feature subset selection technique is introduced based on Regularized Neighborhood Component Analysis (RNCA) and Binary Teaching Learning Based Optimization (BTLBO) algorithms to overcome the above problems. RNCA algorithm assigns weights to the attributes based on their contribution in building the learning models for classification. BTLBO algorithm computes the fitness of individuals with respect to the weights of features and selects the best ones. The results of similar feature selection methods are matched with the proposed hybrid model and proved better performance in terms of classification accuracy, recall and AUC measures over breast cancer datasets.

[1] Ahmad S., Bakar A., and Yaakub M., “Ant Colony Optimization for Text Feature Selection in Sentiment Analysis,” Intelligent Data Analysis, vol. 23, no. 1, pp. 133-158, 2019. DOI: 10.3233/IDA-173740

[2] AlFarraj O., AlZubi A., and Tolba A., “Optimized Feature Selection Algorithm Based on Fireflies with Gravitational Ant Colony Algorithm for Big Data Predictive Analytics,” Neural Computing and Applications, vol. 31, pp. 1391-1403, 2019. DOI:10.1007/s00521-018-3612-0

[3] Ali A., Hussain Z., and Abd S., “Big Data Classification Efficiency Based on Linear Discriminant Analysis,” Iraqi Journal for Computer Science and Mathematics, vol. 1, no. 1, pp. 7-12, 2020. DOI: https://doi.org/10.52866/ijcsm.2019.01.01.001

[4] Allam M. and Nandhini M., “Feature Optimization Using Teaching Learning Based Optimization for Breast Disease Diagnosis,” International Journal of Recent Technology and Engineering, vol. 7, no. 4, pp. 78-85, 2018.

[5] Allam M. and Nandhini M., “Optimal Feature Selection Using Binary Teaching Learning Based Optimization Algorithm,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 2, pp. 329-341, 2018. https://doi.org/10.1016/j.jksuci.2018.12.001

[6] Allam M. and Nandhini M., “A Study on Optimization Techniques in Feature Selection for Medical Image Analysis,” International Journal on Computer Science and Engineering, vol. 9, no. 3, pp. 75-82, 2017.

[7] Goldberger J., Hinton G., Roweis S., and Salakhutdinov R., “Neighbourhood Components Analysis,” Advances in Neural Information Processing Systems, vol. 17, pp. 513-520, 2005.

[8] Guha R., Ghosh M., Kapri. S., Shaw S., Mutsuddi S., Bhateja V., and Sarkar R., “Deluge based Genetic Algorithm for Feature Selection,” Evolutionary Intelligence, vol. 14, pp. 357-367, 2021.

[9] Hsu H., Hsieh C., and Lu M., “Hybrid Feature Selection by Combining Filters and Wrappers,” Expert Systems with Applications, vol. 38, pp. 8144-8150, 2011.

[10] Jain S. and Salau A., “An Image Feature Selection Approach for Dimensionality Reduction Based on kNN and SVM for AkT Proteins,” Cogent Engineering, vol. 6, no. 1, 2019. https://doi.org/10.1080/23311916.2019.1599537

[11] Khourdifi Y. and Bahaj M., “The Hybrid Machine Learning Model Based on Random Forest Optimized by PSO and ACO for Predicting Heart Disease,” ICCWCS, 2019. http://dx.doi.org/10.4108/eai.24-4-2019.2284088

[12] Kiziloz H., Deniz A., Dokeroglu T., and Cosar A., “Novel Multiobjective TLBO Algorithms for the Feature Subset Selection Problem,” Neurocomputing, vol. 306, pp. 94-107, 2018. https://doi.org/10.1016/j.neucom.2018.04.020

[13] Liang H., Wang Z., and Liu Y., “A New Hybrid Ant Colony Optimization Based on Brain Storm Optimization for Feature Selection,” The Institute of Electronics, Information and Communication Engineers, vol. 102, no. 7, pp. 1396-1399, 2019. DOI:10.1587/transinf.2019EDL8001

[14] Patricio M., Pereira J., Crisostomo J., Matafome P., Gomes M., Seica R., Caramelo F., “Using Resistin Glucose Age and BMI to Predict the Presence of Breast Cancer,” BMC Cancer, vol. 18, no. 1, 2018. doi: 10.1186/s12885-017-3877-1.

[15] Qiu Y., Zhou G., Zhao Q., and Cichocki A., “Comparative Study on the Classification Methods for Breast Cancer Diagnosis,” Bulletin of the Polish Academy of Sciences Technical Sciences, vol. 66, no. 6, pp. 841-848, 2018. DOI: 10.24425/bpas.2018.125931

[16] Ramasamy R. Rani S., “Modified Binary Bat Algorithm for Feature Selection in Unsupervised Learning,” The International Arab Journal of Information Technology, vol. 15, no. 6, pp. 1060- 1067, 2018.

[17] Rajalaxmi R., “A Hybrid Binary Cuckoo Search and Genetic Algorithm for Feature Selection in Type-2 Diabetes,” Current Bioinformatics, vol. 11, no. 4, pp. 490-499, 2016. DOI: 10.2174/1574893611666151228190309

[18] Rao R., “Review of Applications of TLBO Algorithm and a Tutorial for Beginners to Solve the Unconstrained and Constrained Optimization Problems,” Decision Science Letters, vol. 5, pp. 1- 30, 2016. DOI:10.5267/j.dsl.2015.9.003

[19] Rao R., Savsani V., and Vakharia D., “Teaching- Learning-based Optimization: A Novel Method for Constrained Mechanical Design Optimization Problems,” Computer-Aided Design, vol. 43, no. 3, pp. 303-315, 2011. https://doi.org/10.1016/j.cad.2010.12.015

[20] Satapathy S., Naik A., and Parvathi K., “Rough set and Teaching Learning Based Optimization Technique for Optimal Features Selection,” Central European Journal of Computer Science, vol. 3, no. 1, pp. 27-42, 2013. DOI: 10.2478/s13537-013-0102-4

[21] Satapathy S., Naik A., and Parvathi K., “Unsupervised Feature Selection Using Rough Set and Teaching Learning-Based Optimisation,” International Journal of Artificial Intelligence and Soft Computing, vol. 3, no. 3, pp. 244-256, 2013. DOI:10.1504/IJAISC.2013.053401

[22] Sevinç E. and Dökeroğlu T., “A Novel Hybrid Teaching-Learning-Based Optimization Algorithm for the Classification of Data by Using Hybrid Feature Selection based on BTLBO and RNCA to Diagnose the Breast Cancer 737 Extreme Learning Machines,” Turkish Journal of Electrical Engineering and Computer Sciences, vol. 27, pp. 1523-1533, 2019. 10.3906/elk-1802- 40

[23] Shang Q., Tan D., Gao S., and Feng L., “A Hybrid Method for Traffic Incident Duration Prediction Using BOA-Optimized Random Forest Combined with Neighborhood Components Analysis,” Jounal of Advanced Transportation, 2019. https://doi.org/10.1155/2019/4202735.

[24] Sun L., Kong X., Xu J., Xue Z., Zhai R., and Zhang S., “A Hybrid Gene Selection Method Based on ReliefF and Ant Colony Optimization Algorithm for Tongumor Classification,” Scientific Reports, vol. 9, no. 1, 2019. doi: 10.1038/s41598-019-45223-x.

[25] Taghanaki S., Ansari M., Dehkordi B., and Mousavi S., “Nonlinear Feature Transformation and Genetic Feature Selection: Improving System Security and Decreasing Computational Cost,” ETRI Journal, vol. 34, no. 6, pp. 847-857, 2012. https://doi.org/10.4218/etrij.12.1812.0032

[26] Too J., Abdullah A., and Saad N., “Binary Competitive Swarm Optimizer Approaches for Feature Selection,” Computation, vol. 7, no. 2, 2019.

[27] Tuo S., Yong L., Deng F., Li Y., Lin Y., and Lu Q., “HSTLBO: A Hybrid Algorithm based on Harmony Search and Teaching-Learning Based Optimization for Complex High Dimensional Optimization Problems,” PLoS ONE12, vol. 12, no. 4, 2017. https://doi.org/10.1371/journal.pone.0175114

[28] Wolberg W., Mangasarian O., Street N., and Street W., UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data Set. http://archive.ics.uci.edu/ml/datasets/Breast+ Cancer+Wisconsin+(Diagnostic).

[29] Yang W., Wang K., and Zuo W., “Neighborhood Component Feature Selection for High Dimensional Data,” Journal of Computers, vol. 7, pp. 161-168, 2012. doi: 10.4304/jcp.7.1.161-168

[30] Yang Z. and Laaksonen J., “Regularized Neighborhood Component Analysis, Image Analysis,” Lecture Notes in Computer Science, vol. 4522, 2007.

[31] Zhao Y., Liu Y., and Huang W., “Prediction Model of HBV Reactivation in Primary Liver Cancer-Based on NCA Feature Selection and SVM Classifier with Bayesian and Grid Optimization,” in Proceedings of the IEEE 3rd International Conference on Cloud Computing and Big Data Analysis, Chengdu, pp. 547-551, 2018. 10.1109/ICCCBDA.2018.8386576