Downloads 614

..............................

Views 1k

..............................

Cited by

..............................

Received date October 13, 2020

Accepted date December 13, 2021

MiNB: Minority Sensitive Na ve Bayesian Algorithm for Multi-Class Classification of

Author Unbalanced Data, Harikrishna Jethva,

Keywords #Imbalanced data learning #weighted naïve bayesian #cost-sensitive learning #multi-class unbalanced data

Abstract The unbalanced nature of data makes it tough to achieve the desire performance goal for classification algorithms. The sub-optimal prediction system isn't a viable solution due to the high misclassification cost of minority events. Thus accurate imbalanced data classification could be a path changer for prediction in domains like medical diagnosis, judiciary, and disaster management systems. To date, most of the existing studies of imbalanced data are for the binary class dataset and supported by data sampling techniques that suffer from loss of information and over-fitting. In this paper, we present the modified naïve Bayesian algorithm for unbalanced data classification that eliminates the requirement of data level sampling. We compared our proposed model with the data sampling technique and cost-sensitive techniques. We use minority sensitive TP Rate, class-specific misclassification rate, and overall performance parameters such as accuracy, f-measure and G-mean. The result shows that our proposed algorithm shows a more optimal result for unbalanced data classification. Results shows reduction in misclassification rate and improve predictive performance for the minority class.

References

[1] Al-Qerem A., Al-Naymat G., Alhasan M., and Al- Debei M., “Default Prediction Model: the Significant Role of Data Engineering in the Quality of Outcomes,” The International Arab Journal of Information Technology, vol. 17. no. 4A, pp. 635-44, 2020.

[2] Barot P. and Jethva H.,“Statistical Study to Prove Importance of Causal Relationship Extraction in Rare Class Classi fi Cation,” in Processdings of The International Conference on Information and Communication Technology for Intelligent Systems, Ahmedabad, pp. 416-425, 2017.

[3] Bashir K., Li T., and Yahaya M., “A Novel Feature Selection Method Based on Maximum Likelihood Logistic Regression for Imbalanced Learning in Software Defect Prediction,” The International Arab Journal of Information Technology, vol. 17, no. 5. pp. 721-730, 2020.

[4] Braytee A., liu W., and Kennedy P., “A Cost- Sensitive Learning Strategy for Feature Extraction from Imbalanced Data,” in Processdings of 23rd International Conference on Neural Information Processing, Kyoto, pp. 78-86, 2016.

[5] Chaabane I., Guermazi R., and Hammami M., “Enhancing Techniques for Learning Decision Trees from Imbalanced Data,”Advances in Data Analysis and Classification, vol. 14, no. 3, pp. 677-745, 2020.

[6] Chomboon K., Kerdprasop K., and Kerdprasop N., “Rare Class Discovery Techniques for Highly Imbalanced Data,” in Proceedings of the International Multi Conference of Engineers and Computer Scientists, Hong Kong, pp. 269-272, 2013.

[7] Chawla N., Bowyer K., Hall L., and Kegelmeyer W., “SMOTE: Synthetic Minority Over-Sampling Technique,” The Journal of Artificial Intelligence Research, vol. 16, pp. 321-57, 2002.

[8] Cieslak D. and Chawla N., “Learning Decision Trees for Unbalanced Data,” in Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Antwerp, pp. 241-256, 2008. MiNB: Minority Sensitive Naïve Bayesian Algorithm for Multi-Class ... 615

[9] Garc´ıa S. and Herrera F., “Evolutionary Under- Sampling for Classification with Imbalanced Data Sets: Proposals and Taxonomy,” Evolutionary Computation, vol. 17, no. 3, pp. 275-306, 2008.

[10] Herna L., Agrawal A., Viktor H., and Paquet E., “SCUT : Multi-Class Imbalanced Data Classification using SMOTE and SCUT : Multi- Class Imbalanced Data Classification using SMOTE and Cluster-based Undersampling,” in Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management-KDIR, Lisbon, pp. 226-234, 2015.

[11] Jiang K., Lu J., and Xia K., “A Novel Algorithm for Imbalance Data Classification Based on Genetic Algorithm Improved SMOTE,” Arabian Journal for Science and Engineering, vol. 41, no. 8, pp. 3255-66, 2016.

[12] Kong G., Jiang L., and Li C., “Beyond Accuracy: Learning Selective Bayesian Classifiers with Minimal,” Pattern Recognition Letters, vol. 80, pp. 165-71, 2016.

[13] Kotsiantis S., Kanellopoulos D., and Pintelas P., “Handling Imbalanced Datasets : A Review,” GESTS International Transactions on Computer Science and Engineering, vol. 30, no. 1, pp. 25-36, 2006.

[14] Labatut V. and Cherifi H., “Accuracy Measures for the Comparison of Classifiers,” in Proceedings of 5th International Conference on Information Technology, Amman, pp. 1-5, 2012.

[15] Lee J., “AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification,” IEEE Access, vol. 7, pp.106034-106042, 2019.

[16] Muchlinski D., Siroky D., He J., and Kocher M., Comparing Random Forest with Logistic Regression for Predicting Class-Imbalanced Civil War Onset Data,” Political Analysis, vol. 24, no. 1, pp. 87-103, 2016.

[17] Mujalli R., López G., and Garach L., “Bayes Classifiers for Imbalanced Traffic Accidents Datasets,” Accident Analysis and Prevention, vol. 88, pp. 37-51, 2016.

[18] Patel H. and Thakur G., “Improved Fuzzy- Optimally Weighted Nearest Neighbor Strategy to Classify Imbalanced Data,” International Journal of Intelligent Engineering and Systems vol. 10, no. 3, pp.156-162, 2016.

[19] Ratanamahatana C. and Gunopulos D., “Scaling up the Naive Bayesian Classifier: Using Decision Trees for Feature Selection,” in Proceedings of Work Data Clean Preprocessing (DCAP 2002), IEEE Intrenational Confrence Data Min, pp. 613- 623, 2002.

[20] Sáez J., Luengo J., Stefanowski J., and Herrera F., “SMOTE-IPF: Addressing the Noisy and Borderline Examples Problem in Imbalanced Classification By A Re-Sampling Method with Filtering,” Information Sciences, vol. 291, pp. 184-203, 2015.

[21] Stefanowski J., Dealing with Data Difficulty Factors While Learning from Imbalanced Data, Springer, 2016.

[22] Sun Z., Song Q., Zhu X., Sun H., Xu B., and Zhou Y., “A Novel Ensemble Method for Classifying Imbalanced Data,”Pattern Recognit, vol. 48, no. 5, pp. 1623-37, 2015.

[23] Taheri S., Yearwood J., Mammadov M., and Seifollahi S., “Attribute Weighted Naive Bayes Classifier Using a Local Optimization,” Neural Computing and Applications, vol. 24, pp. 995- 1002, 2014.

[24] Triguero I., Del Río S., López V., Bacardit J., Benítez J., and Herrera F., “ROSEFW-RF: The Winner Algorithm for the ECBDL’14 big data Competition: An Extremely Imbalanced Big Data Bioinformatics Problem,” Knowledge-Based Systems, vol. 87, pp. 69-79, 2015.

[25] Trisanto D., Rismawati N., Mulya M., and Kurniadi F., “Effectiveness Undersampling Method and Feature Reduction in Credit Card Fraud Detection,” International Journal of Intelligent Engineering and Systems, vol. 13, no. 2, pp. 173-81, 2020.

[26] Tuysuzoglu G. and Birant D., “Enhanced Bagging (eBagging): A Novel Approach for Ensemble Learning,” The International Arab Journal of Information Technology, vol. 17, no. 4, pp. 635- 44, 2020

[27] Vluymans S., Triguero I., Cornelis C., and Saeys Y., “EPRENNID: An Evolutionary Prototype Reduction Based Ensemble for Nearest Neighbor Classification of Imbalanced Data, Neurocomputing, vol. 2016, pp. 596-610, 2016.

[28] Vural M. and Gok M., “Criminal Prediction Using Naive Bayes Theorym” Neural Computing and Applications, vol. 28, pp. 2581-2592, 2017.

[29] Wan C., “Test-Cost Sensitive Classification on Data with Missing Values in the Limited Time,” in Proceedings of the International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, Cardiff, pp. 501-510, 2010.

[30] Weiss G., McCarthy K., and Zabar B., “Cost- Sensitive Learning Vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs?” in Proceedings of the International Conference on Data Mining, Las Vegas, pp. 1-7, 2007.

[31] Zaidi N., Cerquides J., Carman M., and Webb G., “Alleviating Naive Bayes Attribute Independence Assumption by Attribute Weighting,”Journal of Machine Learning Research, vol. 14 pp. 1947- 1988, 2013.

[32] Zhang D., Ma J., Yi J., Niu X., and Xu X., “An Ensemble Method for Unbalanced Sentiment Classification,”in Proceedings of the 11th 616 The International Arab Journal of Information Technology, Vol. 19, No. 4, July 2022 International Conference on Natural Computation, Zhangjiajie, pp. 440-445, 2016. Pratikkumar Barot received the B.E. degree From H.N.G.U and M.E. in computer engineering from Gujarat Technological University, India. He did his Ph.D. from Gujarat Technological University, India. His research interests include unbalanced data classification, machine learning, data mining, AI, data science and algorithm design. Harikrishna Jethva currently works at the Head of Department, Department of Computer Engineering, Government Engineering College, Patan, Gujarat, India, His research interest in Machine Learning, Neural Network, Theory of Computation, Compiler Design, Soft Computing & Algorithms. He is Ph. D. Guide in Gujarat Technological University. In addition, he is a Board of Study member in many universities.