Downloads 569

..............................

Views 1k

..............................

Cited by 41

..............................

Received date July 31, 2018

Accepted date December12, 2019

Enhanced Bagging (eBagging): A Novel Approach for Ensemble Learning

Author Goksu Tuysuzoglu1 and Derya Birant2 1Graduate School of Natural and Applied Sciences, Dokuz Eylul University, Turkey 2Department of Computer Engineering, Dokuz Eylul University, Turkey,

Keywords #Bagging #boosting #classification algorithms #machine learning #random forest #supervised learning

Abstract Bagging is one of the well-known ensemble learning methods, which combines several classifiers trained on different subsamples of the dataset. However, a drawback of bagging is its random selection, where the classification performance depends on chance to choose a suitable subset of training objects. This paper proposes a novel modified version of bagging, named enhanced Bagging (eBagging), which uses a new mechanism (error-based bootstrapping) when constructing training sets in order to cope with this problem. In the experimental setting, the proposed eBagging technique was tested on 33 well-known benchmark datasets and compared with both bagging, random forest and boosting techniques using well-known classification algorithms: Support Vector Machines (SVM), decision trees (C4.5), k-Nearest Neighbour (kNN) and Naive Bayes (NB). The results show that eBagging outperforms its counterparts by classifying the data points more accurately while reducing the training error.

References

[1] Bauer E. and Kohavi R., “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants,” Machine learning, vol. 36, no. 1, pp. 105-139, 1999.

[2] Bifet A., Holmes G., and Pfahringer B., “Leveraging Bagging for Evolving Data Streams,” in Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases, Berlin, pp. 135-150, 2010.

[3] Błaszczyński J. and Stefanowski J., “Actively Balanced Bagging for Imbalanced Data,” in Proceedings of International Symposium on Methodologies for Intelligent Systems, Cham, pp. 271-281, 2017.

[4] Błaszczyński J., Stefanowski J. and Idkowiak Ł., “Extending Bagging for Imbalanced Data,” in Proceedings of the 8th International Conference on Computer Recognition Systems, Heidelberg, pp. 269-278, 2013.

[5] Breiman L., “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.

[6] Bryll R., Gutierrez-Osuna R. and Quek F., “Attribute Bagging: Improving Accuracy of Classifier Ensembles by Using Random Feature Subsets, ” Pattern recognition, vol. 36, no. 6, pp. 1291-1302, 2003.

[7] Bühlmann P. L. and Yu B., “Explaining Bagging,” Technical Report, 2000.

[8] Bühlmann P. L., “Bagging, Subagging and Bragging for Improving Some Prediction Algorithms,” Research report, 2003.

[9] Chung D. and Kim H., “Accurate Ensemble Pruning with PL-Bagging,” Computational Statistics and Data Analysis, vol. 83, pp. 1-13, 2015.

[10] Croux C., Joossens K., and Lemmens A., “Trimmed bagging,” Computational Statistics and Data Analysis, vol. 52, no. 1, pp. 362-368, 2007.

[11] Datta S., Pihur V. and Datta S., “An Adaptive Optimal Ensemble Classifier Via Bagging and Rank Aggregation with Applications to High Dimensional Data,” BMC bioinformatics, vol. 11, no. 1, pp. 1-11, 2015.

[12] Derbeko P., El-Yaniv R., and Meir R., “Variance Optimized Bagging,” in Proceedings of European Conference on Machine Learning, Enhanced Bagging (eBagging): A Novel Approach for Ensemble Learning 527 Berlin, pp. 60-72, 2002.

[13] Dettling M., “Bagboosting for Tumor Classification with Gene Expression Data,” Bioinformatics, vol. 20, no. 18, pp. 3583- 359, 2004.

[14] Dexun J., Peijun M., Xiaohong S. and Tiantian W., “Distance Metric Based Divergent Change Bad Smell Detection and Refactoring Scheme Analysis,” International Journal of Innovative Computing, Information and Control, vol. 10, no. 1, pp. 1519-1531, 2014.

[15] Estruch V., Ferri C., Hernández-Orallo J., and Ramírez-Quintana M., “Bagging Decision Multi- Trees, Multiple Classifier Systems,” in Proceedings of International Workshop on Multiple Classifier Systems, Berlin, pp. 41-51, 2004.

[16] Hieu P. and Olafsson S., “Bagged Ensembles with Tunable Parameters,” Computational Intelligence, vol. 35, no. 1, pp. 184-203, 2019.

[17] Hothorn T. and Lausen B., “Double-Bagging: Combining Classifiers by Bootstrap Aggregation,” Pattern Recognition, vol. 36, no. 6, pp. 1303-1309, 2003.

[18] Jiang Y., Ling J., Li G., Dai H., and Zhou Z., “Dependency Bagging,” in Proceedings of International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing, Berlin, pp. 491-500, 2005.

[19] Jorge A. and Azevedo P., “An Experiment with Association Rules and Classification: Post- Bagging and Conviction,” in Proceedings of International Conference on Discovery Science, Berlin, pp. 137-149, 2005.

[20] Jurek A. et al., “A Survey of Commonly Used Ensemble-Based Classification Techniques,” The Knowledge Engineering Review, vol. 29, no. 5, pp. 551-581, 2014.

[21] Kilimci Z. and Omurca S., “Enhancement of The Heuristic Optimization Based Extended Space Forests with Classifier Ensembles,” The International Arab Journal of Information Technology, vol. 17, no. 2, pp. 188-195, 2020.

[22] Lichman M., “UCI Machine Learning Repository

[http://archive.ics.uci.edu/ml],” Irvine, CA: University of California, School of Information and Computer Science, Last Visited 2013.

[23] Louzada F., Anacleto-Junior O., Candolo C., and Mazucheli J., “Poly-Bagging Predictors for Classification Modelling for Credit Scoring,” Expert Systems with Applications, vol. 38, no. 10, pp. 12717-12720, 2011.

[24] Mohamed H., Negm A., Zahran M., and Saavedra O., “Assessment of Ensemble Classifiers Using the Bagging Technique for Improved Land Cover Classification of multispectral Satellite Images,” The International Arab Journal of Information Technology, vol. 15, no. 2, pp. 270-277, 2018.

[25] Quinlan J., “Bagging, Boosting, and C4.5,” in Proceedings of the 13th National Conference on Artificial Intelligence, Portland, pp. 725-730, 1996.

[26] Shmulevich I. and Dougherty D., Genomic Signal Processing, Princeton University Press, 2007.

[27] Skurichina M. and Duin R., “Bagging for Linear Classifiers,” Pattern Recognition, vol. 31, no. 7, pp. 909-930, 1998.

[28] Terabe M., Washio T., and Motoda H., “The Effect of Subsampling Rate on S3bagging Performance,” in Proceedings of 4th International Conference Advances in Intelligent Data Analysis, Portugal, 2001.

[29] Wang G., Ma J. and Yang S., “IGF-Bagging: Information Gain Based Feature Selection for Bagging,” The International Journal of Innovative Computing, Information and Control, vol. 7, no. 11, pp. 6247-6259, 2011.

[30] Wang G., Sun J., Ma J., Xu K. and Gu J., “Sentiment Classification: The Contribution of Ensemble Learning,” Decision Support Systems, vol. 57, pp. 77-93, 2014.

[31] Wang Y. and Lin C., “Learning by Bagging And Adaboost Based on Support Vector Machine,” in Proceedings of 5th IEEE International Industrial Informatics Conference, Vienna, pp. 663-668, 2007.

[32] Witten L., Frank E., Hall M., and Pal C., Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2016.

[33] Xiaoyuan S., Taghi M., and Xingquan Z., “Vob Predictors: Voting on Bagging Classifications,” in Proceedings of 19th International Conference on Pattern Recognition, Tampa, pp. 1-4, 2008.

[34] Xie Z., Xu Y., Hu Q., and Zhu P., “Margin Distribution Based Bagging Pruning,” Neurocomputing, vol. 85, pp. 11-19, 2012.

[35] Zaidi N ., Metric Learning and Scale Estimation in High Dimensional Machine Learning Problems with An Application to Generic Object Recognition, Thesis, Monash University, 2011.

[36] Zaman F. and Hirose H., “A Robust Bagging Method Using Median As A Combination Rule,” in Proceedings of Computer and Information Technology Workshops, Sydney, pp. 55-60, 2008.

[37] Zeng X., Chao S., and Wong D., “Optimization of Bagging Classifiers Based on SBCB Algorithm,” in Proceedings of International Conference on Machine Learning and Cybernetics, Qingdao, pp. 262-267, 2010.

[38] Zhou Z. and Yu Y., “Adapt Bagging to Nearest Neighbor Classifiers,” Journal of Computer Science and Technology, vol. 20, no.1, pp. 48- 54, 2005. 528 The International Arab Journal of Information Technology, Vol. 17, No. 4, July 2020 Goksu Tuysuzoglu received her BS in Information Systems Engineering in 2013 at Dogus University, Turkey. In the same year, she also received her double major in Industrial Engineering. Then, she received her MS in the department of computer engineering from Istanbul Technical University, Turkey in 2016. At the same time, she worked as a research and teaching assistant in there between the years 2014 and 2016. She is currently a PhD student in the department of computer engineering at Dokuz Eylul University, Turkey. She has also been working as a research and teaching assistant in the same department since 2016. She has BS graduation awards with ranking 1st in the Department and ranking 3rd in the Faculty and University. Her research interests include data mining and machine learning. Derya Birant received her B.S., M.S. and Ph.D. degrees in Computer Engineering from Dokuz Eylul University, Turkey in 2000, 2002 and 2006 respectively. Since 2017 she has been an Associate Professor at the Computer Engineering Department of Dokuz Eylul University. She is the vice-chair of the Computer Engineering Department. She was a Visiting Lecturer at the South East European University in 2006 and Ege University between 2010 and 2012. She is the author of 5 book chapters and more than 70 publications (i.e. journal articles, conference papers). Dr. Birant has also served as an organizing committee member in several conferences. She has been involved in more than 20 long-term interdisciplinary R&D projects. Dr. Birant has several Most Downloaded Article certifications and has graduation awards, ranking 2nd in the Faculty and the Department. She was also the recipient of the Outstanding Achievement Award in 2010.