The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Ensemble based on Accuracy and Diversity Weighting for Evolving Data Streams

Ensemble classification is an actively researched paradigm that has received much attention due to increasing real- world applications. The crucial issue of ensemble learning is to construct a pool of base classifiers with accuracy and diversity. In this paper, unlike conventional data-streams oriented ensemble methods, we propose a novel Measure via both Accuracy and Diversity (MAD) instead of one of them to supervise ensemble learning. Based on MAD, a novel online ensemble method called Accuracy and Diversity weighted Ensemble (ADE) effectively handles concept drift in data streams. ADE mainly uses the following three steps to construct a concept-drift oriented ensemble: for the current data window, 1) a new base classifier is constructed based on the current concept when drift detect, 2) MAD is used to measure the performance of ensemble members, and 3) a newly built classifier replaces the worst base classifier. If the newly constructed classifier is the worst one, the replacement has not occurred. Comparing with the state-of-art algorithms, ADE exceeds the current best-related algorithm by 2.38% in average classification accuracy. Experimental results show that the proposed method can effectively adapt to different types of drifts.


[1] Aggarwal C., Data Streams: Models and Algorithms, Berlin: Springer-Verlag, 2007.

[2] Bifet A., Holmes G., and Kirkby R., and Pfahringer B., “MOA: Massive Online Analysis,” Journal of Machine Learning Research, vol. 11, no. 52, pp. 1601-1604, 2010.

[3] Brown G. and Kuncheva L., “‘Good’ and ‘Bad’ Diversity in Majority Vote Ensembles,” in Proceedings of International Workshop on Multiple Classifier Systems, Cairo, pp. 124-133, 2010.

[4] Brzezinski D. and Stefanowski J., “Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm,” IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 1, pp. 81-94, 2014.

[5] Brzezinski D., Stefanowski J., Susmaga R. and Szczech I., “On the Dynamics of Classification Measures for Imbalanced and Streaming Data,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 8, pp. 2868-2878, 2020.

[6] Cano A. and Krawczyk B., “Kappa Updated Ensemble for Drifting Data Stream Mining,” Machine Learning, vol. 109, no. 1, pp.175-218, 2020.

[7] Ditzler G., Roveri M., and Alippi C., and Polikar R., “Learning in Nonstationary Environments: A Survey,” IEEE Computational Intelligence Magazine, vol. 10, no. 4, pp. 12-25, 2015.

[8] Gama J., Knowledge Discovery from Data Streams, New York: CRC Press, 2010.

[9] Gama J., Žliobaitė I., Bifet A., Pechenizkiy M., and Bouchachia A., “A Survey on Concept Drift Adaptation,” ACM Computing Surveys, vol. 46, no. 4, pp. 231-238, 2014.

[10] Gomes H., Barddal J., and Enembreck F., and 100K200K300K400K500K 60 61 62 63 64 65 66 67 68 69 Accuracy (%) Processed instances ADE AUE2 ARF KUE 96 The International Arab Journal of Information Technology, Vol. 19, No. 1, January 2022 Bifet A., “A Survey on Ensemble Learning for Data Stream Classification,” ACM Computing Surveys, vol. 50, no. 2, pp.1-36, 2017.

[11] Gomes H., Bifet A., and Read J., Barddal J., Enembreck F., Pfharinger B., Holmes G., and Abdessalem T., “Adaptive Random Forests for Evolving Data Stream Classification,” Machine Learning, vol. 106, no. 9-10, pp. 1469- 1495, 2017.

[12] Gomes H., Read J., and Bifet A., Barddal J., and Gama J., “Machine Learning for Streaming Data: State of the Art, Challenges, and Opportunities,” ACM SIGKDD Explorations Newsletter, vol. 21, no. 2, pp. 6-22, 2019.

[13] Khamassi I., Sayed-Mouchaweh M., Hammami M., and Ghédira J., “Discussion and Review on Evolving Data Streams and Concept Drift Adapting,” Evolving Systems, vol. 9, no. 1, pp. 1- 23, 2018.

[14] Liu A., Lu J., and Zhang G., “Diverse Instance- Weighting Ensemble Based on Region Drift Disagreement for Concept Drift Adaptation,” EEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 293-307, 2021.

[15] Lu J., Liu A., Dong F., Gu F., Gama J., and Zhang G., “Learning under Concept Drift: A Review,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 12, pp. 2346-2363, 2019.

[16] Minku L. and Yao X., “DDD: A New Ensemble Approach for Dealing with Concept Drift,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 4, pp. 619-633, 2012.

[17] Morales G., Bifet A., Khan L., Gama J., and Fan W., “IoT Big Data Stream Mining,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, pp. 2119-2120, 2016.

[18] Pietruczuk L., Rutkowski L., Jaworski M., and Duda P., “How to Adjust an Ensemble Size in Stream Data Mining?” Information Sciences, vol. 381, pp. 46-54, 2017.

[19] Rijn J., Holmes G., and Pfahringer B., and Vanschoren J., “Having a Blast: Meta-Learning and Heterogeneous Ensembles for Data Streams,” in Proceedings of 15th International Conference on Data Mining, Atlantic City, pp. 1003-1008, 2015.

[20] Santos S., Jr P., and Silva G., and De Barros R., “Speeding Up Recovery from Concept Drifts,” in Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, Nancy, pp. 179-194, 2014.

[21] Street W. and Kim Y., “A Streaming Ensemble Algorithm (SEA) for Large-scale Classification,” in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, pp. 377-382, 2001.

[22] Sun B., Wang J., and Chen H., and Wang Y., “Diversity Measurement in Ensemble Learning,” Control and Decision, vol. 29, no. 3, pp. 385- 395, 2014.

[23] Sun Y., Tang K., and Zhu Z., and Yao X., “Concept Drift Adaptation by Exploiting Historical Knowledge,” IEEE Transactions on Neural Networks and Learning Systems, vol. 99, no. 10, pp. 1-11, 2017.

[24] Sun Y., Wang Z., and Yuan J., and Zhang W., “Tracking Recurring Concepts from Evolving Data Streams using Ensemble Method,” The International Arab Journal of Information Technology, vol. 16, no. 6, pp. 1044-1052, 2019.

[25] Wang H., Fan W., and Yu P., and Han J., “Mining Concept-drifting Data Streams Using Ensembles Classifiers,” in Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, pp. 226-235, 2003.

[26] Webb G., Hyde R., and Cao H., Nguyen H., and Petitjean F., “Characterizing Concept Drift,” Data Mining and Knowledge Discovery, vol. 30, no. 4, pp. 964-994, 2016. Yange Sun is an associate professor at Xinyang Normal University, Xinyang, China. She received the Ph.D. degree in computer science and technology in 2019 from Beijing Jiaotong University Beijing, China and the M.S. degree from the Central China Normal University, in 2007, both in computer science. Her research interests include data mining and machine learning. Han Shao is a lecture at Xinyang Normal University, Xinyang, China. His research interests include big data mining and machine learning. Bencai Zhang born in 1994, Master. His current research interests include data mining and machine learning.