The International Arab Journal of Information Technology (IAJIT)


Ensemble based on Accuracy and Diversity Weighting for Evolving Data Streams

Ensemble classification is an actively researched paradigm that has received much attention due to increasing real- world applications. The crucial issue of ensemble learning is to construct a pool of base classifiers with accuracy and diversity. In this paper, unlike conventional data-streams oriented ensemble methods, we propose a novel Measure via both Accuracy and Diversity (MAD) instead of one of them to supervise ensemble learning. Based on MAD, a novel online ensemble method called Accuracy and Diversity weighted Ensemble (ADE) effectively handles concept drift in data streams. ADE mainly uses the following three steps to construct a concept-drift oriented ensemble: for the current data window, 1) a new base classifier is constructed based on the current concept when drift detect, 2) MAD is used to measure the performance of ensemble members, and 3) a newly built classifier replaces the worst base classifier. If the newly constructed classifier is the worst one, the replacement has not occurred. Comparing with the state-of-art algorithms, ADE exceeds the current best-related algorithm by 2.38% in average classification accuracy. Experimental results show that the proposed method can effectively adapt to different types of drifts.

[1] Aggarwal C., Data Streams: Models and Algorithms, Berlin: Springer-Verlag, 2007.

[2] Bifet A., Holmes G., and Kirkby R., and Pfahringer B., “MOA: Massive Online Analysis,” Journal of Machine Learning Research, vol. 11, no. 52, pp. 1601-1604, 2010.

[3] Brown G. and Kuncheva L., “‘Good’ and ‘Bad’ Diversity in Majority Vote Ensembles,” in Proceedings of International Workshop on Multiple Classifier Systems, Cairo, pp. 124-133, 2010.

[4] Brzezinski D. and Stefanowski J., “Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm,” IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 1, pp. 81-94, 2014.

[5] Brzezinski D., Stefanowski J., Susmaga R. and Szczech I., “On the Dynamics of Classification Measures for Imbalanced and Streaming Data,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 8, pp. 2868-2878, 2020.

[6] Cano A. and Krawczyk B., “Kappa Updated Ensemble for Drifting Data Stream Mining,” Machine Learning, vol. 109, no. 1, pp.175-218, 2020.

[7] Ditzler G., Roveri M., and Alippi C., and Polikar R., “Learning in Nonstationary Environments: A Survey,” IEEE Computational Intelligence Magazine, vol. 10, no. 4, pp. 12-25, 2015.

[8] Gama J., Knowledge Discovery from Data Streams, New York: CRC Press, 2010.

[9] Gama J., ┼Żliobait─Ś I., Bifet A., Pechenizkiy M., and Bouchachia A., “A Survey on Concept Drift Adaptation,” ACM Computing Surveys, vol. 46, no. 4, pp. 231-238, 2014.

[10] Gomes H., Barddal J., and Enembreck F., and 1Bifet A., “A Survey on Ensemble Learning for Data Stream Classification,” ACM Computing Surveys, vol. 50, no. 2, pp.1-36, 2017.

[11] Gomes H., Bifet A., and Read J., Barddal J., Enembreck F., Pfharinger B., Holmes G., and Abdessalem T., “Adaptive Random Forests for Evolving Data Stream Classification,” Machine Learning, vol. 106, no. 9-10, pp. 1469- 1495, 2017.

[12] Gomes H., Read J., and Bifet A., Barddal J., and Gama J., “Machine Learning for Streaming Data: State of the Art, Challenges, and Opportunities,” ACM SIGKDD Explorations Newsletter, vol. 21, no. 2, pp. 6-22, 2019.

[13] Khamassi I., Sayed-Mouchaweh M., Hammami M., and Ghédira J., “Discussion and Review on Evolving Data Streams and Concept Drift Adapting,” Evolving Systems, vol. 9, no. 1, pp. 1- 23, 2018.

[14] Liu A., Lu J., and Zhang G., “Diverse Instance- Weighting Ensemble Based on Region Drift Disagreement for Concept Drift Adaptation,” EEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 293-307, 2021.

[15] Lu J., Liu A., Dong F., Gu F., Gama J., and Zhang G., “Learning under Concept Drift: A Review,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 12, pp. 2346-2363, 2019.

[16] Minku L. and Yao X., “DDD: A New Ensemble Approach for Dealing with Concept Drift,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 4, pp. 619-633, 2012.

[17] Morales G., Bifet A., Khan L., Gama J., and Fan W., “IoT Big Data Stream Mining,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, pp. 2119-2120, 2016.

[18] Pietruczuk L., Rutkowski L., Jaworski M., and Duda P., “How to Adjust an Ensemble Size in Stream Data Mining?” Information Sciences, vol. 381, pp. 46-54, 2017.

[19] Rijn J., Holmes G., and Pfahringer B., and Vanschoren J., “Having a Blast: Meta-Learning and Heterogeneous Ensembles for Data Streams,” in Proceedings of 15th International Conference on Data Mining, Atlantic City, pp. 1003-1008, 2015.

[20] Santos S., Jr P., and Silva G., and De Barros R., “Speeding Up Recovery from Concept Drifts,” in Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, Nancy, pp. 179-194, 2014.

[21] Street W. and Kim Y., “A Streaming Ensemble Algorithm (SEA) for Large-scale Classification,” in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, pp. 377-382, 2001.

[22] Sun B., Wang J., and Chen H., and Wang Y., “Diversity Measurement in Ensemble Learning,” Control and Decision, vol. 29, no. 3, pp. 385- 395, 2014.

[23] Sun Y., Tang K., and Zhu Z., and Yao X., “Concept Drift Adaptation by Exploiting Historical Knowledge,” IEEE Transactions on Neural Networks and Learning Systems, vol. 99, no. 10, pp. 1-11, 2017.

[24] Sun Y., Wang Z., and Yuan J., and Zhang W., “Tracking Recurring Concepts from Evolving Data Streams using Ensemble Method,” The International Arab Journal of Information Technology, vol. 16, no. 6, pp. 1044-1052, 2019.

[25] Wang H., Fan W., and Yu P., and Han J., “Mining Concept-drifting Data Streams Using Ensembles Classifiers,” in Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, pp. 226-235, 2003.

[26] Webb G., Hyde R., and Cao H., Nguyen H., and Petitjean F., “Characterizing Concept Drift,” Data Mining and Knowledge Discovery, vol. 30, no. 4, pp. 964-994, 2016. Y