The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Tracking Recurring Concepts from Evolving Data Streams using Ensemble Method

Ensemble models are the most widely used methods for classifying evolving data stream. However, most of the existing data stream ensemble classification algorithms do not consider the issue of recurring concepts, which commonly exist in real-world applications. Motivated by this challenge, an Ensemble with internal Change Detection (ECD) was proposed to enhance performance by exploring the recurring concepts. It is done by maintaining a pool of classifiers, which dynamically adds and removes classifiers in response to the change detector. The algorithm adopts a two window change detection model, which adopts the Jensen-Shannon divergence to measure the distance of the distributions between old and recent data. When a change is detected, the repository of stored historical concepts is checked for reuse. Experimental results on both synthetic and real-world data streams demonstrate that the proposed algorithm not only outperforms the state-of-art methods on standard evaluation metrics, but also adapts well in different types of concept drift scenarios especially when concept s reappear.


[1] Abad M., Gomes J., and Menasalvas E., “Predicting Recurring Concepts on Data-Streams By Means of A Meta-Model And A Fuzzy Similarity Function,” Expert Systems with Applications, vol. 46, no. 1, pp. 87-105, 2016.

[2] Baena-García M., Campo-Ávila D., and Fidalgo R., “Early Drift Detection Method,” in Proceedings of the 4th International Workshop on Knowledge Discovery from Data Streams, New York, pp. 77-86, 2006.

[3] Bifet A. and Gavalda R., “Learning from Time- Changing Data with Adaptive Windowing,” in Proceedings of the 7th SIAM International Conference on Data Mining, Minneapolis, pp. 443-448, 2007.

[4] Bifet A., Holmes G., Kirkby R., and Pfahringer B., “MOA: Massive Online Analysis,” Journal of Machine Learning Research, vol. 11, pp. 1601-1604, 2010.

[5] Brzezinski D. and Stefanowski J., “Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm,” IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 1, pp. 81-94, 2014.

[6] Cohen E. and Strauss M., “Maintaining Time- Decaying Stream Aggregates,” Journal of Algorithms, vol. 59, no. 1, pp. 19-36, 2006.

[7] Dasu T., Krishnan S., Venkatasubramanian S., and Yi K., “An Information-Theoretic Approach To Detecting Changes in Multi-Dimensional Data Streams,” in Proceedings of the 38th 020040060080010001200140040 45 50 55 60 65 70 75 80 85 A c cu ra cy (% ) Processed Instances ARF HT RCD ECD Tracking Recurring Concepts from Evolving Data Streams using Ensemble Method 1051 Symposium on the Interface of Statistics, Computing Science, and Applications, Pasadena, pp. 1-23, 2006.

[8] Ditzler G., Roveri M., Alippi C., and Polikar R., “Learning in Nonstationary Environments: A Survey,” IEEE Computational Intelligence Magazine, vol. 10, no. 4, pp. 12-25, 2015.

[9] Domingos P. and Hulten G., “Mining High-Speed Data Streams,” in Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, pp. 71-80, 2000.

[10] Dries A. and Ruckert U., “Adaptive Concept Drift Detection,” Statistical Analysis and Data Mining, vol. 2, no. 5-6, pp. 311-327, 2009.

[11] Gama J., Knowledge Discovery from Data Streams, CRC Press, 2010.

[12] Gama J., Medas P., Castillo G., and Rodrigues P., “Learning with Drift Detection,” in Proceedings of the 17th Brazilian Symposium on Artificial Intelligence, São Luís, pp. 286-295, 2004.

[13] Gama J., Žliobaitė I., Bifet A., Pechenizkiy M., and Bouchachia A., “A Survey on Concept Drift Adaptation,” ACM Computing Surveys, vol. 46, no. 4, pp. 231-238, 2014.

[14] Gao J., Fan W., Han J., and Yu P., “A general Framework for Mining Concept-Drifting Data Streams with Skewed Distributions,” in Proceedings of the 7th SIAM International Conference on Data Mining, Minneapolis, Minnesota, pp. 3-14, 2007.

[15] Gomes H., Bifet A., Read J., Barddal J., Enembreck F., Pfharinger B., and Holmes G., “Adaptive Random Forests for Evolving Data Stream Classification,” Machine Learning, vol. 106, no. 9-10, pp. 1469-1495, 2017.

[16] Gonçalves P. and Barros R., “RCD: A Recurring Concept Drift Framework,” Pattern Recognition Letters, vol. 34, no. 9, pp. 1018-1025, 2013.

[17] Kifer D., Ben-David S., and Gehrke J., “Detecting Change in Data Streams,” in Proceedings of the 30th International Conference on Very Large Data Bases, Toronto, pp. 180-191, 2004.

[18] Klinkenberg R., “Learning Drifting Concepts: Example Selection vs. Example Weighting,” Intelligent Data Analysis, vol. 8, no. 3, pp. 281- 300, 2004.

[19] Kolter J. and Maloof M., “Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts,” Journal of Machine Learning Research, vol. 8, pp. 2755-2790, 2007.

[20] Kullback S. and Leibler R., “On information and Sufficiency,” The Annals of Mathematical Statistics. , vol. 22, no. 1, pp. 79-86, 1951.

[21] Lin J., “Divergence Measures Based on The Shannon Entropy,” IEEE Transactions on Information Theory, vol. 37, no. 1, pp. 145-151, 1991.

[22] Nishida K., Yamauchi K., and Omori T., “ACE: Adaptive Classifiers-Ensemble System for Concept-Drifting Environments,” in Proceedings of the 6th International Workshop Multiple Classifier Systems, Seaside, pp. 176-185, 2005.

[23] Nishida K. and Yamauchi K., “Detecting Concept Drift Using Statistical Testing,” in Proceedings of the 10th International Conference on Discovery Science, Sendai, pp. 264-269, 2007.

[24] Ramamurthy S. and Bhatnagar R., “Tracking Recurrent Concept Drift in Streaming Data Using Ensemble Classifiers,” in Proceedings of the 6th International Conference on Machine Learning and Applications, Cincinnati, pp. 404- 409, 2007.

[25] Ross G., Adams N., Tasoulis D., and Hand D., “Exponentially Weighted Moving Average Charts for Detecting Concept Drift,” Pattern Recognition Letters, vol. 33, no. 2, pp. 191-198, 2012.

[26] Sidhu P. and Bhatia M., “Online Approach to Handle Concept Drifting Data Streams Using Diversity,” The International Arab Journal of Information Technology, vol. 14, no. 3, pp. 293- 299, 2017.

[27] Street W. and Kim Y., “A Streaming Ensemble Algorithm for Large-Scale Classification,” in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, pp. 377-382, 2001.

[28] Sun Y., Wang Z., Bai Y., Dai H., and Nahavandi S., “A Classifier Graph Based Recurring Concept Detection and Prediction Approach,” Computational Intelligence and Neuroscience, vol. 2018, pp. 1-13, 2018.

[29] Tsymbal A., the Problem of Concept Drift: Definitions and Related Work, Technical Report, Department of Computer Science, Trinity College, 2004.

[30] Wang H., Fan W., Yu P., and Han J., “Mining Concept-Drifting Data Streams Using Ensembles Classifiers,” in Proceedings of 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, pp. 226-235, 2003.

[31] Webb G., Hyde R., Cao H., Nguyen H., and Petitjean F., “Characterizing Concept Drift,” Data Mining and Knowledge Discovery, vol. 30, no. 4, pp. 964-994, 2016.

[32] Widmer G., “Tracking Context Changes Through Meta-Learning,” Machine Learning, vol. 27, no. 3, pp. 259-286, 1997.

[33] Widmer G. and Kubat M., “Learning in the Presence of Concept Drift and Hidden Contexts,” Machine Learning, vol. 23, no. 1, pp. 69-101, 1996. 1052 The International Arab Journal of Information Technology, Vol. 16, No. 6, November 2019

[34] Zliobaite I., Pechenizkiy M., and Gama J., Big Data Analysis: New Algorithms for a New Society, Studies in Big Data, Springer, 2016. Yange Sun received the M.S. degree from the Central China Normal University, in 2007, both in computer science. She is currently pursuing thePh.D. degree with the Department of Electrical andComputer Engineering, Beijing Jiaotong University. Her research interests include data mining and machine learning. Zhihai Wang received the Doctor’s Degree in Computer Application from Hefei University of Technology in 1998. He is now a Professor in School of Computer and Information Technology, Beijing Jiaotong University. He has published dozens of papers in international conferences and journals. His research interest includes data mining and artificial intelligence. Jidong Yuan received the M.S. degree and Ph.D. degree in Computer Science and Technology from Beijing Jiaotong University, in 2012 and 2016, respectively. He is currently a lecturer in the School of Computer and Information Technology, Beijing Jiaotong University. His research interests include data miningand pattern recognition. Wei Zhang received the M.S. degree in Computational Mathematics from Guilin University of Electronic Technology, in 2015. He is currently a Ph.D. candidate in School of Computer and Information Technology, Beijing Jiaotong University. His research interests include machine learning and data mining.