..............................
..............................
..............................
Online Approach to Handle Concept Drifting Data Streams using Diversity
Concept drift is the trend observed in almost all real time applications. Many online and offline algorithms were
developed in the past to analyze this drift and train our algorithms. Different levels of diversity are required before and after a
drift to get the best generalization accuracy. In our paper, we present a new online approach Extended Dynamic Weighted
Majority with diversity (EDWM) to handle various types of drifts from slow gradual to abrupt drifts. Our approach is based
on the Weighted Majority(WM) vote of the ensembles containing different diversity levels. Experiments on the various
artificial and real datasets proved that our proposed ensemble approach learns drifting concepts better than the existing
online approaches in a resource constrained environment.
[1] Baena-Garc a M., Campo-Avila J., Fidalgo R., Bifet A., Early Drift Detection Method, in Proceeding of 4th International Workshop on Knowledge Discovery from Data Streams, pp. 77-86, 2006.
[2] Bifet A., Holmes G., Kirkby R., and Pfahringer B., MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering, in Proceeding of Workshop on Applications of Pattern Analysis, Windsor, pp. 44-51, 2010.
[3] Blum A., Empirical Support for Winnow And Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain, Machine Learning, vol. 26, no. 1, pp. 5-23, 1997.
[4] Dawid A. and Vovk V., Prequential Probability: Principles and Proper Ties, Bernoulli, vol. 5, no. 1, pp. 125-162, 1999.
[5] Dietterich T., Machine Learning Research: Four Current Directions, Artificial Intelligence, vol. 18, no. 4, pp. 97-136, 1997.
[6] Gama J., Medas P., Castillo G., and Rodrigues P., Learning with Drift Detection, in Proceeding Brazilian Symposium on Artificial Intelligence, Sao Luis, pp. 286-295, 2004.
[7] Gao J., Fan W., and Han J., On Appropriate Assumptions to Mine Data Streams: Analysis and Practice, in Proceeding of IEEE International Conference on Data Mining, Omaha, pp. 143-152, 2007.
[8] Harries M., Splice-2 Comparative Evaluation: Electricity Pricing, Technical Report, University of New South Wales, 1999.
[9] Hulten G., Spencer L., and Domingos P., Mining Time-Changing Data Streams, in Proceeding of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, pp. 97-106, 2001.
[10] Kolter J. and Maloof M., Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift, in Proceeding of the 3rd IEEE International Conference on Data Mining, Melbourne, pp. 123-130, 2003.
[11] Kolter J. and Maloof M., Using Additive Expert Ensembles To Cope With Concept Drift, in Proceeding of the Twenty Second ACM International Conference on Machine Learning, Bonn, pp. 449-456, 2005.
[12] Kolter J. and Maloof M., Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts, The Journal of Machine Learning Research, vol. 8, pp. 2755-2790, 2007.
[13] Littlestone N. and Warmuth M., The Weighted Majority Algorithm, Information and Computation, vol. 108, no. 2, pp. 212-261, 1994.
[14] Mansoori M., Zakaria O., and Gani A., Improving Exposure of Intrusion Deception System through Implementation of Hybrid Honeypot, The International Arab Journal of Information Technology, vol. 9, no. 5, pp. 436- 444, 2012.
[15] Minku L., White A., and Yao X., The Impact of Diversity on On-Line Ensemble Learning in the Presence of Concept Drift, IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 5, pp. 730-742, 2010.
[16] Minku L. and Yao X., DDD: A New Ensemble Approach for Dealing with Concept Drift, IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 4, pp. 619-633, 2012.
[17] Nishida K., Learning and Detecting Concept Drift, PhD Dissertation, Hokkaido University, 2008.
[18] Nishida K. and Yamauchi K., Adaptive Classifiers-Ensemble System For Tracking Concept Drift, in Proceeding of the 6th International Conference on Machine Learning and Cybernetics, Honk Kong, pp. 3607-3612, 2007.
[19] Nishida K. and Yamauchi K., Detecting Concept Drift Using Statistical Testing, in Proceeding of the 10th International Conference on Discovery Science, Sendai, pp. 264-269, 2007.
[20] Nishida K., Yamauchi K., and Omori T., ACE: Adaptive Classifiers-Ensemble System for Concept-Drifting Environments, in Proceeding of the 6th International International Conference on Multiple Classifier Systems, California, pp. 176-185, 2005.
[21] Oza N. and Russell S., Experimental Comparisons of Online and Batch Versions of Bagging and Boosting, in Proceeding of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, pp. 359-364, 2001.
[22] Scholz M. and Klinkenberg R., An Ensemble Classifier for Drifting Concepts, in Proceeding of the Second International Workshop on Knowledge Discovery from Data Streams, Porto, pp. 53-64, 2005.
[23] Sidhu P. and Bhatia M., Extended Dynamic Weighted Majority Using Diversity to Handle Drifts, in Proceeding of 17th East European Online Approach to Handle Concept Drifting Data Streams ... 299 Conference on Advances in Databases and Information Systems, pp. 389-395, 2014.
[24] Stanley K., Learning Concept Drift with A Committee of Decision Trees, Technical Report UT-AI-TR-03-302, Department of Computer Sciences, University of Texas at Austin, 2003.
[25] Street W. and Kim Y., A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification, in Proceeding of 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, pp. 377-382. NY, 2001. Parneeta Sidhu received her B.Tech degree in Computer Science from Punjab Technical University in 2002. She received her M. Tech degree in Information Systems from University of Delhi in 2009. Ms. Sidhu is a Teaching cum Research Faculty in the Division of COE at the Netaji Subhas Institute of Technology, affiliated to University of Delhi. She is presently pursuing her PhD in Computer Science from University of Delhi under the guidance of Dr. MPS Bhatia. Her research interests include data mining, concept drift, outlier analysis in data streams. She is an author or coauthor of 8 research papers in various international journals and conferences of high repute. She is a member of CSI (Computer Society of India). Mohinder Bhatia received his PhD in Computer Science from University of Delhi. Dr. Bhatia is a Professor in the Division of COE at the Netaji Subhas Institute of Technology, affiliated to University of Delhi. He is also serving the Institute as Dean, Student Welfare and Head, and Head, Placement Cell. He has guided many M.Tech and PhD students in their research work. His research interests include data mining, cyber security, semantic web, machine learning, social network analysis and sentiment analysis. He is an author or coauthor of many research papers in international journals and conferences. Dr. Bhatia is a member of IEEE (Institute of Electrical and Electronics Engineers) and CSI (Computer Society of India).