The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


PatHT: An Efficient Method of Classification over Evolving Data Streams

#
Some existing classifications need frequent update to adapt to the change of concept in data streams. To solve this problem, an adaptive method Pattern-based Hoeffding Tree (PatHT) is proposed to process evolving data streams. A key technology of a training classification decision tree is to improve the efficiency of choosing an optimal splitting attribute. Therefore, frequent patterns are used. Algorithm PatHT discovers constraint-based closed frequent patterns incremental updated. It builds an adaptive and incremental updated tree based on the frequent pattern set. It uses sliding window to avoid concept drift in mining patterns and uses concept drift detector to deal with concept change problem in procedure of training examples. We tested the performance of PatHT against some known algorithms using real data streams and synthetic


[1] Assunção M., Veith A., and Buyya R., “Distributed Data Stream Processing and Edge Computing: A Survey on Resource Elasticity and Future Directions,” Journal of Network and Computer Applications, vol. 103, pp. 1-17, 2018.

[2] Bifet A. and Gavaldá R., “Adaptive Learning from Evolving Data Streams,” in Proceedings of the 8th International Symposium on Intelligent Data Analysis, Lyon, pp. 249-260, 2009.

[3] Bifet A. and Gavaldá R., “Learning from Time- Changing Data with Adaptive Windowing,” in Proceedings of the 7th SIAM International Conference on Data Mining, Minnesota, pp. 443- 448, 2007.

[4] Bifet A., Gavaldà R., Holmes G., and Pfahringer B., Machine Learning for Data Streams with Practical Examples in MOA, MIT Press, 2018.

[5] Bifet A., Holmes G., and Pfahringer B., “New Ensemble Methods for Evolving Data Streams,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, pp. 139-148, 2009.

[6] Bifet A., Zhang J., and Fan W., He C., Zhang J., Qian J., Holmes G., and Pfahringer B., “Extremely Fast Decision Tree Mining for Evolving Data Streams,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, pp. 1733-1742, 2017.

[7] Bustio-Martínez L., Cumplido R., Hernández- León R., Bande-Serrano J., and Feregrino-Uribe C., “On the Design of Hardware Software Architectures for Frequent Itemsets Mining on Data Streams,” Journal of Intelligent Information Systems, vol. 50, no. 3, pp. 415-440, 2018.

[8] Chen H., Shu L., Xia J., and Deng Q., “Mining Frequent Patterns in A Varying-Size Sliding Window of Online Transactional Data Streams,” PatHT: An Efficient Method of Classification over Evolving Data Streams 1105 Information Sciences, vol. 215, no. 12, pp. 15-36, 2012.

[9] Chi Y., Wang H., Yu P., and Muntz R., “Catch the Moment: Maintaining Closed Frequent Itemsets over A Data Stream Sliding Window,” Knowledge and Information Systems, vol. 10, no. 3, pp. 265-294, 2006.

[10] Domingos P. and Hulten G., “Mining High- Speed Data Streams,” in Proceeding of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, pp. 71-80, 2000.

[11] Farhat A., Gouider M., and Said L., “New Algorithm for Frequent Itemsets Mining From Evidential Data Streams,” Procedia Computer Science, vol. 96, pp. 645-653, 2016.

[12] Farid D., Zhang L., Hossain A., Rahman C., Strachan R., Sexton G., and Dahal K., “An Adaptive Ensemble Classifier for Mining Concept Drifting Data Streams,” Expert Systems with Applications, vol. 40, no. 15, pp. 5895-5906, 2013.

[13] Frank A. and Asuncion A., UCI Machine Learning Repository

[EB/OL]. Irvine, CA: University of California, School of Information and Computer Science, http:// archive.ics.uci.edu/ml, Last Visited, 2010.

[14] Gama J., Fernandes R., and Rocha R., “Decision Trees for Mining Data Streams,” Intelligent Data Analysis, vol. 10, no. 1, pp. 23-45, 2006.

[15] Gama J. and Medas P., “Learning Decision Trees from Dynamic Data Streams,” Acm Symposium on Applied Computing, vol. 11, no. 8, pp. 1353- 1366, 2005.

[16] Gomes H., Barddal M., Enembreck F., and Bifet A., “A Survey on Ensemble Learning for Data Stream Classification,” ACM Computing Surveys, vol. 50, no. 2, pp. 1-23, 2017.

[17] Hammer H., Yazidi A., and Oommen B., “On the Classification of Dynamical Data Streams Using Novel “Anti-Bayesian” Technique,” Pattern Recognition, vol. 76, pp. 108-124, 2018.

[18] Han M., Ding J., and Li J., “TDMCS: An Efficient Method for Mining Closed Frequent Patterns Over Data Streams Based on Time Decay Model,” The International Arab Journal of Information Technology, vol. 14, no. 6, pp. 851-860, 2017.

[19] Holmes G., Kirkby R., and Pfahringer B., MOA: Massive Online Analysis. http://sourceforge.net/ projects/moa-datastream. Last Visited, 2014.

[20] Hulten G, Spencer L., and Domingos P., “Mining Time-Changing Data Streams,” in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, pp. 97-106, 2001.

[21] Kourtellis M., Morales G., Bifet A., and Murdopo A., “VHT: Vertical Hoeffding Tree,” in Proceedings of the IEEE International Conference on Big Data, Washington, pp. 915- 922, 2016.

[22] Kusumakumari V., Sherigar D., Chandran R., and Patil N., “Frequent Pattern Mining on Stream Data Using Hadoop Cantreegtree,” Procedia Computer Science, vol. 115, pp. 266-273, 2017.

[23] Lim Y. and Kang U., “Time-Weighted Counting for Recently Frequent Pattern Mining in Data Streams,” Knowledge and Information Systems, vol. 53, no. 2, pp. 391-422, 2017.

[24] Mello R., Vaz Y., Grossi C., and Bifet A., “On Learning Guarantees to Unsupervised Concept Drift Detection on Data Streams,” Expert Systems with Applications, vol. 117, pp. 90-102, 2019.

[25] Oza N. and Russell S., Online Bagging and Boosting. Morgan Kaufmann, 2001.

[26] Reddy V., Rao T., and Govardhan A., “Fast and Lossless Mining of Closed Frequent Itemsets Over Data Streams,” Journal of Advanced Research in Dynamical and Control Systrems, vol. 1, pp. 256-263, 2018.

[27] Wang P., Wu X., Wang C., Wang W., and Shi B., “CAPE-A Classification Algorithm Using Frequent Patterns Over Data Streams,” Journal of Computer Research and Development, vol. 41, no. 10, pp. 1677-1683, 2004.

[28] Witten I. and Frank E., Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2005.

[29] Yen S., Lee Y., Wu C., and Lin C., “An Efficient Algorithm for Maintaining Frequent Closed Itemsets Over Data Stream,” Next-Generation Applied Intelligence, vol. 5579, no. 1, pp. 767- 776, 2009.

[30] Yen S., Wu C., Lee Y., Tseng V., and Hsieh C., “A Fast Algorithm for Mining Frequent Closed Itemsets Over Stream Sliding Window,” in Proceedings of IEEE International Conference on Fuzzy Systems, Taipei, pp. 996-1002, 2011. Han Meng born in 1982, Ph.D., associate professor. Her research interests include data mining and machine learning. Jian Ding born in 1977, M.S., associate professor. His research interests include machine learning and data mining.