The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


TDMCS: An Efficient Method for Mining Closed

In some data stream applications, the information embedded in the data arriving in the new recent time period is important than historical transactions. Because data stream is changing over time, concept drift problem may appear in data stream mining. Frequent pattern mining methods always generate useless and redundant patterns. In order to obtain the result set of lossless compression, closed pattern is needed. A novel method for efficiently mining closed frequent patterns on data stream is proposed in this paper. The main works includes: distinguished importance of recent transactions from historical transactions based on time decay model and sliding window model; designed the frame minimum support count-maximal support error rate-decay factor (θ-ε-f) to avoid concept drift; used closure operator to improve the efficiency of algorithm; design a novel way to set decay factor: average-decay-factor faverage in order to balance the high recall and high precision of algorithm. The performance of proposed method is evaluated via experiments, and the results show that the proposed method is efficient and steady-state. It applies to mine data streams with high density and long patterns. It is suitable for different size sliding windows, and it is also superior to other analogous algorithms.


[1] Chang T., Mining Frequent User Query Patterns From xml Query Streams, The International Arab Journal of Information Technology, vol. 11, no. 5, pp. 452-458, 2014.

[2] Chen H., Mining Top-K Frequent Patterns over Data Streams Sliding Window, Journal of Intelligence Information System, vol. 42, no. 1, pp. 111-131, 2014.

[3] Chen H., Shu L., Xia J., and Deng Q., Mining Frequent Patterns in a Varying-Size Sliding Window of Online Transactional Data Streams, Information Sciences, vol. 215, no. 12, pp. 15-36, 2012.

[4] Cheng J., Ke Y., and Ng W., Maintaining Frequent Closed Itemsets over a Sliding Window, Journal of Intelligent Information Systems, vol. 31, no. 3, pp. 191-215, 2008.

[5] Chi Y., Wang H., Yu P., and Muntz R., Catch the Moment: Maintaining Closed Frequent Itemsets over a Data Stream Sliding Window, Knowledge and Information Systems, vol. 10, no. 3, pp. 265-294, 2006.

[6] Farzanyar Z., Kangavari M., and Cercone N., Max-FISM: Mining (Recently) Maximal Frequent Itemsets over Data Streams Using the Sliding Window Model, Computers and Mathematics with Applications, vol. 64, no. 6, pp. 1706-1718, 2012.

[7] Frank A. and Asuncion A., http://archive.ics.uci.edu/ml, Last Visited 2010.

[8] HewaNadungodage C., Xia Y., Lee J., and Tu Y., Hyper-Structure Mining of Frequent Patterns in Uncertain Data Streams, Knowledge and Information Systems, vol. 37, no. 1, pp. 219-244, 2013.

[9] Jiang N. and Gruenwald L., CFI-Stream: Mining Closed Frequent Itemsets in Data Streams, in Proceeding of ACM SIGKDD Internal Conference on Knowledge Discovering and Data Mining, New York, pp. 592-597, 2006.

[10] Lee G., Yun U., and Ryu K., Sliding Window Based Weighted Maximal Frequent Pattern Mining over Data Streams, Expert Systems with Applications, vol. 41, no. 2, pp. 694-708, 2014.

[11] Li H., Ho C., and Lee S., Incremental Updates of Closed Frequent Itemsets over Continuous Data Streams, Expert Systems with Applications, vol. 36, no. 2, pp. 2451-2458, 2009.

[12] Li G. and Chen H., Mining the Frequent Patterns in an Arbitrary Sliding Window over Online Data Streams, Journal of Software, vol. 19, no. 10, pp. 2585-2596, 2008.

[13] Li H., Zhang N., Zhu J., and Cao H., Frequent Itemset Mining over Time-Sensitive Streams, Chinese Journal of Computers, vol. 35, no. 11, pp. 2283-2293, 2012.

[14] Li H., Ho C., Chen H., and Lee S., A Single- Scan Algorithm for Mining Sequential Patterns from Data Streams, International Journal of Innovative Computing, Information and Control, vol. 8, no. 3A, pp. 1799-1820, 2012.

[15] Manku Q. and Motwani., Approximate Frequency Counts over Streaming Data, in Proceeding of the 28th International Conference on Very Large Data Bases, Hong Kong, pp. 346- 357, 2002.

[16] Nabil H., Eldin A., and Belal M., Mining Frequent Itemsets from Online Data Streams: Comparative Study, International Journal of Advanced Computer Science and Applications, vol. 4, no. 7, pp. 117-125, 2013.

[17] Nori F., Deypir M., and Sadreddini M., A Sliding Window based Algorithm for Frequent Closed Itemset Mining over Data Streams, Journal of Systems and Software, vol. 86, no. 3, pp. 615-623, 2013.

[18] Patnaik D., Laxman S., Chandramouli B., and Ramakrishnan N., A General Streaming Algorithm for Pattern Discovery, Knowledge and Information Systems, vol. 37, no. 3, pp. 585- 610, 2013.

[19] Shie B., Yu P., and Tseng V., Efficient Algorithms for Mining Maximal High Utility Itemsets from Data Streams with Different Models, Expert Systems with Applications, vol. 39, no. 17, pp. 12947-12960, 2012.

[20] Tang K., Dai C., and Chen L., A Novel Strategy for Mining Frequent Closed Itemsets in Data Streams, Journal of Computers, vol. 7, no. 7, pp. 1564-1572, 2012.

[21] Tsai P., Mining top-K Frequent Closed Itemsets over Data Streams Using the Sliding Window Model, Expert Systems with Applications, vol. 37, no. 10, pp. 6968-6973, 2010. 860 The International Arab Journal of Information Technology, Vol. 14, No. 6, November 2017

[22] Wong R. and Fu A., Mining Top-K Frequnt Itemsets form Data Streams, Data Mining and Knowledge Discovery, vol. 13, no. 2, pp. 193-217, 2006.

[23] Yang B. and Huang H., TOPSIL-Miner: an Efficient Algorthm for Mining Top-K Significant Itemsets over Data Streams, Knowledge and Information Systems, vol. 23, no. 2, pp. 225-242, 2010.

[24] Yen S., Lee Y., Wu C., and Lin C., An Efficient Algorithm for Maintaining Frequent Closed Itemsets over Data Stream, Next-Generation Applied Intelligence, vol. 5579, no. 1, pp. 767- 776, 2009.

[25] Yen S., Wu C., and Lee Y., A Fast Algorithm for Mining Frequent Closed Itemsets over Stream Sliding Window, in Proceeding of IEEE International Conference on Fuzzy Systems, Taipei, pp. 996-1002, 2011.

[26] Yu J., Chong Z., Lu H., and Zhou A., False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams, in Proceeding of the 30th International Conference on Very Large Data Bases, Toronto, pp. 204-215, 2004. Han Meng, born in 1982, Ph.D. candidate, associate professor. Her research interests include data mining and machine learning. Jian Ding, born in 1977, M.S., associate professor. His research interests include machine learning and data mining. Juan Li, born in 1975, M.S., associate professor. Her research interests include information security and cloud computing.