The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Compact Tree Structures for Mining High Utility Itemsets

High Utility Item set Mining (HUIM) from large transaction databases has garnered significant attention as it accounts for the revenue of the items purchased in a transaction. Existing tree-based HUIM algorithms discard unpromising items and require at most two database scans for their construction. Hence, whenever utility threshold is changed, the trees have to be reconstructed from scratch. In this regard, the current study proposes to not only incorporate all the items in the tree structure but compactly represent transaction information. The proposed trees namely- Utility Prime Tree (UPT), Prime Cantor Function Tree (PCFT), and String based Utility Prime Tree (SUPT) store transaction-level information in a node unlike item-based prefix trees. Experiments conducted on both real and synthetic datasets compare the execution time and memory of these tree structures with a proposed Utility Count Tree (UCT) and existing IHUP, UP-Growth trees. Due to transaction-level encoding, these structures consume significantly less memory when compared to the tree structures in the literature.

[1] Aggarwal C., Frequent Pattern Mining, Springer International Publishing, 2014.

[2] Agrawal R. and Srikant R., “Fast Algorithms for Mining Association Rules,” in Proceedings of the 20th VLDB Conference, Santiago, pp. 487-499, 1994.

[3] Ahmed C., Tanbeer S., and Jeong B., and Lee Y., “HUC-Prune: An Efficient Candidate Pruning Technique to Mine High Utility Patterns,” Applied Intelligence, vol. 34, no. 2, pp. 181-198, 2011.

[4] Ahmed C., Tanbeer S., Jeong B., and Lee Y., “Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 12, pp. 1708-1721, 2009.

[5] Dawar S., Bera D., and Goyal V., “High-Utility Itemset Mining for Subadditive Monotone Utility Functions,” CoRR abs/1812.07208, 2018.

[6] Duan Y., Fu X., Luo B., Wang Z., Shi J., and Du X., “Detective: Automatically Identify and Analyze Malware Processes in Forensic Scenarios Via Dlls,” in Proceedings of IEEE International Conference on Communications, London, pp. 5691-5696, 2015.

[7] Fournier-Viger P., SPMF An Open-Source Data Mining Library, Datasets, https://www.philippe- fournier- viger.com/spmf/index.php?link=datasets.php, Last Visited, 2019.

[8] Fournier-Viger P., SPMF An Open-Source Data Mining Library, Developer's Guide, https://www.philippe-fournier- viger.com/spmf/index.php?link=developers.php, Last Visited, 2019.

[9] Han J., Pei J., Yin Y., and Mao R., “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, vol. 8, no. 1, pp. 53-87, 2004.

[10] Harpaz R., Chase H., and Friedman C., “Mining Multi-Item Drug Adverse Effect Associations in Spontaneous Reporting Systems,” BMC Bioinformatics, vol. 11, no. 9, pp. 1-8, 2010.

[11] Hopcroft J., Motwani R., and Ullman J., Introduction to Automata Theory, Languages, and Computation, Addison-Wesley, 2006.

[12] Ivancsy R. and Vajk I., “Frequent Pattern Mining in Web Log Data,” Acta Polytechnica Hungarica, vol. 3, no. 1, pp. 77-90, 2006.

[13] Krishna S. and Bhavani S., “An Efficient Approach for Text Clustering Based on Frequent Itemsets,” European Journal of Scientific Research, vol. 42, no. 3, pp. 385-396, 2010.

[14] Krishnamoorthy S., “Pruning Strategies for Mining High Utility Itemsets,” Expert Systems with Applications, vol. 42, no. 5, pp. 2371-2381, 2015.

[15] Lan G., Hong T., and Tseng V., “An Efficient Gradual Pruning Technique for Utility Mining,” International Journal of Innovative Computing Compact Tree Structures for Mining High Utility Itemsets 159 Information and Control, vol. 8, no. 7B, pp. 5165- 5178, 2012.

[16] Li Y., Yeh J., and Chang C., “Isolated Items Discarding Strategy for Discovering Highutility Itemsets,” Data and Knowledge Engineering, vol. 64, no. 1, pp. 198-217, 2008.

[17] Lin C., Hong T., and Lu W., “An Effective Tree Structure for Mining High Utility Itemsets,” Expert Systems with Applications, vol. 38, no. 6, pp. 7419-7424, 2011.

[18] Liu J., Wang K., and Fung B., “Direct Discovery of High Utility Itemsets without Candidate Generation,” in Proceedings of IEEE 12th International Conference on Data Mining, Brussels, pp. 984-989, 2012.

[19] Liu J., Wang K., and Fung B., “Mining High Utility Patterns in one Phase Without Generating Candidates,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 5, pp. 1245- 1257, 2016.

[20] Liu M. and Qu J., “Mining High Utility Itemsets without Candidate Generation,” in Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, Hawaii, pp. 55-64, 2012.

[21] Liu Y., Liao W., and Choudhary A., “A Two- Phase Algorithm for Fast Discovery of High Utility Itemsets,” in Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, Hanoi, pp. 689-695, 2005.

[22] Maiya G. and D'Souza R., “An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent Itemsets,” IJCSNS International Journal of Computer Science and Network Security, vol. 8, no. 8, pp. 121-126, 2008.

[23] Mobasher B., Cooley R., and Srivastava J., “Automatic Personalization Based on Web Usage Mining,” Communications of the ACM, vol. 43, no. 8, pp. 142-151, 2000.

[24] Naulaerts S., Meysman P., Bittremieux W., Vu T., VandenBerghe W., Goethals B., and Laukens K., “A Primer to Frequent Itemset Mining for Bioinformatics,” Briefings in Bioinformatics, vol. 16, no. 2, pp. 216-231, 2013.

[25] Poongodi K. and Kumar D., “Support Vector Machine with Information Gain Based Classification for Credit Card Fraud Detection System,” The International Arab Journal Information Technology, vol. 18, no. 2, pp. 199- 207, 2021.

[26] Slashdot Media: java. size of. https://sourceforge.net/projects/sizeof/_les/, Last Visited, 2019.

[27] Tseng V., Shie B., Wu C., and Yu P., “Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 1, pp. 54-67, 2016.

[28] Tseng V., Wu C., Shie B., and Yu P., “Up- Growth: An Efficient Algorithm for High Utility Itemset Mining,” in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, pp. 253-262, 2010.