The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Compact Tree Structures for Mining High Utility Itemsets

High Utility Item set Mining (HUIM) from large transaction databases has garnered significant attention as it accounts for the revenue of the items purchased in a transaction. Existing tree-based HUIM algorithms discard unpromising items and require at most two database scans for their construction. Hence, whenever utility threshold is changed, the trees have to be reconstructed from scratch. In this regard, the current study proposes to not only incorporate all the items in the tree structure but compactly represent transaction information. The proposed trees namely- Utility Prime Tree (UPT), Prime Cantor Function Tree (PCFT), and String based Utility Prime Tree (SUPT) store transaction-level information in a node unlike item-based prefix trees. Experiments conducted on both real and synthetic datasets compare the execution time and memory of these tree structures with a proposed Utility Count Tree (UCT) and existing IHUP, UP-Growth trees. Due to transaction-level encoding, these structures consume significantly less memory when compared to the tree structures in the literature.


[1] Aggarwal C., Frequent Pattern Mining, Springer International Publishing, 2014.

[2] Agrawal R. and Srikant R., “Fast Algorithms for Mining Association Rules,” in Proceedings of the 20th VLDB Conference, Santiago, pp. 487-499, 1994.

[3] Ahmed C., Tanbeer S., and Jeong B., and Lee Y., “HUC-Prune: An Efficient Candidate Pruning Technique to Mine High Utility Patterns,” Applied Intelligence, vol. 34, no. 2, pp. 181-198, 2011.

[4] Ahmed C., Tanbeer S., Jeong B., and Lee Y., “Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 12, pp. 1708-1721, 2009.

[5] Dawar S., Bera D., and Goyal V., “High-Utility Itemset Mining for Subadditive Monotone Utility Functions,” CoRR abs/1812.07208, 2018.

[6] Duan Y., Fu X., Luo B., Wang Z., Shi J., and Du X., “Detective: Automatically Identify and Analyze Malware Processes in Forensic Scenarios Via Dlls,” in Proceedings of IEEE International Conference on Communications, London, pp. 5691-5696, 2015.

[7] Fournier-Viger P., SPMF An Open-Source Data Mining Library, Datasets, https://www.philippe- fournier- viger.com/spmf/index.php?link=datasets.php, Last Visited, 2019.

[8] Fournier-Viger P., SPMF An Open-Source Data Mining Library, Developer's Guide, https://www.philippe-fournier- viger.com/spmf/index.php?link=developers.php, Last Visited, 2019.

[9] Han J., Pei J., Yin Y., and Mao R., “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, vol. 8, no. 1, pp. 53-87, 2004.

[10] Harpaz R., Chase H., and Friedman C., “Mining Multi-Item Drug Adverse Effect Associations in Spontaneous Reporting Systems,” BMC Bioinformatics, vol. 11, no. 9, pp. 1-8, 2010.

[11] Hopcroft J., Motwani R., and Ullman J., Introduction to Automata Theory, Languages, and Computation, Addison-Wesley, 2006.

[12] Ivancsy R. and Vajk I., “Frequent Pattern Mining in Web Log Data,” Acta Polytechnica Hungarica, vol. 3, no. 1, pp. 77-90, 2006.

[13] Krishna S. and Bhavani S., “An Efficient Approach for Text Clustering Based on Frequent Itemsets,” European Journal of Scientific Research, vol. 42, no. 3, pp. 385-396, 2010.

[14] Krishnamoorthy S., “Pruning Strategies for Mining High Utility Itemsets,” Expert Systems with Applications, vol. 42, no. 5, pp. 2371-2381, 2015.

[15] Lan G., Hong T., and Tseng V., “An Efficient Gradual Pruning Technique for Utility Mining,” International Journal of Innovative Computing Compact Tree Structures for Mining High Utility Itemsets 159 Information and Control, vol. 8, no. 7B, pp. 5165- 5178, 2012.

[16] Li Y., Yeh J., and Chang C., “Isolated Items Discarding Strategy for Discovering Highutility Itemsets,” Data and Knowledge Engineering, vol. 64, no. 1, pp. 198-217, 2008.

[17] Lin C., Hong T., and Lu W., “An Effective Tree Structure for Mining High Utility Itemsets,” Expert Systems with Applications, vol. 38, no. 6, pp. 7419-7424, 2011.

[18] Liu J., Wang K., and Fung B., “Direct Discovery of High Utility Itemsets without Candidate Generation,” in Proceedings of IEEE 12th International Conference on Data Mining, Brussels, pp. 984-989, 2012.

[19] Liu J., Wang K., and Fung B., “Mining High Utility Patterns in one Phase Without Generating Candidates,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 5, pp. 1245- 1257, 2016.

[20] Liu M. and Qu J., “Mining High Utility Itemsets without Candidate Generation,” in Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, Hawaii, pp. 55-64, 2012.

[21] Liu Y., Liao W., and Choudhary A., “A Two- Phase Algorithm for Fast Discovery of High Utility Itemsets,” in Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, Hanoi, pp. 689-695, 2005.

[22] Maiya G. and D'Souza R., “An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent Itemsets,” IJCSNS International Journal of Computer Science and Network Security, vol. 8, no. 8, pp. 121-126, 2008.

[23] Mobasher B., Cooley R., and Srivastava J., “Automatic Personalization Based on Web Usage Mining,” Communications of the ACM, vol. 43, no. 8, pp. 142-151, 2000.

[24] Naulaerts S., Meysman P., Bittremieux W., Vu T., VandenBerghe W., Goethals B., and Laukens K., “A Primer to Frequent Itemset Mining for Bioinformatics,” Briefings in Bioinformatics, vol. 16, no. 2, pp. 216-231, 2013.

[25] Poongodi K. and Kumar D., “Support Vector Machine with Information Gain Based Classification for Credit Card Fraud Detection System,” The International Arab Journal Information Technology, vol. 18, no. 2, pp. 199- 207, 2021.

[26] Slashdot Media: java. size of. https://sourceforge.net/projects/sizeof/_les/, Last Visited, 2019.

[27] Tseng V., Shie B., Wu C., and Yu P., “Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 1, pp. 54-67, 2016.

[28] Tseng V., Wu C., Shie B., and Yu P., “Up- Growth: An Efficient Algorithm for High Utility Itemset Mining,” in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, pp. 253-262, 2010. Anup Bhat was born in Manipal, Udupi, Karnataka, India in 1991. He received his BE degree in information and communication technology (2013) and MTech degree in computer science and engineering (2017) from Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India in 2013 and 2017, respectively. He is currently pursuing PhD in the same institute. His research interests include Data Mining and Machine Learning. Harish Venkatarama received his PhD degree from National Institute of Technology Karnataka in 2011 from the Department of Computer Science and Engineering. He is currently serving as Professor in the Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India. His work has been published in journals of international repute. He also has two book chapters to his credit. His research interests include Algorithms, Machine Learning and Data Mining. Geetha Maiya received her PhD degree from the Department of Mathematical and Computational Sciences, National Institute of Technology Karnataka in 2010.She is currently a Professor with the Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India. She has presented several articles in national and international conferences. Her work has also been published in several international journals. Her current research interests include Algorithms, Data Mining, Text mining in healthcare, and financial sectors.