The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


An Efficient Algorithm for Extracting Infrequent Itemsets from Weblog

Weblog data contains unstructured information. Due to this, extracting frequent pattern from weblog databases is a very challenging task. A power set lattice strategy is adopted for handling that kind of problem. In this lattice, the top label contains full set and at the bottom label contains empty set. Most number of algorithms follows bottom-up strategy, i.e. combining smaller to larger sets. Efficient lattice traversal techniques are presented which quickly identify all the long frequent itemsets and their subsets if required. This strategy is suitable for discovering frequent itemsets but it might not be worth being used for infrequent itemsets. In this paper, we propose Infrequent Itemset Mining for Weblog (IIMW) algorithm; it is a top-down breadth-first level-wise algorithm for discovering infrequent itemsets. We have compared our algorithm IIMW to Apriori-Rare, Apriori-Inverse and generated result in with different parameters such as candidate itemset, frequent itemset, time, transaction database and support threshold.


[1] Adda M., Wu L., White S., and Fengr Y., “Pattern Detection with Rare Itemset Mining,” International Journal on Soft Computing, Artificial Intelligence and Applications, vol. 1, no. 1, pp. 1-17, 2012.

[2] Agrawal R. and Srikant R., “Fast Algorithms for Mining Association Rules,” in Proceedings of 20th International Conference on Very Large Data Bases, San Francisco, pp. 487-499, 1994.

[3] Agrawal R., Imielinski T., and Swami A., “Mining Association Rules between Sets of Items in Large Databases,” in Proceedings of ACM SIGMOD International Conference on Management of Data, New York, pp. 207-216, 1993.

[4] Bakariya B., Mohbey K., and Thakur G., “An Inclusive Survey on Data Preprocessing Methods Used in Web Usage Mining,” in Proceedings of 7th International Conference on Bio-Inspired Computing: Theories and Applications, India, pp. 407-416, 2013.

[5] Bakariya B., Mohbey K., and Thakur G., “An Inclusive Survey on Data Preprocessing Methods Used in Web Usage Mining,” in Proceedings of 7th International Conference on Bio-Inspired Computing: Theories and Applications, Gwalior, pp. 407-416, 2013.

[6] Han J., Pei J., and Yin Y. “Mining Frequent Patterns without Candidate Generation,” in Proceedings of ACM SIGMOD International Conference on Management of Data, Texas, pp. 1-12, 2000.

[7] Han J., Pei J., Yin Y., and Mao R., “Mining Frequent Patterns without Candidate Generation: a Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, vol. 8, no. 1, pp. 53- 87, 2004.

[8] Huang D., Koh Y., and Dobbie G., “Infrequent Pattern Mining on Data Streams,” Data Warehousing and Knowledge Discovery Lecture Notes in Computer Science, Vienna, pp. 303-314, 2012.

[9] Iwanuma K., Takano Y., and Nabeshima H., “On Anti-Monotone Frequency Measures for Extracting Sequential Patterns from a Single Very Long Data Sequence,” IEEE Conference on Cybernetics and Intelligent Systems, Singapore, pp. 213-217, 2004.

[10] Koh Y. and Rountree N., “Finding Sporadic Rules Using Apriori-Inverse,” Advances in Knowledge Discovery and Data Mining Lecture Notes in Computer Science, Nanjing, pp. 97-106, 2007.

[11] Liu B., Hsu W., and Ma Y., “Mining Association Rules with Multiple Minimum Supports,” in Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, pp. 337- 341, 1999.

[12] Pei J., Han J., Lu H., Nishio S., Tang S., and Yang D., “H-Mine: Fast and Space-preserving Frequent Pattern Mining in Large Databases,” IEEE Transactions, vol. 39, no. 6, pp. 593-605, 2007.

[13] Prati R., Monard M., Andre C., and Carvalho L., “A Method for Refining Knowledge Rules Using Exceptions,” Electronic Journal of Informatics and Operations Research, vol. 27, no. 4 pp. 53- 65, 2004.

[14] Song M. and Rajasekaran S., “A Transaction Mapping Algorithm for Frequent Itemsets Mining,” IEEE Transactions on Knowledge and 50100150200250300 0 5 10 15 20 25 30 35 40 45 50 Tra nsa ctions ve rsus Ex e cution Time Numbe r of W e b Tra nsa ctions Execution Time (Sec) Apriori-Rare Apriori-Inverse IIMW 280 The International Arab Journal of Information Technology, Vol. 16, No. 2, March 2019 Data Engineering, vol. 18, no. 4, pp. 472-481, 2006.

[15] Szathmary L., Napoli A., and Valtchev P., “Towards Infrequent Itemset Mining,” in Proceedings of 19th IEEE International Conference on Tools with Artificial Intelligence, Patras, pp. 305-312, 2007.

[16] The Internet Traffic Archive, available at: http://ita.ee.lbl.gov/html/contrib/NASA- HTTP.html, Last Visited, 2013.

[17] Troiano L. and Scibelli G., “A Time-Efficient Breadth-First Level-Wise Lattice-Traversal Algorithm To Discover Infrequent Itemsets,” Data Mining and Knowledge Discovery, vol. 28, no. 3, pp. 773-807, 2014.

[18] Tsang S., Koh Y., and Dobbie G., “Finding Interesting Infrequent Association Rules Using Infrequent Pattern Tree,” Transactions on Large- Scale Data- and Knowledge-Centered Systems VIII Lecture Notes in Computer Science, pp. 157- 173, 2013. Brijesh Bakariya received Graduation degree from Barkatullah University Bhopal M.P. in 2005, and Post Graduation Degree in Computer Applications from Devi Ahilya Vishwavidyalaya Indore M.P. in year 2009. He received Ph.D. Degree in the Department of Computer Applications, Maulana Azad National Institute of Technology Bhopal M.P. in 2016. He is Assistant Professor in Department of Computer Science and Engineering, I.K. Gujral Punjab Technical University (IKGPTU) Jalandhar, Punjab. He has been teaching since 2009 and guiding M.Tech/ Ph.D students. In the mean time he published many research papers in SCI publications in the area of Data Mining, Image Processing, and Social Networking. He has attended various short term training programs, refresher course, workshops and seminars. He is a member of the IACSIT, APCBEES, APCBEES and UACEE. Ghanshyam Thakur has received BSc degree from Dr. Hari Singh Gour University Sagar M.P. in 2000. He has received MCA degree in 2003 from Pt. RaviShankar Shukal University Raipur C.G. and PhD degree from Barkhatullah University, Bhopal M.P. in year 2009. He is Assistant Professor in the department of Computer Applications, Maulana Azad National Institute of technology, Bhopal, M. P. India. He has eight year teaching and research experience. He has 26 publications in national and international journals. His research interests include Text Mining, Document clustering, Information Retrieval, Data Warehousing. He is a member of the CSI, IAENG, and IACSIT.