Parallelization of Frequent Itemset Mining Methods with FP-tree: An Experiment with PrePost+ Algorithm

Author Olakara Jamsheela, Raju Gopalakrishna,

Keywords #Data Mining algorithm #parallelization of PrePost+ #parallel processing #multicore

Abstract

Parallel processing has turn to be a common programming practice because of its efficiency and thus becomes an interesting field for researchers. With the introduction of multi- core processors as well as general purpose graphics processing units, parallel programming has become affordable. This leads to the parallelization of many of the complex data processing algorithms including algorithms in data mining. In this paper, a study on parallel PrePost+ is presented. PrePost+ is an efficient frequent itemset mining algorithm. The algorithm has been modified as a parallel algorithm and the obtained result is compared with the result of sequential PrePost+ algorithm.

References

[1] Agapito G., Guzzi P., and Cannataro M., “Parallel Extraction of Association Rules from Genomics Data,” Applied Mathematics and Computation, vol. 350, pp. 434-446, 2019.

[2] Al-Fayoumi M., Alwidian J., and Abusaif M., “Intelligent Association Classification Technique for Phishing Website Detection,” The International Arab Journal of Information Technology, vol. 17, no. 4, pp. 488-496, 2019.

[3] Athavale A., Randive P., and Kambale A., “Automatic Parallelization of Sequential Codes Using S2p Tool and Benchmarking of The Generated Parallel Codes,” URL http://www. kpit. com/downloads/research-papers/automatic- parallelization-sequential-codes. pdf. Last Visited, 2020.

[4] Choy R. and Edelman A., “Parallel MATLAB: Doing it RIght,” Proceedings of the IEEE, vol. 93, no. 2, pp. 331-341, 2005.

[5] Deng Z. and Lv S.,” PrePost+: An Efficient N- Lists-Based Algorithm for Mining Frequent Itemsets via Children-Parent Equivalence Pruning,” Expert Systems with Applications, vol. 42, no. 13, pp.5424-5432, 2015.

[6] Gandomi A. and Haider M., “Beyond The Hype: Big Data Concepts, Methods, and Analytics,” International Journal of Information Management, vol. 35, no. 2, pp. 137-144, 2015. 0 5 10 15 20 25 90807570656055504540353020 time (sec) minimum support(%) PrePost+ PRL-PrePost+ 0 10 20 30 40 765432 time (sec) minimum support(%) PrePost+ PRL-PrePost+ 0 20 40 60 8075706560 time (sec ) minimum support(%) PrePost+ PRL-PrePost+ 0 5 10 15 0.020.040.060.080.10.2 time(sec) minimum support(%) PrePost+ PRL-PrePost+ Parallelization of Frequent Itemset Mining Methods with FP-tree: An Experiment ... 213

[7] Goethals B., Fimi repository website. http://fimi.ua.ac.be/data/., Last Visited, 2020.

[8] Guo R., “Variance-Covariance Matrix Estimation with LSQR in A Parallel Programming Environ- Ment,” Master's Thesis, 2008.

[9] Hadoop A., Welcome to Apache TM Hadoop®. URL http://hadoop.apache.org/, Last Visited, 2020.

[10] Husbands P., Isbell C., and Edelman A., “MITMatlab: A Tool for Interactive Supercomputing, in Proceedings the 9th SIAM Confrence, San Antonio, 1999.

[11] Jamsheela O. and Gopalakrishna R., “Frequent Itemset Mining Algorithms: A Literature Survey,” in Proceedings of IEEE International Advance Computing Conference Banglore, pp. 1099-1104, 2015.

[12] Kepner J. and Ahalt S., “MatlabMPI,” Journal of Parallel and Distributed Computing, vol. 64, no. 8, pp. 997-1005, 2004.

[13] Letras M., Bustio-Martínez L., Cumplido R., Hernández-León R., and Feregrino-Uribe C., “On the Design of Hardware Architectures for Parallel Frequent Itemsets Mining,” Expert Systems with Applications, vol. 157, 2020.

[14] Luebke D., “CUDA: Scalable Parallel Programming for High-Performance Scientific Computing,” in Proceedings of 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Paris, pp. 836-838, 2008.

[15] Lutke bohle I., Search Data Center, http://searchdatacenter.techtarget.com/definition/p arallelprocessing/, Last Visited, 2020.

[16] Mathworks.Com: Parallel Computing Toolbox, MATLAB, MathWorks India, http://in.mathworks.com/products/parallel- computing/ Last Visited, 2020.

[17] Matloff N., Programming on Parallel Machines. University of California, 2011.

[18] Mohamed M. and Refaat H., “A Fast Parallel Association Rule Mining Algorithm Based on the Probability of Frequent Itemsets,” International Journal of Computer Science and Network Security, vol. 11, no. 5, pp. 152-162, 2011.

[19] Patel A., Birl M., and Nair U., “Addressing Big Data Problem Using Hadoop and Map Reduce,” in Proceedings of Nirma University International Conference on Engineerings, Ahmedabad, pp. 1- 5, 2012.

[20] Shafi A., Carpenter B., and Baker M., “Nested Parallelism for Multi-Core HPC Systems Using Java,” Journal of Parallel and Distributed Computing, vol. 69, no. 6, pp. 532-545, 2009.

[21] TECH, OF: MATLAB* P 2.0: Interactive Supercomputing, Massachusetts Institute of Technology, Dissertation, 2002.

[22] Travinin B. and Kepner J., “pMATLAB Parallel MATLAB Librar,” The International Journal of High Performance Computing Applications, vol. vol. 21, no. 3, pp. 336-359, 2007.

[23] Tsagkli S., Ilias:Java Fork/Join for Parallel Programming Java Code Geeks-2016, https://www.javacodegeeks.com/2011/02/ java- forkjoin-parallel-programming.html, Last Visited, 2020.

[24] Xia D., Zhou Y., Rong Z., and Zhang Z., “IPFP: An Improved Parallel FP-Growth Algorithm for Frequent Itemsets Mining,” in Proceedings 59th ISI World Statistics Congress, Hong Kong, pp. 4034-4039, 2013.

[25] Xun Y., Zhang J., and Qin X., “Fidoop: Parallel Mining of Frequent Itemsets Using Mapreduce,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 46, no. 3, pp. 313- 325, 2015.

[26] Yan D., Zhao X., Lin R., and Bai D., “PPQAR: Parallel PSO for Quantitative Association Rulemining,” Peer-to-Peer Networking and Applications, vol. 12, no. 5, pp.1433-1444, 2019.

[27] Yong W., Zhe Z., and Fang W., “A Parallel Algorithm of Association Rules Based on Cloud Computing,” in Proceedings 8th International Conference on Communications and Networking in China, Guilin, pp. 415-419, 2013.