The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


A New Approach to Improve Association Rules for Big Data in Cloud Environment

The technique of association rules is very useful in Data Mining, but it generates a huge number of rules. So, a manual post-processing is required to target only the interesting rules. Several researchers suggest integrating users' knowledge by using ontology and rule patterns, and then select automatically the interesting rules after generating all possible rules. However, nowadays the business data are extremely increasing, and many companies have already opted for Big Data systems deployed in cloud environments, then the process of generating association rules becomes very hard. To deal with this issue, we propose an approach using ontology with rule patterns to integrate users' knowledge early in the preprocessing step before searching or generating any rule. So, only the interesting rules which respect the rule patterns will be generated. This approach allows reducing execution time and minimizing the cost of the post-processing especially for Big Data. To confirm the performance results, experiments are carried out on Not Only Strutured Query Language (NoSQL) databases which are distributed in a cloud environment.


[1] Agrawal R. and Srikant R., “Mining Generalized Association Rules,” in Proceedings of 21st International Conference on Very Large Data Bases, San Francisco, pp. 407-419, 1995.

[2] Bayardo J. and Agrawal R., “Mining the Most Interesting Rules,” in Proceedings of 5th ACM SIGKDD, Conference on Knowledge Discovery and Data Mining, California, pp. 145-154, 1999.

[3] Dahmani D., Rahal S., and Belalem G., “Improving the Performance of Data Mining by Using Big Data in Cloud Environment,” Journal of Information and Knowledge Management, vol. 15, no. 4, 2016.

[4] Guillet F. and Hamilton H., Quality Measures in Data Mining, Springer, 2007.

[5] Han J., Kamber M., and Pei J., Data Mining Concepts and Techniques, Morgan Kaufmann Publishers, 2012.

[6] Han J., Pei J., Yin Y., and Mao R., “Mining Frequent Patterns without Candidate Generation: a Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, vol. 8, no. 1, pp. 53- 87, 2000.

[7] Hastie T., Tibshirani R., and Friedman J., The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer-Verlag, 2008.

[8] Hilderman R. and Hamilton H., “Evaluation of Interestingness Measures for Ranking Discovered Knowledge,” in Proceedings of the 5th Pacific- Asia Conference on Knowledge Discovery and Data Mining, London, pp. 247-259, 2001.

[9] JeyaKumar K., Dhanabalachandran M., and JeyaKumar K., “Effective and Efficient Utility Mining Technique for Incremental Dataset,” The International Arab Journal of Information Technology, vol. 15, no. 1, pp. 157-166, 2018.

[10] Klemettinen M., Mannila H., Ronkainen P., Toivonen H., and Verkamo I., “Finding Interesting Rules from Large Sets of Discovered Association Rules,” in Proceedings of the 3rd International Conference on Information and Knowledge Management, Maryland, pp. 401-407, 1994.

[11] Liu B., Hsu W., Wang K., and Chen S., “Visually Aided Exploration of Interesting Association Rules,” in Proceedings of the 3rd Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining, Beijing, pp. 380- 389, 1999.

[12] Marinica C., Guillet F., and Briand H., “Vers La Fouille De Règles D’association Guidée Par Des Ontologies Et Des Schémas De Règles,” LINA- COD team, Conferance, Nice, 2008.

[13] McCreary D. and Kelly A., Making Sense of NoSQL, Manning Publications, 2014.

[14] Mongo B., Documentation Project MongoDB, https://www.mongodb.com, Last Visited, 2018.

[15] Padmanabhan B. and Tuzhuilin A., “Unexpectedness as a Measure of Interestingness in Knowledge Discovery,” Decision Support Systems, vol. 27, no. 3, pp. 303-318, 1999.

[16] Pei J., Han J., and Mao R., “CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets,” ACM SIGMOD DMKD, Dallas, pp. 21-30, 2002.

[17] Piatetsky-Shapiro G. and Frawley W., Knowledge Discovery in Databases, AAAI Press, 1991. 1020 The International Arab Journal of Information Technology, Vol. 16, No. 6, November 2019

[18] Protégé, modeling Ontology Tool, Stanford University, http://protege.stanford.edu, Last Visited, 2018.

[19] Silberschatz A. and Tuzhilin A., “What Makes Patterns Interesting in Knowledge Discovery Systems,” IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 970-974, 1996.

[20] Solid IT, Ranking database management systems. http://db-engines.com/en/ranking, Last Visited, 2018.

[21] Sonatrach, the Algerian Oil and Gas Company, the first company in Africa and Medit. https://www.sonatrach.com, Last Visited, 2018.

[22] Taha Ahmed S., Al-hamdani R., and Crook M., “Studying of Educational Data Mining Techniques,” International Journal of Advanced Research in Science, Engineering and Technology, vol. 5, no. 5, pp. 5742-5750, 2018.

[23] Tan P., Kumar V., and Srivastava J., “Selecting the Right Objective Measure for Association Analysis,” Information Systems, vol. 29, no. 4, pp. 293-313, 2004.

[24] W3C Web Ontology Language (OWL), http://www.w3.org, Last Visited, 2018.

[25] Zaki M. and Wagner M., Data Mining and Analysis: Fundamental Concepts and Algorithms, Cambridge University Press, 2014. Djilali Dahmani Graduated from Department of computer science, Faculty of exact and applied sciences, University of Science and Technology MB, USTO, Algeria, where he received PhD degree in computer science in 2017. His current research interests are Big data, Cloud computing, Data mining, database models, data science, replication, consistency, fault tolerance, resource management, energy consumption, mobile environment, High Performance Computing. Sidi Ahmed Rahal He is Doctor in computer science since 1989 in Pau University, France. Currently, he is a professor at Department of computer science, University of Science and Technology MB, USTO, Algeria. His current research interests are Data mining, Object-Oriented database, Data Mining, Agents Expert Systems, Big data, Cloud computing, database models. He is a member of SSD (Signal, System and Data) laboratory. Ghalem Belalem Graduated from Department of computer science, Faculty of exact and applied sciences, University of Oran1 Ahmed Ben Bella, Algeria, where he received PhD degree in computer science in 2007. His current research interests are distributed system; grid computing, cloud computing, replication, consistency, fault tolerance, resource management, economic models, energy consumption, Big data, IoT, mobile environment, images processing, Supply chain optimization, Decision support systems, High Performance Computing.