The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Clustering Based on Correlation Fractal Dimension

Online clustering, in an evolving high dimensional data is an amazing challenge for data mining applications. Although, many clustering strategies have been proposed, it is still an exciting task since the published algorithms fail to do well with high dimensional datasets, finding arbitrary shaped clusters and handling outliers. Knowing fractal characteristics of dataset can help abstract the dataset and provide insightful hints in the clustering process. This paper concentrates on presenting a novel strategy, FractStream for clustering data streams using fractal dimension, basic window technology, and damped window model. Core fractal-clusters, progressive fractal-cluster, outlier fractal clusters are identified, aiming to reduce search complexity and execution time. Pruning strategies are also employed based on the weights associated with each cluster, which reduced the usage of main memory. Experimental study of this paper over a number of data sets demonstrates the effectiveness and efficiency of the proposed technique.


[1] Aggarwal C., Han J., Yu P., and Wang J., A Framework for Clustering Evolving Data Streams, in Proceedings of the 29th Very Large Databases Conference, Berlin, pp. 81-92, 2003.

[2] Aggarwal C., Han J., Yu P., and Wang J., A Framework for Projected Clustering of High Dimensional Data Stream, in Proceedings of the 13th International conference on Very Large Data Bases, Toronto, pp. 852-863, 2004.

[3] Aggarwal C., Han J., Yu P., and Wang J., On High Dimensional Projected Clustering of Data Streams, Data Mining and Knowledge Discovery, vol. 10, no. 3, pp. 251-273, 2005.

[4] Ali S. and Madani S., Distributed Grid Based Robust Clustering Protocol for Mobile Sensor Networks, The International Arab Journal of Information Technology, vol. 8, no. 4, pp. 414- 421, 2011.

[5] Barbara D. and Chen P., Fractal Mining Self Similarity based Clustering and its Applications, Springer, 2010.

[6] Barbara D., Requirements for Clustering Data Streams, ACM SIGKDD Explorations, vol. 3, no. 2, pp. 23-27, 2002.

[7] Belussi A. and Faloutsos C., Estimating the 8 The International Arab Journal of Information Technology, Vol. 15, No. 1, January 2018 Selectivity of Spatial Queries Using the Correlation Fractal Dimension, in Proceedings of the 21th International Conference on Very Large, San Francisc, pp. 299-310, 1995.

[8] Cao F., Ester M., Qian W., and Zhou A., Density-based Clustering Over Evolving Data Stream with Noise, in Proceedings of the 6th SIAM International Conference on Data Mining, Bethesda, pp. 326-337, 2006.

[9] Chen Y. and Tu L., Density-Based Clustering for Real-Time Stream Data, in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose pp. 133-142, 2007.

[10] Ester M., Kriegel H., Sander J., and Xu X., A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, in Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, Portland, pp. 226-231, 1996.

[11] Gama J. and Rodrigues P., An Overview on Mining Data Streams, Springer-Verlag Berlin Heidelberg, 2009.

[12] Guha S., Meyerson A., Mishra N., and O'Callaghan R., Clustering Data Streams: Theory and Practice, IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 3 pp. 515-528, 2003.

[13] Guiling Li., Wang Y., Gu S., and Zhu X., Fractal-Based Algorithm for Anomaly Pattern Discovery on Time Series Stream, Journal of Convergence Information Technology, vol. 6, no. 3, pp. 181-187, 2011.

[14] Han J. and Kamber M., Data Mining: Concepts and Techniques (Second Edition), Elsevier, 2006.

[15] Khalilian M. and Mustapha N., Data stream Clustering: Challenges and Issues, in Proceedings of International Multi Conference of Engineers and Computer Scientists, Hong Kong, pp. 17-19, 2010.

[16] Lin J. and Lin H., A Density-Based Clustering Over Evolving Heterogeneous Data stream, in Proceedings of International Colloquium on Computing Communication Control and Management, Sanya, pp. 275-277, 2009.

[17] Lui L., Huang H., Guo Y., and Chen F., rDenStream, A Clustering Algorithm over an Evolving Data Stream, in Proceedings of International Conference on Information Engineering and Computer Science, Wuhan, pp. 1-4, 2009.

[18] O Callaghan L., Motwani R., Mishra N., Meyerson A., and Guha S., Streaming Data Algorithms for High-Guality Clustering, in Proceedings of 18th International Conference on Data Engineering, San Jose, pp. 685-694, 2002.

[19] Osama A., Comparisons Between Data Clustering Algorithms, The International Arab Journal of Information Technology, vol. 5, no. 3, pp. 320-325, 2008.

[20] Qian Q., Chao-Jie X., and Rui Z., Grid-Based Data Stream Clustering for Intrusion Detection, International Journal of Network Security, vol. 15, no. 1, pp. 1-8, 2013.

[21] Ren J., Cai B., and Hu C., Clustering Over Data Streams Based on Grid Density and Index Tree, Journal of Convergence Information Technology, vol. 6, no. 1, pp. 83-93, 2011.

[22] Ren J., Li L., and Hu C., A Weighted Subspace Clustering Algorithm in High-Dimensional Data Streams, in Proceedings of 4th International Conference on Innovative Computing, Information and Control, Kaohsiung, pp. 631- 634, 2009.

[23] Traina C., Traina A., and Faloutsos C., Fast Feature Selection Using Fractal Dimension-Ten Years Later, Journal of Information and Data Management, vol. 1, no. 1, pp. 17-20, 2010.

[24] Udommanetanakit K., Rakthanmanon T., and Waiyamai k., E-Stream: Evolution based Technique for Stream Clustering, in Proceedings of International Conference on Advanced Data Mining and Applications, Harbin, pp. 605-6015, 2007.

[25] Yarlagadda A., Murthy J., and KrishnaPrasad M., Estimating Correlation Dimension using Multi Layered Grid and Damped Window Model Over Data Streams, Procedia Technology, vol. 10, pp. 797-804, 2013.

[26] Zhu Y. and Shasha D., StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time, in Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, pp. 358-369, 2002. Clustering Based on Correlation Fractal Dimension Over an Evolving Data Stream 9 Anuradha Yarlagadda received her Master s in Computer Science and Engineering from Visvesvaraya Technological University, India, and is pursuing her Doctoral degree at Jawaharlal Nehru Technological University Hyderabad, India. Her research interest is data warehousing and mining. Murthy Jonnalagedda is currently, a Professor of the Department of Computer Science and Engineering, University College of Engineering Kakinada, JNTUK, Andhra Pradesh. He received his B.Tech degree from JNTU College of Engineering, Kakinada, M.Tech degree from IIT Kharagpur and Ph.D. degree from JNTU, Kakinada. His research interests include data warehousing and mining, data bases, big data analytics and high performance computing. Krishna Munaga is currently, an Associate Professor of the Department of Computer Science and Engineering, University College of Engineering Kakinada, JNTUK, Andhra Pradesh. He received his BE degree from Osmania University, Hyderabad, M.Tech degree and Ph.D. in Computer Science and Engineering and from JNTU, Hyderabad. He successfully completed a two-year MIUR fellowship at the University of Udine, Udine, Italy. His research interests include data mining, big data analytics and high performance computing.