The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Privacy-Preserving Data Mining in Homogeneous

#
Privacy concern has become an important issue in da ta mining. In this paper, a novel algorithm for privacy preserving in distributed environment using data cl ustering algorithm has been proposed. As demonstrat ed, the data is locally clustered and the encrypted aggregated information is transferred to the master site. This aggregated information consists of centroids of clusters along with their sizes. On th e basis of this local information, global centroids are reconstructed then it is transferred to all sites for updating their local c entroids. Additionally, the proposed algorithm is i ntegrated with Elliptic Curve Cryptography (ECC) public key cryptosystem and Diff ie-Hellman key exchange. The proposed distributed encrypted scheme can add an increase not more than 15% in performanc e time relative to distributed non encrypted scheme but give not less than 48% reduction in performance time relative to centralized scheme with the same size of dataset. Theoretical and experimental analysis illustrates that the proposed algorithm can effectively solve privacy preserving problem of clustering mining over distributed data and achieve the privac y-preserving aim.  


[1] Agrawal R. and Srikant R., Privacy Preserving Data Mining, ACM SIGMOD Record , vol. 29, no. 2, pp. 439 450, 2000.

[2] Amara M. and Siad A., Elliptic Curve Cryptography and its Applications, in Proceedings of the 7 th International Workshop on System , Signal Processing and their Application , Tipaza, Algeria, pp. 247 250, 2011.

[3] Bunn P. and Ostrovsky R., Secure Two Party k Means Clustering, in Proceedings of the 14 th ACM Conference on Computer and Communications Security , Virginia, USA, pp. 486 497, 2007.

[4] Calinski T. and Harabasz J., A Dendrite Method for Cluster Analysis, Communications in Statistics-theory and Methods , vol. 3, no. 1, pp. 1 27, 1974. Privacy-Preserving Data Mining in Homogeneous Collaborative Clustering 611

[5] Davies D. and Bouldin D., A Cluster Separation Measure, IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 1, no. 2, pp. 224 227, 1979.

[6] Dhillon I. and Modha D., A Data Clustering Algorithm on Distributed Memory Multiprocessors, Large-Scale Parallel Data Mining , 2000.

[7] Diffie W. and Hellman M., New Directions in Cryptography, IEEE Transactions on Information Theory , vol. 22, no. 6, pp. 644 654, 1976.

[8] Duand W. and Atallah M., Privacy Preserving Cooperative Statistical Analysis, in Annual Computer Security Applications Conference ACSAC , Louisiana, USA, pp. 102 110, 2001.

[9] Forman G. and Zhag B., Distributed Data Clustering can be Efficient and Exact, ACM SIGKDD Explorations Newsletter , vol. 2, no. 2, pp. 34 38., 2000.

[10] Gallian J., Contemporary Abstract Algebra , Boston: Houghton Mifflin, 2006.

[11] Golwasser S. and Micali S., Probabilistic Encryption, the Journal of Computer and System Sciences , vol. 28, no. 2, pp. 270 299, 1984.

[12] Han J. and Kamber M., Data Mining: Concepts and Techniques , Morgan Kaufmann, 2000.

[13] Hankerson D., Menezes A., and Vanstone S., Guide to Elliptic Curve Cryptography , Springer Verlag, 2004.

[14] Hasegawa T., Nakajima J., and Matsui M., A Practical Implementation of Elliptic Curve Cryptosystems Over GF(P) on A 16 Bit Microcomputer, in Proceedings of the 1 st International Workshop on Practice and Theory in Public Key Cryptography , Pacifico Yokohama, Japan, pp. 182 194, 1998.

[15] Januzaj E., Kriegel P., and Pfeifle M., DBDC: Density Based Distributed Clustering

[c], in Proceedings of the 9 th International Conference on Extending Database Technology , Crete, Greece, pp. 88 105, 2004

[16] Jha S., Kruger L., and McDaniel P., Privacy Preserving Clustering, available at: http://siis.cse.psu.edu/pubs/esorics05.pdf, last visited 2013.

[17] Kantabutra S. and Couch L., Parallel K Means Clustering Algorithm on Nows, National Electronics and Computer Technology Center Technical Journal , vol. 1, no. 6, pp. 243 247, 2000

[18] Kargupta H., Huang W., Sivakumar K. and Johnson E., Distributed Clustering using Collective Principal Component Analysis, Knowledge and Information Systems , vol. 3, no. 4, pp. 405 421, 2001.

[19] Klusch M., Lodi S., and Moro G., Distributed Clustering based on Sampling Local Density Estimates, available at: http://www ags.dfki.uni sb.de/~klusch/papers/ijcai03 KDEC paper.pdf, last visited 2003.

[20] Kohavi R. and Becker B., UCI Repository of Machine Learning Databases, available at: http://archive.ics.uci.edu/ml/datasets.html, last visited 2013.

[21] Linedell Y. and Pinkas B., Secure Multiparty Computation for Privacy Preserving Data Mining, the Journal of Privacy and Confidentiality , vol. 1, no. 1, pp. 59 98, 2009.

[22] Maulik U. and Bandyopadhyay S., Performance Evaluation of Some Clustering Algorithms and Validity Indices, IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 24, no. 12, pp. 1650 1654, 2002.

[23] Merugu S. and Ghosh J., Privacy Preserving Distributed Clustering using Generative Models, in Proceedings of the 3 rd International Conference on Data Mining , Florida, USA, pp. 211 218, 2003.

[24] Meskine F. and Bahloul S., Privacy Preserving K means Clustering: A Survey Research, the International Arab Journal of Information Technology , vol. 9, no. 2, pp. 194 200, 2012.

[25] Miao Z. and Genlin J., DK Means An Improvement of Distributed Clustering Algorithm K Dmeans, available at: http://d.g.wanfangdata.com.cn/Periodical_jsjyjyf z2007z2017.aspx, last visited 2013.

[26] Oliveira S. and Zaiane O., Privacy Preserving Clustering by Data Transformation, Journal of Information and Data Management , vol. 1, no. 1, pp. 304 318, 2010.

[27] Pardo J., An Introduction to Elliptic Curve Cryptogaphy, Introduction to Cryptography with Maple , 2013.

[28] Tan P., Steinbach M., and Kumar V., Introduction to Data Mining , USA: Addison Wesley Longman, Inc, 2005.

[29] Vaidya J. and Clifton C., Privacy Preserving k Means Clustering over Vertically Partitioned Data, in Proceedings of the 9 th ACM SIGDD International Conference on Knowledge Discovery and Data Mining , Illinois, USA, pp. 206 215, 2003.

[30] Xie X. and Beni G., A Validity Measure for Fuzzy Clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 13, no. 8, pp. 841 847, 1991.

[31] Yi X. and Zhang Y., Equally Contributory Privacy Preserving K Means Clustering Over Vertically Partitioned Data, Information Systems , vol. 38, no. 1, pp. 97 107, 2013.

[32] Zhao Y. and Karypis G., Evaluation of Hierarchical Clustering Algorithms for 612 The International Arab Journal of Information Tech nology, Vol. 12, No. 6, November 2015 Document Datasets, in Proceedings of the 11th International Conference on Information and Knowledge Management , Virginia, USA, pp. 515 524, 2002. Mohamed Ouda PhD student in Communications and Computer Engineering Department, Helwan University, Egypt. His research interests include machine learning, data mining, and database security. Sameh Salem graduated with a BSc and MSc degrees in communications and electronics engineering, both from Helwan University, Egypt, in 1998 and 2003, respectively. In 2008, He received the degree of PhD in engineering from Department of Electrical Engineering and Electronics, The Univers ity of Liverpool, UK. His research interests include clustering algorithms, machine learning, data minin g, parallel computing, and cloud computing. In 2008, H e was appointed as assistant professor in Department of Electronics, Communication and Computer Engineering, Faculty of Engineering, Helwan University, Egypt. Also, He is selected to be coordinator and academic advisor at Department of Communication and Information Technology, Uninettuno University (Italy) in corporation with Faculty of Engineering, Helwan University (Egypt). Furthermore, He is reviewing several proposals and research projects at the National Telecommunication Regulatory Authority (NTRA) Egypt. In 2014, He is promoted to be Associate Professor. Currently, he i s Honorary Research Fellow at the Department of Electrical Engineering and Electronics, The Univers ity of Liverpool, UK. Ihab Ali obtained his BSc, MSc and PhD degrees at 1985, 1991 and 1997 respectively, all in communications Engineering from Helwan University, Egypt. He is a senior member of IEEE. He is currently the head of Communications Engineering Department, Helwan University, Egypt. EL-Sayed Saad is Professor of Electronic Circuits, Faculty of Engineering, Helwan University, Egypt. International scientific member of the ECCTD. Member of the national radio science committee. Member of the European Circuit Society (ECS). Inventor of Scaad s single amplifier SC structure. Engineering Consulta nt for the Supreme Council of Universities.