..............................
            ..............................
            ..............................
            
A Method for Finding the Appropriate Number of Clusters
        
        Drawback of almost partition based clustering algorithms is the requirement for the number of clusters specified at 
the  beginning.  Identifying  the  true  number  of  clusters  at  the  beginning  is  a  difficult  problem.  So  far,  there  were  some  works 
studied  on  this  issue  but  no  method  is  perfect  in  every  case.  This  paper  proposes  a  method  to  find  the  appropriate  number  of 
clusters in the clustering process by making an index indicated the appropriate number of clusters. This index is built from the 
intra-cluster  coefficient  and  inter-cluster  coefficient.  The  intra-cluster  coefficient  reflects  intra-distortion  of  the  cluster.  The 
inter-cluster coefficient reflects the distance among clusters. Those coefficients are made only by extremely marginal objects of 
clusters.  The  looking  for  the  extremely  marginal  objects  and  the  building  of  the  index  are  integrated  in  a  weighted  FCM 
algorithm  and it is calculated suitably  while the  weighted Fuzzy  C-Means (FCM) is processing. The  Extended weighted FCM 
algorithm integrated this index is called Fuzzy C-Means-Extended (FCM-E). Not only does the FCM-E seek the clusters, but it 
also  finds  the  appropriate  number  of  clusters. The  authors experiment  with  the  FCM-E  on  some  data  sets  of University  of 
California, Irvine (UCI): Iris, Wine, Breast Cancer Wisconsin, and Glass and compare the results of the proposed method with 
the results of the other methods. The results of proposed method obtained are encouraging.    
            [1] Bezdek J., Ehrlich R., and Full W., FCM: The Fuzzy C-Means Clustering Algorithm, Computers and Geosciences, vol. 10, no. 2-3, pp. 191-203, 1984.
[2] Capitaine H. and Fr licot C., A Fuzzy Modeling Approach to Cluster Validity, in Proceedings of IEEE International Conference on Fuzzy Systems, Jeju Island, pp. 462-467, 2009.
[3] Cheong Y. and Lee H., Determining the Number of Clusters in Cluster Analysis, Journal of the Korean Statistical Society, vol. 37, no. 2, pp. 135-143, 2008.
[4] Doan H. and Nguyen T., An Adaptive Method to Determine the Number of Clusters in Clustering Process, in Proceedings of The International Conference on Computer and Information Sciences, Kuala Lumpur, pp. 1-6, 2014.
[5] Hathaway R. and Bezdek J., Recent Convergence Results for the Fuzzy c-Means 682 The International Arab Journal of Information Technology, Vol. 15, No. 4, July 2018 Clustering Algorithms, Joumal of Classificanon, vol. 5, no. 2, pp. 237-247, 1988.
[6] Kalti K. and Mahjoub M., Image Segmentation by Gaussian Mixture Models and Modified FCM Algorithm, The International Arab Journal of Information Technology, vol. 11, no. 1, pp. 11-18, 2014.
[7] Kyrgyzov I., Kyrgyzov O., Ma tre H., and Campedel M., Kernel MDL to Determine the 1 X P E H U R I & O X V W H U V in Proceedings of 5th International Conference Machine Learning and Data Mining in Pattern Recognition. Lecture Notes in Computer Science, Leipzig, pp. 203-217, 2007.
[8] Nguyen T. and Doan H., An Approach to determine the Number of Clusters for Clustering Algorithms, in Proceedings of 4th International Conference Computational Collective Intelligence. Technologies and Applications, Vietnam, pp. 485-494, 2012.
[9] Pham T., Dimov S., and Nguyen D., Selection of K in K-means Clustering, Journal of Mechanical Engineering Science, vol. 219, no.1, pp.103-119, 2005.
[10] Rosenberger C. and Chehdi K., Unsupervised Clustering Method with Optimal Estimation of the Number of Clusters: Application to Image Segmentation, in Proceedings of 15th International Conference on Pattern Recognition, Barcelona, 2000.
[11] Salvador S. and Chan P., Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms, in Proceedings of 16th IEEE International Conference on Tools with Artificial Intelligence, Boca Raton, 2004.
[12] Sanguinetti G., Laidler J., and Lawrence N., Automatic Determination of the Number of Clusters using Spectral Algorithms, IEEE Workshop on Machine Learning for Signal Processing, Mystic, 2005.
[13] Shao Q. and Wu Y., A consistent Procedure for Determining the Number of Clusters in Regression Clustering, Journal of Statistical Planning and Inference, vol. 135, no. 2, pp. 461- 476, 2005.
[14] Sugar C. and James G., Finding the Number of Clusters in a Data set: An Information Theoretic Approach, Journal of the American Statistical Association, vol. 98, no. 463, pp. 750-763, 2003.
[15] Sun H., Wang S., and Jiang Q., FCM-Based Model Selection Algorithms for Determining the Number of Clusters, Pattern Recognition, vol. 37, no. 10, pp. 2027-2037, 2004.
[16] Tibshirani R., Walther G., and Hastie T., Estimating the Number of Clusters in a Data Set Via the Gap Statistic, Journal of the Royal Statistical Society, vol. 63, no. 2, pp. 411-423 2001.
[17] UCI Machine Learning Repository, available at: http://archive.ics.uci.edu/ml/datasets.html, Last Visited, 2013.
[18] Yan M. and Ye K., Determining the Number of Clusters Using the Weighted Gap Statistic, Biometrics, vol. 63, no. 4, pp. 1031-1037, 2007.
[19] Zalik K., Cluster Validity Index for Estimation of Fuzzy Clusters of Different Sizes and Densities, Pattern Recognition, vol. 43, no. 10, pp. 3374-3390, 2010.
[20] Zhao Q., Hautamaki V., and Fr nti P., Knee Point Detection in BIC for Detecting the Number of Clusters, in Proceedings of International Conference on Advanced Concepts for Intelligent Vision Systems, France, pp. 664- 673, 2008. Huan Doan received his BSc degree in Mathematics from Hue University of Science, Vietnam in 1988, and MSc degree in Computer Science from University of Information Technology (UIT), Vietnam National University Ho Chi Minh city (VNU-HCM) in 2012. He is currently pursuing PhD degree in Computer Science from University of Information Technology (UIT), VNU- HCM. He is also the director of EnterSoft Software Solution Joint Stock Company, Ho Chi Minh City, Vietnam. He has published about 8 research papers in the area of data mining and artificial intelligence, data analysis and risk analysis at international/national level conferences and journals. Dinh Nguyen Nguyen has been the Associate Professor at Department of Information Systems, University of Information Technology (UIT), Vietnam National University Ho Chi Minh city (VNU-HCM). He received his BSc degree in Mathematics from Dalat University in 1984, MSc degree in Information Technology from University of Science (VNU-HCM) in 1997 and PhD degree in Information Technology from Institute of Information Technology (IOIT), Vietnamese Academy of Science and Technology (VAST) in 2004. He has published more than 35 research papers in the area of database, data mining and data analysis at international/national level conferences and journals. He is currently guiding 3 PhD students in the area of data mining and data analysis
