Cohesive Pair-Wises Constrained Deep Embedding for Semi-Supervised Clustering with Very Few Labeled Samples*

Author Jing Zhang, Guiyan Wei, Yonggong Ren,

Keywords #Semi-supervised learning #clustering #auto-encoder network #pair-wise

Abstract

Semi-supervised learning is a powerful paradigm for excavating latent structures of between labeled and unlabeled samples under the view of models constructing. Currently, graph-based models solve the approximate matrix that directly represent distributions of samples by the spatial metric. The crux lies in optimizing connections of samples, which is achieved by subjecting to must-links or cannot-links. Unfortunately, to find links are rather difficult for semi-supervised clustering with very few labeled samples, therefore, significantly impairs the robustness and accuracy in such scenario. To address this problem, we propose the Cohesive Pair-wises Constrained deep Embedding model (CPCE) to obtain an optimal embedding for representing the category distribution of samples and avoid the failed graph-structure of the global samples. CPCE designs the deep network framework based on CNN-Autoencoder by minimizing reconstruct errors of samples, and build up constrains both of the sample distribution for within-class and the category distribution for intra-class to optimal the latent embedding. Then, leverage the strong supervised information obtained from cohesive pair-wises to pull samples into within-class, which avoid the similarity of high-dimension features located in different categories to achieve more the compact solution. We demonstrate the proposed method in popular datasets and compare the superiority with popular methods.

References

[1] Allam M. and Malaiyappan N., “Hybrid Feature Selection based on BTLBO and RNCA to Diagnose the Breast Cancer,” The International Arab Journal of Information Technology, vol. 20, no. 5, pp. 727-737, 2023. https://doi.org/10.34028/iajit/20/5/5

[2] Butler K., Davies D., Cartwright H., Isayev O., and Walsh A., “Machine Learning for Molecular and Materials Science,” Nature, vol. 559, no. 7715, pp. 547-555, 2018. DOI: 10.1038/s41586- 018-0337-2

[3] Chen C., Wang Z., Wu J., Wang X., Guo L., and Li Y., “Interactive Graph Construction for Graph- Based Semi-Supervised Learning,” IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 9, pp. 3701-3716, 2021. doi: 10.1109/TVCG.2021.3084694.

[4] Chen L. and Zhong Z., “Adaptive and Structured Graph Learning for Semi-Supervised Clustering,” Information Processing and Management, vol. 59, no. 4, pp. 102949, 2022. doi: 10.1016/j.ipm. 2022.102949

[5] Diallo B., Hu J., Li T., Khan G., Liang X., and Zhao Y., “Deep Embedding Clustering Based on Contractive Autoencoder,” Neurocomputing, vol. 433, pp. 96-107, 2021. doi: 10.1016/j.neucom.2020.12.094.

[6] Goel S. and Tushir M., “A New Semi-Supervised Clustering for Incomplete Data,” Journal of Intelligent and Fuzzy Systems, vol. 42, no. 2, pp. 727-739, 2022. DOI:10.3233/JIFS-189744.

[7] Guo X., Gao L., Liu X., and Yin J., “Improved Deep Embedded Clustering with Local Structure Preservation,” in Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, pp. 1753-1759, 2017. DOI:10.24963/ijcai.2017/243

[8] Han Y. and Wang T., “Semi-Supervised Clustering for Financial Risk Analysis,” Neural Processing Letters, vol. 53, no. 5, pp. 3561-3572, 2021. https://doi.org/10.1007/s11063-021-10564- 0 Cohesive Pair-Wises Constrained Deep Embedding for Semi-Supervised Clustering with ... 83

[9] Khanali H. and Vaziri B., “An Improved Approach to Fuzzy Clustering Based on FCM Algorithm and Extended VIKOR Method,” Neural Computing and Applications, vol. 32, no. 2, pp. 473-484, 2020. https://doi.org/10.1007/s00521-019-04035-w

[10] Kononenko I., “Machine Learning for Medical Diagnosis: History, State of the Art and perspective,” Artificial Intelligence in Medicine, vol. 23, no. 1, pp. 89-109, 2001. https://doi.org/10.1016/S0933-3657(01)00077-X

[11] Li X., Yin H., Zhou K., and Zhou X., “Semi- Supervised Clustering with Deep Metric Learning and Graph Embedding,” World Wide Web, vol. 23, no. 2, pp. 781-798, 2020. DOI:10.1007/s11280- 019-00723-8

[12] Lu J., Hu J., and Zhou J., “Deep Metric Learning for Visual Understanding: An Overview of Recent Advances,” IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 76-84, 2017. DOI:10.1109/MSP.2017.2732900

[13] Miyato T., Maeda S., Koyama M., and Ishii S., “Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 8, pp. 1979-1993, 2018. DOI:10.1109/TPAMI.2018.2858821

[14] Nie F., Wang X., Jordan M., and Huang H., “The Constrained Laplacian Rank Algorithm for Graph-Based Clustering,” in Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, pp. 1969-1976, 2016. https://doi.org/10.1609/aaai.v30i1.10302

[15] Nie F., Zhang H., Wang R., and Li X., “Semi- Supervised Clustering Via Pairwise Constrained Optimal Graph,” in Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence, pp. 3160- 3166, Yokohama, 2020. https://doi.org/10.24963/ijcai.2020/437

[16] Ren Y., Hu K., Dai X., Pan L., Hoi S., and Xu Z., “Semi-Supervised Deep Embedded Clustering,” Neurocomputing, vol. 325, pp. 121-130, 2019. https://doi.org/10.1016/j.neucom.2018.10.016

[17] Solorio-Fernández S., Carrasco-Ochoa J., and Martínez-Trinidad J., “A Review of Unsupervised Feature Selection Methods,” Artificial Intelligence Review, vol. 53, no. 2, pp. 907-948, 2020. https://doi.org/10.1007/s10462-019-09682- y

[18] Suárez J., García S., and Herrera F., “A Tutorial on Distance Metric Learning: Mathematical Foundations, Algorithms, Experimental Analysis, Prospects and Challenges,” Neurocomputing, vol. 425, pp. 300-322, 2021. https://doi.org/10.1016/j.neucom.2020.08.017

[19] Tanemura K., Das S., and Merz K., “AutoGraph: Autonomous Graph-Based Clustering of Small- Molecule Conformations,” Journal of Chemical Information and Modeling, vol. 61, no. 4, pp. 1647-1656, 2021. https://doi.org/10.1021/acs.jcim.0c01492

[20] Wen J., Varol E., Sotiras A., and Yang Z., “Multi- Scale Semi-Supervised Clustering of Brain Images: Deriving Disease Subtypes,” Medical Image Analysis, vol. 75, pp. 102304, 2022. https://doi.org/10.1016/j.media.2021.102304

[21] Xie J., Girshick R., and Farhadi A., “Unsupervised Deep Embedding for Clustering Analysis,” in Proceedings of the 33rd International Conference on Machine Learning, New York, pp. 478-487, 2016. https://proceedings.mlr.press/v48/xieb16.html

[22] Xu Z., Liu B., Zhe S., Bai H., and Wang Z., “Variational Random Function Model for Network Modeling,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 1, pp. 318-324, 2019. DOI:10.1109/TNNLS.2018.2837667

[23] Yu Z., Luo P., Liu J., Wong H., and You J., “Semi- Supervised Ensemble Clustering Based on Selected Constraint Projection,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 12, pp. 2394-2407, 2018. DOI:10.1109/TKDE.2018.2818729.

[24] Zhang Y., Wang H., Yang Y., and Zhou W., “Deep Matrix Factorization with Knowledge Transfer for Lifelong Clustering and Semi- Supervised Clustering,” Information Sciences, vol. 570, pp. 795-814, 2021. https://doi.org/10.1016/j.ins.2021.04.067