Vol.64, No.1, 2020, pp.207-216, doi:10.32604/cmc.2020.09861
Analysis of Semi-Supervised Text Clustering Algorithm on Marine Data
  • Yu Jiang1, 2, Dengwen Yu1, Mingzhao Zhao1, 2, Hongtao Bai1, 2, Chong Wang1, 2, 3, Lili He1, 2, *
1 College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
2 A Key Laboratory of Symbolic Computation and Knowledge Engineering, Jilin University, Changchun, 130012, China.
3 Department of Engineering Mechanics, State Marine Technical University of St. Petersburg, St. Petersburg, 190008, Russia.
* Corresponding Author: Lili He. Email: helili@jlu.edu.cn.
Received 22 January 2020; Accepted 12 February 2020; Issue published 20 May 2020
Semi-supervised clustering improves learning performance as long as it uses a small number of labeled samples to assist un-tagged samples for learning. This paper implements and compares unsupervised and semi-supervised clustering analysis of BOAArgo ocean text data. Unsupervised K-Means and Affinity Propagation (AP) are two classical clustering algorithms. The Election-AP algorithm is proposed to handle the final cluster number in AP clustering as it has proved to be difficult to control in a suitable range. Semi-supervised samples thermocline data in the BOA-Argo dataset according to the thermocline standard definition, and use this data for semi-supervised cluster analysis. Several semi-supervised clustering algorithms were chosen for comparison of learning performance: Constrained-K-Means, Seeded-K-Means, SAP (Semi-supervised Affinity Propagation), LSAP (Loose Seed AP) and CSAP (Compact Seed AP). In order to adapt the single label, this paper improves the above algorithms to SCKM (improved Constrained-K-Means), SSKM (improved Seeded-K-Means), and SSAP (improved Semi-supervised Affinity Propagationg) to perform semi-supervised clustering analysis on the data. A DSAP (Double Seed AP) semi-supervised clustering algorithm based on compact seeds is proposed as the experimental data shows that DSAP has a better clustering effect. The unsupervised and semi-supervised clustering results are used to analyze the potential patterns of marine data.
Unsupervised learning, semi-supervised learning, text clustering.
Cite This Article
Jiang, Y., Yu, D., Zhao, M., Bai, H., Wang, C. et al. (2020). Analysis of Semi-Supervised Text Clustering Algorithm on Marine Data. CMC-Computers, Materials & Continua, 64(1), 207–216.