Table of Content

Open Access iconOpen Access

ARTICLE

crossmark

Analysis of Semi-Supervised Text Clustering Algorithm on Marine Data

Yu Jiang1, 2, Dengwen Yu1, Mingzhao Zhao1, 2, Hongtao Bai1, 2, Chong Wang1, 2, 3, Lili He1, 2, *

1 College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
2 A Key Laboratory of Symbolic Computation and Knowledge Engineering, Jilin University, Changchun, 130012, China.
3 Department of Engineering Mechanics, State Marine Technical University of St. Petersburg, St. Petersburg, 190008, Russia.

* Corresponding Author: Lili He. Email: email.

Computers, Materials & Continua 2020, 64(1), 207-216. https://doi.org/10.32604/cmc.2020.09861

Abstract

Semi-supervised clustering improves learning performance as long as it uses a small number of labeled samples to assist un-tagged samples for learning. This paper implements and compares unsupervised and semi-supervised clustering analysis of BOAArgo ocean text data. Unsupervised K-Means and Affinity Propagation (AP) are two classical clustering algorithms. The Election-AP algorithm is proposed to handle the final cluster number in AP clustering as it has proved to be difficult to control in a suitable range. Semi-supervised samples thermocline data in the BOA-Argo dataset according to the thermocline standard definition, and use this data for semi-supervised cluster analysis. Several semi-supervised clustering algorithms were chosen for comparison of learning performance: Constrained-K-Means, Seeded-K-Means, SAP (Semi-supervised Affinity Propagation), LSAP (Loose Seed AP) and CSAP (Compact Seed AP). In order to adapt the single label, this paper improves the above algorithms to SCKM (improved Constrained-K-Means), SSKM (improved Seeded-K-Means), and SSAP (improved Semi-supervised Affinity Propagationg) to perform semi-supervised clustering analysis on the data. A DSAP (Double Seed AP) semi-supervised clustering algorithm based on compact seeds is proposed as the experimental data shows that DSAP has a better clustering effect. The unsupervised and semi-supervised clustering results are used to analyze the potential patterns of marine data.

Keywords

Unsupervised learning, semi-supervised learning, text clustering.

Cite This Article

APA Style
Jiang, Y., Yu, D., Zhao, M., Bai, H., Wang, C. et al. (2020). Analysis of Semi-Supervised Text Clustering Algorithm on Marine Data. Computers, Materials & Continua, 64(1), 207–216. https://doi.org/10.32604/cmc.2020.09861
Vancouver Style
Jiang Y, Yu D, Zhao M, Bai H, Wang C, He L. Analysis of Semi-Supervised Text Clustering Algorithm on Marine Data. Comput Mater Contin. 2020;64(1):207–216. https://doi.org/10.32604/cmc.2020.09861
IEEE Style
Y. Jiang, D. Yu, M. Zhao, H. Bai, C. Wang, and L. He, “Analysis of Semi-Supervised Text Clustering Algorithm on Marine Data,” Comput. Mater. Contin., vol. 64, no. 1, pp. 207–216, 2020. https://doi.org/10.32604/cmc.2020.09861

Citations




cc Copyright © 2020 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 3312

    View

  • 1858

    Download

  • 0

    Like

Share Link