A Tradeoff Between Accuracy and Speed for K-Means Seed Determination

Farzaneh Khorasani; Morteza Zanjireh; Mahdi Bahaghighat; Qin Xin

doi:10.32604/csse.2022.016003

Open Access icon Open Access

ARTICLE

A Tradeoff Between Accuracy and Speed for K-Means Seed Determination

Farzaneh Khorasani¹, Morteza Mohammadi Zanjireh^1,*, Mahdi Bahaghighat¹, Qin Xin²

1 Computer Engineering Department, Imam Khomeini International University, Qazvin, Iran
2 Faculty of Science and Technology, University of the Faroe Islands, Torshavn, Faroe Islands

* Corresponding Author: Morteza Mohammadi Zanjireh. Email: email

Computer Systems Science and Engineering 2022, 40(3), 1085-1098. https://doi.org/10.32604/csse.2022.016003

Received 18 December 2020; Accepted 30 April 2021; Issue published 24 September 2021

Abstract

With a sharp increase in the information volume, analyzing and retrieving this vast data volume is much more essential than ever. One of the main techniques that would be beneficial in this regard is called the Clustering method. Clustering aims to classify objects so that all objects within a cluster have similar features while other objects in different clusters are as distinct as possible. One of the most widely used clustering algorithms with the well and approved performance in different applications is the k-means algorithm. The main problem of the k-means algorithm is its performance which can be directly affected by the selection in the primary clusters. Lack of attention to this crucial issue has consequences such as creating empty clusters and decreasing the convergence time. Besides, the selection of appropriate initial seeds can reduce the cluster’s inconsistency. In this paper, we present a new method to determine the initial seeds of the k-mean algorithm to improve the accuracy and decrease the number of iterations of the algorithm. For this purpose, a new method is proposed considering the average distance between objects to determine the initial seeds. Our method attempts to provide a proper tradeoff between the accuracy and speed of the clustering algorithm. The experimental results showed that our proposed approach outperforms the Chithra with 1.7% and 2.1% in terms of clustering accuracy for Wine and Abalone detection data, respectively. Furthermore, achieved results indicate that comparing with the Reverse Nearest Neighbor (RNN) search approach, the proposed method has a higher convergence speed.

Keywords

Data clustering; k-means algorithm; information retrieval; outlier detection; clustering accuracy; unsupervised learning

Cite This Article

APA Style

Khorasani, F., Zanjireh, M.M., Bahaghighat, M., Xin, Q. (2022). A Tradeoff Between Accuracy and Speed for K-Means Seed Determination. Computer Systems Science and Engineering, 40(3), 1085–1098. https://doi.org/10.32604/csse.2022.016003

Vancouver Style

Khorasani F, Zanjireh MM, Bahaghighat M, Xin Q. A Tradeoff Between Accuracy and Speed for K-Means Seed Determination. Comput Syst Sci Eng. 2022;40(3):1085–1098. https://doi.org/10.32604/csse.2022.016003

IEEE Style

F. Khorasani, M. M. Zanjireh, M. Bahaghighat, and Q. Xin, “A Tradeoff Between Accuracy and Speed for K-Means Seed Determination,” Comput. Syst. Sci. Eng., vol. 40, no. 3, pp. 1085–1098, 2022. https://doi.org/10.32604/csse.2022.016003

BibTex EndNote RIS

Copyright © 2022 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A Tradeoff Between Accuracy and Speed for K-Means Seed Determination

Abstract

Keywords

Cite This Article

2229

1213

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link