Modeling Bacterial Species: Using Sequence Similarity with Clustering Techniques

Miguel-Angel Sicilia; Elena García-Barriocanal; Marçal Mora-Cantallops; Salvador Sánchez-Alonso; Lino González

doi:10.32604/cmc.2021.015874

Open Access icon Open Access

ARTICLE

Modeling Bacterial Species: Using Sequence Similarity with Clustering Techniques

Miguel-Angel Sicilia^1,*, Elena García-Barriocanal¹, Marçal Mora-Cantallops¹, Salvador Sánchez-Alonso¹, Lino González²

1 University of Alcalá, Alcalá de Henares (Madrid), 28871, Spain
2 Camilo José Cela University, Madrid, 28007, Spain

* Corresponding Author: Miguel-Angel Sicilia. Email: email

Computers, Materials & Continua 2021, 68(2), 1661-1672. https://doi.org/10.32604/cmc.2021.015874

Received 11 December 2020; Accepted 25 January 2021; Issue published 13 April 2021

Abstract

Existing studies have challenged the current definition of named bacterial species, especially in the case of highly recombinogenic bacteria. This has led to considering the use of computational procedures to examine potential bacterial clusters that are not identified by species naming. This paper describes the use of sequence data obtained from MLST databases as input for a k-means algorithm extended to deal with housekeeping gene sequences as a metric of similarity for the clustering process. An implementation of the k-means algorithm has been developed based on an existing source code implementation, and it has been evaluated against MLST data. Results point out to potential bacterial clusters that are close to more than one different named species and thus may become candidates for alternative classifications accounting for genotypic information. The use of hierarchical clustering with sequence comparison as similarity metric has the potential to find clusters different from named species by using a more informed cluster formation strategy than a conventional nominal variant of the algorithm.

Keywords

Clustering; bacterial species; k-means; sequence alignment

Cite This Article

APA Style

Sicilia, M., García-Barriocanal, E., Mora-Cantallops, M., Sánchez-Alonso, S., González, L. (2021). Modeling Bacterial Species: Using Sequence Similarity with Clustering Techniques. Computers, Materials & Continua, 68(2), 1661–1672. https://doi.org/10.32604/cmc.2021.015874

Vancouver Style

Sicilia M, García-Barriocanal E, Mora-Cantallops M, Sánchez-Alonso S, González L. Modeling Bacterial Species: Using Sequence Similarity with Clustering Techniques. Comput Mater Contin. 2021;68(2):1661–1672. https://doi.org/10.32604/cmc.2021.015874

IEEE Style

M. Sicilia, E. García-Barriocanal, M. Mora-Cantallops, S. Sánchez-Alonso, and L. González, “Modeling Bacterial Species: Using Sequence Similarity with Clustering Techniques,” Comput. Mater. Contin., vol. 68, no. 2, pp. 1661–1672, 2021. https://doi.org/10.32604/cmc.2021.015874

BibTex EndNote RIS

Copyright © 2021 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Modeling Bacterial Species: Using Sequence Similarity with Clustering Techniques

Abstract

Keywords

Cite This Article

2576

2322

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link