Open Access iconOpen Access

ARTICLE

crossmark

Big Data Bot with a Special Reference to Bioinformatics

Ahmad M. Al-Omari1,*, Shefa M. Tawalbeh1, Yazan H. Akkam2, Mohammad Al-Tawalbeh3, Shima’a Younis1, Abdullah A. Mustafa4, Jonathan Arnold5

1 Biomedical Systems and Informatics Engineering Department, Yarmouk University, Irbid, 21163, Jordan
2 Department of Medicinal Chemistry and Pharmacognosy, Yarmouk University, Irbid, 21163, Jordan
3 Department of Electrical, Computer and Software Engineering, University of Ontario Institute of Technology, Oshawa, L1H7K4, Canada
4 Department of Mechanical Engineering, University of Mosul, Mosul, 41001, Iraq
5 Genetics Department, University of Georgia, Athens, 30602, GA, USA

* Corresponding Author: Ahmad M. Al-Omari. Email: email

Computers, Materials & Continua 2023, 75(2), 4155-4173. https://doi.org/10.32604/cmc.2023.036956

Abstract

There are quintillions of data on deoxyribonucleic acid (DNA) and protein in publicly accessible data banks, and that number is expanding at an exponential rate. Many scientific fields, such as bioinformatics and drug discovery, rely on such data; nevertheless, gathering and extracting data from these resources is a tough undertaking. This data should go through several processes, including mining, data processing, analysis, and classification. This study proposes software that extracts data from big data repositories automatically and with the particular ability to repeat data extraction phases as many times as needed without human intervention. This software simulates the extraction of data from web-based (point-and-click) resources or graphical user interfaces that cannot be accessed using command-line tools. The software was evaluated by creating a novel database of 34 parameters for 1360 physicochemical properties of antimicrobial peptides (AMP) sequences (46240 hits) from various MARVIN software panels, which can be later utilized to develop novel AMPs. Furthermore, for machine learning research, the program was validated by extracting 10,000 protein tertiary structures from the Protein Data Bank. As a result, data collection from the web will become faster and less expensive, with no need for manual data extraction. The software is critical as a first step to preparing large datasets for subsequent stages of analysis, such as those using machine and deep-learning applications.

Keywords


Cite This Article

APA Style
Al-Omari, A.M., Tawalbeh, S.M., Akkam, Y.H., Al-Tawalbeh, M., Younis, S. et al. (2023). Big data bot with a special reference to bioinformatics. Computers, Materials & Continua, 75(2), 4155-4173. https://doi.org/10.32604/cmc.2023.036956
Vancouver Style
Al-Omari AM, Tawalbeh SM, Akkam YH, Al-Tawalbeh M, Younis S, Mustafa AA, et al. Big data bot with a special reference to bioinformatics. Comput Mater Contin. 2023;75(2):4155-4173 https://doi.org/10.32604/cmc.2023.036956
IEEE Style
A.M. Al-Omari et al., “Big Data Bot with a Special Reference to Bioinformatics,” Comput. Mater. Contin., vol. 75, no. 2, pp. 4155-4173, 2023. https://doi.org/10.32604/cmc.2023.036956



cc Copyright © 2023 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 754

    View

  • 542

    Download

  • 0

    Like

Share Link