Big Data Bot with a Special Reference to Bioinformatics

Ahmad Al-Omari; Shefa Tawalbeh; Yazan Akkam; Mohammad Al-Tawalbeh; Shima’a Younis; Abdullah Mustafa; Jonathan Arnold

doi:10.32604/cmc.2023.036956

Open Access icon Open Access

ARTICLE

Big Data Bot with a Special Reference to Bioinformatics

Ahmad M. Al-Omari^1,*, Shefa M. Tawalbeh¹, Yazan H. Akkam², Mohammad Al-Tawalbeh³, Shima’a Younis¹, Abdullah A. Mustafa⁴, Jonathan Arnold⁵

1 Biomedical Systems and Informatics Engineering Department, Yarmouk University, Irbid, 21163, Jordan
2 Department of Medicinal Chemistry and Pharmacognosy, Yarmouk University, Irbid, 21163, Jordan
3 Department of Electrical, Computer and Software Engineering, University of Ontario Institute of Technology, Oshawa, L1H7K4, Canada
4 Department of Mechanical Engineering, University of Mosul, Mosul, 41001, Iraq
5 Genetics Department, University of Georgia, Athens, 30602, GA, USA

* Corresponding Author: Ahmad M. Al-Omari. Email: email

Computers, Materials & Continua 2023, 75(2), 4155-4173. https://doi.org/10.32604/cmc.2023.036956

Received 18 October 2022; Accepted 08 February 2023; Issue published 31 March 2023

Abstract

There are quintillions of data on deoxyribonucleic acid (DNA) and protein in publicly accessible data banks, and that number is expanding at an exponential rate. Many scientific fields, such as bioinformatics and drug discovery, rely on such data; nevertheless, gathering and extracting data from these resources is a tough undertaking. This data should go through several processes, including mining, data processing, analysis, and classification. This study proposes software that extracts data from big data repositories automatically and with the particular ability to repeat data extraction phases as many times as needed without human intervention. This software simulates the extraction of data from web-based (point-and-click) resources or graphical user interfaces that cannot be accessed using command-line tools. The software was evaluated by creating a novel database of 34 parameters for 1360 physicochemical properties of antimicrobial peptides (AMP) sequences (46240 hits) from various MARVIN software panels, which can be later utilized to develop novel AMPs. Furthermore, for machine learning research, the program was validated by extracting 10,000 protein tertiary structures from the Protein Data Bank. As a result, data collection from the web will become faster and less expensive, with no need for manual data extraction. The software is critical as a first step to preparing large datasets for subsequent stages of analysis, such as those using machine and deep-learning applications.

Keywords

Bioinformatics; big data; data extraction; bot; drug design

Cite This Article

APA Style

Al-Omari, A.M., Tawalbeh, S.M., Akkam, Y.H., Al-Tawalbeh, M., Younis, S. et al. (2023). Big Data Bot with a Special Reference to Bioinformatics. Computers, Materials & Continua, 75(2), 4155–4173. https://doi.org/10.32604/cmc.2023.036956

Vancouver Style

Al-Omari AM, Tawalbeh SM, Akkam YH, Al-Tawalbeh M, Younis S, Mustafa AA, et al. Big Data Bot with a Special Reference to Bioinformatics. Comput Mater Contin. 2023;75(2):4155–4173. https://doi.org/10.32604/cmc.2023.036956

IEEE Style

A. M. Al-Omari et al., “Big Data Bot with a Special Reference to Bioinformatics,” Comput. Mater. Contin., vol. 75, no. 2, pp. 4155–4173, 2023. https://doi.org/10.32604/cmc.2023.036956

BibTex EndNote RIS

Copyright © 2023 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Big Data Bot with a Special Reference to Bioinformatics

Abstract

Keywords

Cite This Article

1353

1043

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link