An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters

Mohammed Hadwan; Hamzah Alsayadi; Salah AL-Hagree

doi:10.32604/cmc.2023.033457

Open Access icon Open Access

ARTICLE

An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters

Mohammed Hadwan^1,2,*, Hamzah A. Alsayadi^3,4, Salah AL-Hagree⁵

1 Department of Information Technology, College of Computer, Qassim University, Buraydah, 51452, Saudi Arabia
2 Department of Computer Science, College of Applied Sciences, Taiz University, Taiz, 6803, Yemen
3 Computer Science Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, 11566, Egypt
4 Computer Science Department, Faculty of Sciences, Ibb University, Yemen
5 Department of Computer Sciences & Information, Ibb University, Yemen

* Corresponding Author: Mohammed Hadwan. Email: email

Computers, Materials & Continua 2023, 74(2), 3471-3487. https://doi.org/10.32604/cmc.2023.033457

Received 17 June 2022; Accepted 19 August 2022; Issue published 31 October 2022

Abstract

The attention-based encoder-decoder technique, known as the trans-former, is used to enhance the performance of end-to-end automatic speech recognition (ASR). This research focuses on applying ASR end-to-end transformer-based models for the Arabic language, as the researchers’ community pays little attention to it. The Muslims Holy Qur’an book is written using Arabic diacritized text. In this paper, an end-to-end transformer model to building a robust Qur’an vs. recognition is proposed. The acoustic model was built using the transformer-based model as deep learning by the PyTorch framework. A multi-head attention mechanism is utilized to represent the encoder and decoder in the acoustic model. A Mel filter bank is used for feature extraction. To build a language model (LM), the Recurrent Neural Network (RNN) and Long short-term memory (LSTM) were used to train an n-gram word-based LM. As a part of this research, a new dataset of Qur’an verses and their associated transcripts were collected and processed for training and evaluating the proposed model, consisting of 10 h of .wav recitations performed by 60 reciters. The experimental results showed that the proposed end-to-end transformer-based model achieved a significant low character error rate (CER) of 1.98% and a word error rate (WER) of 6.16%. We have achieved state-of-the-art end-to-end transformer-based recognition for Qur’an reciters.

Keywords

Attention-based encoder-decoder; recurrent neural network; long short-term memory; qur’an reciters recognition; diacritized arabic text

Cite This Article

APA Style

Hadwan, M., Alsayadi, H.A., AL-Hagree, S. (2023). An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters. Computers, Materials & Continua, 74(2), 3471–3487. https://doi.org/10.32604/cmc.2023.033457

Vancouver Style

Hadwan M, Alsayadi HA, AL-Hagree S. An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters. Comput Mater Contin. 2023;74(2):3471–3487. https://doi.org/10.32604/cmc.2023.033457

IEEE Style

M. Hadwan, H. A. Alsayadi, and S. AL-Hagree, “An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters,” Comput. Mater. Contin., vol. 74, no. 2, pp. 3471–3487, 2023. https://doi.org/10.32604/cmc.2023.033457

BibTex EndNote RIS

Copyright © 2023 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters

Abstract

Keywords

Cite This Article

1648

755

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link