Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (8)
  • Open Access

    ARTICLE

    Improved Speech Emotion Recognition Focusing on High-Level Data Representations and Swift Feature Extraction Calculation

    Akmalbek Abdusalomov1, Alpamis Kutlimuratov2, Rashid Nasimov3, Taeg Keun Whangbo1,*

    CMC-Computers, Materials & Continua, Vol.77, No.3, pp. 2915-2933, 2023, DOI:10.32604/cmc.2023.044466 - 26 December 2023

    Abstract The performance of a speech emotion recognition (SER) system is heavily influenced by the efficacy of its feature extraction techniques. The study was designed to advance the field of SER by optimizing feature extraction techniques, specifically through the incorporation of high-resolution Mel-spectrograms and the expedited calculation of Mel Frequency Cepstral Coefficients (MFCC). This initiative aimed to refine the system’s accuracy by identifying and mitigating the shortcomings commonly found in current approaches. Ultimately, the primary objective was to elevate both the intricacy and effectiveness of our SER model, with a focus on augmenting its proficiency in… More >

  • Open Access

    ARTICLE

    Profiling of Urban Noise Using Artificial Intelligence

    Le Quang Thao1,2,*, Duong Duc Cuong2, Tran Thi Tuong Anh3, Tran Duc Luong4

    Computer Systems Science and Engineering, Vol.45, No.2, pp. 1309-1321, 2023, DOI:10.32604/csse.2023.031010 - 03 November 2022

    Abstract Noise pollution tends to receive less awareness compared to other types of pollution, however, it greatly impacts the quality of life for humans such as causing sleep disruption, stress or hearing impairment. Profiling urban sound through the identification of noise sources in cities could help to benefit livability by reducing exposure to noise pollution through methods such as noise control, planning of the soundscape environment, or selection of safe living space. In this paper, we proposed a self-attention long short-term memory (LSTM) method that can improve sound classification compared to previous baselines. An attention mechanism… More >

  • Open Access

    REVIEW

    Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms, Tools and Systems

    Sneha Basak1, Himanshi Agrawal1, Shreya Jena1, Shilpa Gite2,*, Mrinal Bachute2, Biswajeet Pradhan3,4,5,*, Mazen Assiri4

    CMES-Computer Modeling in Engineering & Sciences, Vol.135, No.2, pp. 1053-1089, 2023, DOI:10.32604/cmes.2022.021755 - 27 October 2022

    Abstract Speech recognition systems have become a unique human-computer interaction (HCI) family. Speech is one of the most naturally developed human abilities; speech signal processing opens up a transparent and hand-free computation experience. This paper aims to present a retrospective yet modern approach to the world of speech recognition systems. The development journey of ASR (Automatic Speech Recognition) has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper. A step-by-step rundown of the fundamental stages in developing speech recognition systems has been presented, along with a brief discussion of various More >

  • Open Access

    ARTICLE

    Automatic Detection of Outliers in Multi-Channel EMG Signals Using MFCC and SVM

    Muhammad Irfan1, Khalil Ullah2, Fazal Muhammad3,*, Salman Khan3, Faisal Althobiani4, Muhammad Usman5, Mohammed Alshareef4, Shadi Alghaffari4, Saifur Rahman1

    Intelligent Automation & Soft Computing, Vol.36, No.1, pp. 169-181, 2023, DOI:10.32604/iasc.2023.032337 - 29 September 2022

    Abstract The automatic detection of noisy channels in surface Electromyogram (sEMG) signals, at the time of recording, is very critical in making a noise-free EMG dataset. If an EMG signal contaminated by high-level noise is recorded, then it will be useless and can’t be used for any healthcare application. In this research work, a new machine learning-based paradigm is proposed to automate the detection of low-level and high-level noises occurring in different channels of high density and multi-channel sEMG signals. A modified version of mel frequency cepstral coefficients (mMFCC) is proposed for the extraction of features… More >

  • Open Access

    ARTICLE

    The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition

    Mohammad Amaz Uddin1, Mohammad Salah Uddin Chowdury1, Mayeen Uddin Khandaker2,*, Nissren Tamam3, Abdelmoneim Sulieman4

    CMC-Computers, Materials & Continua, Vol.74, No.1, pp. 1709-1722, 2023, DOI:10.32604/cmc.2023.031177 - 22 September 2022

    Abstract Human speech indirectly represents the mental state or emotion of others. The use of Artificial Intelligence (AI)-based techniques may bring revolution in this modern era by recognizing emotion from speech. In this study, we introduced a robust method for emotion recognition from human speech using a well-performed preprocessing technique together with the deep learning-based mixed model consisting of Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN). About 2800 audio files were extracted from the Toronto emotional speech set (TESS) database for this study. A high pass and Savitzky Golay Filter have been used to More >

  • Open Access

    ARTICLE

    Arabic Music Genre Classification Using Deep Convolutional Neural Networks (CNNs)

    Laiali Almazaydeh1,*, Saleh Atiewi2, Arar Al Tawil3, Khaled Elleithy4

    CMC-Computers, Materials & Continua, Vol.72, No.3, pp. 5443-5458, 2022, DOI:10.32604/cmc.2022.025526 - 21 April 2022

    Abstract Genres are one of the key features that categorize music based on specific series of patterns. However, the Arabic music content on the web is poorly defined into its genres, making the automatic classification of Arabic audio genres challenging. For this reason, in this research, our objective is first to construct a well-annotated dataset of five of the most well-known Arabic music genres, which are: Eastern Takht, Rai, Muwashshah, the poem, and Mawwal, and finally present a comprehensive empirical comparison of deep Convolutional Neural Networks (CNNs) architectures on Arabic music genres classification. In this work, More >

  • Open Access

    ARTICLE

    Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning

    Uğur Ayvaz1, Hüseyin Gürüler2, Faheem Khan3, Naveed Ahmed4, Taegkeun Whangbo3,*, Abdusalomov Akmalbek Bobomirzaevich3

    CMC-Computers, Materials & Continua, Vol.71, No.3, pp. 5511-5521, 2022, DOI:10.32604/cmc.2022.023278 - 14 January 2022

    Abstract Automatic speaker recognition (ASR) systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals. One of the most commonly used methods for feature extraction is Mel Frequency Cepstral Coefficients (MFCCs). Recent researches show that MFCCs are successful in processing the voice signal with high accuracies. MFCCs represents a sequence of voice signal-specific features. This experimental analysis is proposed to distinguish Turkish speakers by extracting the MFCCs from the speech recordings. Since the human perception of sound is not linear, after the More >

  • Open Access

    ARTICLE

    Age-Based Automatic Voice Conversion Using Blood Relation for Voice Impaired

    Palli Padmini1, C. Paramasivam1, G. Jyothish Lal2, Sadeen Alharbi3,*, Kaustav Bhowmick4

    CMC-Computers, Materials & Continua, Vol.70, No.2, pp. 4027-4051, 2022, DOI:10.32604/cmc.2022.020065 - 27 September 2021

    Abstract The present work presents a statistical method to translate human voices across age groups, based on commonalities in voices of blood relations. The age-translated voices have been naturalized extracting the blood relation features e.g., pitch, duration, energy, using Mel Frequency Cepstrum Coefficients (MFCC), for social compatibility of the voice-impaired. The system has been demonstrated using standard English and an Indian language. The voice samples for resynthesis were derived from 12 families, with member ages ranging from 8–80 years. The voice-age translation, performed using the Pitch synchronous overlap and add (PSOLA) approach, by modulation of extracted voice… More >

Displaying 1-10 on page 1 of 8. Per Page