Tech Science Press - Publisher of Open Access Journals

Open Access

ARTICLE

RSG-Conformer: ReLU-Based Sparse and Grouped Conformer for Audio-Visual Speech Recognition

Yewei Xiao, Xin Du^*, Wei Zeng

CMC-Computers, Materials & Continua, Vol.86, No.3, 2026, DOI:10.32604/cmc.2025.072145 - 12 January 2026

Abstract Audio-visual speech recognition (AVSR), which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions, has attracted significant research interest. However, Conformer-based architectures remain computational expensive due to the quadratic increase in the spatial and temporal complexity of their softmax-based attention mechanisms with sequence length. In addition, Conformer-based architectures may not provide sufficient flexibility for modeling local dependencies at different granularities. To mitigate these limitations, this study introduces a novel AVSR framework based on a ReLU-based Sparse and Grouped Conformer (RSG-Conformer) architecture. Specifically, we propose a Global-enhanced Sparse… More >

Open Access

ARTICLE

Speech Emotion Recognition Based on the Adaptive Acoustic Enhancement and Refined Attention Mechanism

Jun Li¹, Chunyan Liang^1,*, Zhiguo Liu¹, Fengpei Ge²

CMC-Computers, Materials & Continua, Vol.86, No.3, 2026, DOI:10.32604/cmc.2025.071011 - 12 January 2026

Abstract To enhance speech emotion recognition capability, this study constructs a speech emotion recognition model integrating the adaptive acoustic mixup (AAM) and improved coordinate and shuffle attention (ICASA) methods. The AAM method optimizes data augmentation by combining a sample selection strategy and dynamic interpolation coefficients, thus enabling information fusion of speech data with different emotions at the acoustic level. The ICASA method enhances feature extraction capability through dynamic fusion of the improved coordinate attention (ICA) and shuffle attention (SA) techniques. The ICA technique reduces computational overhead by employing depth-separable convolution and an h-swish activation function and More >

Open Access

ARTICLE

A Synthetic Speech Detection Model Combining Local-Global Dependency

Jiahui Song, Yuepeng Zhang, Wenhao Yuan^*

CMC-Computers, Materials & Continua, Vol.86, No.1, pp. 1-15, 2026, DOI:10.32604/cmc.2025.069918 - 10 November 2025

Abstract Synthetic speech detection is an essential task in the field of voice security, aimed at identifying deceptive voice attacks generated by text-to-speech (TTS) systems or voice conversion (VC) systems. In this paper, we propose a synthetic speech detection model called TFTransformer, which integrates both local and global features to enhance detection capabilities by effectively modeling local and global dependencies. Structurally, the model is divided into two main components: a front-end and a back-end. The front-end of the model uses a combination of SincLayer and two-dimensional (2D) convolution to extract high-level feature maps (HFM) containing local… More >

Open Access

ARTICLE

Mordukhovich Subdifferential Optimization Framework for Multi-Criteria Voice Cloning of Pathological Speech

Rytis Maskeliūnas¹, Robertas Damaševičius^1,*, Audrius Kulikajevas¹, Kipras Pribuišis², Nora Ulozaitė-Stanienė², Virgilijus Uloza²

CMES-Computer Modeling in Engineering & Sciences, Vol.145, No.3, pp. 4203-4223, 2025, DOI:10.32604/cmes.2025.072790 - 23 December 2025

Abstract This study introduces a novel voice cloning framework driven by Mordukhovich Subdifferential Optimization (MSO) to address the complex multi-objective challenges of pathological speech synthesis in under-resourced Lithuanian language with unique phonemes not present in most pre-trained models. Unlike existing voice synthesis models that often optimize for a single objective or are restricted to major languages, our approach explicitly balances four competing criteria: speech naturalness, speaker similarity, computational efficiency, and adaptability to pathological voice patterns. We evaluate four model configurations combining Lithuanian and English encoders, synthesizers, and vocoders. The hybrid model (English encoder, Lithuanian synthesizer, English More >

Open Access

ARTICLE

Using Hate Speech Detection Techniques to Prevent Violence and Foster Community Safety

Ayaz Hussain¹, Asad Hayat², Muhammad Hasnain^1,*

Journal on Artificial Intelligence, Vol.7, pp. 485-498, 2025, DOI:10.32604/jai.2025.071933 - 17 November 2025

Abstract Violent hate speech and scapegoating people against one another have emerged as a rising worldwide issue. But identifying and combating such content is crucial to create safer and more inclusive societies. The current study conducted research using Machine Learning models to classify hate speech and overcome the limitations posed in the existing detection techniques. Logistic Regression (LR), Random Forest (RF), K-Nearest Neighbour (KNN) and Decision Tree were used on top of a publicly available hate speech dataset. The data was preprocessed by cleaning the text and tokenization and using normalization techniques to efficiently train the… More >

Open Access

ARTICLE

A Black-Box Speech Adversarial Attack Method Based on Enhanced Neural Predictors in Industrial IoT

Yun Zhang, Zhenhua Yu^*, Xufei Hu, Xuya Cong, Ou Ye

CMC-Computers, Materials & Continua, Vol.84, No.3, pp. 5403-5426, 2025, DOI:10.32604/cmc.2025.067120 - 30 July 2025

Abstract Devices in Industrial Internet of Things are vulnerable to voice adversarial attacks. Studying adversarial speech samples is crucial for enhancing the security of automatic speech recognition systems in Industrial Internet of Things devices. Current black-box attack methods often face challenges such as complex search processes and excessive perturbation generation. To address these issues, this paper proposes a black-box voice adversarial attack method based on enhanced neural predictors. This method searches for minimal perturbations in the perturbation space, employing an optimization process guided by a self-attention neural predictor to identify the optimal perturbation direction. This direction… More >

Open Access

ARTICLE

Enhancing Phoneme Labeling in Dysarthric Speech with Digital Twin-Driven Multi-Modal Architecture

Saeed Alzahrani¹, Nazar Hussain², Farah Mohammad^3,*

CMC-Computers, Materials & Continua, Vol.84, No.3, pp. 4825-4849, 2025, DOI:10.32604/cmc.2025.066322 - 30 July 2025

Abstract Digital twin technology is revolutionizing personalized healthcare by creating dynamic virtual replicas of individual patients. This paper presents a novel multi-modal architecture leveraging digital twins to enhance precision in predictive diagnostics and treatment planning of phoneme labeling. By integrating real-time images, electronic health records, and genomic information, the system enables personalized simulations for disease progression modeling, treatment response prediction, and preventive care strategies. In dysarthric speech, which is characterized by articulation imprecision, temporal misalignments, and phoneme distortions, existing models struggle to capture these irregularities. Traditional approaches, often relying solely on audio features, fail to address… More >

Open Access

ARTICLE

Deep Learning-Based Lip-Reading for Vocal Impaired Patient Rehabilitation

Chiara Innocente^1,*, Matteo Boemio², Gianmarco Lorenzetti², Ilaria Pulito², Diego Romagnoli², Valeria Saponaro², Giorgia Marullo¹, Luca Ulrich¹, Enrico Vezzetti¹

CMES-Computer Modeling in Engineering & Sciences, Vol.143, No.2, pp. 1355-1379, 2025, DOI:10.32604/cmes.2025.063186 - 30 May 2025

Abstract Lip-reading technology, based on visual speech decoding and automatic speech recognition, offers a promising solution to overcoming communication barriers, particularly for individuals with temporary or permanent speech impairments. However, most Visual Speech Recognition (VSR) research has primarily focused on the English language and general-purpose applications, limiting its practical applicability in medical and rehabilitative settings. This study introduces the first Deep Learning (DL) based lip-reading system for the Italian language designed to assist individuals with vocal cord pathologies in daily interactions, facilitating communication for patients recovering from vocal cord surgeries, whether temporarily or permanently impaired. To… More >

Open Access

ARTICLE

Integrating Speech-to-Text for Image Generation Using Generative Adversarial Networks

Smita Mahajan¹, Shilpa Gite^1,2, Biswajeet Pradhan^3,*, Abdullah Alamri⁴, Shaunak Inamdar⁵, Deva Shriyansh⁵, Akshat Ashish Shah⁵, Shruti Agarwal⁵

CMES-Computer Modeling in Engineering & Sciences, Vol.143, No.2, pp. 2001-2026, 2025, DOI:10.32604/cmes.2025.058456 - 30 May 2025

Abstract The development of generative architectures has resulted in numerous novel deep-learning models that generate images using text inputs. However, humans naturally use speech for visualization prompts. Therefore, this paper proposes an architecture that integrates speech prompts as input to image-generation Generative Adversarial Networks (GANs) model, leveraging Speech-to-Text translation along with the CLIP + VQGAN model. The proposed method involves translating speech prompts into text, which is then used by the Contrastive Language-Image Pretraining (CLIP) + Vector Quantized Generative Adversarial Network (VQGAN) model to generate images. This paper outlines the steps required to implement such a… More >

Open Access

ARTICLE

ALCTS—An Assistive Learning and Communicative Tool for Speech and Hearing Impaired Students

Shabana Ziyad Puthu Vedu^1,*, Wafaa A. Ghonaim², Naglaa M. Mostafa³, Pradeep Kumar Singh⁴

CMC-Computers, Materials & Continua, Vol.83, No.2, pp. 2599-2617, 2025, DOI:10.32604/cmc.2025.062695 - 16 April 2025

Abstract Hearing and Speech impairment can be congenital or acquired. Hearing and speech-impaired students often hesitate to pursue higher education in reputable institutions due to their challenges. However, the development of automated assistive learning tools within the educational field has empowered disabled students to pursue higher education in any field of study. Assistive learning devices enable students to access institutional resources and facilities fully. The proposed assistive learning and communication tool allows hearing and speech-impaired students to interact productively with their teachers and classmates. This tool converts the audio signals into sign language videos for the… More >

Displaying 1-10 on page 1 of 111. Per Page

View

935

Download

430

View

575

Download

221

View

1036

Download

374

View

1025

Download

329

View

523

Download

183

View

1361

Download

846

View

1178

Download

627

View

2150

Download

901

View

1983

Download

723

View

1349

Download

534

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp: