Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (2)
  • Open Access

    ARTICLE

    Robust Audio-Visual Fusion for Emotion Recognition Based on Cross-Modal Learning under Noisy Conditions

    A-Seong Moon1, Seungyeon Jeong1, Donghee Kim1, Mohd Asyraf Zulkifley2, Bong-Soo Sohn3,*, Jaesung Lee1,*

    CMC-Computers, Materials & Continua, Vol.85, No.2, pp. 2851-2872, 2025, DOI:10.32604/cmc.2025.067103 - 23 September 2025

    Abstract Emotion recognition under uncontrolled and noisy environments presents persistent challenges in the design of emotionally responsive systems. The current study introduces an audio-visual recognition framework designed to address performance degradation caused by environmental interference, such as background noise, overlapping speech, and visual obstructions. The proposed framework employs a structured fusion approach, combining early-stage feature-level integration with decision-level coordination guided by temporal attention mechanisms. Audio data are transformed into mel-spectrogram representations, and visual data are represented as raw frame sequences. Spatial and temporal features are extracted through convolutional and transformer-based encoders, allowing the framework to capture… More > Graphic Abstract

    Robust Audio-Visual Fusion for Emotion Recognition Based on Cross-Modal Learning under Noisy Conditions

  • Open Access

    ARTICLE

    Cross-Modal Simplex Center Learning for Speech-Face Association

    Qiming Ma, Fanliang Bu*, Rong Wang, Lingbin Bu, Yifan Wang, Zhiyuan Li

    CMC-Computers, Materials & Continua, Vol.82, No.3, pp. 5169-5184, 2025, DOI:10.32604/cmc.2025.061187 - 06 March 2025

    Abstract Speech-face association aims to achieve identity matching between facial images and voice segments by aligning cross-modal features. Existing research primarily focuses on learning shared-space representations and computing one-to-one similarities between cross-modal sample pairs to establish their correlation. However, these approaches do not fully account for intra-class variations between the modalities or the many-to-many relationships among cross-modal samples, which are crucial for robust association modeling. To address these challenges, we propose a novel framework that leverages global information to align voice and face embeddings while effectively correlating identity information embedded in both modalities. First, we jointly… More >

Displaying 1-10 on page 1 of 2. Per Page