Qiming Ma, Fanliang Bu*, Rong Wang, Lingbin Bu, Yifan Wang, Zhiyuan Li
CMC-Computers, Materials & Continua, Vol.82, No.3, pp. 5169-5184, 2025, DOI:10.32604/cmc.2025.061187
- 06 March 2025
Abstract Speech-face association aims to achieve identity matching between facial images and voice segments by aligning cross-modal features. Existing research primarily focuses on learning shared-space representations and computing one-to-one similarities between cross-modal sample pairs to establish their correlation. However, these approaches do not fully account for intra-class variations between the modalities or the many-to-many relationships among cross-modal samples, which are crucial for robust association modeling. To address these challenges, we propose a novel framework that leverages global information to align voice and face embeddings while effectively correlating identity information embedded in both modalities. First, we jointly… More >