Somin Park1, Mpabulungi Mark1, Bogyung Park2, Hyunki Hong1,*
CMC-Computers, Materials & Continua, Vol.77, No.1, pp. 1009-1030, 2023, DOI:10.32604/cmc.2023.041332
- 31 October 2023
Abstract Speech emotion recognition is essential for frictionless human-machine interaction, where machines respond to human instructions with context-aware actions. The properties of individuals’ voices vary with culture, language, gender, and personality. These variations in speaker-specific properties may hamper the performance of standard representations in downstream tasks such as speech emotion recognition (SER). This study demonstrates the significance of speaker-specific speech characteristics and how considering them can be leveraged to improve the performance of SER models. In the proposed approach, two wav2vec-based modules (a speaker-identification network and an emotion classification network) are trained with the Arcface loss.… More >