Yuanle Chen1, Haobo Wang1, Chunyu Liu1, Linyi Wang2, Jiaxin Liu1, Wei Wu1,*
CMES-Computer Modeling in Engineering & Sciences, Vol.139, No.3, pp. 2985-3009, 2024, DOI:10.32604/cmes.2023.046837
- 11 March 2024
Abstract Recently, there have been significant advancements in the study of semantic communication in single-modal scenarios. However, the ability to process information in multi-modal environments remains limited. Inspired by the research and applications of natural language processing across different modalities, our goal is to accurately extract frame-level semantic information from videos and ultimately transmit high-quality videos. Specifically, we propose a deep learning-based Multi-Modal Mutual Enhancement Video Semantic Communication system, called M3E-VSC. Built upon a Vector Quantized Generative Adversarial Network (VQGAN), our system aims to leverage mutual enhancement among different modalities by using text as the main More >