Ou Ye, Xinli Wei, Zhenhua Yu*, Yan Fu, Ying Yang
CMC-Computers, Materials & Continua, Vol.78, No.1, pp. 1071-1093, 2024, DOI:10.32604/cmc.2023.046418
- 30 January 2024
Abstract In the video captioning methods based on an encoder-decoder, limited visual features are extracted by an encoder, and a natural sentence of the video content is generated using a decoder. However, this kind of method is dependent on a single video input source and few visual labels, and there is a problem with semantic alignment between video contents and generated natural sentences, which are not suitable for accurately comprehending and describing the video contents. To address this issue, this paper proposes a video captioning method by semantic topic-guided generation. First, a 3D convolutional neural network… More >