Feng Yan1, Wushouer Silamu2, Yanbing Li1,*
CMC-Computers, Materials & Continua, Vol.76, No.1, pp. 65-80, 2023, DOI:10.32604/cmc.2023.038177
- 08 June 2023
Abstract Visual question answering (VQA) requires a deep understanding of images and their corresponding textual questions to answer questions about images more accurately. However, existing models tend to ignore the implicit knowledge in the images and focus only on the visual information in the images, which limits the understanding depth of the image content. The images contain more than just visual objects, some images contain textual information about the scene, and slightly more complex images contain relationships between individual visual objects. Firstly, this paper proposes a model using image description for feature enhancement. This model encodes… More >