Yue Li, Jin Liu*, Shengjie Shang
Journal on Big Data, Vol.3, No.3, pp. 111-118, 2021, DOI:10.32604/jbd.2021.017169
- 22 November 2021
Abstract Visual Question Answering (VQA) has attracted extensive research
focus and has become a hot topic in deep learning recently. The development of
computer vision and natural language processing technology has contributed to
the advancement of this research area. Key solutions to improve the performance
of VQA system exist in feature extraction, multimodal fusion, and answer
prediction modules. There exists an unsolved issue in the popular VQA image
feature extraction module that extracts the fine-grained features from objects of
different scale difficultly. In this paper, a novel feature extraction network that
combines multi-scale convolution and self-attention More >