Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (2)
  • Open Access

    ARTICLE

    Improving VQA via Dual-Level Feature Embedding Network

    Yaru Song*, Huahu Xu, Dikai Fang

    Intelligent Automation & Soft Computing, Vol.39, No.3, pp. 397-416, 2024, DOI:10.32604/iasc.2023.040521 - 11 July 2024

    Abstract Visual Question Answering (VQA) has sparked widespread interest as a crucial task in integrating vision and language. VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual regions with input questions. The detection-based features extracted by the object detection network aim to acquire the visual attention distribution on a predetermined detection frame and provide object-level insights to answer questions about foreground objects more effectively. However, it cannot answer the question about the background forms without detection boxes due to the lack of fine-grained details, which is the advantage of grid-based features. In… More >

  • Open Access

    ARTICLE

    WMA: A Multi-Scale Self-Attention Feature Extraction Network Based on Weight Sharing for VQA

    Yue Li, Jin Liu*, Shengjie Shang

    Journal on Big Data, Vol.3, No.3, pp. 111-118, 2021, DOI:10.32604/jbd.2021.017169 - 22 November 2021

    Abstract Visual Question Answering (VQA) has attracted extensive research focus and has become a hot topic in deep learning recently. The development of computer vision and natural language processing technology has contributed to the advancement of this research area. Key solutions to improve the performance of VQA system exist in feature extraction, multimodal fusion, and answer prediction modules. There exists an unsolved issue in the popular VQA image feature extraction module that extracts the fine-grained features from objects of different scale difficultly. In this paper, a novel feature extraction network that combines multi-scale convolution and self-attention More >

Displaying 1-10 on page 1 of 2. Per Page