Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (21)
  • Open Access

    ARTICLE

    Performance vs. Complexity Comparative Analysis of Multimodal Bilinear Pooling Fusion Approaches for Deep Learning-Based Visual Arabic-Question Answering Systems

    Sarah M. Kamel1,*, Mai A. Fadel2, Lamiaa Elrefaei1,3, Shimaa I. Hassan1,4

    CMES-Computer Modeling in Engineering & Sciences, Vol.143, No.1, pp. 373-411, 2025, DOI:10.32604/cmes.2025.062837 - 11 April 2025

    Abstract Visual question answering (VQA) is a multimodal task, involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer. In this paper, we propose a VQA system intended to answer yes/no questions about real-world images, in Arabic. To support a robust VQA system, we work in two directions: (1) Using deep neural networks to semantically represent the given image and question in a fine-grained manner, namely ResNet-152 and Gated Recurrent Units (GRU). (2) Studying the role of the utilized multimodal bilinear… More >

  • Open Access

    ARTICLE

    UniTrans: Unified Parameter-Efficient Transfer Learning and Multimodal Alignment for Large Multimodal Foundation Model

    Jiakang Sun1,2, Ke Chen1,2, Xinyang He1,2, Xu Liu1,2, Ke Li1,2, Cheng Peng1,2,*

    CMC-Computers, Materials & Continua, Vol.83, No.1, pp. 219-238, 2025, DOI:10.32604/cmc.2025.059745 - 26 March 2025

    Abstract With the advancements in parameter-efficient transfer learning techniques, it has become feasible to leverage large pre-trained language models for downstream tasks under low-cost and low-resource conditions. However, applying this technique to multimodal knowledge transfer introduces a significant challenge: ensuring alignment across modalities while minimizing the number of additional parameters required for downstream task adaptation. This paper introduces UniTrans, a framework aimed at facilitating efficient knowledge transfer across multiple modalities. UniTrans leverages Vector-based Cross-modal Random Matrix Adaptation to enable fine-tuning with minimal parameter overhead. To further enhance modality alignment, we introduce two key components: the Multimodal More >

  • Open Access

    ARTICLE

    Adjusted Reasoning Module for Deep Visual Question Answering Using Vision Transformer

    Christine Dewi1,3, Hanna Prillysca Chernovita2, Stephen Abednego Philemon1, Christian Adi Ananta1, Abbott Po Shun Chen4,*

    CMC-Computers, Materials & Continua, Vol.81, No.3, pp. 4195-4216, 2024, DOI:10.32604/cmc.2024.057453 - 19 December 2024

    Abstract Visual Question Answering (VQA) is an interdisciplinary artificial intelligence (AI) activity that integrates computer vision and natural language processing. Its purpose is to empower machines to respond to questions by utilizing visual information. A VQA system typically takes an image and a natural language query as input and produces a textual answer as output. One major obstacle in VQA is identifying a successful method to extract and merge textual and visual data. We examine “Fusion” Models that use information from both the text encoder and picture encoder to efficiently perform the visual question-answering challenge. For More >

  • Open Access

    ARTICLE

    DPAL-BERT: A Faster and Lighter Question Answering Model

    Lirong Yin1, Lei Wang1, Zhuohang Cai2, Siyu Lu2,*, Ruiyang Wang2, Ahmed AlSanad3, Salman A. AlQahtani3, Xiaobing Chen4, Zhengtong Yin5, Xiaolu Li6, Wenfeng Zheng2,3,*

    CMES-Computer Modeling in Engineering & Sciences, Vol.141, No.1, pp. 771-786, 2024, DOI:10.32604/cmes.2024.052622 - 20 August 2024

    Abstract Recent advancements in natural language processing have given rise to numerous pre-training language models in question-answering systems. However, with the constant evolution of algorithms, data, and computing power, the increasing size and complexity of these models have led to increased training costs and reduced efficiency. This study aims to minimize the inference time of such models while maintaining computational performance. It also proposes a novel Distillation model for PAL-BERT (DPAL-BERT), specifically, employs knowledge distillation, using the PAL-BERT model as the teacher model to train two student models: DPAL-BERT-Bi and DPAL-BERT-C. This research enhances the dataset More >

  • Open Access

    ARTICLE

    Improving VQA via Dual-Level Feature Embedding Network

    Yaru Song*, Huahu Xu, Dikai Fang

    Intelligent Automation & Soft Computing, Vol.39, No.3, pp. 397-416, 2024, DOI:10.32604/iasc.2023.040521 - 11 July 2024

    Abstract Visual Question Answering (VQA) has sparked widespread interest as a crucial task in integrating vision and language. VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual regions with input questions. The detection-based features extracted by the object detection network aim to acquire the visual attention distribution on a predetermined detection frame and provide object-level insights to answer questions about foreground objects more effectively. However, it cannot answer the question about the background forms without detection boxes due to the lack of fine-grained details, which is the advantage of grid-based features. In… More >

  • Open Access

    ARTICLE

    PAL-BERT: An Improved Question Answering Model

    Wenfeng Zheng1, Siyu Lu1, Zhuohang Cai1, Ruiyang Wang1, Lei Wang2, Lirong Yin2,*

    CMES-Computer Modeling in Engineering & Sciences, Vol.139, No.3, pp. 2729-2745, 2024, DOI:10.32604/cmes.2023.046692 - 11 March 2024

    Abstract In the field of natural language processing (NLP), there have been various pre-training language models in recent years, with question answering systems gaining significant attention. However, as algorithms, data, and computing power advance, the issue of increasingly larger models and a growing number of parameters has surfaced. Consequently, model training has become more costly and less efficient. To enhance the efficiency and accuracy of the training process while reducing the model volume, this paper proposes a first-order pruning model PAL-BERT based on the ALBERT model according to the characteristics of question-answering (QA) system and language More >

  • Open Access

    ARTICLE

    MVCE-Net: Multi-View Region Feature and Caption Enhancement Co-Attention Network for Visual Question Answering

    Feng Yan1, Wushouer Silamu2, Yanbing Li1,*

    CMC-Computers, Materials & Continua, Vol.76, No.1, pp. 65-80, 2023, DOI:10.32604/cmc.2023.038177 - 08 June 2023

    Abstract Visual question answering (VQA) requires a deep understanding of images and their corresponding textual questions to answer questions about images more accurately. However, existing models tend to ignore the implicit knowledge in the images and focus only on the visual information in the images, which limits the understanding depth of the image content. The images contain more than just visual objects, some images contain textual information about the scene, and slightly more complex images contain relationships between individual visual objects. Firstly, this paper proposes a model using image description for feature enhancement. This model encodes… More >

  • Open Access

    ARTICLE

    Improved Blending Attention Mechanism in Visual Question Answering

    Siyu Lu1, Yueming Ding1, Zhengtong Yin2, Mingzhe Liu3,*, Xuan Liu4, Wenfeng Zheng1,*, Lirong Yin5

    Computer Systems Science and Engineering, Vol.47, No.1, pp. 1149-1161, 2023, DOI:10.32604/csse.2023.038598 - 26 May 2023

    Abstract Visual question answering (VQA) has attracted more and more attention in computer vision and natural language processing. Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks. Analysis of all features may cause information redundancy and heavy computational burden. Attention mechanism is a wise way to solve this problem. However, using single attention mechanism may cause incomplete concern of features. This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention More >

  • Open Access

    ARTICLE

    Expert Recommendation in Community Question Answering via Heterogeneous Content Network Embedding

    Hong Li1,*, Jianjun Li1, Guohui Li1, Rong Gao2, Lingyu Yan2

    CMC-Computers, Materials & Continua, Vol.75, No.1, pp. 1687-1709, 2023, DOI:10.32604/cmc.2023.035239 - 06 February 2023

    Abstract Expert Recommendation (ER) aims to identify domain experts with high expertise and willingness to provide answers to questions in Community Question Answering (CQA) web services. How to model questions and users in the heterogeneous content network is critical to this task. Most traditional methods focus on modeling questions and users based on the textual content left in the community while ignoring the structural properties of heterogeneous CQA networks and always suffering from textual data sparsity issues. Recent approaches take advantage of structural proximities between nodes and attempt to fuse the textual content of nodes for… More >

  • Open Access

    ARTICLE

    Information Extraction Based on Multi-turn Question Answering for Analyzing Korean Research Trends

    Seongung Jo1, Heung-Seon Oh1,*, Sanghun Im1, Gibaeg Kim1, Seonho Kim2

    CMC-Computers, Materials & Continua, Vol.74, No.2, pp. 2967-2980, 2023, DOI:10.32604/cmc.2023.031983 - 31 October 2022

    Abstract Analyzing Research and Development (R&D) trends is important because it can influence future decisions regarding R&D direction. In typical trend analysis, topic or technology taxonomies are employed to compute the popularities of the topics or codes over time. Although it is simple and effective, the taxonomies are difficult to manage because new technologies are introduced rapidly. Therefore, recent studies exploit deep learning to extract pre-defined targets such as problems and solutions. Based on the recent advances in question answering (QA) using deep learning, we adopt a multi-turn QA model to extract problems and solutions from… More >

Displaying 1-10 on page 1 of 21. Per Page