Tech Science Press - Publisher of Open Access Journals

Open Access

ARTICLE

Performance vs. Complexity Comparative Analysis of Multimodal Bilinear Pooling Fusion Approaches for Deep Learning-Based Visual Arabic-Question Answering Systems

Sarah M. Kamel^1,*, Mai A. Fadel², Lamiaa Elrefaei^1,3, Shimaa I. Hassan^1,4

CMES-Computer Modeling in Engineering & Sciences, Vol.143, No.1, pp. 373-411, 2025, DOI:10.32604/cmes.2025.062837 - 11 April 2025

Abstract Visual question answering (VQA) is a multimodal task, involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer. In this paper, we propose a VQA system intended to answer yes/no questions about real-world images, in Arabic. To support a robust VQA system, we work in two directions: (1) Using deep neural networks to semantically represent the given image and question in a fine-grained manner, namely ResNet-152 and Gated Recurrent Units (GRU). (2) Studying the role of the utilized multimodal bilinear… More >

Open Access

ARTICLE

UniTrans: Unified Parameter-Efficient Transfer Learning and Multimodal Alignment for Large Multimodal Foundation Model

Jiakang Sun^1,2, Ke Chen^1,2, Xinyang He^1,2, Xu Liu^1,2, Ke Li^1,2, Cheng Peng^1,2,*

CMC-Computers, Materials & Continua, Vol.83, No.1, pp. 219-238, 2025, DOI:10.32604/cmc.2025.059745 - 26 March 2025

Abstract With the advancements in parameter-efficient transfer learning techniques, it has become feasible to leverage large pre-trained language models for downstream tasks under low-cost and low-resource conditions. However, applying this technique to multimodal knowledge transfer introduces a significant challenge: ensuring alignment across modalities while minimizing the number of additional parameters required for downstream task adaptation. This paper introduces UniTrans, a framework aimed at facilitating efficient knowledge transfer across multiple modalities. UniTrans leverages Vector-based Cross-modal Random Matrix Adaptation to enable fine-tuning with minimal parameter overhead. To further enhance modality alignment, we introduce two key components: the Multimodal More >

Open Access

ARTICLE

Adjusted Reasoning Module for Deep Visual Question Answering Using Vision Transformer

Christine Dewi^1,3, Hanna Prillysca Chernovita², Stephen Abednego Philemon¹, Christian Adi Ananta¹, Abbott Po Shun Chen^4,*

CMC-Computers, Materials & Continua, Vol.81, No.3, pp. 4195-4216, 2024, DOI:10.32604/cmc.2024.057453 - 19 December 2024

Abstract Visual Question Answering (VQA) is an interdisciplinary artificial intelligence (AI) activity that integrates computer vision and natural language processing. Its purpose is to empower machines to respond to questions by utilizing visual information. A VQA system typically takes an image and a natural language query as input and produces a textual answer as output. One major obstacle in VQA is identifying a successful method to extract and merge textual and visual data. We examine “Fusion” Models that use information from both the text encoder and picture encoder to efficiently perform the visual question-answering challenge. For More >

Open Access

ARTICLE

DPAL-BERT: A Faster and Lighter Question Answering Model

Lirong Yin¹, Lei Wang¹, Zhuohang Cai², Siyu Lu^2,*, Ruiyang Wang², Ahmed AlSanad³, Salman A. AlQahtani³, Xiaobing Chen⁴, Zhengtong Yin⁵, Xiaolu Li⁶, Wenfeng Zheng^2,3,*

CMES-Computer Modeling in Engineering & Sciences, Vol.141, No.1, pp. 771-786, 2024, DOI:10.32604/cmes.2024.052622 - 20 August 2024

Abstract Recent advancements in natural language processing have given rise to numerous pre-training language models in question-answering systems. However, with the constant evolution of algorithms, data, and computing power, the increasing size and complexity of these models have led to increased training costs and reduced efficiency. This study aims to minimize the inference time of such models while maintaining computational performance. It also proposes a novel Distillation model for PAL-BERT (DPAL-BERT), specifically, employs knowledge distillation, using the PAL-BERT model as the teacher model to train two student models: DPAL-BERT-Bi and DPAL-BERT-C. This research enhances the dataset More >

Open Access

ARTICLE

Improving VQA via Dual-Level Feature Embedding Network

Yaru Song^*, Huahu Xu, Dikai Fang

Intelligent Automation & Soft Computing, Vol.39, No.3, pp. 397-416, 2024, DOI:10.32604/iasc.2023.040521 - 11 July 2024

Abstract Visual Question Answering (VQA) has sparked widespread interest as a crucial task in integrating vision and language. VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual regions with input questions. The detection-based features extracted by the object detection network aim to acquire the visual attention distribution on a predetermined detection frame and provide object-level insights to answer questions about foreground objects more effectively. However, it cannot answer the question about the background forms without detection boxes due to the lack of fine-grained details, which is the advantage of grid-based features. In… More >

Open Access

ARTICLE

PAL-BERT: An Improved Question Answering Model

Wenfeng Zheng¹, Siyu Lu¹, Zhuohang Cai¹, Ruiyang Wang¹, Lei Wang², Lirong Yin^2,*

CMES-Computer Modeling in Engineering & Sciences, Vol.139, No.3, pp. 2729-2745, 2024, DOI:10.32604/cmes.2023.046692 - 11 March 2024

Abstract In the field of natural language processing (NLP), there have been various pre-training language models in recent years, with question answering systems gaining significant attention. However, as algorithms, data, and computing power advance, the issue of increasingly larger models and a growing number of parameters has surfaced. Consequently, model training has become more costly and less efficient. To enhance the efficiency and accuracy of the training process while reducing the model volume, this paper proposes a first-order pruning model PAL-BERT based on the ALBERT model according to the characteristics of question-answering (QA) system and language More >

Open Access

ARTICLE

MVCE-Net: Multi-View Region Feature and Caption Enhancement Co-Attention Network for Visual Question Answering

Feng Yan¹, Wushouer Silamu², Yanbing Li^1,*

CMC-Computers, Materials & Continua, Vol.76, No.1, pp. 65-80, 2023, DOI:10.32604/cmc.2023.038177 - 08 June 2023

Abstract Visual question answering (VQA) requires a deep understanding of images and their corresponding textual questions to answer questions about images more accurately. However, existing models tend to ignore the implicit knowledge in the images and focus only on the visual information in the images, which limits the understanding depth of the image content. The images contain more than just visual objects, some images contain textual information about the scene, and slightly more complex images contain relationships between individual visual objects. Firstly, this paper proposes a model using image description for feature enhancement. This model encodes… More >

Open Access

ARTICLE

Improved Blending Attention Mechanism in Visual Question Answering

Siyu Lu¹, Yueming Ding¹, Zhengtong Yin², Mingzhe Liu^3,*, Xuan Liu⁴, Wenfeng Zheng^1,*, Lirong Yin⁵

Computer Systems Science and Engineering, Vol.47, No.1, pp. 1149-1161, 2023, DOI:10.32604/csse.2023.038598 - 26 May 2023

Abstract Visual question answering (VQA) has attracted more and more attention in computer vision and natural language processing. Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks. Analysis of all features may cause information redundancy and heavy computational burden. Attention mechanism is a wise way to solve this problem. However, using single attention mechanism may cause incomplete concern of features. This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention More >

Open Access

ARTICLE

Expert Recommendation in Community Question Answering via Heterogeneous Content Network Embedding

Hong Li^1,*, Jianjun Li¹, Guohui Li¹, Rong Gao², Lingyu Yan²

CMC-Computers, Materials & Continua, Vol.75, No.1, pp. 1687-1709, 2023, DOI:10.32604/cmc.2023.035239 - 06 February 2023

Abstract Expert Recommendation (ER) aims to identify domain experts with high expertise and willingness to provide answers to questions in Community Question Answering (CQA) web services. How to model questions and users in the heterogeneous content network is critical to this task. Most traditional methods focus on modeling questions and users based on the textual content left in the community while ignoring the structural properties of heterogeneous CQA networks and always suffering from textual data sparsity issues. Recent approaches take advantage of structural proximities between nodes and attempt to fuse the textual content of nodes for… More >

Open Access

ARTICLE

Information Extraction Based on Multi-turn Question Answering for Analyzing Korean Research Trends

Seongung Jo¹, Heung-Seon Oh^1,*, Sanghun Im¹, Gibaeg Kim¹, Seonho Kim²

CMC-Computers, Materials & Continua, Vol.74, No.2, pp. 2967-2980, 2023, DOI:10.32604/cmc.2023.031983 - 31 October 2022

Abstract Analyzing Research and Development (R&D) trends is important because it can influence future decisions regarding R&D direction. In typical trend analysis, topic or technology taxonomies are employed to compute the popularities of the topics or codes over time. Although it is simple and effective, the taxonomies are difficult to manage because new technologies are introduced rapidly. Therefore, recent studies exploit deep learning to extract pre-defined targets such as problems and solutions. Based on the recent advances in question answering (QA) using deep learning, we adopt a multi-turn QA model to extract problems and solutions from… More >

Displaying 1-10 on page 1 of 21. Per Page

View

1210

Download

526

View

1539

Download

646

View

1723

Download

799

View

2087

Download

964

View

1420

Download

1212

View

3331

Download

24658

Like

1

View

1577

Download

1159

View

1276

Download

729

View

1701

Download

910

View

2112

Download

1114

Like

1

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp: