A Novelty Framework in Image-Captioning with Visual Attention-Based Refined Visual Features

Alaa Thobhani; Beiji Zou; Xiaoyan Kui; Amr Abdussalam; Muhammad Asim; Mohammed ELAffendi; Sajid Shah

doi:10.32604/cmc.2025.060788

Open Access icon Open Access

ARTICLE

A Novelty Framework in Image-Captioning with Visual Attention-Based Refined Visual Features

Alaa Thobhani^1,*, Beiji Zou¹, Xiaoyan Kui^1,*, Amr Abdussalam², Muhammad Asim³, Mohammed ELAffendi³, Sajid Shah³

1 School of Computer Science and Engineering, Central South University, Changsha, 410083, China
2 Electronic Engineering and Information Science Department, University of Science and Technology of China, Hefei, 230026, China
3 EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh, 11586, Saudi Arabia

* Corresponding Authors: Alaa Thobhani. Email: email ; Xiaoyan Kui. Email: email

Computers, Materials & Continua 2025, 82(3), 3943-3964. https://doi.org/10.32604/cmc.2025.060788

Received 10 November 2024; Accepted 24 December 2024; Issue published 06 March 2025

Abstract

Image captioning, the task of generating descriptive sentences for images, has advanced significantly with the integration of semantic information. However, traditional models still rely on static visual features that do not evolve with the changing linguistic context, which can hinder the ability to form meaningful connections between the image and the generated captions. This limitation often leads to captions that are less accurate or descriptive. In this paper, we propose a novel approach to enhance image captioning by introducing dynamic interactions where visual features continuously adapt to the evolving linguistic context. Our model strengthens the alignment between visual and linguistic elements, resulting in more coherent and contextually appropriate captions. Specifically, we introduce two innovative modules: the Visual Weighting Module (VWM) and the Enhanced Features Attention Module (EFAM). The VWM adjusts visual features using partial attention, enabling dynamic reweighting of the visual inputs, while the EFAM further refines these features to improve their relevance to the generated caption. By continuously adjusting visual features in response to the linguistic context, our model bridges the gap between static visual features and dynamic language generation. We demonstrate the effectiveness of our approach through experiments on the MS-COCO dataset, where our method outperforms state-of-the-art techniques in terms of caption quality and contextual relevance. Our results show that dynamic visual-linguistic alignment significantly enhances image captioning performance.

Keywords

Image-captioning; visual attention; deep learning; visual features

Cite This Article

APA Style

Thobhani, A., Zou, B., Kui, X., Abdussalam, A., Asim, M. et al. (2025). A Novelty Framework in Image-Captioning with Visual Attention-Based Refined Visual Features. Computers, Materials & Continua, 82(3), 3943–3964. https://doi.org/10.32604/cmc.2025.060788

Vancouver Style

Thobhani A, Zou B, Kui X, Abdussalam A, Asim M, ELAffendi M, et al. A Novelty Framework in Image-Captioning with Visual Attention-Based Refined Visual Features. Comput Mater Contin. 2025;82(3):3943–3964. https://doi.org/10.32604/cmc.2025.060788

IEEE Style

A. Thobhani et al., “A Novelty Framework in Image-Captioning with Visual Attention-Based Refined Visual Features,” Comput. Mater. Contin., vol. 82, no. 3, pp. 3943–3964, 2025. https://doi.org/10.32604/cmc.2025.060788

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A Novelty Framework in Image-Captioning with Visual Attention-Based Refined Visual Features

Abstract

Keywords

Cite This Article

493

288

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link