Open Access iconOpen Access

REVIEW

crossmark

A Survey on Enhancing Image Captioning with Advanced Strategies and Techniques

Alaa Thobhani1,*, Beiji Zou1, Xiaoyan Kui1,*, Amr Abdussalam2, Muhammad Asim3, Sajid Shah3, Mohammed ELAffendi3

1 School of Computer Science and Engineering, Central South University, Changsha, 410083, China
2 Electronic Engineering and Information Science Department, University of Science and Technology of China, Hefei, 230026, China
3 EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh, 11586, Saudi Arabia

* Corresponding Authors: Alaa Thobhani. Email: email; Xiaoyan Kui. Email: email

(This article belongs to the Special Issue: Recent Advances in Signal Processing and Computer Vision)

Computer Modeling in Engineering & Sciences 2025, 142(3), 2247-2280. https://doi.org/10.32604/cmes.2025.059192

Abstract

Image captioning has seen significant research efforts over the last decade. The goal is to generate meaningful semantic sentences that describe visual content depicted in photographs and are syntactically accurate. Many real-world applications rely on image captioning, such as helping people with visual impairments to see their surroundings. To formulate a coherent and relevant textual description, computer vision techniques are utilized to comprehend the visual content within an image, followed by natural language processing methods. Numerous approaches and models have been developed to deal with this multifaceted problem. Several models prove to be state-of-the-art solutions in this field. This work offers an exclusive perspective emphasizing the most critical strategies and techniques for enhancing image caption generation. Rather than reviewing all previous image captioning work, we analyze various techniques that significantly improve image caption generation and achieve significant performance improvements, including encompassing image captioning with visual attention methods, exploring semantic information types in captions, and employing multi-caption generation techniques. Further, advancements such as neural architecture search, few-shot learning, multi-phase learning, and cross-modal embedding within image caption networks are examined for their transformative effects. The comprehensive quantitative analysis conducted in this study identifies cutting-edge methodologies and sheds light on their profound impact, driving forward the forefront of image captioning technology.

Keywords


Cite This Article

APA Style
Thobhani, A., Zou, B., Kui, X., Abdussalam, A., Asim, M. et al. (2025). A survey on enhancing image captioning with advanced strategies and techniques. Computer Modeling in Engineering & Sciences, 142(3), 2247–2280. https://doi.org/10.32604/cmes.2025.059192
Vancouver Style
Thobhani A, Zou B, Kui X, Abdussalam A, Asim M, Shah S, et al. A survey on enhancing image captioning with advanced strategies and techniques. Comput Model Eng Sci. 2025;142(3):2247–2280. https://doi.org/10.32604/cmes.2025.059192
IEEE Style
A. Thobhani et al., “A Survey on Enhancing Image Captioning with Advanced Strategies and Techniques,” Comput. Model. Eng. Sci., vol. 142, no. 3, pp. 2247–2280, 2025. https://doi.org/10.32604/cmes.2025.059192



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 424

    View

  • 237

    Download

  • 0

    Like

Share Link