Open Access iconOpen Access

REVIEW

crossmark

A Review on Vision-Language-Based Approaches: Challenges and Applications

Huu-Tuong Ho1,#, Luong Vuong Nguyen1,#, Minh-Tien Pham1, Quang-Huy Pham1, Quang-Duong Tran1, Duong Nguyen Minh Huy2, Tri-Hai Nguyen3,*

1 Department of Artificial Intelligence, FPT University, Danang, 550000, Vietnam
2 Department of Business, FPT University, Danang, 550000, Vietnam
3 Faculty of Information Technology, School of Technology, Van Lang University, Ho Chi Minh City, 70000, Vietnam

* Corresponding Author: Tri-Hai Nguyen. Email: email
# These authors contributed equally to this work

(This article belongs to the Special Issue: New Trends in Image Processing)

Computers, Materials & Continua 2025, 82(2), 1733-1756. https://doi.org/10.32604/cmc.2025.060363

Abstract

In multimodal learning, Vision-Language Models (VLMs) have become a critical research focus, enabling the integration of textual and visual data. These models have shown significant promise across various natural language processing tasks, such as visual question answering and computer vision applications, including image captioning and image-text retrieval, highlighting their adaptability for complex, multimodal datasets. In this work, we review the landscape of Bootstrapping Language-Image Pre-training (BLIP) and other VLM techniques. A comparative analysis is conducted to assess VLMs’ strengths, limitations, and applicability across tasks while examining challenges such as scalability, data quality, and fine-tuning complexities. The work concludes by outlining potential future directions in VLM research, focusing on enhancing model interpretability, addressing ethical implications, and advancing multimodal integration in real-world applications.

Keywords

Bootstrapping language-image pre-training (BLIP); multimodal learning; vision-language model (VLM); vision-language pre-training (VLP)

Cite This Article

APA Style
Ho, H., Nguyen, L.V., Pham, M., Pham, Q., Tran, Q. et al. (2025). A review on vision-language-based approaches: challenges and applications. Computers, Materials & Continua, 82(2), 1733–1756. https://doi.org/10.32604/cmc.2025.060363
Vancouver Style
Ho H, Nguyen LV, Pham M, Pham Q, Tran Q, Huy DNM, et al. A review on vision-language-based approaches: challenges and applications. Comput Mater Contin. 2025;82(2):1733–1756. https://doi.org/10.32604/cmc.2025.060363
IEEE Style
H. Ho et al., “A Review on Vision-Language-Based Approaches: Challenges and Applications,” Comput. Mater. Contin., vol. 82, no. 2, pp. 1733–1756, 2025. https://doi.org/10.32604/cmc.2025.060363



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 500

    View

  • 234

    Download

  • 0

    Like

Related articles

Share Link