Open Access iconOpen Access

ARTICLE

Enhancing Cross-Lingual Image Description: A Multimodal Approach for Semantic Relevance and Stylistic Alignment

Emran Al-Buraihy, Dan Wang*

Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China

* Corresponding Author: Dan Wang. Email: email

Computers, Materials & Continua 2024, 79(3), 3913-3938. https://doi.org/10.32604/cmc.2024.048104

Abstract

Cross-lingual image description, the task of generating image captions in a target language from images and descriptions in a source language, is addressed in this study through a novel approach that combines neural network models and semantic matching techniques. Experiments conducted on the Flickr8k and AraImg2k benchmark datasets, featuring images and descriptions in English and Arabic, showcase remarkable performance improvements over state-of-the-art methods. Our model, equipped with the Image & Cross-Language Semantic Matching module and the Target Language Domain Evaluation module, significantly enhances the semantic relevance of generated image descriptions. For English-to-Arabic and Arabic-to-English cross-language image descriptions, our approach achieves a CIDEr score for English and Arabic of 87.9% and 81.7%, respectively, emphasizing the substantial contributions of our methodology. Comparative analyses with previous works further affirm the superior performance of our approach, and visual results underscore that our model generates image captions that are both semantically accurate and stylistically consistent with the target language. In summary, this study advances the field of cross-lingual image description, offering an effective solution for generating image captions across languages, with the potential to impact multilingual communication and accessibility. Future research directions include expanding to more languages and incorporating diverse visual and textual data sources.

Keywords


Cite This Article

APA Style
Al-Buraihy, E., Wang, D. (2024). Enhancing cross-lingual image description: A multimodal approach for semantic relevance and stylistic alignment. Computers, Materials & Continua, 79(3), 3913-3938. https://doi.org/10.32604/cmc.2024.048104
Vancouver Style
Al-Buraihy E, Wang D. Enhancing cross-lingual image description: A multimodal approach for semantic relevance and stylistic alignment. Comput Mater Contin. 2024;79(3):3913-3938 https://doi.org/10.32604/cmc.2024.048104
IEEE Style
E. Al-Buraihy and D. Wang, “Enhancing Cross-Lingual Image Description: A Multimodal Approach for Semantic Relevance and Stylistic Alignment,” Comput. Mater. Contin., vol. 79, no. 3, pp. 3913-3938, 2024. https://doi.org/10.32604/cmc.2024.048104



cc Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 419

    View

  • 216

    Download

  • 0

    Like

Share Link