Alaa Thobhani1,*, Beiji Zou1, Xiaoyan Kui1, Amr Abdussalam2, Muhammad Asim3, Naveed Ahmed4, Mohammed Ali Alshara4,5
CMC-Computers, Materials & Continua, Vol.81, No.2, pp. 2873-2894, 2024, DOI:10.32604/cmc.2024.054841
- 18 November 2024
Abstract Image captioning has gained increasing attention in recent years. Visual characteristics found in input images play a crucial role in generating high-quality captions. Prior studies have used visual attention mechanisms to dynamically focus on localized regions of the input image, improving the effectiveness of identifying relevant image regions at each step of caption generation. However, providing image captioning models with the capability of selecting the most relevant visual features from the input image and attending to them can significantly improve the utilization of these features. Consequently, this leads to enhanced captioning network performance. In light… More >