Zelin Deng1,*, Bo Zhou1, Pei He2, Jianfeng Huang3, Osama Alfarraj4, Amr Tolba4,5
CMC-Computers, Materials & Continua, Vol.70, No.1, pp. 2065-2081, 2022, DOI:10.32604/cmc.2022.019328
- 07 September 2021
Abstract Image captioning aims to generate a corresponding description of an image. In recent years, neural encoder-decoder models have been the dominant approaches, in which the Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) are used to translate an image into a natural language description. Among these approaches, the visual attention mechanisms are widely used to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. However, most conventional visual attention mechanisms are based on high-level image features, ignoring the effects of other image features, and giving insufficient consideration to the… More >