Table of Content

Open Access iconOpen Access

ARTICLE

Feedback LSTM Network Based on Attention for Image Description Generator

Zhaowei Qu1,*, Bingyu Cao1, Xiaoru Wang1, Fu Li2, Peirong Xu1, Luhan Zhang1

Beijing Key Laboratory of Network System and Network Culture, Beijing University of Posts and Telecommunications, Beijing, 100876, China.
Department of Electrical and Computer Engineering, Portland States University, Portland, OR 97207-0751, USA.

* Corresponding Author: Zhaowei Qu. Email: email.

Computers, Materials & Continua 2019, 59(2), 575-589. https://doi.org/10.32604/cmc.2019.05569

Abstract

Images are complex multimedia data which contain rich semantic information. Most of current image description generator algorithms only generate plain description, with the lack of distinction between primary and secondary object, leading to insufficient high-level semantic and accuracy under public evaluation criteria. The major issue is the lack of effective network on high-level semantic sentences generation, which contains detailed description for motion and state of the principal object. To address the issue, this paper proposes the Attention-based Feedback Long Short-Term Memory Network (AFLN). Based on existing codec framework, there are two independent sub tasks in our method: attention-based feedback LSTM network during decoding and the Convolutional Block Attention Module (CBAM) in the coding phase. First, we propose an attention-based network to feedback the features corresponding to the generated word from the previous LSTM decoding unit. We implement feedback guidance through the related field mapping algorithm, which quantifies the correlation between previous word and latter word, so that the main object can be tracked with highlighted detailed description. Second, we exploit the attention idea and apply a lightweight and general module called CBAM after the last layer of VGG 16 pretraining network, which can enhance the expression of image coding features by combining channel and spatial dimension attention maps with negligible overheads. Extensive experiments on COCO dataset validate the superiority of our network over the state-of-the-art algorithms. Both scores and actual effects are proved. The BLEU 4 score increases from 0.291 to 0.301 while the CIDEr score rising from 0.912 to 0.952.

Keywords


Cite This Article

Z. Qu, B. Cao, X. Wang, F. Li, P. Xu et al., "Feedback lstm network based on attention for image description generator," Computers, Materials & Continua, vol. 59, no.2, pp. 575–589, 2019. https://doi.org/10.32604/cmc.2019.05569

Citations




cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 2558

    View

  • 1245

    Download

  • 0

    Like

Related articles

Share Link