Feedback LSTM Network Based on Attention for Image Description Generator

Zhaowei Qu; Bingyu Cao; Xiaoru Wang; Fu Li; Peirong Xu; Luhan Zhang

doi:10.32604/cmc.2019.05569

Open Access icon Open Access

ARTICLE

Feedback LSTM Network Based on Attention for Image Description Generator

Zhaowei Qu^1,*, Bingyu Cao¹, Xiaoru Wang¹, Fu Li², Peirong Xu¹, Luhan Zhang¹

1 Beijing Key Laboratory of Network System and Network Culture, Beijing University of Posts and Telecommunications, Beijing, 100876, China.
2 Department of Electrical and Computer Engineering, Portland States University, Portland, OR 97207-0751, USA.

* Corresponding Author: Zhaowei Qu. Email: email .

Computers, Materials & Continua 2019, 59(2), 575-589. https://doi.org/10.32604/cmc.2019.05569

Download PDF

Abstract

Images are complex multimedia data which contain rich semantic information. Most of current image description generator algorithms only generate plain description, with the lack of distinction between primary and secondary object, leading to insufficient high-level semantic and accuracy under public evaluation criteria. The major issue is the lack of effective network on high-level semantic sentences generation, which contains detailed description for motion and state of the principal object. To address the issue, this paper proposes the Attention-based Feedback Long Short-Term Memory Network (AFLN). Based on existing codec framework, there are two independent sub tasks in our method: attention-based feedback LSTM network during decoding and the Convolutional Block Attention Module (CBAM) in the coding phase. First, we propose an attention-based network to feedback the features corresponding to the generated word from the previous LSTM decoding unit. We implement feedback guidance through the related field mapping algorithm, which quantifies the correlation between previous word and latter word, so that the main object can be tracked with highlighted detailed description. Second, we exploit the attention idea and apply a lightweight and general module called CBAM after the last layer of VGG 16 pretraining network, which can enhance the expression of image coding features by combining channel and spatial dimension attention maps with negligible overheads. Extensive experiments on COCO dataset validate the superiority of our network over the state-of-the-art algorithms. Both scores and actual effects are proved. The BLEU 4 score increases from 0.291 to 0.301 while the CIDEr score rising from 0.912 to 0.952.

Keywords

Image description generator, feedback LSTM network, attention, CBAM

Cite This Article

APA Style

Qu, Z., Cao, B., Wang, X., Li, F., Xu, P. et al. (2019). Feedback LSTM Network Based on Attention for Image Description Generator. Computers, Materials & Continua, 59(2), 575–589. https://doi.org/10.32604/cmc.2019.05569

Vancouver Style

Qu Z, Cao B, Wang X, Li F, Xu P, Zhang L. Feedback LSTM Network Based on Attention for Image Description Generator. Comput Mater Contin. 2019;59(2):575–589. https://doi.org/10.32604/cmc.2019.05569

IEEE Style

Z. Qu, B. Cao, X. Wang, F. Li, P. Xu, and L. Zhang, “Feedback LSTM Network Based on Attention for Image Description Generator,” Comput. Mater. Contin., vol. 59, no. 2, pp. 575–589, 2019. https://doi.org/10.32604/cmc.2019.05569

BibTex EndNote RIS

Citations

4

[click to view]

Copyright © 2019 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Feedback LSTM Network Based on Attention for Image Description Generator

Abstract

Keywords

Cite This Article

Citations

2976

1480

0

Further Information

Guidelines

Follow Us

Join Us

Share Link