Computer and Modernization ›› 2020, Vol. 0 ›› Issue (06): 95-.

Previous Articles     Next Articles

Image Captioning Based on Adaptive Attention Model

  

  1. (College of Electronic Engineering, Tianjin University of Technology and Education, Tianjin 300222, China)
  • Received:2019-10-15 Online:2020-06-24 Published:2020-06-28

Abstract: Attention-based neural encoder-decoder frameworks have been widely adopted for image captioning. Most methods force visual attention to be active for every generated word. However, the decoder likely requires little to no visual information from the image to predict nonvisual words such as “the” and “of”. In this paper,an adaptive attention model is proposed, in which the encoder adopts the Faster R-CNN network to extract the salient features of images, the decoder LSTM network adapts a visual sentinel. At each time step, it can automatically decide when to rely on visual signals and when to just rely on the language model. Finally, the model is verified on Flickr30K and MS-COCO data sets, the experimental results show that the model effectively improves the quality of image captioning.

Key words: attention mechanism, Convolution Neural Network(CNN), Long Short-Term Memory Network(LSTM), image captioning

CLC Number: