Computer and Modernization ›› 2021, Vol. 0 ›› Issue (09): 1-6.

    Next Articles

Multi-scene Fusion Algorithm for Fine-grained Image Caption

  

  1. (College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China)
  • Online:2021-09-14 Published:2021-09-14

Abstract: In terms of the poor performance of image caption task in different scenes, a multi-scene image caption generation algorithm based on convolutional neural network and prior knowledge is proposed. The algorithm generates visual semantic units by convolutional neural network, then uses named entity recognition to identify and predict image scenes, uses the result of classifying to adjust the focusing parameter of self-attention mechanism automatically, and calculate the multi-scene attention score. Finally, the obtained region coding and semantic prior knowledge are inserted into Transformer text generator to guide sentence generation. The results show that the algorithm can effectively solve the problem that the caption lacks the key scene information. Evaluation indicators are used to evaluate the model on the MSCOCO dataset and Flickr30k dataset, and the CIDEr score of MSCOCO dataset reaches 1.210, which is better than similar image description generation models.

Key words: image caption, CNN, NER, multi-scene attention, Transformer structure