计算机与现代化 ›› 2021, Vol. 0 ›› Issue (03): 108-114.

• 图像处理 • 上一篇    下一篇

一种基于目标检测与词性分析的图像描述算法

  

  1. (北京工业大学信息学部,北京100124)
  • 出版日期:2020-03-30 发布日期:2021-03-24
  • 作者简介:高逸凡(1993—),男,甘肃武威人,硕士研究生,研究方向:深度学习,图像处理,E-mail: gao_yifan@163.com; 王勇(1974—),男,山东临朐人,副教授,博士,研究方向:并行与分布式计算,E-mail: wangy@bjut.edu.cn。

An Image Description Algorithm Based on Object Detection and Part of Speech Analysis

  1. (Information Faculty, Beijing University of Technology, Beijing 100124, China)
  • Online:2020-03-30 Published:2021-03-24

摘要: 针对现有的基于注意力机制的图像描述方法描述内容与图像关联度低的问题,提出一种基于目标检测与词性分析的图像描述算法。该方法在注意力机制的基础上,通过目标检测算法提取图片中的信息,使用带有注意力机制的循环神经网络对提取到的信息进行处理,生成图像描述语句。在生成单词的过程中,算法会预测每个单词的词性,根据不同的词性选择不同的神经网络,从而提升描述语句与原图像的关联度。实验结果表明,在多种客观描述评价标准中,本文算法生成的描述语句相对目前存在的算法均有不同程度提升,同时,在主观评价中也能够更准确流畅地描述图片的内容。

关键词: 图像描述, 循环神经网络, 注意力机制, 目标检测, 深度学习, 自然语言处理

Abstract: In this paper, an image description algorithm based on object detection and part of speech analysis is proposed to solve the problem of low correlation between the description content and image in the existing image description method that based on attention mechanism. Based on the attention mechanism, this method extracts information from the picture by the target detection algorithm, and processes it by the recurrent neural network with the attention mechanism to generate the image description statement. In the process of generating words, the algorithm predicts the part of speech of each word, and then selects different neural networks according to the prediction results, so it improves the correlation between the description statement and the original image. The experimental results show that in many objective description evaluation criteria, the description statements generated by the algorithm in this paper have different degrees of improvement compared with the existing algorithms, at the same time, the content of the picture can be more accurately and smoothly described in the subjective evaluation.

Key words: image description, recurrent neural network, attention mechanism, target detection, deep learning, natural language processing