计算机与现代化

• 人工智能 • 上一篇    下一篇

一种改进的LSTM模型在图像标题生成中的应用

  

  1. (大连海事大学理学院,辽宁大连116000)
  • 收稿日期:2019-08-04 出版日期:2020-04-22 发布日期:2020-04-24
  • 作者简介:王志平(1964-),男,湖北鄂州人,教授,博士,研究方向:网络拓扑性与可靠性,E-mail: wzp@dlmu.edu.cn; 郑宝友(1996-),男,山东临沂人,硕士研究生,研究方向:深度学习,计算机视觉,E-mail: 17824833955@163.com; 刘仪伟(1994-),男,湖北黄石人,硕士研究生,研究方向:计算机视觉,E-mail: 174494244@qq.com。
  • 基金资助:
    中央高校基础研究基金资助项目(3132019323)

An Improved LSTM Model in the Application of Image Caption Generation

  1. (College of Science, Dalian Maritime University, Dalian 116000, China)
  • Received:2019-08-04 Online:2020-04-22 Published:2020-04-24

摘要: 为解决传统长短时记忆(LSTM)神经网络存在过早饱和的问题,使得对给定的图片能够生成更准确的描述,提出一种基于反正切函数的长短时记忆(ITLSTM)神经网络模型。首先,利用经典的卷积神经网络模型提取图像特征;然后,利用ITLSTM神经网络模型来表征图像对应的描述;最后在Flickr8K数据集上评估模型的性能,并与几种经典的图像标题生成模型如Google NIC等进行比较,实验结果表明本文提出的模型能够有效地提高图像标题生成的准确性。

关键词: 图像标题生成, 反正切函数, 长短时记忆神经网络, 卷积神经网络

Abstract: In order to solve the problem of premature saturation of traditional Long Short-Term Memory(LSTM) neural network and generate a more accurate description for a given picture, this paper proposes a long short-term memory neural network model based on inverse tangent function(ITLSTM). Firstly, the classical convolutional neural network model is used to extract image features. Then, the ITLSTM neural network model is used to characterize the corresponding description of the image. Finally, the performance of the model is evaluated on the Flickr8K dataset and compared with several classic image caption generation models such as Google NIC. The experimental results show that the proposed model can effectively improve the accuracy of image caption generation.

Key words: image caption generation, inverse tangent function, Long Short-Term Memory (LSTM) neural network, convolutional neural network

中图分类号: