[1] 马苗,王伯龙,吴琦,等. 视觉场景描述及其效果评价[J]. 软件学报, 2019,30(4):867-883.
[2] 靳华中,刘潇龙,胡梓珂. 一种结合全局和局部特征的图像描述生成模型[J]. 应用科学学报, 2019,37(4):501-509.
[3] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors[J]. Cognitive Modeling, 1988,5(3):1.
[4] ELMAN J L. Finding structure in time[J]. Cognitive Science, 1990,14(2):179-211.
[5] VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: A neural image caption generator[C]// Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2015:3156-3164.
[6] CHO K, VAN MERRINBOER B, GULCEHRE C, et al. Learning Phrase Representations Using RNN Encoder-decoder for Statistical Machine Translation[J/OL]. (2014- 09-03)[2019-11-24]. https://arxiv.org/pdf/1406.1078.
[7] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015:91-99.
[8] BAHDANAU D, CHO K, BENGIO Y. Neural Machine Translation by Jointly Learning to Align and Translate[J/OL]. (2016-05-19)[2019-11-24]. https://arxiv.org/pdf/1409.0473.pdf.
[9] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997,9(8):1735-1780.
[10] CHUNG J, GULCEHRE C, CHO K H, et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling[J/OL]. (2014-12-11)[2019-11-24]. https://arxiv.org/pdf/1412.3555.
[11] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors[J]. Nature, 1986,323:533-536.
[12] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015,521:436-444.
[13] 〖JP+1〗邓珍荣,张宝军,蒋周琴,等. 融合word2vec和注意力机制的图像描述模型[J]. 计算机科学, 2019,46(4):268-273.
[14] CHEN X L, FANG H, LIN T Y, et al. Microsoft COCO Captions: Data Collection and Evaluation Server[J/OL]. (2015-04-03)[2019-11-24]. https://arxiv.org/pdf/1504.00325.
[15] 师岩,王宇,吴水清. 基于Self-Attention模型的机器翻译系统[J]. 计算机与现代化, 2019(7):9-14.
[16] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[J]. Journal of Machine Learning Research, 2010,9:249-256.
[17] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving Neural Networks by Preventing Co-adaptation of Feature Detectors[J/OL]. (2012-07-03)[2019-11-24]. https://arxiv.org/pdf/1207.0580.
[18] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002:311-318.
[19] LIN C Y. ROUGE: A package for automatic evaluation of summaries[C]// Proceedings of the Workshop on Text Summarization Branches Out. 2004:74-81.
[20] BANERJEE S, LAVIE A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments[C]// Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 2005:65-72.
[21] VEDANTAM R, ZITNICK C L, PARIKH D. CIDEr: Consensus-based image description evaluation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:4566-4575. 〖HJ0.55mm〗
[22] MAO J H, XU W, YANG Y, et al. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)[J/OL]. (2015-06-11)[2019-11-24]. https://arxiv.org/abs/1412.6632.
[23] XU K, LEI BA J, KIROS R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]// Proceedings of the 32nd International Conference on International Conference on Machine Learning. 2015:2048-2057.
[24] JIANG L, XU M, LIU T, et al. Deepvs: A deep learning based video saliency prediction approach[C]// Proceedings of the European Conference on Computer Vision (ECCV). 2018:602-617.
[25] ANDERSON P, HE X D, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:6077-6086.
|