[1] 陈龙杰,张钰,张玉梅,等. 基于多注意力多尺度特征融合的图像描述生成算法[J]. 计算机应用, 2019,39(2):354-359.
[2] 张姣,杨振宇. 图像描述生成方法研究文献综述[J]. 智能计算机与应用, 2019(5):45-49.
[3] VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: A neural image caption generator[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015:3156-3164.
[4] KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012,25(2):1097-1105.
[5] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997,9(8):1735-1780.
[6] MAO J H, XU W. Explain images with multimodal recurrent neural networks[J]. Computer Science, 2014,arXiv:1410.1090.
[7] XU K, BA J, KIROS R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]// International Conference on Machine Learning. 2015:2048-2057.
[8] HERDADE S, KAPPELER A, BOAKYE K, et al. Image captioning: Transforming objects into words[C]// Advances in Neural Information Processing Systems. 2019:11137-11147.
[9] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision & Pattern Recognition. 2016:770-778.
[10]REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017,39(6):1137-1149.
[11]KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[J]. Machine Learning, 2016,arXiv:1609.02907.
[12]BASTINGS J, TITOV I, AZIZ W, et al. Graph convolutional encoders for syntax-aware neural machine translation[J]. Computation and
Language, 2017,arXiv:1704.04675.
[13]SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014:3104-3112.
[14]BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. Computation and Language, 2014,arXiv:1409.0473.
[15]ZELLERS R, YATSKAR M, THOMSON S, et al. Neural motifs: Scene graph parsing with global context[J]. Computer Vision and Pattern Recognition, 2017,arXiv:1711.06640.
[16]KRISHNA R, ZHU Y, GROTH O, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations[J]. International Journal of Computer Vision, 2017,123(1):32-73.
[17]SHANG J B, LIU L Y, GU X T, et al. Learning named entity tagger using domain-specific dictionary[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018:2054-2064.
[18]VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017:5998-6008.
[19]VEDANTAM R, ZITNICK C L, PARIKH D. CIDEr: Consensus-based image description evaluation[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015:4566-4575.
[20]PLUMMER B A, WANG L, CERVANTES C M, et al. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models[C]// Proceedings of the IEEE International Conference on Computer Vision. 2015:2641-2649.
[21]KARPATHY A, FEI-FEI L. Deep visual-semantic alignments for generating image descriptions[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:3128-3137.
[22]PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002:311-318.
[23]SATANJEEV B. METEOR: An automatic metric for mt evaluation with improved correlation with human judgments[C]// Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 2005:228-231.
[24]LIN C Y, HOVY E. Automatic evaluation of summaries using n-gram co-occurrence statistics[C]// Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 2003:150-157.
[25]KINGMA D, BA J. Adam: A method for stochastic optimization[J]. Machine Learning, 2014,arXiv:1412.6980.
[26]ANDERSON P, HE X, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:6077-6086.
[27]ZHA Z J, LIU D Q, ZHANG H W, et al. Context-aware visual policy network for sequence-level image captioning[C]// Proceedings of the 26th ACM International Conference on Multimedia. 2018:1416-1424.
[28]YAO T, PAN Y W, LI Y H, et al. Exploring visual relationship for image captioning[C]// Proceedings of the European Conference on Computer Vision. 2018:684-699.
|