[1] |
陈龙杰,张钰,张玉梅,等. 基于多注意力多尺度特征融合的图像描述生成算法[J]. 计算机应用, 2019,39(2):354-359.
|
[2] |
张姣,杨振宇. 图像描述生成方法研究文献综述[J]. 智能计算机与应用, 2019(5):45-49.
|
[3] |
VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: A neural image caption generator[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015:3156-3164.
|
[4] |
KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012,25(2):1097-1105.
|
[5] |
HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997,9(8):1735-1780.
|
[6] |
MAO J H, XU W. Explain images with multimodal recurrent neural networks[J]. Computer Science, 2014,arXiv:1410.1090.
|
[7] |
XU K, BA J, KIROS R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]// International Conference on Machine Learning. 2015:2048-2057.
|
[8] |
HERDADE S, KAPPELER A, BOAKYE K, et al. Image captioning: Transforming objects into words[C]// Advances in Neural Information Processing Systems. 2019:11137-11147.
|
[9] |
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision & Pattern Recognition. 2016:770-778.
|
[10] |
REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017,39(6):1137-1149.
|
[11] |
KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[J]. Machine Learning, 2016,arXiv:1609.02907.
|
[12] |
BASTINGS J, TITOV I, AZIZ W, et al. Graph convolutional encoders for syntax-aware neural machine translation[J]. Computation and
|
|
Language, 2017,arXiv:1704.04675.
|
[13] |
SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014:3104-3112.
|
[14] |
BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. Computation and Language, 2014,arXiv:1409.0473.
|
[15] |
ZELLERS R, YATSKAR M, THOMSON S, et al. Neural motifs: Scene graph parsing with global context[J]. Computer Vision and Pattern Recognition, 2017,arXiv:1711.06640.
|
[16] |
KRISHNA R, ZHU Y, GROTH O, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations[J]. International Journal of Computer Vision, 2017,123(1):32-73.
|
[17] |
SHANG J B, LIU L Y, GU X T, et al. Learning named entity tagger using domain-specific dictionary[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018:2054-2064.
|
[18] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017:5998-6008.
|
[19] |
VEDANTAM R, ZITNICK C L, PARIKH D. CIDEr: Consensus-based image description evaluation[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015:4566-4575.
|
[20] |
PLUMMER B A, WANG L, CERVANTES C M, et al. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models[C]// Proceedings of the IEEE International Conference on Computer Vision. 2015:2641-2649.
|
[21] |
KARPATHY A, FEI-FEI L. Deep visual-semantic alignments for generating image descriptions[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:3128-3137.
|
[22] |
PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002:311-318.
|
[23] |
SATANJEEV B. METEOR: An automatic metric for mt evaluation with improved correlation with human judgments[C]// Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 2005:228-231.
|
[24] |
LIN C Y, HOVY E. Automatic evaluation of summaries using n-gram co-occurrence statistics[C]// Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 2003:150-157.
|
[25] |
KINGMA D, BA J. Adam: A method for stochastic optimization[J]. Machine Learning, 2014,arXiv:1412.6980.
|
[26] |
ANDERSON P, HE X, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:6077-6086.
|
[27] |
ZHA Z J, LIU D Q, ZHANG H W, et al. Context-aware visual policy network for sequence-level image captioning[C]// Proceedings of the 26th ACM International Conference on Multimedia. 2018:1416-1424.
|
[28] |
YAO T, PAN Y W, LI Y H, et al. Exploring visual relationship for image captioning[C]// Proceedings of the European Conference on Computer Vision. 2018:684-699.
|