[1] |
苗益,赵增顺,杨雨露,等. 图像描述技术综述[J]. 计算机科学, 2020,47(12):149-160.
|
[2] |
马艳春,刘永坚,解庆,等. 自动图像标注技术综述[J]. 计算机研究与发展, 2020,57(11):2348-2374.
|
[3] |
KULKARNI G, PREMRAJ V, ORDONEZ V, et al. Babytalk: Understanding and generating simple image descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,35(12):2891-2903.
|
[4] |
DATTA R, JOSHI D, LI J, et al. Image retrieval: Ideas, influences, and trends of the new age[J]. ACM Computing Surveys, 2008,40(2):1-60.
|
[5] |
BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv:1409.0473, 2014.
|
[6] |
VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: A neural image caption generator[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:3156-3164.
|
[7] |
李勇,成红红,梁新彦,等. CNN图像标题生成[J]. 西安电子科技大学学报, 2019,46(2):152-157.
|
[8] |
MAO J H, XU W, YANG Y, et al. Deep captioning with multimodal recurrent neural networks(m-RNN)[J]. arXiv preprint arXiv:1412.6632, 2014.
|
[9] |
HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997,9(8):1735-1780.
|
[10] |
XU K, BA J, KIROS R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]// International Conference on Machine Learning. PMLR, 2015:2048-2057.
|
[11] |
ANDERSON P, HE X D, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:6077-6086.
|
[12] |
CHEN L, ZHANG H W, XIAO J, et al. SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:5659-5667.
|
[13] |
WANG Q L, WU B G, ZHU P F, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020. DOI: 10.1109/CVPR42600.2020.01155.
|
[14] |
HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[J] IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020,42(8):2011-2023 .
|
[15] |
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
|
[16] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017:6000-6010.
|
[17] |
CHEN H T, WANG Y H, GUO T Y, et al. Pre-trained image processing transformer[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021:12299-12310.
|
[18] |
刘文婷,卢新明. 基于计算机视觉的Transformer研究进展[J]. 计算机工程与应用, 2022,58(6):1-16.
|
[19] |
LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]// European Conference on Computer Vision. Springer. 2014:740-755.
|
[20] |
ANDREJ K, LI F F. Deep visual-semantic alignments for generating image descriptions[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017,39(4):664-676.
|
[21] |
PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002:311-318.
|
[22] |
DENKOWSKI M, LAVIE A. Meteor universal: Language specific translation evaluation for any target language[C]// Proceedings of the 9th Workshop on Statistical Machine Translation. 2014:376-380.
|
[23] |
LIN C Y. Rouge: A package for automatic evaluation of summaries[C]// Proceedings of the Workshop on Text Summarization Branches Out. 2004:74-81.
|
[24] |
VEDANTAM R, LAWRENCE ZITNICK C. CIDEr: Consensusbased image description evaluation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:4566-4575.
|
[25] |
HAWKINS D M. The problem of overfitting[J]. Journal of Chemical Information and Computer Sciences, 2004,44(1):1-12.
|
[26] |
ANEJA J, DESHPANDE A. Convolutional image captioning[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:5561-5570.
|
[27] |
ZHANG Z J, WU Q, WANG Y, et al. High-quality image captioning with fine-grained and semantic-guided visual attention[J]. IEEE Transactions on Multimedia, 2019,21(7):1681-1693.
|