Computer and Modernization ›› 2023, Vol. 0 ›› Issue (05): 8-12.
Previous Articles Next Articles
Online:
2023-06-06
Published:
2023-06-06
LIU Jing, CHEN Jin-guang. Image Caption Generation Method Based on Channel Attention and Transformer[J]. Computer and Modernization, 2023, 0(05): 8-12.
[1] | 苗益,赵增顺,杨雨露,等. 图像描述技术综述[J]. 计算机科学, 2020,47(12):149-160. |
[2] | 马艳春,刘永坚,解庆,等. 自动图像标注技术综述[J]. 计算机研究与发展, 2020,57(11):2348-2374. |
[3] | KULKARNI G, PREMRAJ V, ORDONEZ V, et al. Babytalk: Understanding and generating simple image descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,35(12):2891-2903. |
[4] | DATTA R, JOSHI D, LI J, et al. Image retrieval: Ideas, influences, and trends of the new age[J]. ACM Computing Surveys, 2008,40(2):1-60. |
[5] | BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv:1409.0473, 2014. |
[6] | VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: A neural image caption generator[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:3156-3164. |
[7] | 李勇,成红红,梁新彦,等. CNN图像标题生成[J]. 西安电子科技大学学报, 2019,46(2):152-157. |
[8] | MAO J H, XU W, YANG Y, et al. Deep captioning with multimodal recurrent neural networks(m-RNN)[J]. arXiv preprint arXiv:1412.6632, 2014. |
[9] | HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997,9(8):1735-1780. |
[10] | XU K, BA J, KIROS R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]// International Conference on Machine Learning. PMLR, 2015:2048-2057. |
[11] | ANDERSON P, HE X D, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:6077-6086. |
[12] | CHEN L, ZHANG H W, XIAO J, et al. SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:5659-5667. |
[13] | WANG Q L, WU B G, ZHU P F, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020. DOI: 10.1109/CVPR42600.2020.01155. |
[14] | HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[J] IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020,42(8):2011-2023 . |
[15] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778. |
[16] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017:6000-6010. |
[17] | CHEN H T, WANG Y H, GUO T Y, et al. Pre-trained image processing transformer[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021:12299-12310. |
[18] | 刘文婷,卢新明. 基于计算机视觉的Transformer研究进展[J]. 计算机工程与应用, 2022,58(6):1-16. |
[19] | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]// European Conference on Computer Vision. Springer. 2014:740-755. |
[20] | ANDREJ K, LI F F. Deep visual-semantic alignments for generating image descriptions[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017,39(4):664-676. |
[21] | PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002:311-318. |
[22] | DENKOWSKI M, LAVIE A. Meteor universal: Language specific translation evaluation for any target language[C]// Proceedings of the 9th Workshop on Statistical Machine Translation. 2014:376-380. |
[23] | LIN C Y. Rouge: A package for automatic evaluation of summaries[C]// Proceedings of the Workshop on Text Summarization Branches Out. 2004:74-81. |
[24] | VEDANTAM R, LAWRENCE ZITNICK C. CIDEr: Consensusbased image description evaluation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:4566-4575. |
[25] | HAWKINS D M. The problem of overfitting[J]. Journal of Chemical Information and Computer Sciences, 2004,44(1):1-12. |
[26] | ANEJA J, DESHPANDE A. Convolutional image captioning[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:5561-5570. |
[27] | ZHANG Z J, WU Q, WANG Y, et al. High-quality image captioning with fine-grained and semantic-guided visual attention[J]. IEEE Transactions on Multimedia, 2019,21(7):1681-1693. |
[1] | FU Hong-lin, ZHANG Tai-hong, YANG Ya-ting, Aizimaiti Aiwanier, MA Bo. Scenes Text Modification Network for Uyghur Based on Generative Adversarial Network [J]. Computer and Modernization, 2024, 0(01): 41-46. |
[2] | LI Ya-ping, WANG Jun-fang, YU Hong-mei, DOU Yi-min, XIAO Yuan, TIAN Ji-lin. Regformer: Hydraulic Prediction Model of Oil Pipeline Based on GS-XGBoost [J]. Computer and Modernization, 2024, 0(01): 59-66. |
[3] | ZHOU Cheng-cheng, ZENG Qing-jun, YANG Kang, HU Jia-ming, HAN Chun-wei. EEG Recognition of Motor Imagination Based on Efficiency Channel Attention Module [J]. Computer and Modernization, 2023, 0(12): 19-23. |
[4] | ZHANG Hao-yang, YIN Zi-ming, LE Jun-yi, SHEN Da-cong, SHU Yi-jun, YANG Zi-yi, . 3D-SPRNet: Segmentation Model of Gallbladder Cancer Based on Parallel Decoder and Double Attention Mechanism [J]. Computer and Modernization, 2023, 0(12): 59-66. |
[5] | LIU Fu-qi, ZHANG Da, SONG Jian-hua, WANG Hai-dong. Fault Diagnosis of Hydraulic Systems Based on CNN-BiLSTM [J]. Computer and Modernization, 2023, 0(09): 10-19. |
[6] | JIANG Lei, TANG Jian, YANG Chao-yue, LYU Ting-ting. Bearing Fault Diagnosis Based on CWGAN-GP and CNN [J]. Computer and Modernization, 2023, 0(07): 1-6. |
[7] | XU Ye-tong, GENG Xin-zhe, ZHAO Wei-qiang, ZHANG Yue, NING Hai-long, LEI Tao. A Remote Sensing Image Change Detection Model Based on CNN-Transformer Hybrid Structure [J]. Computer and Modernization, 2023, 0(07): 79-85. |
[8] | ZHU Jian-bo, GE Ming-feng, DONG Wen-fei. Alzheimer’s Disease Image Classification Based on Improved EfficientNet [J]. Computer and Modernization, 2023, 0(06): 56-61. |
[9] | LIU Jia-jia, HU Xu-xin, YU Ping. Monocular Depth Estimation Method by Aggregating Multi-dimensional Attention Features [J]. Computer and Modernization, 2023, 0(06): 76-81. |
[10] | WANG Xin-yi, YIN Si-qing, HONG Jun. Asymmetric Deep Supervised Hashing with Attention Mechanism [J]. Computer and Modernization, 2023, 0(05): 26-31. |
[11] | SU Jin-ku, GUI Zhi-ming. Prediction of Short-term Taxi Flow Based on Spatio-temporal Characteristics [J]. Computer and Modernization, 2023, 0(05): 32-38. |
[12] | WANG Juan, LI Chuan-geng, ZHANG Qing-yuan, XIA Cheng-yi. Segmentation Method of Knee Meniscus Based on Multiscale-net [J]. Computer and Modernization, 2023, 0(05): 111-116. |
[13] | WANG Lei, ZHANG Xiao-dong, DAI Huan. Fault Diagnosis of Pumping Unit Based on 1D-CNN-LSTM Attention Network [J]. Computer and Modernization, 2023, 0(04): 1-6. |
[14] | XU Ya-xin, HE Ze-en, XU Xu-kan. Automatic Classification Method of CNC Machine Tool Fault Text Based on CNN-BiLSTM [J]. Computer and Modernization, 2023, 0(04): 7-14. |
[15] | ZHU Yuan-ye, NI Jian-jun, TANG Guang-yi. An RGB-D Indoor Scene Classification Method Based on Improved Convolutional Neural Network [J]. Computer and Modernization, 2023, 0(04): 73-77. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||