结合注意力机制的深度神经网络综述

摘要/Abstract

摘要： 注意力机制已成为改进神经网络学习能力的研究热点之一。鉴于注意力机制受到的广泛关注，本文旨在从注意力机制的分类、与深度神经网络的结合方式，以及在自然语言处理和计算机视觉领域的具体应用3个方面对深度神经网络中的注意力机制给出较全面的分析和阐述。具体地，分析比较了软注意力、硬注意力和自注意力这3种机制的优缺点；并分别讨论了递归神经网络和卷积神经网络中结合注意力机制的常用方式及其代表性模型结构；然后，以自然语言处理、计算机视觉领域为例，说明了其应用情况；最后，分析了注意力机制的发展趋势，期望为后续研究提供线索和方向。

关键词: 注意力机制, 深度学习, 神经网络, 注意力模型

Abstract: Attention mechanism has become one of the research hotspots in improving the learning ability of deep neural network. In view of the wide attention paid to the attention mechanism， this paper aims to give a comprehensive analysis and elaboration of attention mechanism in deep neural network from three aspects: the classification of attention mechanism， the way of combining with deep neural network， and the specific applications in natural language processing and computer vision. Specifically， attention mechanism has been divided into soft attention mechanism， hard attention mechanism and self-attention mechanism， and their advantages and disadvantages are compared. Then， the common ways of combining attention mechanism in recursive neural network and convolutional neural network are discussed respectively， and the representative model structures of each way are given. After that， the applications of attention mechanism in natural language processing and computer vision are illustrated. Finally， several future developments of attention mechanism are illustrated expecting to provide clues and directions for subsequent researches.

Key words: attention mechanisms, deep learning, neural networks, attention models

皇甫晓瑛, 钱惠敏, 黄敏. 结合注意力机制的深度神经网络综述[J]. 计算机与现代化, 2023, 0(02): 40-49.

HUANGFU Xiao-ying, QIAN Hui-min, HUANG Min. A Review of Deep Neural Networks Combined with Attention Mechanism[J]. Computer and Modernization, 2023, 0(02): 40-49.

参考文献［58］

［1］	吴建鑫，高斌斌，魏秀参，等. 资源受限的深度学习:挑战与实践［J］. 中国科学:信息科学， 2018，48（5）:501-510.
［2］	BORJI A， ITTI L. State-of-the-art in visual attention modeling［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2013，35（1）:185-207.
［3］	王培森. 基于注意力机制的图像分类深度学习方法研究［D］. 合肥:中国科学技术大学， 2018.
［4］	HONG Z. A preliminary study on artificial neural network［C］// 2011 6th IEEE Joint International Information Technology and Artificial Intelligence Conference. 2011:336-338.
［5］	CHAUDHARI S， MITHAL V， POLATKAN G， et al. An Attentive Survey of Attention Models［J］. arXiv preprint arXiv:1904.02874， 2019.
［6］	任欢，王旭光. 注意力机制综述［J］. 计算机应用， 2021，41（S01）:1-6.
［7］	CORREIA A D S， COLOMBINI E L. Attention， please! A survey of neural attention models in deep learning［J］. arXiv preprint arXiv:2103.16775， 2021.
［8］	XU K， BA J， KIROS R， et al. Show， attend and tell: Neural image caption generation with visual attention［J］. arXiv preprint arXiv:1502.03044v2， 2015.
［9］	HU J， SHEN L， ALBANIE S， et al. Squeeze-and-xxcitation networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2020，42（8）:2011-2023.
［10］	WANG Q L， WU B G， ZHU P F， et al. ECA-Net: Efficient channel attention for deep convolutional neural networks［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2020:11531-11539.
［11］	LI X， WANG W H， HU X L， et al. Selective kernel networks［C］// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2019:510-519.
［12］	JADERBERG M， SIMONYAN K， ZISSERMAN A， et al. Spatial transformer networks［J］. arXiv preprint arXiv:1506.02025v3， 2015.
［13］	WOO S， PARK J， LEE J， et al. CBAM: Convolutional block attention module［C］// Computer Vision - ECCV 2018. 2018. DOI:10.1007/978-3-030-01234-2_1.
［14］	PARK J， WOO S， LEE J， et al. BAM: Bottleneck attention module［J］. arXiv preprint arXiv:1807.06514， 2018.
［15］	ROY A G， NAVAB N， WACHINGER C. Concurrent spatial and channel squeeze & excitation in fully convolutional networks［J］. arXiv preprint arXiv:1803.02579v2， 2018.
［16］	MNIH V， HEESS N， GRAVES A， et al. Recurrent models of visual attention［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. 2014:2204-2212.
［17］	MALINOWSKI M， DOERSCH C， SANTORO A， et al. Learning visual question answering by？Bootstrapping hard attention［C］// Computer Vision – ECCV 2018. 2018. DOI:10.1007/978-3-030-01231-1_1.
［18］	ZHOU S K， LE H N， LU U K， et al. Deep reinforcement learning in medical imaging: A literature review［J］. arXiv preprint arXiv:2103.05115， 2021.
［19］	MICHEL P， LEVY O， NEUBIG Graham. Are sixteen heads really better than one？［J］. arXiv preprint arXiv: 1905.10650， 2019.
［20］	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［J］. arXiv preprint arXiv:1706.03762v5， 2017.
［21］	WANG X， GIRSHICK R， GUPTA A， et al. Non-local neural networks［C］// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:7794-7803.
［22］	HUANG Z L， WANG X G， HUANG L C， et al. CCNet: Criss-cross attention for semantic segmentation［C］// 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. 2019:603-612.
［23］	REN S C， ZHOU D Q， HE S F， et al. Shunted self-attention via multi-scale token aggregation［J］. arXiv preprint arXiv:2111.15193， 2021.
［24］	PI H J， WANG H Y， LI Y W， et al. Searching for TrioNet: Combining convolution with local and global self-attention［J］. arXiv preprint arXiv:2111.07547， 2021.
［25］	LIPTON Z C， BERKOWITZ J， ELKAN C. A critical review of recurrent neural networks for sequence learning［J］. arXiv preprint arXiv: 1506.00019v4， 2015.
［26］	SEO M， KEMBHAVI A， FARHADI A， et al. Bidirectional attention flow for machine comprehension［J］. arXiv preprint arXiv:1611.01603v6， 2016.
［27］	WANG B， LIU K， ZHAO J. Inner attention based recurrent neural networks for answer selection［C］// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016:1288-1297.
［28］	QIN Y， SONG D J， CHENG H F， et al. A dual-stage attention-based recurrent neural network for time series prediction［C］// Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017:2627-2633.
［29］	ZHANG P F， XUE J R， LAN C L， et al. Adding attentiveness to the neurons in recurrent neural networks［C］// 15th European Conference on Munich， Germany. 2018:136-152.
［30］	HUBEL D H， WIESEL T N. Early exploration of the visual cortex［J］. Neuron， 1998，20（3）:401-412.
［31］	XU S J， CHENG Y， GU K， et al. Jointly attentive spatial-temporal pooling networks for video-based person re-identification［C］// 2017 IEEE International Conference on Computer Vision （ICCV）. 2017:4743-4752.
［32］	LIU N， LONG Y C， ZOU C Q， et al. ADCrowdNet: An attention-injective deformable convolutional network for crowd understanding［C］// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2019:3220-3229.
［33］	GUO S N， LIN Y F， FENG N， et al. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2019，33（1）:922-929.
［34］	WANG F， JIANG M Q， QIAN C， et al. Residual attention network for image classification［C］// 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2017:6450-6458
［35］	FU J L， ZHENG H L， MEI T. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition［C］// 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2017:4476-4484.
［36］	YANG S H， WANG Y X， CHU X W. A survey of deep learning techniques for neural machine translation［J］. arXiv preprint arXiv:2002.07526v1， 2020.
［37］	BAHDANAU D， CHO K， BENGIO Y. Neural machine translation by jointly learning to align and translate［J］. arXiv preprint arXiv:1409.0473v7， 2016.
［38］	LUONG M， PHAM H， MANNING C D. Effective approaches to attention-based neural machine translation［J］. arXiv preprint arXiv:1508.04025， 2015.
［39］	GEHRING J， AULI M， GRANGIER D， et al. Convolutional sequence to sequence learning［J］. arXiv preprint arXiv:1705.03122v3， 2017.
［40］	DEVLIN J， CHANG M， LEE K， et al. BERT: Pre-training of deep bidirectional transformers for language understanding［J］. arXiv preprint arXiv:1810.04805， 2018.
［41］	杨小冈，高凡，卢瑞涛，等. 基于改进YOLOv5的轻量化航空目标检测方法［J］. 信息与控制， 2022，51（3）:361-368.
［42］	刘赏，葛顶玉，耿明筱.结合全局与局部的人群集体性卷积网络识别方法［J/OL］.信息与控制.［2022-01-01］ https://doi.org/10.13976/j.cnki.xk.2022.1381.
［43］	CHEN T， LIU Y， SU H， et al. Dual-Awareness Attention for Few-Shot Object Detection［J］. arXiv preprint arXiv:2102.12152v3， 2021.
［44］	NIU R， SUN X， TIAN Y， et al. Hybrid multiple attention network for semantic segmentation in aerial images［J］. IEEE Transactions on Geoscience and Remote Sensing. 2021，60:1-18.
［45］	FU J， LIU J， TIAN H J， et al. Dual attention network for scene segmentation［C］// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2019:3141-3149.
［46］	WANG H， WANG W N， LIU J. Temporal memory attention for video semantic segmentation［J］. arXiv preprint arXiv:2102.08643， 2021.
［47］	于东飞. 基于注意力机制与高层语义的视觉问答研究［D］. 合肥:中国科学技术大学， 2019.
［48］	GUO W Y， ZHANG Y， WU X F， et al. Re-attention for visual question answering［C］// Proceedings of the AAAI Conference on Artificial Intelligence. 2020，34（1）:91-98.
［49］	孟乐乐. 融合时空网络与注意力机制的人体行为识别研究［D］. 北京:北京交通大学， 2018.
［50］	PEREZ-RUA J M， MARTINEZ B， ZHU X， et al. Knowing what， where and when to look: Efficient video action modeling with attention［J］. arXiv preprint arXiv: 2004.01278v1， 2020.
［51］	PU S， SONG Y B， MA C， et al. Deep attentive tracking via reciprocative learning［J］. arXiv preprint arXiv:1810.03851， 2018.
［52］	CAO Z， FU C H， YE J J， et al. SiamAPN++: Siamese attentional aggregation network for real-time UAV Tracking［J］. arXiv preprint arXiv:2106.08816v2， 2021.
［53］	XUE Y， YUAN Z M， NERI F. ConAM: Confidence attention module for convolutional neural networks［J］. arXiv preprint arXiv:2110.14369， 2021.
［54］	GUO M H， LU C Z， LIU Z N， et al. Visual attention network［J］. arXiv preprint arXiv:2202.09741， 2022.
［55］	姚懿秦，郭薇. 基于交互注意力机制的多模态情感识别算法［J］. 计算机应用研究， 2021，38（6）:1689-1693.
［56］	LIU H D， XU S Y， FU J M， et al. CMA-CLIP: Cross-modality attention CLIP for image-text classification［J］. arXiv preprint arXiv:2112.03562， 2021.
［57］	HAFIZ A M， PARAH S A， BHAT R U A. Attention mechanisms and deep learning for machine vision: A survey of the state of the art［J］. arXiv preprint arXiv:2106.07550v1， 2021.
［58］	姚玉倩. 基于胶囊网络的人脸表情特征提取与识别算法研究［D］. 北京:北京交通大学， 2019.

[1]	王秋忆, 周浩, 郑婷婷. 改进RetinaNet的电力设备目标检测方法[J]. 计算机与现代化, 2024, 0(01): 47-52.
[2]	林启钊, 彭志平, 郭棉, 崔得龙. 基于双向多步预测的炉管温度场重构方法[J]. 计算机与现代化, 2024, 0(01): 53-58.
[3]	李亚平, 王军防, 余红梅, 窦一民, 肖媛, 田继林. Regformer：基于稀疏注意力的输油管道水力压降预测方法[J]. 计算机与现代化, 2024, 0(01): 59-66.
[4]	胡崇佳, 刘金洲, 方立. 基于无监督域适应的室外点云语义分割[J]. 计算机与现代化, 2024, 0(01): 74-79.
[5]	林威. 基于自监督学习和数据回放的新闻推荐模型增量学习方法[J]. 计算机与现代化, 2023, 0(12): 1-6.
[6]	宋涛涛, 李艳萍, 李洪港, 韩春雪. 基于改进变结构趋近律的机械臂滑模控制系统[J]. 计算机与现代化, 2023, 0(12): 14-18.
[7]	周成诚, 曾庆军, 杨康, 胡家铭, 韩春伟. 基于高效通道注意力模块的运动想象脑电识别[J]. 计算机与现代化, 2023, 0(12): 19-23.
[8]	梁天恺, 黄康华, 刘凯航, 兰岚, 曾碧. 基于双向同态加密的深度联邦图片分类方法[J]. 计算机与现代化, 2023, 0(12): 36-40.
[9]	邱凯星, 冯广. 基于双重特征注意力的多标签图像分类模型[J]. 计算机与现代化, 2023, 0(12): 41-47.
[10]	张浩洋, 尹梓名, 乐珺怡, 沈达聪, 束翌俊, 杨自逸, 孔祥勇, 龚伟. 3D-SPRNet: 一种基于并行解码器和双注意力机制的胆囊癌分割模型[J]. 计算机与现代化, 2023, 0(12): 59-66.
[11]	张伯泉, 麦海鹏, 陈嘉敏, 逄锦聚. 基于高灰度值注意力机制的脑白质高信号分割[J]. 计算机与现代化, 2023, 0(12): 67-75.
[12]	宁娟, 周庆华, 曾小为. 改进YOLOv7算法在西林瓶轧盖缺陷检测中的应用[J]. 计算机与现代化, 2023, 0(12): 82-86.
[13]	王宇航, 董宝良, 公超, 尚真真, 姚康宁. 基于意图识别的空中群目标动态威胁评估[J]. 计算机与现代化, 2023, 0(12): 100-104.
[14]	罗明杰, 冯开平. 基于沙漏结构与注意力机制的轻量级人脸表情识别方法[J]. 计算机与现代化, 2023, 0(11): 89-94.
[15]	马泽宇, 叶宁, 徐康, 王甦, 王汝传, . 基于FMCW雷达和ResNeSt-GRU的行为识别方法[J]. 计算机与现代化, 2023, 0(11): 101-107.