深度连接的超轻量化子空间注意模块

摘要/Abstract

摘要： 针对紧凑型卷积神经网络在部署现有注意力机制存在计算量或参数开销大的问题，提出一种改进的超轻量化子空间注意模块。首先，深度连接的子空间注意模块(Deep Connected Subspace Attention Mechanism, DCSAM)划分特征图为若干特征子空间，为每个特征子空间推导不同的注意特征图；其次，改进特征子空间进行空间校准的方式；最后，建立前后特征子空间的连接，实现前后特征子空间的信息流动。该子空间注意机制能够学习到多尺度、多频率的特征表示，更适合细粒度分类任务，且与现有视觉模型中的注意力机制是正交和互补的。实验结果表明，在ImageNet-1K和Stanford Cars数据集上，MobileNetV2在参数量和浮点运算数分别减少12%和24%的情况下，最高精度分别提高了0.48和约2个百分点。

关键词: 紧凑型, 注意力机制, 深度连接, 特征子空间

Abstract: In order to solve the problem of large computation or parameter overheads in deploying the existing attention mechanism of compact convolutional neural network, an improved ultra-lightweight subspace attention mechanism is proposed. Firstly, the deep connected subspace attention mechanism(DCSAM) is used to divide the feature map into several feature subspaces, and deduce different attention feature maps for each feature subspace. Secondly, the spatial calibration method of feature subspace is improved. Finally, the connection between the front and back feature subspaces is established to make the information flow between the front and back feature subspaces. The subspace attention mechanism enables multi-scale and multi-frequency feature representation, which is more suitable for fine-grained image classification. The method is orthogonal and complementary to the existing attention mechanisms used in visual models. The experimental results show that on ImageNet-1K and Stanford Cars datasets, the highest accuracy of MobileNetV2 is improved about 0.48 and 2 percent points when the number of parameters and floating-point operations are reduced by 12% and 24% respectively.

Key words: compact, attention mechanism, deep connection, feature subspace

张宸逍, 潘庆, 王效灵. 深度连接的超轻量化子空间注意模块[J]. 计算机与现代化, 2021, 0(12): 79-84.

ZHANG Chen-xiao, PAN Qing, WANG Xiao-ling. Deep Connected Ultra-lightweight Subspace Attention Mechanism[J]. Computer and Modernization, 2021, 0(12): 79-84.

参考文献

［1］ KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks［J］. Communications of the ACM, 2017,60(6):84-90.
［2］ SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［J］. arXiv preprint arXiv:1409.1556, 2014.
［3］ SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015:1-9.
［4］ HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
［5］ HU J, SHEN L, SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:7132-7141.
［6］ HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications［J］. arXiv preprint arXiv:1704.04861, 2017.
［7］ SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: Inverted residuals and linear bottlenecks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:4510-4520.
［8］ HOWARD A, SANDLER M, CHEN B, et al. Searching for mobileNetV3［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. 2019:1314-1324.
［9］ ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: An extremely efficient convolutional neural network for mobile devices［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:6848-6856.
［10］MA N N, ZHANG X Y, ZHENG H T, et al. ShuffleNet V2: Practical guidelines for efficient CNN architecture design［C］// Proceedings of the 2018 European Conference on Computer Vision (ECCV). 2018:122-138.

［11］WOO S, PARK J, LEE J Y. CBAM: Convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision (ECCV). 2018:3-19.

［12］LI X, WANG W H, HU X L, et al. Selective kernel networks［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019:510-519.
［13］HU J, SHEN L, ALBANIE S, et al. Gather-excite: Exploiting feature context in convolutional neural networks［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018:9423-9433.
［14］SAINI R, JHA N K, DAS B, et al. ULSAM: Ultra-lightweight subspace attention module for compact convolutional neural networks［C］// The 2020 IEEE Winter Conference on Applications of Computer Vision. 2020:1616-1625.
［15］MA X, GUO J D, TANG S H, et al. DCANet: Learning connected attentions for convolutional neural networks［J］. arXiv preprint arXiv:2007.05099, 2020.
［16］PARK J, WOO S, LEE J Y, et al. BAM: Bottleneck attention module［J］. arXiv preprint arXiv:1807.06514, 2018.
［17］WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:7794-7803.
［18］CAO Y, XU J R, LIN S, et al. GCNet: Non-local networks meet squeeze-excitation networks and beyond［J］. arXiv preprint arXiv:1904.11492, 2019.
［19］WANG F, JIANG M Q, QIAN C, et al. Residual attention network for image classification［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:6450-6458.
［20］RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge［J］. International Journal of Computer Vision, 2015,115(3):211-252.
［21］KRAUSE J, STARK M, DENG J, et al. 3D object representations for fine-grained categoryization［C］// Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops. 2013:554-561.
［22］KHOSLA A, JAYADEVAPRAKASH N, YAO B P, et al. Novel dataset for fine-grained image categorization: Stanford dogs［C］// Proc. CVPR Workshop on Fine-Grained Visual Categorization(FGVC). 2011.
［23］RAMACHANDRAN P, PARMAR N, VASWANI A, et al. Stand-alone self-attention in vision models［J］. arXiv preprint arXiv:1906.05909, 2019.
［24］CHATTOPADHAY A, SARKAR A, HOWLADER P, et al. Grad-CAM+〖KG-*3〗+: Generalized gradient-based visual explanations for deep convolutional networks［C］// 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2018:839-847.

[1]	何思达, 陈平华. 基于意图的轻量级自注意力序列推荐模型[J]. 计算机与现代化, 2024, 0(12): 1-9.
[2]	赵晨阳, 薛涛, 刘俊华. 基于改进Stable Diffusion的时尚服饰图案生成[J]. 计算机与现代化, 2024, 0(12): 15-23.
[3]	黄庭培1, 马禄彪1, 李世宝2, 刘建航1. 基于WiFi和原型网络的手势识别方法[J]. 计算机与现代化, 2024, 0(12): 34-39.
[4]	张晓东1, 白广芝1, 李敏1, 李昊洋2. 基于经验小波变换的油气井产量预测模型 [J]. 计算机与现代化, 2024, 0(12): 53-58.
[5]	刘云海1, 冯广1, 吴晓婷2, 杨群2. 复杂施工场景下的安全帽佩戴检测算法[J]. 计算机与现代化, 2024, 0(12): 66-71.
[6]	谷岳, 邓松峰, 沈霁, 穆文涛, 赵恩棋. 基于改进YOLOv8的SAR舰船目标检测算法[J]. 计算机与现代化, 2024, 0(12): 78-83.
[7]	王艳媛, 茅正冲. 中英文场景文本图像的检测和识别算法[J]. 计算机与现代化, 2024, 0(12): 84-90.
[8]	李钧超1, 尤菲1, 张超2, 苏乐乐2, 龚龑2. 基于新型多目标浣熊优化算法的BiLSTM-Attention#br# 预测模型及误差分析[J]. 计算机与现代化, 2024, 0(11): 70-76.
[9]	张宇1, 2, 黎靖1, 2, 马铭1, 2, 王众祥1, 2, 孙妍1, 2. YOLOLW:一个新的轻量级目标检测模型[J]. 计算机与现代化, 2024, 0(11): 91-98.
[10]	祁贤, 刘大铭, 常佳鑫. 基于改进自注意力机制的多视图三维重建[J]. 计算机与现代化, 2024, 0(11): 106-112.
[11]	杨骏1, 胡为1, 朱文福2. 基于改进MobileNetV3的视觉SLAM回环检测算法[J]. 计算机与现代化, 2024, 0(10): 21-26.
[12]	魏学诚1, 江凌云1, 李研2, 何非2. 改进YOLOv5的路侧单目视角小目标检测算法[J]. 计算机与现代化, 2024, 0(10): 27-34.
[13]	杜猛俊1, 李昂1, 童俊1, 钱锦1, 康恺1, 王若丁1, 靳文星2. 基于改进极限学习算法的电力信息数据融合模型[J]. 计算机与现代化, 2024, 0(10): 61-64.
[14]	杨世军1, 狄广义1, 高军1, 陈见飞1, 王耀坤1, 季晓晗2. 跨模态注意力融合和信息感知的情感一致检测[J]. 计算机与现代化, 2024, 0(10): 113-119.
[15]	候聪颖, 杨文清, 王召, 程聪. 基于时频自注意力残差时序卷积网络的语音增强[J]. 计算机与现代化, 2024, 0(09): 20-24.