基于增强多重注意力机制的深度神经网络的细粒度检测

doi:10.3969/j.issn.1006-2475.2019.09.015

摘要/Abstract

摘要： 现有的基于注意力机制的细粒度图像识别方法大多都没有考虑目标局部的相关性，而且以往大多数方法都用多阶段或者多尺度机制，导致效率不高且难以端到端训练。本文提出的方法能调节不同输入图像的不同部位的关系。基于上述思路的注意力机制的方法去学习每幅图的每个关注区域特征，再用增强多重注意力机制强化这一效果，让同类别图像具有类似的注意力机制，而不同类别的图像具有不一样的注意力机制，同时也能够进行端到端训练。

关键词: 多注意力机制, 端到端, 细粒度图像识别

Abstract: Most of the existing fine-grained image recognition methods based on attention mechanism do not consider the local correlation of the target. In addition, most of the previous methods use multi-stage or multi-scale mechanism, which leads to low efficiency and difficulty in end-to-end training. This paper proposes that the relationship between different parts of different input images can be adjusted. The method based on the attention mechanism of the above ideas is to learn the characteristics of each focus area of each graph. Then the amplified multi-attention method is used to enhance the effect, so that the same category of images have similar attention mechanism, and different categories of images have different attention mechanism and can also be trained end-to-end.

Key words: multiple attention mechanism, end-to-end, fine-grained image recognition

中图分类号:

TP319

周晨轶，冯宇，徐亦白，卢杉. 基于增强多重注意力机制的深度神经网络的细粒度检测[J]. 计算机与现代化, 2019, 0(09): 83-.

ZHOU Chen-yi, FENG Yu, XU Yi-bai, LU Shan. Fine-grained Image Recognition Based on Deep Neural Network #br# with Amplified Multi-attention Mechanism[J]. Computer and Modernization, 2019, 0(09): 83-.

参考文献

［1］ HAFNER J, SAWHNEY H S, EQUITZ W, et al. Efficient color histogram indexing for quadratic form distance functions［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995,17(7):729-736.
［2］ PENTLAND A P. Fractal-based description of natural scenes［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984,6(6):661-674.
［3］ SCHMID C, MOHR R. Local grayvalue invariants for image retrieval［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997,19(5):530-535.
［4］ LOWE D G. Object recognition from local scale-invariant features［C］// Proceedings of the 7th IEEE International Conference on Computer Vision. 1999,2:1150-1157.
［5］ LOWE D G. Distinctive image features from scale-invariant keypoints［J］. International Journal of Computer Vision, 2004,60(2):91-110.
［6］ BAY H, TUYTELAARS T, VAN GOOL L. SURF: Speeded up robust features［C］// European Conference on Computer Vision. 2006:404-417.
［7］ LIN D, SHEN X Y, LU C W, et al. Deep LAC: Deep localization, alignment and classification for fine-grained recognition［C］// IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2015:1666-1674.
［8］ KRAUSE J, JIN H L, YANG J C, et al. Fine-grained recognition without part annotations［C］// IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2015:5546-5555.
［9］ BRANSON S, VAN HORN G, BELONGIE S, et al. Bird Species Categorization Using Pose Normalized Deep Convolutional Nets［EB/OL］. (2014-06-11)［2019-02-13］. https://arxiv.org/abs/1406.2952.
［10］ZHANG N, DONAHUE J, GIRSHICK R, et al. Part-based R-CNNs for finegrained category detection［C］// European Conference on Computer Vision (ECCV). 2014:arXiv:1407.3867.
［11］ZHANG N, PALURI M, RANZATO M, et al. Panda: Pose aligned networks for deep attribute modeling［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014:1637-1644.
［12］MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014,2:2204-2212.
［13］FU J L, ZHENG H L, MEI T. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition［C］// IEEE Conference on Computer Vision and Pattern Recognition. 2017:4476-4484.
［14］HU J, SHEN L, SUN G. Squeeze-and-excitation networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence (Early Access), 2019:DOI:10.1109/TPAMI.2019.2913372.
［15］FARRELL R, OZA O, ZHANG N, et al. Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance［C］// Proceedings of the 2011 International Conference on Computer Vision. 2011:161-168.
［16］PARKHIO M, VEDALDI A, JAWAHAR C, et al. The truth about cats and dogs［C］// Proceedings of the 2011 International Conference on Computer Vision. 2011:1427-1434.
［17］KRAUSE J, GEBRU T, DENG J, et al. Learning features and parts for fine-grained recognition［C］// Proceedings of the 22nd International Conference on Pattern Recognition. 2014:26-33.
［18］KRAUSE J, STARK M, DENG J, et al. 3D object representations for fine grained categorization［C］// Proceedings of the IEEE International Conference on Computer Vision Workshops. 2013:554-561.
［19］JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks［C］// Advances in Neural Information Processing Systems(NIPS). 2015.
［20］LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNN models for fine-grained visual recognition［C］// IEEE International Conference on Computer Vision(ICCV). 2015:1449-1457.
［21］ZHU Y, ZHOU Y Z, YE Q X, et al. Soft proposal networks for weakly supervised object localization［C］// IEEE International Conference on Computer Vision (ICCV). 2017:1859-1868.
［22］ZHOU B, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization［C］// IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016:2921-2929.
［23］KULIS B. Metric Learning: A Survey［M］. Now Foundations and Trends, 2013.
［24］SALAKHUTDINOV R, HINTON G E.Learning a nonlinear embedding by preserving class neighbourhood structure［C］// Proceedings of the 11th International Conference on Artificial Intelligence and Statistics. 2007:412-419.
［25］SOHN K. Improved deep metric learning with multi-class n-pair loss objective［C］// Advances in Neural Information Processing Systems (NIPS). 2016.
［26］ZHAO B, FENG J S, WU X, et al. A survey on deep learning-based fine-grained object classification and semantic segmentation［J］. International Journal of Automation and Computing, 2017,14(2):119-135.
［27］WANG D Q, SHEN Z Q, SHAO J, et al. Multiple granularity descriptors for fine-grained categorization［C］// Proceedings of the IEEE International Conference on Computer Vision(ICCV). 2015:2399-2406.
［28］ZHENG H L, FU J L, MEI T. Learning multi-attention convolutional neural network for fine-grained image recognition［C］// IEEE International Conference on Computer Vision(ICCV). 2017:5219-5227.
［29］WANG F, JIANG M Q, QIAN C, et al. Residual attention network for image classification［C］// IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017:6450-6458.
［30］ZHANG X P, XIONG H K, ZHOU W G, et al. Picking deep filter responses for fine-grained image recognition［C］// IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016:1134-1142.
［31］LIU X, XIA T, WANG J, et al. Fully Convolutional Attention Networks for Fine-grained Recognition［EB/OL］. (2017-03-21)［2019-02-13］. https://arxiv.org/abs/1603.06765.
［32］BARGAL S A, ZUNINO A, KIM D, et al. Excitation backprop for RNNs［C］// IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2018:1440-1449.
［33］ZHANG J M, BARGAL S, LIN Z, et al. Top-down neural attention by excitation backprop［J］. International Journal of Computer Vision, 2018,126(10):1084-1102.
［34］ZHANG X L,WEI Y C, FENG J S, et al. Adversarial complementary learning for weakly supervised object localization［C］// IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). 2018:1325-1334.
［35］SUN M, YUAN Y C, ZHOU F, et al. Multi-attention multi-class constraint for fine-grained image recognition［C］// European Conference on Computer Vision(ECCV). 2018:834-850.
［36］ZHOU Z H. A brief introduction to weakly supervised learning［J］. National Science Review, 2018,5(1):44-53.
［37］PENG X, TANG Z Q, YANG F, et al. Jointly optimize data augmentation and network training: Adversarial data augmentation in human pose estimation［C］// IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). 2018:2226-2234.
［38］HUANG G, LIU Z, MAATEN L, et al. Densely connected convolutional networks［C］// IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2017:2261-2269.
［39］HU T, QI H G, XU J Z, et al. Facial Landmarks Detection by Self-iterative Regression Based Landmarks-attention Network［EB/OL］. (2018-03-01)［2019-02-13］.https://arxiv.org/pdf/1803.06598.pdf.
［40］DEVRIES T, TAYLOR G W. Improved Regularization of Convolutional Neural Networks with Cutout［EB/OL］.(2017-08-01)［2019-02-13］. https://arxiv.org/pdf/1708.04552.pdf.

[1]	张高义1, 徐杨1, 2, 曹斌1, 2, 石进1. 全局跨层交互网络学习细粒度图像特征表示[J]. 计算机与现代化, 2024, 0(03): 97-104.
[2]	袁甜甜, 李志华, 邱阳. 基于辅助学习的改进端到端合成语音检测方法[J]. 计算机与现代化, 2023, 0(05): 52-57.
[3]	刘立婷, 欧毓毅. 融合注意力机制与并行混合网络的DGA域名检测[J]. 计算机与现代化, 2022, 0(09): 119-126.
[4]	许鸿奎, 张子枫, 卢江坤, 周俊杰, 胡文烨, 姜彤彤. 混合CTC/Attention模型在普通话识别中的应用[J]. 计算机与现代化, 2022, 0(08): 1-6.
[5]	孙弘扬, 王尚. 基于残差门控循环卷积和注意力机制的端到端光学乐谱识别方法[J]. 计算机与现代化, 2022, 0(07): 85-90.
[6]	王芷悦, 崔琳, . 基于非线性堆叠双向网络的端到端声纹识别[J]. 计算机与现代化, 2022, 0(03): 13-17.
[7]	陈越洋，顾平，张超. 基于SP调度策略的簇树型无线传感网络QoS上界研究[J]. 计算机与现代化, 2014, 0(7): 117-120.
[8]	门佳;李伟. 互联网体系结构中的端到端原则[J]. 计算机与现代化, 2013, 1(9): 152-156.