Fine-grained Image Recognition Based on Deep Neural Network #br# with Amplified Multi-attention Mechanism

doi:10.3969/j.issn.1006-2475.2019.09.015

Abstract

Abstract: Most of the existing fine-grained image recognition methods based on attention mechanism do not consider the local correlation of the target. In addition, most of the previous methods use multi-stage or multi-scale mechanism, which leads to low efficiency and difficulty in end-to-end training. This paper proposes that the relationship between different parts of different input images can be adjusted. The method based on the attention mechanism of the above ideas is to learn the characteristics of each focus area of each graph. Then the amplified multi-attention method is used to enhance the effect, so that the same category of images have similar attention mechanism, and different categories of images have different attention mechanism and can also be trained end-to-end.

Key words: multiple attention mechanism, end-to-end, fine-grained image recognition

CLC Number:

TP319

ZHOU Chen-yi, FENG Yu, XU Yi-bai, LU Shan. Fine-grained Image Recognition Based on Deep Neural Network #br# with Amplified Multi-attention Mechanism[J]. Computer and Modernization, 2019, 0(09): 83-.

References

［1］ HAFNER J, SAWHNEY H S, EQUITZ W, et al. Efficient color histogram indexing for quadratic form distance functions［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995,17(7):729-736.
［2］ PENTLAND A P. Fractal-based description of natural scenes［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984,6(6):661-674.
［3］ SCHMID C, MOHR R. Local grayvalue invariants for image retrieval［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997,19(5):530-535.
［4］ LOWE D G. Object recognition from local scale-invariant features［C］// Proceedings of the 7th IEEE International Conference on Computer Vision. 1999,2:1150-1157.
［5］ LOWE D G. Distinctive image features from scale-invariant keypoints［J］. International Journal of Computer Vision, 2004,60(2):91-110.
［6］ BAY H, TUYTELAARS T, VAN GOOL L. SURF: Speeded up robust features［C］// European Conference on Computer Vision. 2006:404-417.
［7］ LIN D, SHEN X Y, LU C W, et al. Deep LAC: Deep localization, alignment and classification for fine-grained recognition［C］// IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2015:1666-1674.
［8］ KRAUSE J, JIN H L, YANG J C, et al. Fine-grained recognition without part annotations［C］// IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2015:5546-5555.
［9］ BRANSON S, VAN HORN G, BELONGIE S, et al. Bird Species Categorization Using Pose Normalized Deep Convolutional Nets［EB/OL］. (2014-06-11)［2019-02-13］. https://arxiv.org/abs/1406.2952.
［10］ZHANG N, DONAHUE J, GIRSHICK R, et al. Part-based R-CNNs for finegrained category detection［C］// European Conference on Computer Vision (ECCV). 2014:arXiv:1407.3867.
［11］ZHANG N, PALURI M, RANZATO M, et al. Panda: Pose aligned networks for deep attribute modeling［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014:1637-1644.
［12］MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014,2:2204-2212.
［13］FU J L, ZHENG H L, MEI T. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition［C］// IEEE Conference on Computer Vision and Pattern Recognition. 2017:4476-4484.
［14］HU J, SHEN L, SUN G. Squeeze-and-excitation networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence (Early Access), 2019:DOI:10.1109/TPAMI.2019.2913372.
［15］FARRELL R, OZA O, ZHANG N, et al. Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance［C］// Proceedings of the 2011 International Conference on Computer Vision. 2011:161-168.
［16］PARKHIO M, VEDALDI A, JAWAHAR C, et al. The truth about cats and dogs［C］// Proceedings of the 2011 International Conference on Computer Vision. 2011:1427-1434.
［17］KRAUSE J, GEBRU T, DENG J, et al. Learning features and parts for fine-grained recognition［C］// Proceedings of the 22nd International Conference on Pattern Recognition. 2014:26-33.
［18］KRAUSE J, STARK M, DENG J, et al. 3D object representations for fine grained categorization［C］// Proceedings of the IEEE International Conference on Computer Vision Workshops. 2013:554-561.
［19］JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks［C］// Advances in Neural Information Processing Systems(NIPS). 2015.
［20］LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNN models for fine-grained visual recognition［C］// IEEE International Conference on Computer Vision(ICCV). 2015:1449-1457.
［21］ZHU Y, ZHOU Y Z, YE Q X, et al. Soft proposal networks for weakly supervised object localization［C］// IEEE International Conference on Computer Vision (ICCV). 2017:1859-1868.
［22］ZHOU B, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization［C］// IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016:2921-2929.
［23］KULIS B. Metric Learning: A Survey［M］. Now Foundations and Trends, 2013.
［24］SALAKHUTDINOV R, HINTON G E.Learning a nonlinear embedding by preserving class neighbourhood structure［C］// Proceedings of the 11th International Conference on Artificial Intelligence and Statistics. 2007:412-419.
［25］SOHN K. Improved deep metric learning with multi-class n-pair loss objective［C］// Advances in Neural Information Processing Systems (NIPS). 2016.
［26］ZHAO B, FENG J S, WU X, et al. A survey on deep learning-based fine-grained object classification and semantic segmentation［J］. International Journal of Automation and Computing, 2017,14(2):119-135.
［27］WANG D Q, SHEN Z Q, SHAO J, et al. Multiple granularity descriptors for fine-grained categorization［C］// Proceedings of the IEEE International Conference on Computer Vision(ICCV). 2015:2399-2406.
［28］ZHENG H L, FU J L, MEI T. Learning multi-attention convolutional neural network for fine-grained image recognition［C］// IEEE International Conference on Computer Vision(ICCV). 2017:5219-5227.
［29］WANG F, JIANG M Q, QIAN C, et al. Residual attention network for image classification［C］// IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017:6450-6458.
［30］ZHANG X P, XIONG H K, ZHOU W G, et al. Picking deep filter responses for fine-grained image recognition［C］// IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016:1134-1142.
［31］LIU X, XIA T, WANG J, et al. Fully Convolutional Attention Networks for Fine-grained Recognition［EB/OL］. (2017-03-21)［2019-02-13］. https://arxiv.org/abs/1603.06765.
［32］BARGAL S A, ZUNINO A, KIM D, et al. Excitation backprop for RNNs［C］// IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2018:1440-1449.
［33］ZHANG J M, BARGAL S, LIN Z, et al. Top-down neural attention by excitation backprop［J］. International Journal of Computer Vision, 2018,126(10):1084-1102.
［34］ZHANG X L,WEI Y C, FENG J S, et al. Adversarial complementary learning for weakly supervised object localization［C］// IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). 2018:1325-1334.
［35］SUN M, YUAN Y C, ZHOU F, et al. Multi-attention multi-class constraint for fine-grained image recognition［C］// European Conference on Computer Vision(ECCV). 2018:834-850.
［36］ZHOU Z H. A brief introduction to weakly supervised learning［J］. National Science Review, 2018,5(1):44-53.
［37］PENG X, TANG Z Q, YANG F, et al. Jointly optimize data augmentation and network training: Adversarial data augmentation in human pose estimation［C］// IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). 2018:2226-2234.
［38］HUANG G, LIU Z, MAATEN L, et al. Densely connected convolutional networks［C］// IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2017:2261-2269.
［39］HU T, QI H G, XU J Z, et al. Facial Landmarks Detection by Self-iterative Regression Based Landmarks-attention Network［EB/OL］. (2018-03-01)［2019-02-13］.https://arxiv.org/pdf/1803.06598.pdf.
［40］DEVRIES T, TAYLOR G W. Improved Regularization of Convolutional Neural Networks with Cutout［EB/OL］.(2017-08-01)［2019-02-13］. https://arxiv.org/pdf/1708.04552.pdf.

[1]	ZHANG Gao-yi1, XU Yang1, 2, CAO Bin1, 2, SHI Jin1. Global Cross-layer Interaction Networks Learning Fine-grained Images Features Representation [J]. Computer and Modernization, 2024, 0(03): 97-104.
[2]	YUAN Tian-tian, LI Zhi-hua, QIU Yang. Improved End-to-end Synthetic Speech Detection Method Based on Auxiliary Learning [J]. Computer and Modernization, 2023, 0(05): 52-57.
[3]	LIU Li-ting, OU Yu-yi. DGA Domain Name Detection Combining Attention Mechanisms and Parallel Hybrid Network [J]. Computer and Modernization, 2022, 0(09): 119-126.
[4]	XU Hong-kui, ZHANG Zi-feng, LU Jiang-kun, ZHOU Jun-jie, HU Wen-ye, JIANG Tong-tong. Application of Hybrid CTC/Attention Model in Mandarin Recognition [J]. Computer and Modernization, 2022, 0(08): 1-6.
[5]	SUN Hong-yang, WANG Shang. End-to-end Optical Music Recognition Method Based on Residual Gated Recurrent Convolutional Neural Network and Attention Mechanism [J]. Computer and Modernization, 2022, 0(07): 85-90.
[6]	CHEN Yue-yang, GU Ping, ZHANG Chao. Upper Bounds of Cluster-tree Wireless Sensor Network QoS Based on Strict Priority Scheduling Strategy [J]. Computer and Modernization, 2014, 0(7): 117-120.
[7]	MEN Jia;LI Wei. End-to-end Argument in Internet Architecture [J]. Computer and Modernization, 2013, 1(9): 152-156.