[1] HAFNER J, SAWHNEY H S, EQUITZ W, et al. Efficient color histogram indexing for quadratic form distance functions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995,17(7):729-736.
[2] PENTLAND A P. Fractal-based description of natural scenes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984,6(6):661-674.
[3] SCHMID C, MOHR R. Local grayvalue invariants for image retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997,19(5):530-535.
[4] LOWE D G. Object recognition from local scale-invariant features[C]// Proceedings of the 7th IEEE International Conference on Computer Vision. 1999,2:1150-1157.
[5] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004,60(2):91-110.
[6] BAY H, TUYTELAARS T, VAN GOOL L. SURF: Speeded up robust features[C]// European Conference on Computer Vision. 2006:404-417.
[7] LIN D, SHEN X Y, LU C W, et al. Deep LAC: Deep localization, alignment and classification for fine-grained recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2015:1666-1674.
[8] KRAUSE J, JIN H L, YANG J C, et al. Fine-grained recognition without part annotations[C]// IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2015:5546-5555.
[9] BRANSON S, VAN HORN G, BELONGIE S, et al. Bird Species Categorization Using Pose Normalized Deep Convolutional Nets[EB/OL]. (2014-06-11)[2019-02-13]. https://arxiv.org/abs/1406.2952.
[10]ZHANG N, DONAHUE J, GIRSHICK R, et al. Part-based R-CNNs for finegrained category detection[C]// European Conference on Computer Vision (ECCV). 2014:arXiv:1407.3867.
[11]ZHANG N, PALURI M, RANZATO M, et al. Panda: Pose aligned networks for deep attribute modeling[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014:1637-1644.
[12]MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014,2:2204-2212.
[13]FU J L, ZHENG H L, MEI T. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2017:4476-4484.
[14]HU J, SHEN L, SUN G. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence (Early Access), 2019:DOI:10.1109/TPAMI.2019.2913372.
[15]FARRELL R, OZA O, ZHANG N, et al. Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance[C]// Proceedings of the 2011 International Conference on Computer Vision. 2011:161-168.
[16]PARKHIO M, VEDALDI A, JAWAHAR C, et al. The truth about cats and dogs[C]// Proceedings of the 2011 International Conference on Computer Vision. 2011:1427-1434.
[17]KRAUSE J, GEBRU T, DENG J, et al. Learning features and parts for fine-grained recognition[C]// Proceedings of the 22nd International Conference on Pattern Recognition. 2014:26-33.
[18]KRAUSE J, STARK M, DENG J, et al. 3D object representations for fine grained categorization[C]// Proceedings of the IEEE International Conference on Computer Vision Workshops. 2013:554-561.
[19]JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks[C]// Advances in Neural Information Processing Systems(NIPS). 2015.
[20]LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNN models for fine-grained visual recognition[C]// IEEE International Conference on Computer Vision(ICCV). 2015:1449-1457.
[21]ZHU Y, ZHOU Y Z, YE Q X, et al. Soft proposal networks for weakly supervised object localization[C]// IEEE International Conference on Computer Vision (ICCV). 2017:1859-1868.
[22]ZHOU B, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization[C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016:2921-2929.
[23]KULIS B. Metric Learning: A Survey[M]. Now Foundations and Trends, 2013.
[24]SALAKHUTDINOV R, HINTON G E.Learning a nonlinear embedding by preserving class neighbourhood structure[C]// Proceedings of the 11th International Conference on Artificial Intelligence and Statistics. 2007:412-419.
[25]SOHN K. Improved deep metric learning with multi-class n-pair loss objective[C]// Advances in Neural Information Processing Systems (NIPS). 2016.
[26]ZHAO B, FENG J S, WU X, et al. A survey on deep learning-based fine-grained object classification and semantic segmentation[J]. International Journal of Automation and Computing, 2017,14(2):119-135.
[27]WANG D Q, SHEN Z Q, SHAO J, et al. Multiple granularity descriptors for fine-grained categorization[C]// Proceedings of the IEEE International Conference on Computer Vision(ICCV). 2015:2399-2406.
[28]ZHENG H L, FU J L, MEI T. Learning multi-attention convolutional neural network for fine-grained image recognition[C]// IEEE International Conference on Computer Vision(ICCV). 2017:5219-5227.
[29]WANG F, JIANG M Q, QIAN C, et al. Residual attention network for image classification[C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017:6450-6458.
[30]ZHANG X P, XIONG H K, ZHOU W G, et al. Picking deep filter responses for fine-grained image recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2016:1134-1142.
[31]LIU X, XIA T, WANG J, et al. Fully Convolutional Attention Networks for Fine-grained Recognition[EB/OL]. (2017-03-21)[2019-02-13]. https://arxiv.org/abs/1603.06765.
[32]BARGAL S A, ZUNINO A, KIM D, et al. Excitation backprop for RNNs[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2018:1440-1449.
[33]ZHANG J M, BARGAL S, LIN Z, et al. Top-down neural attention by excitation backprop[J]. International Journal of Computer Vision, 2018,126(10):1084-1102.
[34]ZHANG X L,WEI Y C, FENG J S, et al. Adversarial complementary learning for weakly supervised object localization[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). 2018:1325-1334.
[35]SUN M, YUAN Y C, ZHOU F, et al. Multi-attention multi-class constraint for fine-grained image recognition[C]// European Conference on Computer Vision(ECCV). 2018:834-850.
[36]ZHOU Z H. A brief introduction to weakly supervised learning[J]. National Science Review, 2018,5(1):44-53.
[37]PENG X, TANG Z Q, YANG F, et al. Jointly optimize data augmentation and network training: Adversarial data augmentation in human pose estimation[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). 2018:2226-2234.
[38]HUANG G, LIU Z, MAATEN L, et al. Densely connected convolutional networks[C]// IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2017:2261-2269.
[39]HU T, QI H G, XU J Z, et al. Facial Landmarks Detection by Self-iterative Regression Based Landmarks-attention Network[EB/OL]. (2018-03-01)[2019-02-13].https://arxiv.org/pdf/1803.06598.pdf.
[40]DEVRIES T, TAYLOR G W. Improved Regularization of Convolutional Neural Networks with Cutout[EB/OL].(2017-08-01)[2019-02-13]. https://arxiv.org/pdf/1708.04552.pdf. |