Low-data Fine-grained Image Classification Based on Self-distillation and Self-attention Enhancement
(1. Jiangsu Youth Science and Technology Center, Nanjing 210000, China; 2. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210000, China)
ZHANG Jingying1, GENG Lin2, LIU Ningzhong2. Low-data Fine-grained Image Classification Based on Self-distillation and Self-attention Enhancement[J]. Computer and Modernization, 2025, 0(09): 27-34.
[1] JIA D, WEI D, RICHARD S, et al. ImageNet: A large-scale hierarchical image database[C]// Proceedings of 2009 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2009:248-255.
[2] SCHMARJE L, SANTAROSSA M, SCHRÖDER S M, et al. A survey on semi-self-and unsupervised learning for image classification[J]. IEEE Access, 2021,9:82146-82168.
[3] XU M D, ZHANG Z, HU H, et al. End-to-end semi-supervised object detection with soft teacher[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. IEEE, 2021:3060-3069.
[4] YANG X L, SONG Z X, KING I, et al. A survey on deep semi-supervised learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2022,35(9):8934-8954.
[5] SHU Y Y, YU B S, XU H M, et al. Improving fine-grained visual recognition in low data regimes via self-boosting attention mechanism[C]// European Conference on Computer Vision. Springer, 2022:449-465.
[6] 马瑶,智敏,殷雁君,等. CNN 和 Transformer 在细粒度图像识别中的应用综述[J]. 计算机工程与应用, 2022,58(19):53-63
[7] DEMIDOV D, AL MAJZOUB R, KUMAR A, et al. Distilling local texture features for colorectal tissue classification in low data regimes[C]// International Workshop on Machine Learning in Medical Imaging. Springer, 2023:357-366.
[8] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[9] YUN S, HAN D, OH S J, et al. CutMix: Regularization strategy to train strong classifiers with localizable features[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. IEEE, 2019:6023-6032.
[10] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016:770-778.
[11] 申志军,穆丽娜,高静,等. 细粒度图像分类综述[J]. 计算机应用, 2023,43(1):51-60.
[12] BREIKI F A, RIDZUAN M, GRANDHE R. Self-supervised learning for fine-grained image classification[J]. arXiv preprint arXiv:2107.13973, 2021.
[13] SU J C, CHENG Z, MAJI S. A realistic evaluation of semi-supervised learning for fine-grained classification[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2021:12966-12975.
[14] TANG H, YUAN C C, LI Z H, et al. Learning attention-guided pyramidal features for few-shot fine-grained recognition[J]. Pattern Recognition, 2022,130. DOI: 10.1016/j.patcog.2022.108792.
[15] FLORES C F, GONZALEZ-GARCIA A, VAN DE WEIJER J, et al. Saliency for fine-grained object recognition in domains with scarce training data[J]. Pattern Recognition, 2019,94:62-73.
[16] ZHANG N, DONAHUE J, GIRSHICK R, et al. Part-based R-CNNs for fine-grained category detection[C]// The 13th European Conference on Computer Vision–ECCV 2014. Springer, 2014:834-849.
[17] ZHUANG P Q, WANG Y L, QIAO Y. Learning attentive pairwise interaction for fine-grained classification[J]. Proceedings of the 2020 AAAI Conference on Artificial Intelligence, 2020,34(7):13130-13137.
[18] 江卓,吴茜,李贺武,等. 互联网端到端多路径传输跨层优化研究综述[J]. 软件学报, 2019,30(2):302-322.
[19] GAO Y, BEIJBOM O, ZHANG N, et al. Compact bilinear pooling[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016:317-326.
[20] LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNN models for fine-grained visual recognition[C]// Proceedings of 2015 IEEE International Conference on Computer Vision. IEEE, 2015:1449-1457.
[21] ZHENG H L, FU J L, ZHA Z J, et al. Learning deep bilinear transformation for fine-grained image representation[C]// Conference on Neural Information Processing Systems. ACM, 2020:3969-4770.
[22] CHANG D L, DING Y F, XIE J Y, et al. The devil is in the channels: Mutual-channel loss for fine-grained image classification[J]. IEEE Transactions on Image Processing, 2020,29:4683-4695.
[23] CHOU P Y, LIN C H, KAO W C. A novel plug-in module for fine-grained visual classification[J]. arXiv preprint arXiv:2202.03822, 2022.
[24] LAGUNAS M, IMPATA B, MARTINEZ V, et al. Transfer learning for fine-grained classification using semi-supervised learning and visual transformers[J]. arXiv preprint arXiv:2305.10018, 2023.
[25] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
[26] SUN M, YUAN Y C, ZHOU F, et al. Multi-attention multi-class constraint for fine-grained image recognition[C]// Proceedings of the European Conference on Computer Vision (ECCV). Springer, 2018:805-821.
[27] GENG P, LU X, HU C, et al. Focusing fine-grained action by self-attention-enhanced graph neural networks with contrastive learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023,33(9):4754-4768.
[28] LIU X Z, ZHANG L F, LI T, et al. Dual attention guided multi-scale CNN for fine-grained image classification[J]. Information Sciences, 2021,573:37-45.
[29] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
[30] KIM K, JI B M, YOON D, et al. Self-knowledge distillation with progressive refinement of targets[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. IEEE, 2021:6567-6576.
[31] JAISWAL A, BABU A R, ZADEH M Z, et al. A survey on contrastive self-supervised learning[J]. Technologies, 2020,9(1). DOI: 10.3390/technologies9010002.
[32] MAZUMDER P, SINGH P, NAMBOODIRI V P. Fair visual recognition in limited data regime using self-supervision and self-distillation[C]// Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE, 2022:3095-3103.
[33] RAMDAN A, HERYANA A, ARISAL A, et al. Transfer learning and fine-tuning for deep learning-based tea diseases detection on small datasets[C]// 2020 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications (ICRAMET). IEEE, 2020:206-211.
[34] ZHANG H, GOODFELLOW I, METAXAS D, et al. Self-attention generative adversarial networks[C]// International Conference on Machine Learning. PMLR, 2019:7354-7363.
[35] CHEN T, KORNBLITH S, SWERSKY K, et al. Big self-supervised models are strong semi-supervised learners[J]. Advances in Neural Information Processing Systems, 2020,33:22243-22255.
[36] WANG X, FAN H Q, TIAN Y D, et al. On the importance of asymmetry for siamese representation learning[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2022:16570-16579.
[37] CARON M, TOUVRON H, MISRA I, et al. Emerging properties in self-supervised vision Transformers[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. IEEE, 2021:9650-9660.
[38] KIM T, OH J, KIM N Y, et al. Comparing kullback-leibler divergence and mean squared error loss in knowledge distillation[J]. arXiv preprint arXiv:2105.08919, 2021.
[39] WAH C, BRANSON S, WELINDER P, et al. The Caltechucsd Birds-200-2011 Dataset[EB/OL]. (2023-03-25)[2024-07-25]. https://authors.library.caltech.edu/records/cvm3y-5hh21.
[40] KRAUSE J, STARK M, DENG J, et al. 3D object representations for fine-grained categorization[C]// Proceedings of 2013 IEEE International Conference on Computer Vision Workshops. IEEE, 2013:554-561.
[41] MAJI S, RAHTU E, KANNALA J, et al. Fine-grained visual classification of aircraft[J]. arXiv preprint arXiv:1306.5151,2013.
[42] YU C J, ZHAO X Y, ZHENG Q, et al. Hierarchical bilinear pooling for fine-grained visual recognition[C]// Proceedings of the 15th European Conference on Computer Vision (ECCV). ACM, 2018: 595-610.
[43] SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2015:1-9.
[44] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016:2818-2826.
[45] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]// Proceedings of 2017 IEEE Conference on ComputerVision and Pattern Recognition. IEEE, 2017:4700-4708.
[46] CARON M, MISRA I, MAIRAL J, et al. Unsupervised learning of visual features by contrasting cluster assignments[J]. Advances in Neural Information Processing Systems, 2020,33:9912-9924.