Low-data Fine-grained Image Classification Based on Self-distillation and Self-attention Enhancement

doi:10.3969/j.issn.1006-2475.2025.09.004

Abstract

Abstract: Abstract: Training a fine-grained image classification （FGIC） model with limited data is a great challenge， where subtle differences between categories may not be easily discernible. A common strategy is to utilize pre-trained network models to generate effective feature representations. However， when fine-tuning the pre-trained model using limited fine-grained data， the model often tends to extract less relevant features， which triggers the overfitting problem. To address the above issues， this paper designs an new FGIC method named SDA-Net under low-data conditions， which optimizes the feature learning process by fusing the spatial self-attention mechanism and the self-distillation technique， which can effectively mitigate the overfitting problem caused by data scarcity and improve the performance of deep neural networks in low-data environments. Specifically， SDA-Net improves the intra-class representation by introducing spatial self-attention to encode contextual information into local features. Meanwhile， a distillation branch is introduced and the distillation loss is used in the augmented input samples， which realizes the deep enhancement and transfer of knowledge within the network. A comprehensive evaluation on three fine-grained benchmark data shows that SDA-Net exhibits significant performance gains compared to both traditional fine-tuning methods and the current SOTA low-data FGIC strategy. In 3 scenarios with 10% low-data volume， relative accuracies are improved by 30%， 47%， and 29%， respectively， compared to standard ResNet-50， and by 15%， 28%， and 17%， respectively， compared to SOTA.

Key words: Key words: deep learning, fine-grained image classification, low-data learning, self-distillation, self-attention, data augmentation

CLC Number:

TP391

ZHANG Jingying1, GENG Lin2, LIU Ningzhong2. Low-data Fine-grained Image Classification Based on Self-distillation and Self-attention Enhancement[J]. Computer and Modernization, 2025, 0(09): 27-34.

References

［1］ JIA D， WEI D， RICHARD S， et al. ImageNet: A large-scale hierarchical image database［C］// Proceedings of 2009 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2009:248-255.
［2］ SCHMARJE L， SANTAROSSA M， SCHRÖDER S M， et al. A survey on semi-self-and unsupervised learning for image classification［J］. IEEE Access， 2021，9:82146-82168.
［3］ XU M D， ZHANG Z， HU H， et al. End-to-end semi-supervised object detection with soft teacher［C］// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. IEEE， 2021:3060-3069.
［4］ YANG X L， SONG Z X， KING I， et al. A survey on deep semi-supervised learning［J］. IEEE Transactions on Knowledge and Data Engineering， 2022，35（9）:8934-8954.
［5］ SHU Y Y， YU B S， XU H M， et al. Improving fine-grained visual recognition in low data regimes via self-boosting attention mechanism［C］// European Conference on Computer Vision. Springer， 2022:449-465.
［6］马瑶，智敏，殷雁君，等. CNN 和 Transformer 在细粒度图像识别中的应用综述［J］. 计算机工程与应用， 2022，58（19）:53-63
［7］ DEMIDOV D， AL MAJZOUB R， KUMAR A， et al. Distilling local texture features for colorectal tissue classification in low data regimes［C］// International Workshop on Machine Learning in Medical Imaging. Springer， 2023:357-366.
［8］ SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［J］. arXiv preprint arXiv:1409.1556， 2014.
［9］ YUN S， HAN D， OH S J， et al. CutMix: Regularization strategy to train strong classifiers with localizable features［C］// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. IEEE， 2019:6023-6032.
［10］ HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2016:770-778.
［11］申志军，穆丽娜，高静，等. 细粒度图像分类综述［J］. 计算机应用， 2023，43（1）:51-60.
［12］ BREIKI F A， RIDZUAN M， GRANDHE R. Self-supervised learning for fine-grained image classification［J］. arXiv preprint arXiv:2107.13973， 2021.
［13］ SU J C， CHENG Z， MAJI S. A realistic evaluation of semi-supervised learning for fine-grained classification［C］// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2021:12966-12975.
［14］ TANG H， YUAN C C， LI Z H， et al. Learning attention-guided pyramidal features for few-shot fine-grained recognition［J］. Pattern Recognition， 2022，130. DOI: 10.1016/j.patcog.2022.108792.
［15］ FLORES C F， GONZALEZ-GARCIA A， VAN DE WEIJER J， et al. Saliency for fine-grained object recognition in domains with scarce training data［J］. Pattern Recognition， 2019，94:62-73.
［16］ ZHANG N， DONAHUE J， GIRSHICK R， et al. Part-based R-CNNs for fine-grained category detection［C］// The 13th European Conference on Computer Vision–ECCV 2014. Springer， 2014:834-849.
［17］ ZHUANG P Q， WANG Y L， QIAO Y. Learning attentive pairwise interaction for fine-grained classification［J］. Proceedings of the 2020 AAAI Conference on Artificial Intelligence， 2020，34（7）:13130-13137.
［18］江卓，吴茜，李贺武，等. 互联网端到端多路径传输跨层优化研究综述［J］. 软件学报， 2019，30（2）:302-322.
［19］ GAO Y， BEIJBOM O， ZHANG N， et al. Compact bilinear pooling［C］// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2016:317-326.
［20］ LIN T Y， ROYCHOWDHURY A， MAJI S. Bilinear CNN models for fine-grained visual recognition［C］// Proceedings of 2015 IEEE International Conference on Computer Vision. IEEE， 2015:1449-1457.
［21］ ZHENG H L， FU J L， ZHA Z J， et al. Learning deep bilinear transformation for fine-grained image representation［C］// Conference on Neural Information Processing Systems. ACM， 2020:3969-4770.
［22］ CHANG D L， DING Y F， XIE J Y， et al. The devil is in the channels: Mutual-channel loss for fine-grained image classification［J］. IEEE Transactions on Image Processing， 2020，29:4683-4695.
［23］ CHOU P Y， LIN C H， KAO W C. A novel plug-in module for fine-grained visual classification［J］. arXiv preprint arXiv:2202.03822， 2022.
［24］ LAGUNAS M， IMPATA B， MARTINEZ V， et al. Transfer learning for fine-grained classification using semi-supervised learning and visual transformers［J］. arXiv preprint arXiv:2305.10018， 2023.
［25］ DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16×16 words: Transformers for image recognition at scale［J］. arXiv preprint arXiv:2010.11929， 2020.
［26］ SUN M， YUAN Y C， ZHOU F， et al. Multi-attention multi-class constraint for fine-grained image recognition［C］// Proceedings of the European Conference on Computer Vision （ECCV）. Springer， 2018:805-821.
［27］ GENG P， LU X， HU C， et al. Focusing fine-grained action by self-attention-enhanced graph neural networks with contrastive learning［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2023，33（9）:4754-4768.
［28］ LIU X Z， ZHANG L F， LI T， et al. Dual attention guided multi-scale CNN for fine-grained image classification［J］. Information Sciences， 2021，573:37-45.
［29］ HINTON G， VINYALS O， DEAN J. Distilling the knowledge in a neural network［J］. arXiv preprint arXiv:1503.02531， 2015.
［30］ KIM K， JI B M， YOON D， et al. Self-knowledge distillation with progressive refinement of targets［C］// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. IEEE， 2021:6567-6576.
［31］ JAISWAL A， BABU A R， ZADEH M Z， et al. A survey on contrastive self-supervised learning［J］. Technologies， 2020，9（1）. DOI: 10.3390/technologies9010002.
［32］ MAZUMDER P， SINGH P， NAMBOODIRI V P. Fair visual recognition in limited data regime using self-supervision and self-distillation［C］// Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE， 2022:3095-3103.
［33］ RAMDAN A， HERYANA A， ARISAL A， et al. Transfer learning and fine-tuning for deep learning-based tea diseases detection on small datasets［C］// 2020 International Conference on Radar， Antenna， Microwave， Electronics， and Telecommunications （ICRAMET）. IEEE， 2020:206-211.
［34］ ZHANG H， GOODFELLOW I， METAXAS D， et al. Self-attention generative adversarial networks［C］// International Conference on Machine Learning. PMLR， 2019:7354-7363.
［35］ CHEN T， KORNBLITH S， SWERSKY K， et al. Big self-supervised models are strong semi-supervised learners［J］. Advances in Neural Information Processing Systems， 2020，33:22243-22255.
［36］ WANG X， FAN H Q， TIAN Y D， et al. On the importance of asymmetry for siamese representation learning［C］// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2022:16570-16579.
［37］ CARON M， TOUVRON H， MISRA I， et al. Emerging properties in self-supervised vision Transformers［C］// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. IEEE， 2021:9650-9660.
［38］ KIM T， OH J， KIM N Y， et al. Comparing kullback-leibler divergence and mean squared error loss in knowledge distillation［J］. arXiv preprint arXiv:2105.08919， 2021.
［39］ WAH C， BRANSON S， WELINDER P， et al. The Caltechucsd Birds-200-2011 Dataset［EB/OL］. （2023-03-25）［2024-07-25］. https：//authors.library.caltech.edu/records/cvm3y-5hh21.
［40］ KRAUSE J， STARK M， DENG J， et al. 3D object representations for fine-grained categorization［C］// Proceedings of 2013 IEEE International Conference on Computer Vision Workshops. IEEE， 2013:554-561.
［41］ MAJI S， RAHTU E， KANNALA J， et al. Fine-grained visual classification of aircraft［J］. arXiv preprint arXiv:1306.5151，2013.
［42］ YU C J， ZHAO X Y， ZHENG Q， et al. Hierarchical bilinear pooling for fine-grained visual recognition［C］// Proceedings of the 15th European Conference on Computer Vision （ECCV）. ACM， 2018: 595-610.
［43］ SZEGEDY C， LIU W， JIA Y Q， et al. Going deeper with convolutions［C］// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2015:1-9.
［44］ SZEGEDY C， VANHOUCKE V， IOFFE S， et al. Rethinking the inception architecture for computer vision［C］// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2016:2818-2826.
［45］ HUANG G， LIU Z， VAN DER MAATEN L， et al. Densely connected convolutional networks［C］// Proceedings of 2017 IEEE Conference on ComputerVision and Pattern Recognition. IEEE， 2017:4700-4708.
［46］ CARON M， MISRA I， MAIRAL J， et al. Unsupervised learning of visual features by contrasting cluster assignments［J］. Advances in Neural Information Processing Systems， 2020，33:9912-9924.

[1]	JIAO Leyan, ZHU Xinjuan. Speech Cloning Method Based on Self-attention Mechanism Speaker Encoder And SA-Decoder [J]. Computer and Modernization, 2025, 0(07): 69-76.
[2]	GAO Ningbo, ZHANG Xiaobin. Bert-BiGRU-CRF with Self-attention Fusion for Text Causal Relationship Extraction [J]. Computer and Modernization, 2025, 0(07): 112-118.
[3]	LIANG Panru, XIN Guojiang, DING Changsong. Pancreatic Image Segmentation Approach Based on Improved SegFormer [J]. Computer and Modernization, 2025, 0(06): 71-78.
[4]	CHENG Zhang, LIU Dan, WANG Yanxia. Gaze Estimation Model Based on Hybrid Transformer [J]. Computer and Modernization, 2025, 0(04): 1-5.
[5]	GUO Hua. Resampling of Imbalanced Data for Optimizing Downstream Tasks [J]. Computer and Modernization, 2025, 0(02): 28-32.
[6]	FENG Xinjie, WANG Wei. Twin Feature Fusion Network for Scene Text Image Super Resolution [J]. Computer and Modernization, 2025, 0(02): 86-93.
[7]	HE Sida, CHEN Pinghua. Intent-based Lightweight Self-Attention Network for Sequential Recommendation [J]. Computer and Modernization, 2024, 0(12): 1-9.
[8]	ZHANG Xiaodong1, BAI Guangzhi1, LI Min1, LI Haoyang2. Oil and Gas Well Production Prediction Model Based on Empirical Wavelet Transform [J]. Computer and Modernization, 2024, 0(12): 53-58.
[9]	QI Xian, LIU Daming, CHANG Jiaxin. Multi-view 3D Reconstruction Based on Improved Self-attention Mechanism [J]. Computer and Modernization, 2024, 0(11): 106-112.
[10]	HOU Congying, YANG Wengqing, WANG Zhao, CHENG Cong. Speech Enhancement Based on Time-frequency Self-attention Residual Temporal#br# Convolutional Networks [J]. Computer and Modernization, 2024, 0(09): 20-24.
[11]	YE Senhui, WANG Lei. Multi-view Reconstruction with Local Self-attention and Deep Optimization [J]. Computer and Modernization, 2024, 0(05): 92-98.
[12]	YOU Jiajing1, 2, HE Yueshun1, HE Linlin1, ZHONG Hailong1, 2. Encryption Traffic Classification Method Based on AHP-CNN [J]. Computer and Modernization, 2024, 0(04): 83-87.
[13]	CHEN Zhen1, YAO Jing-hui2, SU Cheng-yue1. Improved Algorithm for Keypoints Detection of Hip Based on U-Net [J]. Computer and Modernization, 2024, 0(02): 15-19.
[14]	LI Shi-da, XIANG Jian-wen. A Weakened Joint Reinforcement Method to Improve Robustness of Image Recognition Models [J]. Computer and Modernization, 2023, 0(10): 70-76.
[15]	LI Yan-man, WANG Bi-heng, ZHAO Ling-yan. Safety Helmet Detection Based on Lightweight YOLOv5 [J]. Computer and Modernization, 2023, 0(10): 59-64.

Low-data Fine-grained Image Classification Based on Self-distillation and Self-attention Enhancement

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments