Computer and Modernization ›› 2025, Vol. 0 ›› Issue (09): 27-34.doi: 10.3969/j.issn.1006-2475.2025.09.004

Previous Articles     Next Articles

Low-data Fine-grained Image Classification Based on Self-distillation and Self-attention Enhancement

  


  1. (1. Jiangsu Youth Science and Technology Center, Nanjing 210000, China; 
    2. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210000, China)
  • Online:2025-09-24 Published:2025-09-24

Abstract: Abstract: Training a fine-grained image classification (FGIC) model with limited data is a great challenge, where subtle differences between categories may not be easily discernible. A common strategy is to utilize pre-trained network models to generate effective feature representations. However, when fine-tuning the pre-trained model using limited fine-grained data, the model often tends to extract less relevant features, which triggers the overfitting problem. To address the above issues, this paper designs an new FGIC method named SDA-Net under low-data conditions, which optimizes the feature learning process by fusing the spatial self-attention mechanism and the self-distillation technique, which can effectively mitigate the overfitting problem caused by data scarcity and improve the performance of deep neural networks in low-data environments. Specifically, SDA-Net improves the intra-class representation by introducing spatial self-attention to encode contextual information into local features. Meanwhile, a distillation branch is introduced and the distillation loss is used in the augmented input samples, which realizes the deep enhancement and transfer of knowledge within the network. A comprehensive evaluation on three fine-grained benchmark data shows that SDA-Net exhibits significant performance gains compared to both traditional fine-tuning methods and the current SOTA low-data FGIC strategy. In 3 scenarios with 10% low-data volume, relative accuracies are improved by 30%, 47%, and 29%, respectively, compared to standard ResNet-50, and by 15%, 28%, and 17%, respectively, compared to SOTA.

Key words: Key words: deep learning, fine-grained image classification, low-data learning, self-distillation, self-attention, data augmentation

CLC Number: