计算机与现代化 ›› 2025, Vol. 0 ›› Issue (09): 27-34.doi: 10.3969/j.issn.1006-2475.2025.09.004

• 人工智能 • 上一篇    下一篇

基于自蒸馏和自注意力增强的低数据细粒度图像分类

  


  1. (1.江苏省青少年科技中心,江苏 南京 210000; 2.南京航空航天大学计算科学与技术学院,江苏 南京 210000)
  • 出版日期:2025-09-24 发布日期:2025-09-24
  • 作者简介: 作者简介:张婧颖(1996—),女,山东德州人,助理研究员,硕士,研究方向:计算机视觉,E-mail: zhangjingying2024@163.com; 耿琳(1989—),女,江苏徐州人,博士研究生,研究方向:计算机视觉,E-mail: beargl@163.com; 通信作者:刘宁钟(1975—),男,江苏南京人,教授,博士,研究方向:计算机视觉,模式识别,人工智能,E-mail: lnz_nuaa@163.com。
  • 基金资助:
     基金项目:江苏省前沿引领技术基础研究重大项目(BK20222012)
      

Low-data Fine-grained Image Classification Based on Self-distillation and Self-attention Enhancement


  1. (1. Jiangsu Youth Science and Technology Center, Nanjing 210000, China; 
    2. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210000, China)
  • Online:2025-09-24 Published:2025-09-24

摘要:
摘要:利用有限的数据训练细粒度图像分类(Fine-grained Image Classification, FGIC)模型是一项巨大的挑战,其类别之间的细微差异可能不易辨别。一种常用的策略是利用预训练的网络模型生成有效的特征表示。然而当使用有限的细粒度数据对预训练模型进行微调时,模型往往倾向于提取相关性较低的特征,从而引发过拟合问题。针对上述问题,本文设计一种新的低数据条件下的细粒度图像分类方法SDA-Net。该方法通过融合空间自注意力机制和自蒸馏技术优化特征学习过程,能有效减轻因数据稀缺导致的过拟合问题,提升深度神经网络在低数据环境下的性能表现。具体而言,SDA-Net通过引入空间自注意力将上下文信息编码为局部特征,改进类内表示。同时引入一个蒸馏分支并将蒸馏损失用于增广后的输入样本中,实现网络内部知识的深度增强与传递。在3个细粒度基准数据上进行了全面的评估,结果显示SDA-Net相较于传统的微调方法及当前最先进的低数据FGIC技术,均展现出了显著的性能提升。在3种10%的低数据场景下,与标准ResNet-50相比,相对精度分别提高了30%、47%和29%;与SOTA相比,相对精度分别提高了15%、28%和17%。



关键词: 关键词:深度学习, 细粒度图像分类, 低数据学习, 自蒸馏, 自注意力, 数据增广

Abstract: Abstract: Training a fine-grained image classification (FGIC) model with limited data is a great challenge, where subtle differences between categories may not be easily discernible. A common strategy is to utilize pre-trained network models to generate effective feature representations. However, when fine-tuning the pre-trained model using limited fine-grained data, the model often tends to extract less relevant features, which triggers the overfitting problem. To address the above issues, this paper designs an new FGIC method named SDA-Net under low-data conditions, which optimizes the feature learning process by fusing the spatial self-attention mechanism and the self-distillation technique, which can effectively mitigate the overfitting problem caused by data scarcity and improve the performance of deep neural networks in low-data environments. Specifically, SDA-Net improves the intra-class representation by introducing spatial self-attention to encode contextual information into local features. Meanwhile, a distillation branch is introduced and the distillation loss is used in the augmented input samples, which realizes the deep enhancement and transfer of knowledge within the network. A comprehensive evaluation on three fine-grained benchmark data shows that SDA-Net exhibits significant performance gains compared to both traditional fine-tuning methods and the current SOTA low-data FGIC strategy. In 3 scenarios with 10% low-data volume, relative accuracies are improved by 30%, 47%, and 29%, respectively, compared to standard ResNet-50, and by 15%, 28%, and 17%, respectively, compared to SOTA.

Key words: Key words: deep learning, fine-grained image classification, low-data learning, self-distillation, self-attention, data augmentation

中图分类号: