计算机与现代化 ›› 2024, Vol. 0 ›› Issue (10): 74-79.doi: 10.3969/j.issn.1006-2475.2024.10.012

• 图像处理 • 上一篇    下一篇

基于Res2Net和递归门控卷积的细粒度图像分类


  

  1. (山西大学物理电子工程学院,山西 太原 030006)
  • 出版日期:2024-10-29 发布日期:2024-10-30
  • 基金资助:
    2021年度山西省基础研究计划(自由探索类)项目(202103021223029)

Fine-grained Image Classification Based on Res2Net and Recursive Gated Convolution

  1. (School of Physics and Electronic Engineering, Shanxi University, Taiyuan 030006, China)
  • Online:2024-10-29 Published:2024-10-30

摘要: 提取图像中具有判别性的区域在细粒度图像分类中起着至关重要的作用。现有的细粒度图像分类方法忽视图像多尺度信息以及相邻空间位置信息交互作用,难以准确提取细微特征,并且传统的CNN方法捕捉长距离语义信息不足,提取图像全局信息能力欠缺。针对这些问题设计一种基于Res2Net和递归门控卷积模块的细粒度分类算法。该网络中,使用弱监督数据增强网络(WS-DAN)进行数据扩展防止过拟合,将Res2Net作为特征提取网络,用以提取不同尺度的图像信息,增加网络层的感受野,同时在该网络中引入递归门控卷积模块,用来进一步融合信息并且实现高阶特征交互,提高网络建模能力。该方法分别在CUB-200-2011、Stanford Dogs和FGVC-Aircraft这3个公开数据集上达到了90.36%、93.1%和94.3%的准确率,能够有效地提取图像细微特征并实现分类。

关键词: 深度学习, 细粒度分类, Res2Net, 递归门控卷积

Abstract: Extracting discriminative regions in images plays a crucial role in fine-grained image classification. Existing fine-grained image classification methods ignore the multi-scale information of the image and the interaction of adjacent spatial position information,and it is difficult to accurately extract subtle features. Moreover, the traditional CNN method is insufficient to capture long-distance semantic information and cannot obtain accurate global information.To address these issues, a fine-grained classification algorithm based on Res2Net and recursive gated convolution module is designed. In this network, the weakly supervised data augmentation network (WS-DAN) is used for data expansion to prevent overfitting, and Res2Net is used as a feature extraction network, which can extract image information of different scales, increase the receptive field of network layer. Meanwhile, a recursive gated convolution module is introduced into the network to further fuse information and realize high-order feature interaction to improve network modeling capabilities. The proposed method achieves 90.36%, 93.1% and 94.3% accuracy on the three public datasets of CUB-200-2011, Stanford Dogs and FGVC-Aircraft, respectively, which can effectively extract subtle features of images and achieve classification.

Key words: deep learning, fine-grained classification, Res2Net, recursive gated convolution

中图分类号: