计算机与现代化 ›› 2024, Vol. 0 ›› Issue (03): 97-104.doi: 10.3969/j.issn.1006-2475.2024.03.016

• 图像处理 • 上一篇    下一篇

全局跨层交互网络学习细粒度图像特征表示

  

  1. (1.贵州大学大数据与信息工程学院,贵州 贵阳 550025; 2.贵阳铝镁设计研究院有限公司,贵州 贵阳 550009)
  • 出版日期:2024-03-28 发布日期:2024-04-28
  • 作者简介:张高义(1998—),男,贵州威宁人,硕士研究生,研究方向:计算机视觉,图像处理,E-mail: 1667568637@qq.com; 通信作者:徐杨(1980—),男,贵州贵阳人,副教授,博士,研究方向:数据采集,机器学习,E-mail: xuy@gzu.edu.cn; 曹斌(1963—),男,贵州贵阳人,研究员,博士,研究方向:机电一体化,工业控制,信息安全,智能管理,E-mail: caobinh@sina.com; 石进(1995—),男,贵州兴义人,硕士研究生,研究方向:图像识别,E-mail: 1499627570@qq.com。
  • 基金资助:
    贵州省科技计划项目(黔合科支撑[2021] 一般176)

Global Cross-layer Interaction Networks Learning Fine-grained Images Features Representation

  1. (1. College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China;
    2. Guiyang Aluminum-magnesium Design and Research Institute Co., LTD., Guiyang 550009, China)
  • Online:2024-03-28 Published:2024-04-28

摘要: 摘要:细粒度图像分类中的关键任务是提取极具鉴别性的特征。在以往的模型中,往往采用双线性池化技术及其变种来解决这个问题。然而,大多数双线性池化及其变体会忽略层内或层间特征交互,这种不充分的交互易导致鉴别信息丢失或使鉴别信息包含过多冗余信息。针对上述问题,设计一种新的学习细粒度图像特征及特征表示的方法——全局跨层交互(GCI)网络。提出的分层双三次池化方法具有平衡提取鉴别信息和过滤冗余信息能力,并能同时建模层内和层间的特征交互。进一步分析层间交互计算结构,发现易于将交互计算结构与现有的通道注意力机制结合形成交互注意力机制,以提升骨干网络的关键特征提取能力。最后,将交互注意力机制构成的特征提取网络与双三次池化方法融合得到GCI,用来提取鲁棒的细粒度图像特征表示。在3个细粒度基准数据集上进行实验,实验结果表明分层双三次池化实现了分层交互池化框架中最优效果,即在CUB-200-2011、Stanford-Cars、FGVC-Aircraft上分别达到了87.4%、93.2%和92.1%的分类精度,将交互注意力机制融入后分类精度进一步提升至88.5%、95.1%和93.9%。

关键词: 关键词:细粒度图像识别, 全局跨层交互网络, 分层双三次池化, 层内层间特征交互, 交互注意力机制

Abstract: Abstract: The key task of fine-grained visual categorization is to extract highly discriminative features. In previous models, bilinear pooling techniques and their variants are often combined to solve this problem. However, most bilinear pooling and its variants ignore intra-layer or inter-layer feature interactions, and such insufficient interactions can easily lead to the loss of discriminative information or make the discriminative information contain too much redundant information. Aiming at the above problems, a new method for learning fine-grained image features and feature representations—Global Cross-layer Interaction (GCI) network is designed. The proposed hierarchical bicubic pooling method balances the ability of extracting discriminative information and filtering redundant information and can simultaneously model the feature interaction within and between layers. The interactive computing structure is combined with the existing channel attention mechanism to form an interactive attention mechanism to improve the key feature extraction capability of the backbone network. Finally, the feature extraction network composed of interactive attention mechanism is fused with bicubic pooling method to obtain GCI, and robust fine-grained image feature representation is extracted. Experiments are carried out on three fine-grained benchmark datasets, and the experimental results show that the hierarchical bicubic pooling achieves the best results in the hierarchical interactive pooling framework, namely the classification accuracy of CUB-200-2011, Stanford-Cars and FGVC-Aircraft is 87.4%, 93.2% and 92.1%, respectively, and the classification accuracy is further improved to 88.5%, 95.1% and 93.9% after the interactive attention mechanism is integrated.

Key words: Key words: fine-grained image recognition, global cross-layer interaction networks, hierarchical bicubic pooling, intra and inter layer feature interactions, interactive attention mechanism

中图分类号: