Computer and Modernization ›› 2024, Vol. 0 ›› Issue (03): 97-104.doi: 10.3969/j.issn.1006-2475.2024.03.016

Previous Articles     Next Articles

Global Cross-layer Interaction Networks Learning Fine-grained Images Features Representation

  

  1. (1. College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China;
    2. Guiyang Aluminum-magnesium Design and Research Institute Co., LTD., Guiyang 550009, China)
  • Online:2024-03-28 Published:2024-04-28

Abstract: Abstract: The key task of fine-grained visual categorization is to extract highly discriminative features. In previous models, bilinear pooling techniques and their variants are often combined to solve this problem. However, most bilinear pooling and its variants ignore intra-layer or inter-layer feature interactions, and such insufficient interactions can easily lead to the loss of discriminative information or make the discriminative information contain too much redundant information. Aiming at the above problems, a new method for learning fine-grained image features and feature representations—Global Cross-layer Interaction (GCI) network is designed. The proposed hierarchical bicubic pooling method balances the ability of extracting discriminative information and filtering redundant information and can simultaneously model the feature interaction within and between layers. The interactive computing structure is combined with the existing channel attention mechanism to form an interactive attention mechanism to improve the key feature extraction capability of the backbone network. Finally, the feature extraction network composed of interactive attention mechanism is fused with bicubic pooling method to obtain GCI, and robust fine-grained image feature representation is extracted. Experiments are carried out on three fine-grained benchmark datasets, and the experimental results show that the hierarchical bicubic pooling achieves the best results in the hierarchical interactive pooling framework, namely the classification accuracy of CUB-200-2011, Stanford-Cars and FGVC-Aircraft is 87.4%, 93.2% and 92.1%, respectively, and the classification accuracy is further improved to 88.5%, 95.1% and 93.9% after the interactive attention mechanism is integrated.

Key words: Key words: fine-grained image recognition, global cross-layer interaction networks, hierarchical bicubic pooling, intra and inter layer feature interactions, interactive attention mechanism

CLC Number: