基于双重特征注意力的多标签图像分类模型

doi:10.3969/j.issn.1006-2475.2013.12.008

摘要/Abstract

摘要： 摘要：针对目前多标签图像分类任务中存在的图像多区域特征信息提取不足、图像特征与标签语义关系构建难等问题，提出一种基于双重特征注意力的多标签图像分类模型。首先，构建图像特征注意力模块对图像信息进行全局多区域特征的注意力关联，增强图像特征提取能力；其次，通过构建联合特征注意力模块对图像特征信息和标签嵌入进行相关性表示，从而使标签与图像区域之间进行跨模态融合得到更优的映射关系。实验结果表明，该模型在VOC2007和COCO2014多标签图像分类数据集中均取得了较好的分类效果，其性能指标相比于现有算法有较大的提升，验证了该模型的有效性。

关键词: 关键词：图像分类, 多标签, 注意力机制, 深度学习, 特征关联

Abstract: Abstract： A multi-label image classification model based on dual feature attention is proposed to address the current problems of insufficient extraction of feature information from multiple image regions and difficulty in constructing semantic relationships between image features and labels in multi-label image classification tasks. Firstly, the image feature attention module is constructed to correlate the attention of image information with global multi-region features to enhance image feature extraction. Secondly, a combined feature attention module is constructed to perform correlation representation of image feature information and label embedding, thus enabling cross-modal fusion between labels and image regions to obtain a better mapping relationship. The experimental results show that the model achieves better classification results in both the VOC2007 and COCO2014 multi-label image classification datasets, and its performance metrics have improved significantly compared with existing algorithms, verifying the effectiveness of the model.

Key words: Key words： image classification, multi-label, attention mechanisms, deep learning, feature association

中图分类号:

TP391

邱凯星, 冯广. 基于双重特征注意力的多标签图像分类模型[J]. 计算机与现代化, 2023, 0(12): 41-47.

QIU Kai-xing, FENG Guang. A Multi-label Image Classification Model Based on Dual Feature Attention[J]. Computer and Modernization, 2023, 0(12): 41-47.

参考文献

［1］ KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks［C］// Proceedings of the 2012 International Conference on Neural Information Processing Systems. 2012：1097-1105.
［2］ SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［J］. arXiv preprint arXiv：1409.1556, 2014.
［3］ SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2015：1-9.
［4］ HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2016：770-778.
［5］ DURAND T, MEHRASA N, MORI G. Learning a deep convNet for multi-label classification with partial labels［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2019：647-657.
［6］ WEI Y C, XIA W, LIN M, et al. HCP： A flexible CNN framework for multi-label image classification［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016,38（9）：1901-1907.
［7］ WANG J, YANG Y, MAO J H, et al. CNN-RNN： A unified framework for multi-label image classification［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2016：2285-2294.
［8］ ZHANG J J, WU Q, SHEN C H, et al. Multilabel image classification with regional latent semantic dependencies［J］. IEEE Transactions on Multimedia, 2018,20（10）：2801-2813.
［9］ KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks［J］. arXiv preprint arXiv：1609.02907, 2016.
［10］ CHEN Z M, WEI X S, WANG P, et al. Multi-label image recognition with graph convolutional networks［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2019：5172-5181.
［11］ WANG Y T, XIE Y Z, LIU Y, et al. Fast graph convolution network based multi-label image recognition via cross-modal fusion［C］// Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2020：1575-1584.
［12］ VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017：6000-6010.
［13］ WANG F, JIANG M Q, QIAN C, et al. Residual attention network for image classification［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2017：6450-6458.
［14］ YAN Z, LIU W W, WEN S P, et al. Multi-label image classification by feature attention network［J］. IEEE Access, 2019,7：98005-98013.
［15］ GUO H, ZHENG K, FAN X C, et al. Visual attention consistency under image transforms for multi-label image classification［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2019：729-739.
［16］ LANCHANTIN J, WANG T L, ORDONEZ V, et al. General multi-label image classification with transformers［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2021：16473-16483.
［17］ LIU S L, ZHANG L, YANG X, et al. Query2label： A simple transformer way to multi-label classification［J］. arXiv preprint arXiv：2107.10834, 2021.
［18］ LI J L, LI P P, HU X G, et al. Learning common and label-specific features for multi-label classification with correlation information［J］. Pattern Recognition, 2022,121. DOI： 10.1016/j.patcog.2021.108259.
［19］ RIDNIK T, BEN-BARUCH E, ZAMIR N, et al. Asymmetric loss for multi-label classification［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision （ICCV）. 2021：82-91.
［20］ EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The PASCAL visual object classes （VOC） challenge［J］. International Journal of Computer Vision, 2010,88（2）：303-308.
［21］ LIN T Y, Maire M, BELONGIE S, et al. Microsoft COCO： Common objects in context［C］// Proceedings of the 2014 European Conference on Computer Vision （ECCV）. 2014：740-755.
［22］ CUBUK E D, ZOPH B, SHLENS J, et al. Randaugment： Practical automated data augmentation with a reduced search space［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. 2020：3008-3017.
［23］ LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization［J］. arXiv preprint arXiv：1711.05101, 2017.
［24］ CHEN T S, XU M X, HUI X L, et al. Learning semantic-specific graph representation for multi-label image recognition［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. 2019：522-531.
［25］ HASSANIN M, RADWAN I, KHAN S, et al. Learning discriminative representations for multi-label image recognition［J］. Journal of Visual Communication and Image Representation, 2022,83. DOI： 10.1016/j.jvcir.2022.103448.
［26］ CHEN Z M, WEI X S, WANG P, et al. Learning graph convolutional networks for multi-label recognition and applications［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023,45（6）：6969-6983 .
［27］ ZHU F, LI H S, OUYANG W L, et al. Learning spatial regularization with image-level supervisions for multi-label image classification［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2017：2027-2036.
［28］ GAO B B, ZHOU H Y. Learning to discover multi-class attentional regions for multi-label image recognition［J］. IEEE Transactions on Image Processing, 2021,30：5920-5932.
［29］ ZHAO J W, YAN K, ZHAO Y F, et al. Transformer-based dual relation graph for multi-label image recognition［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision （ICCV）. 2021：163-172.

[1]	何思达, 陈平华. 基于意图的轻量级自注意力序列推荐模型[J]. 计算机与现代化, 2024, 0(12): 1-9.
[2]	赵晨阳, 薛涛, 刘俊华. 基于改进Stable Diffusion的时尚服饰图案生成[J]. 计算机与现代化, 2024, 0(12): 15-23.
[3]	黄庭培1, 马禄彪1, 李世宝2, 刘建航1. 基于WiFi和原型网络的手势识别方法[J]. 计算机与现代化, 2024, 0(12): 34-39.
[4]	张晓东1, 白广芝1, 李敏1, 李昊洋2. 基于经验小波变换的油气井产量预测模型 [J]. 计算机与现代化, 2024, 0(12): 53-58.
[5]	刘云海1, 冯广1, 吴晓婷2, 杨群2. 复杂施工场景下的安全帽佩戴检测算法[J]. 计算机与现代化, 2024, 0(12): 66-71.
[6]	谷岳, 邓松峰, 沈霁, 穆文涛, 赵恩棋. 基于改进YOLOv8的SAR舰船目标检测算法[J]. 计算机与现代化, 2024, 0(12): 78-83.
[7]	王艳媛, 茅正冲. 中英文场景文本图像的检测和识别算法[J]. 计算机与现代化, 2024, 0(12): 84-90.
[8]	李钧超1, 尤菲1, 张超2, 苏乐乐2, 龚龑2. 基于新型多目标浣熊优化算法的BiLSTM-Attention#br# 预测模型及误差分析[J]. 计算机与现代化, 2024, 0(11): 70-76.
[9]	张宇1, 2, 黎靖1, 2, 马铭1, 2, 王众祥1, 2, 孙妍1, 2. YOLOLW:一个新的轻量级目标检测模型[J]. 计算机与现代化, 2024, 0(11): 91-98.
[10]	祁贤, 刘大铭, 常佳鑫. 基于改进自注意力机制的多视图三维重建[J]. 计算机与现代化, 2024, 0(11): 106-112.
[11]	陈凯1, 李宜汀1, 2, 全华凤1 . 基于改进YOLOv8的河道废弃瓶检测方法[J]. 计算机与现代化, 2024, 0(11): 113-120.
[12]	杨骏1, 胡为1, 朱文福2. 基于改进MobileNetV3的视觉SLAM回环检测算法[J]. 计算机与现代化, 2024, 0(10): 21-26.
[13]	魏学诚1, 江凌云1, 李研2, 何非2. 改进YOLOv5的路侧单目视角小目标检测算法[J]. 计算机与现代化, 2024, 0(10): 27-34.
[14]	杜猛俊1, 李昂1, 童俊1, 钱锦1, 康恺1, 王若丁1, 靳文星2. 基于改进极限学习算法的电力信息数据融合模型[J]. 计算机与现代化, 2024, 0(10): 61-64.
[15]	王莹莹, 郝潇. 基于Res2Net和递归门控卷积的细粒度图像分类[J]. 计算机与现代化, 2024, 0(10): 74-79.