融合CNN与交互特征的多标签图像分类方法

摘要/Abstract

摘要： 图像在日常生活中广泛存在，图像分类具有重要的现实意义。针对当前多标签图像分类中因神经网络模型复杂以及提取到的图像特征信息不足而导致分类准确率较低、计算复杂度高等问题，提出一种融合卷积神经网络与交互特征的多标签分类方法，即MLCNN-IF模型。MLCNN-IF模型主要分成2步，首先参考传统CNN基本结构搭建一个仅有9层的轻量级神经网络（MLCNN），用于处理图像数据并提取特征；其次基于MLCNN提取的特征，通过交互特征方法产生各独立特征的组合特征，以此获得新的更丰富的特征集。实验结果表明，MLCNN-IF模型对比AlexNet、GoogLeNet和VGG16在4种多标签图像数据集上取得了更好的分类结果，其准确率和精准率分别平均提高9%和4.8%；同时MLCNN网络结构相对更简洁，有效降低了模型参数量和时间复杂度。

关键词: 卷积神经网络, 多标签学习, 深度学习, 图像分类, 交互特征

Abstract: Images exist widely in daily life, and image classification is of great practical significance. Aiming at the problems of low classification accuracy and high computational complexity in current multi-label image classification due to the complexity of the neural network model and the insufficient of extracted image feature information, a multi-label classification method combined CNN and interactive features, namely MLCNN-IF model, is proposed. The model is mainly divided into two steps. Firstly, a lightweight neural network （MLCNN） with only 9 layers is built with reference to the basic structure of traditional CNN, which is used to process image data and extract features. Secondly, based on the features extracted by MLCNN, the combined features of independent features are generated by the interactive feature method, so as to obtain a new and richer feature set. The experimental results show that compared with AlexNet, GoogLeNet and VGG16, the proposed model achieves better classification results on four multi-label image datasets, and its accuracy and precision rate are increased by 9% and 4.8% respectively on average. At the same time, the MLCNN network structure is relatively simpler, which effectively reduces the amount of model parameters and time complexity.

Key words: convolutional neural network, multi-label learning, deep learning, image classification, interactive feature

王盼红, 朱昌明. 融合CNN与交互特征的多标签图像分类方法[J]. 计算机与现代化, 2022, 0(09): 85-92.

WANG Pan-hong, ZHU Chang-ming. Multi-label Image Classification Method Combined CNN and Interactive Features[J]. Computer and Modernization, 2022, 0(09): 85-92.

参考文献

［1］ PARK J Y, HWANG Y, LEE D Y, et al. MarsNet: Multi-label classification network for images of various sizes［J］. IEEE Access, 2020,8（1）:21832-121846.
［2］ ZHANG Z L, ZHANG Z W, LIU Y, et al. Deep learning-based image classification of gas coal［J］. International Journal of Global Energy Issues, 2021,43（4）:371-386.
［3］ SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions［C］// 2015 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2015:1-9.
［4］ LIU X H, WANG H. AdvNet: Multi-task fusion of object detection and semantic segmentation［C］// 2019 Chinese Automation Congress （CAC）. 2020:3359-3362.
［5］ REN S Q, HE K M, GIRSHICI R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017,39（6）:1137-1149.
［6］ SEKHAR K, BABU T R, PRATHIBHA G, et al. Dermoscopic image classification using CNN with handcrafted features［J］. Journal of King Saud University-Science, 2021,33（6）:101550.
［7］陈立潮,武晨燕,曹建芳,等. 基于双通道卷积神经网络的多标签图像标注［J］. 计算机工程与设计, 2019,40（12）:3601-3607.
［8］ GHAZI M M, YANIKOGLU B, APTOULA E. Plant identification using deep neural networks via optimization of transfer learning parameters［J］. Neurocomputing, 2017,235（26）:228-235.
［9］ DIAS P A, TABB A, MEDEIROS H. Apple flower detection using deep convolutional networks［J］. Computers in Industry, 2018,99:17-28.
［10］GAYATHRI S, GOPI V P, PALANISAMY P A.Lightweight CNN for diabetic retinopathy classification from fundus images［J］. Biomedical Signal Processing and Control, 2020,62:102115.
［11］WANG J, YANG Y, MAO J H, et al. CNN-RNN: A unified framework for multi-label image classification［C］// 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2016:2285-2294.
［12］SONG L Y, LIU J, QIAN B, et al. A deep multi-modal CNN for multi-instance multi-label image classification［J］. IEEE Transactions on Image Processing, 2018,27（12）:6025-6038.
［13］DAO S D, ZHAO E, PHUNG D, et al. Multi-label image classification with contrastive learning［J］.arXiv preprint arXiv:2107.11626v1, 2021.
［14］WANG X J, XU J, HUA J, et al. Multi-labe limage classification optimization model based on deep learning［C］// China Conference on Wireless Sensor Networks. 2020:269-285.
［15］JIN R, HAN X Z, YU T R. A real-time image semantic segmentation method based on multilabel classification［J］. Mathematical Problems in Engineering, 2021（1）:1-13.
［16］黄睿,亢浏越. 基于标签正负相关性的多标签类属特征学习［J］. 计算机工程与设计, 2021,42（5）:1271-1277.
［17］何牧宇，周晖. ReliefF-MFO多标签特征选择算法［J］. 计算机工程与设计, 2019,40（12）:3469-3473.
［18］GWA B, RZ C, YTBD E, et al. Join tranking SVM and binary relevance with robust low-rank learning for multi-label classification［J］. Neural Networks, 2020,122:24-39.
［19］BJORCK J, GOMES C, SELAMAN B. Understanding batch normalization［J］.arXiv preprint arXiv:1806.02375, 2018.
［20］GARBIN C, ZHU X Q, MARQUES O. Dropout vs. batch normalization: An empirical study of their impact to deep learning［J］. Multimedia Tools and Applications, 2020,79（2）:12777-12815.
［21］LIU B, ZHANG X Y, GAO Z Y, et al. Weld defect images classification with VGG16-based neural network［C］// International Forum on Digital TV and Wireless Multimedia Communications. 2017:215-223.
［22］ALM M Z, TAHA T M, YAKOPCIC C, et al. The history began from AlexNet: A comprehensive survey on deep learning approaches［J］.arXiv preprint arXiv:1803.01164, 2018.
［23］TANG P, HAN L L, KWONG S, et al. G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition［J］. Neurocomputing, 2017,225:188-197.
［24］沈军,廖鑫,秦拯,等. 基于卷积神经网络的低嵌入率空域隐写分析［J］. 软件学报, 2021,32（9）:2901-2915.
［25］CHEN M, SEDIGHI V, BOROUMAND M, et al. JPEG-phase-aware convolutional neural network for steganalysis of JPEG images［C］// The 5th ACM Workshop. 2017:75-84.
［26］刘晓玲,刘柏嵩,王洋洋,等. 基于深度学习的多标签生成研究进展［J］. 计算机科学, 2020,47（3）:192-199.
［27］徐晓丹,姚明海,刘华文,等. 基于KNN的多标签分类预处理方法［J］. 计算机科学, 2015,42（5）:106-108.
［28］邢豪,李明. 基于3D CNNS的深度伪造视频篡改检测［J］. 计算机科学, 2021,48（7）:86-92.
［29］KINGMA D, BA J. Adam: A method for stochastic optimization［J］. arXiv preprint arXiv:1412.6980, 2014.
［30］ZHU C M, WANG P H, MA L, et al. Global and local multi-view multi-label learning with incomplete views and labels［J］. Neural Computing and Applications, 2020,371:67-77.
［31］GEDK N. A new feature extraction approach using contourlet transform and t-test statistics for mammogram classification［J］. Balkan Journal of Electrical and Computer Engineering, 2020. DOI:10.17694/bajece.557693.

[1]	何思达, 陈平华. 基于意图的轻量级自注意力序列推荐模型[J]. 计算机与现代化, 2024, 0(12): 1-9.
[2]	黄庭培1, 马禄彪1, 李世宝2, 刘建航1. 基于WiFi和原型网络的手势识别方法[J]. 计算机与现代化, 2024, 0(12): 34-39.
[3]	张晓东1, 白广芝1, 李敏1, 李昊洋2. 基于经验小波变换的油气井产量预测模型 [J]. 计算机与现代化, 2024, 0(12): 53-58.
[4]	刘宝宝, 杨菁菁, 陶露, 王贺应. 基于注意力的DSMSC的遥感图像场景分类[J]. 计算机与现代化, 2024, 0(12): 72-77.
[5]	祁贤, 刘大铭, 常佳鑫. 基于改进自注意力机制的多视图三维重建[J]. 计算机与现代化, 2024, 0(11): 106-112.
[6]	陈凯1, 李宜汀1, 2, 全华凤1 . 基于改进YOLOv8的河道废弃瓶检测方法[J]. 计算机与现代化, 2024, 0(11): 113-120.
[7]	杨骏1, 胡为1, 朱文福2. 基于改进MobileNetV3的视觉SLAM回环检测算法[J]. 计算机与现代化, 2024, 0(10): 21-26.
[8]	王莹莹, 郝潇. 基于Res2Net和递归门控卷积的细粒度图像分类[J]. 计算机与现代化, 2024, 0(10): 74-79.
[9]	史星宇1, 李强2, 庄莉3, 梁懿3, 王秋琳3, 陈锴3, 伍臣周3, 常胜1. 一种面向工业部署的目标检测模型蒸馏技术[J]. 计算机与现代化, 2024, 0(10): 93-99.
[10]	陈雪松1, 李衡1, 王浩畅2. 结合注意力机制和Mengzi模型的短文本分类[J]. 计算机与现代化, 2024, 0(09): 101-106.
[11]	张泽1, 张建权2, 3, 周国鹏2, 3. 基于改进YOLOv8s的摄像头模组缺陷检测[J]. 计算机与现代化, 2024, 0(09): 107-113.
[12]	程亚子1, 雷亮1, 2, 陈瀚1, 赵毅然1. 基于转置注意力的多尺度深度融合单目深度估计[J]. 计算机与现代化, 2024, 0(09): 121-126.
[13]	程萌, 李浩. 改进YOLOv5s的落叶树鸟巢检测方法[J]. 计算机与现代化, 2024, 0(08): 24-29.
[14]	王梦溪, 李峻. 老年人跌倒检测技术研究综述[J]. 计算机与现代化, 2024, 0(08): 30-36.
[15]	时现伟1, 范鑫2. 基于轻量化的视频帧场景语义分割方法[J]. 计算机与现代化, 2024, 0(08): 49-53.