基于改进的MoCo的遥感图像目标检测

摘要/Abstract

摘要： 卫星遥感图像的智能化处理存在着处理标注时标准不统一、数据分布不均匀的问题，导致有效样本不多、目标检测效果较差的现象。针对这种现象，提出一种基于MoCo无监督对比学习模型的目标检测算法，目标检测的框架采用以ResNet50为骨干网络的YOLOv5，使用对比学习得到的ResNet50的权重作为固定值不进行梯度迭代参与YOLOv5下游的检测任务训练。对比学习实验在AID数据集上进行，改进的MoCo v2的top-1精度最高达到95.888%。在下游的检测任务中，使用的是TGRS-HRRSD数据集，改进MoCo v2的预训练权重的mAP@.5:.95精度达到67.8%，较不使用预训练权重提高了5.6个百分点。结果证明改进的MoCo对比学习模型的有效性，在对比学习之后的下游检测任务中，检测精度也有所提高。

关键词: 无监督对比学习, 遥感图像检测, 注意力机制, YOLOv5

Abstract: In the intelligent processing of satellite remote sensing images, there are some problems such as inconsistent standards and uneven data distribution, resulting in few effective samples and poor object detection effect. Aiming at this phenomenon, an object detection algorithm based on MoCo unsupervised contrast learning model is proposed. The framework of object detection adopts YOLOv5 with ResNet50 as the backbone network, and the weight of ResNet50 obtained by contrastive learning is used as a fixed value to participate in the detection task training of YOLOv5 downstream without gradient iteration. The contrastive learning experiment is carried out on AID Dataset, and the top-1 accuracy of the improved MoCo v2 is 95.888%. In the downstream detection task, using the TGRS-HRRSD Dataset, the accuracy of mAP@.5:.95 with the improved MoCo v2 pre-training weight is 67.8%, which is 5.6 percentage points higher than that without the pre-training weight. The results show that the improved MoCo comparative learning model is effective, and the detection accuracy is also improved in the downstream detection tasks after the comparative learning.

Key words: unsupervised contrastive learning, remote sensing image detection, attention, YOLOv5

焦新泉, 李睿康, 陈建军. 基于改进的MoCo的遥感图像目标检测[J]. 计算机与现代化, 2022, 0(12): 88-94.

JIAO Xin-quan, LI Rui-kang, CHEN Jian-jun. Remote Sensing Image Object Detection Based on Improved MoCo[J]. Computer and Modernization, 2022, 0(12): 88-94.

参考文献

［1］ ZHANG X, LIU L Y, CHEN X D, et al. A novel multitemporal cloud and cloud shadow detection method using the integrated cloud z-scores model［J］. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019,12(1):123-134.
［2］ GUO J H, YANG J Y, YUE H J, et al. CDnetV2: CNN-based cloud detection for remote sensing imagery with cloud-snow coexistence［J］. IEEE Transactions on Geoscience and Remote Sensing, 2021,59(1):700-713.
［3］ HU J, SHEN L, SUN G. Squeeze-and-excitation networks［C］// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:7132-7141.
［4］ WANG Q, WU B, ZHU P, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020:11531-11539.
［5］ XIA G S, HU J W, HU F, et al. AID: A benchmark data set for performance evaluation of aerial scene classification［J］. IEEE Transactions on Geoscience and Remote Sensing, 2017,55(7):3965-3981.
［6］ ZHANG Y L, YUAN Y, FENG Y C, et al. Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection［J］. IEEE Transactions on Geoscience and Remote Sensing, 2019,57(8):5535-5548.
［7］ CHEN J, WAN L, ZHU J R, et al. Multi-scale spatial and channel-wise attention for improving object detection in remote sensing imagery［J］. IEEE Geoscience and Remote Sensing Letters, 2020,17(4):681-685.
［8］ TAN Q L, LING J, HU J, et al. Vehicle detection in high resolution satellite remote sensing images based on deep learning［J］. IEEE Access, 2020,8:153394-153402.
［9］ GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014:580-587.
［10］REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017,39(6):1137-1149.
［11］REDMON J, FARHADI A. YOLOv3: An incremental improvement［J］. arXiv preprint arXiv:1804.02767, 2018.
［12］BOCHKOVSKIY A, WANG C Y, MARK LIAO H Y. YOLOv4: Optimal speed and accuracy of object detection［J］. arXiv preprint arXiv:2004.10934, 2020.
［13］LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single ShotMultiBox Detector［C］// 2016 European Conference on Computer Vision. 2016:21-37.
［14］FU C Y, LIU W, RANGA A, et al. DSSD: Deconvolutional single shot detector［J］. arXiv preprint arXiv:1701.06659, 2017.
［15］HE K M, ZHANG X, REN S Q, et al. Deep residual learning for image recognition［C］// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016:770-778.
［16］LIN T, DOLLR P, GIRSHICK R, et al. Feature pyramid networks for object detection［C］// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017:936-944.
［17］LI H, XIONG P, AN J, et al. Pyramid attention network for semantic segmentation［J］. arXiv preprint arXiv:1805.10180, 2018.
［18］DOSOVITSKIY A, FISCHER P, SPRINGENBERG J T, et al. Discriminative unsupervised feature learning with exemplar convolutional neural networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016,38(9):1734-1747.
［19］YE M, ZHANG X, YUEN P C, et al. Unsupervised embedding learning via invariant and spreading instance feature［C］// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019:6203-6212. 〖HJ0.27mm〗
［20］OORD A V D, LI Y, VINYALS O. Representation learning with contrastive predictive coding［J］. arXiv preprint arXiv:1807.03748, 2018.
［21］HNAFF O J, SRINIVAS A, FAUW J D, et al. Data-efficient image recognition with contrastive predictive coding［C］// Proceedings of the 37th International Conference on Machine Learning. 2020:4182-4192.
［22］DEVON HJELM R, FEDOROV A, LAVOIE-MARCHILDON S, et al. Learning deep representations by mutual information estimation and maximization［J］. arXiv preprint arXiv:1808.06670, 2018.
［23］BACHMAN P, HJELM R D, BUCHWALTER W. Learning representations by maximizing mutual information across views［C］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019:15535-15545.
［24］WU Z R, XIONG Y J, YU S X, et al. Unsupervised feature learning via non-parametric instance discrimination［C］// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:3733-3742.
［25］HE K M, FAN H Q, WU Y X, et al. Momentum contrast for unsupervised visual representation learning［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020:9726-9735.
［26］CHEN X L, FAN H Q, GIRSHICK R, et al. Improved baselines with momentum contrastive learning［J］. arXiv preprint arXiv:2003.04297, 2020.
［27］CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations［J］. arXiv preprint arXiv:2002.05709, 2020.
［28］XIE E, DING J, WANG W, et al.DetCo: Unsupervised contrastive learning for object detection［C］// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 2021:8372-8381.
［29］LI Y, ZHANG Y, HUANG X, et al. Deep networks under scene-level supervision for multi-class geospatial object detection from remote sensing images［J］. ISPRS Journal of Photogrammetry and Remote Sensing, 2018,146:182-196.
［30］LI Y, CHEN W, ZHANG Y, et al. Accurate cloud detection in high-resolution remote sensing imagery by weakly supervised deep learning［J］. Remote Sensing of Environment, 2020. DOI:10.1016/j.rse.2020.112045.
［31］LI Y, KONG D, ZHANG Y, et al. Robust deep alignment network with remote sensing knowledge graph for zero-shot and generalized zero-shot remote sensing image scene classification［J］. ISPRS Journal of Photogrammetry and Remote Sensing, 2021,179:145-158.
［32］GUTMANN M, HYVRINEN A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models［J］. Journal of Machine Learning Research, 2010,9:297-304.
［33］CARON M, BOJANOWSKI P, JOULIN A, et al. Deepclustering for unsupervised learning of visual features［C］// 2018 European Conference on Computer Vision. 2018:139-156.
［34］CARON M, MISRA I, MAIRAL J, et al. Unsupervised learning of visual features by contrasting cluster assignments［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020:9912-9924.
［35］JIANG Z, VON N K, LOISEL J, et al.ArcticNet: A deep learning solution to classify arctic wetlands［J］. arXiv preprint arXiv:1906.00133, 2019.
［36］LU X, ZHANG Y, YUAN Y, et al. Gated and axis-concentrated localization network for remote sensing object detection［J］. IEEE Transactions on Geoscience and Remote Sensing, 2020,58(1):179-192.
［37］韩伟. 基于深度神经网络的高分辨率遥感影像弱小目标检测［D］. 武汉:中国地质大学, 2021.

[1]	何思达, 陈平华. 基于意图的轻量级自注意力序列推荐模型[J]. 计算机与现代化, 2024, 0(12): 1-9.
[2]	赵晨阳, 薛涛, 刘俊华. 基于改进Stable Diffusion的时尚服饰图案生成[J]. 计算机与现代化, 2024, 0(12): 15-23.
[3]	黄庭培1, 马禄彪1, 李世宝2, 刘建航1. 基于WiFi和原型网络的手势识别方法[J]. 计算机与现代化, 2024, 0(12): 34-39.
[4]	张晓东1, 白广芝1, 李敏1, 李昊洋2. 基于经验小波变换的油气井产量预测模型 [J]. 计算机与现代化, 2024, 0(12): 53-58.
[5]	刘云海1, 冯广1, 吴晓婷2, 杨群2. 复杂施工场景下的安全帽佩戴检测算法[J]. 计算机与现代化, 2024, 0(12): 66-71.
[6]	谷岳, 邓松峰, 沈霁, 穆文涛, 赵恩棋. 基于改进YOLOv8的SAR舰船目标检测算法[J]. 计算机与现代化, 2024, 0(12): 78-83.
[7]	王艳媛, 茅正冲. 中英文场景文本图像的检测和识别算法[J]. 计算机与现代化, 2024, 0(12): 84-90.
[8]	李钧超1, 尤菲1, 张超2, 苏乐乐2, 龚龑2. 基于新型多目标浣熊优化算法的BiLSTM-Attention#br# 预测模型及误差分析[J]. 计算机与现代化, 2024, 0(11): 70-76.
[9]	张宇1, 2, 黎靖1, 2, 马铭1, 2, 王众祥1, 2, 孙妍1, 2. YOLOLW:一个新的轻量级目标检测模型[J]. 计算机与现代化, 2024, 0(11): 91-98.
[10]	祁贤, 刘大铭, 常佳鑫. 基于改进自注意力机制的多视图三维重建[J]. 计算机与现代化, 2024, 0(11): 106-112.
[11]	杨骏1, 胡为1, 朱文福2. 基于改进MobileNetV3的视觉SLAM回环检测算法[J]. 计算机与现代化, 2024, 0(10): 21-26.
[12]	魏学诚1, 江凌云1, 李研2, 何非2. 改进YOLOv5的路侧单目视角小目标检测算法[J]. 计算机与现代化, 2024, 0(10): 27-34.
[13]	杜猛俊1, 李昂1, 童俊1, 钱锦1, 康恺1, 王若丁1, 靳文星2. 基于改进极限学习算法的电力信息数据融合模型[J]. 计算机与现代化, 2024, 0(10): 61-64.
[14]	杨世军1, 狄广义1, 高军1, 陈见飞1, 王耀坤1, 季晓晗2. 跨模态注意力融合和信息感知的情感一致检测[J]. 计算机与现代化, 2024, 0(10): 113-119.
[15]	候聪颖, 杨文清, 王召, 程聪. 基于时频自注意力残差时序卷积网络的语音增强[J]. 计算机与现代化, 2024, 0(09): 20-24.