结合注意力机制的HRNet图像语义分割算法

doi:10.3969/j.issn.1006-2475.2023.10.010

摘要/Abstract

摘要： 目前主流的语义分割算法中依然存在小尺寸目标丢失、分割不精确等问题，针对这些问题本文基于HRNet网络模型进行改进，融入注意力机制生成更有效的特征图，对于原模型中低分辨率图像直接向高分辨率图像融合而产生的特征图细节不足的问题，提出多级上采样机制，让不同分辨率图像之间的融合方式更平滑从而得到更好的融合效果，同时使用深度可分离卷积减少模型的参数。本文模型全程保持了图像较高的分辨率，保留了特征图的空间信息，提升了对小尺寸目标的分割效果。在PASCAL VOC2012增强版数据集上的mIoU值达到80.87%，和原模型相比，精度提升了1.54个百分点。

关键词: 关键词：图像语义分割, 注意力机制, 高分辨率, 深度可分离卷积

Abstract: Abstract: The current mainstream semantic segmentation algorithms still have problems such as loss of small-sized objects and inaccurate segmentation. In response to these problems， this paper improves the HRNet network model and integrates the attention mechanism to generate more effective feature maps. To address the problem of insufficient detail of the feature map caused by the direct fusion of the low resolution images to the high-resolution images in the original model， a multi-level upsampling mechanism is proposed to make the fusion between images of different resolutions smoother to achieve better fusion results， and the depth separable convolution is used to reduce the parameters of the model. The model in this article maintains a high resolution of the image throughout the entire process. The spatial information of the feature map is improved， and the segmentation effect of small-sized objects is improved. The mIoU value on the PASCAL VOC2012 enhanced dataset reaches 80.87%， and the accuracy is improved by 1.54 percentage points compared with the original model.

Key words: Key words: image semantic segmentation, attention mechanism, high resolution, depthwise separable convolution

中图分类号:

TP391

叶思佳, 魏延, 杜韩宇, 邓金枝. 结合注意力机制的HRNet图像语义分割算法[J]. 计算机与现代化, 2023, 0(10): 65-69.

YE Si-jia, WEI Yan, DU Han-yu, DENG Jin-zhi. HRNet Image Semantic Segmentation Algorithm Combined with Attention Mechanism[J]. Computer and Modernization, 2023, 0(10): 65-69.

参考文献

［1］田萱，王亮，丁琪. 基于深度学习的图像语义分割方法综述［J］. 软件学报， 2019，30（2）:440-468.
［2］ YU H S， YANG Z G， TAN L， et al. Methods and datasets on semantic segmentation: A review［J］. Neurocomputing， 2018，304:82-103.
［3］ QIU C P， MOU L C， SCHMITT M， et al. Local climate zone-based urban land cover classification from multi-seasonal sentinel-2 images with a recurrent residual network［J］. ISPRS Journal of Photogrammetry and Remote Sensing， 2019，154:151-162.
［4］ MODY P. Semantic segmentation: Wiki， applications and resources［EB/OL］. （2018-12-06）［2020-05-24］. https://www.kdnuggets.com/2018/10/semantic-segmentation-wiki
-applications-resources.html.
［5］ SUN L， YANG K L， HU X X， et al. Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images［J］. IEEE Robotics and Automation Letters， 2020，5（4）:5558-5565.
［6］王龙飞，严春满. 道路场景语义分割综述［J/OL］. 激光与光电子学进展:1-25（2020-10-15）［2021-03-02］. http://kns.cnki.net/kcms/detail/31.1690.TN.20201015.0915.002.html.
［7］ JIN Q G， MENG Z P， PHAM T D， et al. DUNet: A deformable network for retinal vessel segmentation［J］. Knowledge
-Based Systems， 2019，178:149-162.
［8］贾园园. 基于深度学习和形变模型的乳腺癌全扫描切片图像语义分割［D］. 武汉:华中科技大学， 2019.
［9］ OTSU N. A threshold selection method from gray-level histograms［J］. IEEE Transactions on Systems， Man， and Cybernetics， 1979，9（1）:62-66.
［10］ YANG L， WU X Y， ZHAO D W， et al. An improved Prewitt algorithm for edge detection based on noised image［C］// Proceedings of the 2011 4th International Congress on Image and Signal Processing. 2011，3:1197-1200.
［11］ COATES A， NG A Y. Learning feature representations with K-means［M］// Neural Networks: Tricks of the Trade. Springer， 2012:561-580.
［12］ TANG M， GORELICK L， VEKSLER O， et al. GrabCut in one cut［C］// Proceedings of the 2013 IEEE International Conference on Computer Vision. 2013:1769-1776.
［13］ SHELHAMER E， LONG J， DARRELL T. Fully convolutional networks for semantic segmentation［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017，39（4）:640-651.
［14］ RONNEBERGER O， FISCHER P， BROX T. U-Net: Convolutional networks for biomedical image segmentation［C］// Proceedings of the 2015 International Conference on Medical Image Computing and Computer-Assisted Intervention. 2015:234-241.
［15］ BADRINARAYANAN V， KENDALL A， CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017，39（12）:2481-2495.
［16］ ZHAO H S， SHI J P， QI X J， et al. Pyramid scene parsing network［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2017:6230-6239.
［17］ LI H C， XIONG P F， AN J， et al. Pyramid attention network for semantic segmentation［J］. arXiv preprint arXiv:1805.10180， 2018.
［18］ YU F， KOLTUN V， FUNKHOUSER T. Dilated residual networks［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2017:636-644.
［19］ CHEN L C， PAPANDREOU G， SCHROFF F， et al. Rethinking atrous convolution for semantic image segmentation［J］. arXiv preprint arXiv:1706.05587， 2017.
［20］ CHEN L C， PAPANDREOU G， KOKKINOS I， et al. DeepLab: Semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018，40（4）:834-848.
［21］ CHEN L C， ZHU Y K， PAPANDREOU G， et al. Encoder-decoder with atrous separable convolution for semantic image segmentation［C］// Proceedings of the 2018 European Conference on Computer Vision （ECCV）. 2018:833-851.
［22］ SUN K， XIAO B， LIU D， et al. Deep high-resolution representation learning for human pose estimation［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2019:5686-5696.
［23］ WANG F， JIANG M Q， QIAN C， et al. Residual attention network for image classification［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2017:6450-6458.
［24］ FU J， LIU J， TIAN H J， et al. Dual attention network for scene segmentation［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2019:3141-3149.
［25］ CAO J X， CHEN Q， GUO J， et al. Attention-guided context feature pyramid network for object detection［J］. arXiv preprint arXiv:2005.11475， 2020.
［26］ CHOLLET F. Xception: Deep learning with depthwise separable convolutions［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2017:1800-1807.
［27］ EVERINGHAM M， ESLAMI S M A， VAN GOOL L， et al. The Pascal visual object classes challenge: A retrospective［J］. International Journal of Computer Vision， 2015，111（1）:98-136.

[1]	何思达, 陈平华. 基于意图的轻量级自注意力序列推荐模型[J]. 计算机与现代化, 2024, 0(12): 1-9.
[2]	赵晨阳, 薛涛, 刘俊华. 基于改进Stable Diffusion的时尚服饰图案生成[J]. 计算机与现代化, 2024, 0(12): 15-23.
[3]	黄庭培1, 马禄彪1, 李世宝2, 刘建航1. 基于WiFi和原型网络的手势识别方法[J]. 计算机与现代化, 2024, 0(12): 34-39.
[4]	张晓东1, 白广芝1, 李敏1, 李昊洋2. 基于经验小波变换的油气井产量预测模型 [J]. 计算机与现代化, 2024, 0(12): 53-58.
[5]	刘云海1, 冯广1, 吴晓婷2, 杨群2. 复杂施工场景下的安全帽佩戴检测算法[J]. 计算机与现代化, 2024, 0(12): 66-71.
[6]	谷岳, 邓松峰, 沈霁, 穆文涛, 赵恩棋. 基于改进YOLOv8的SAR舰船目标检测算法[J]. 计算机与现代化, 2024, 0(12): 78-83.
[7]	王艳媛, 茅正冲. 中英文场景文本图像的检测和识别算法[J]. 计算机与现代化, 2024, 0(12): 84-90.
[8]	李钧超1, 尤菲1, 张超2, 苏乐乐2, 龚龑2. 基于新型多目标浣熊优化算法的BiLSTM-Attention#br# 预测模型及误差分析[J]. 计算机与现代化, 2024, 0(11): 70-76.
[9]	张宇1, 2, 黎靖1, 2, 马铭1, 2, 王众祥1, 2, 孙妍1, 2. YOLOLW:一个新的轻量级目标检测模型[J]. 计算机与现代化, 2024, 0(11): 91-98.
[10]	祁贤, 刘大铭, 常佳鑫. 基于改进自注意力机制的多视图三维重建[J]. 计算机与现代化, 2024, 0(11): 106-112.
[11]	杨骏1, 胡为1, 朱文福2. 基于改进MobileNetV3的视觉SLAM回环检测算法[J]. 计算机与现代化, 2024, 0(10): 21-26.
[12]	魏学诚1, 江凌云1, 李研2, 何非2. 改进YOLOv5的路侧单目视角小目标检测算法[J]. 计算机与现代化, 2024, 0(10): 27-34.
[13]	杜猛俊1, 李昂1, 童俊1, 钱锦1, 康恺1, 王若丁1, 靳文星2. 基于改进极限学习算法的电力信息数据融合模型[J]. 计算机与现代化, 2024, 0(10): 61-64.
[14]	杨世军1, 狄广义1, 高军1, 陈见飞1, 王耀坤1, 季晓晗2. 跨模态注意力融合和信息感知的情感一致检测[J]. 计算机与现代化, 2024, 0(10): 113-119.
[15]	候聪颖, 杨文清, 王召, 程聪. 基于时频自注意力残差时序卷积网络的语音增强[J]. 计算机与现代化, 2024, 0(09): 20-24.