计算机与现代化 ›› 2023, Vol. 0 ›› Issue (10): 65-69.doi: 10.3969/j.issn.1006-2475.2023.10.010

• 图像处理 • 上一篇    下一篇

结合注意力机制的HRNet图像语义分割算法

  

  1. (重庆师范大学计算机与信息科学学院,重庆 401331)
  • 出版日期:2023-10-26 发布日期:2023-10-26
  • 作者简介:叶思佳(1998—),女,重庆忠县人,硕士研究生,研究方向:图像语义分割,E-mail: 895121532@qq.com; 通信作者:魏延(1970—),男,四川泸县人,教授,硕士生导师,研究方向:教育大数据,E-mail: weiyan@cqnu.edu.cn; 杜韩宇(1997—),男,安徽宿州人,硕士研究生,研究方向:图像暗光增强,E-mail: duhanyu5@163.com; 邓金枝(1995—),女,四川南充人,硕士研究生,研究方向:文本生成图像,E-mail: 1334548213@qq.com。
  • 基金资助:
    重庆市技术创新与应用发展重点项目(cstc2019jscx-mbdxX0061)

HRNet Image Semantic Segmentation Algorithm Combined with Attention Mechanism

  1. (College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China)
  • Online:2023-10-26 Published:2023-10-26

摘要: 目前主流的语义分割算法中依然存在小尺寸目标丢失、分割不精确等问题,针对这些问题本文基于HRNet网络模型进行改进,融入注意力机制生成更有效的特征图,对于原模型中低分辨率图像直接向高分辨率图像融合而产生的特征图细节不足的问题,提出多级上采样机制,让不同分辨率图像之间的融合方式更平滑从而得到更好的融合效果,同时使用深度可分离卷积减少模型的参数。本文模型全程保持了图像较高的分辨率,保留了特征图的空间信息,提升了对小尺寸目标的分割效果。在PASCAL VOC2012增强版数据集上的mIoU值达到80.87%,和原模型相比,精度提升了1.54个百分点。

关键词: 关键词:图像语义分割, 注意力机制, 高分辨率, 深度可分离卷积

Abstract: Abstract: The current mainstream semantic segmentation algorithms still have problems such as loss of small-sized objects and inaccurate segmentation. In response to these problems, this paper improves the HRNet network model and integrates the attention mechanism to generate more effective feature maps. To address the problem of insufficient detail of the feature map caused by the direct fusion of the low resolution images to the high-resolution images in the original model, a multi-level upsampling mechanism is proposed to make the fusion between images of different resolutions smoother to achieve better fusion results, and the depth separable convolution is used to reduce the parameters of the model. The model in this article maintains a high resolution of the image throughout the entire process. The spatial information of the feature map is improved, and the segmentation effect of small-sized objects is improved. The mIoU value on the PASCAL VOC2012 enhanced dataset reaches 80.87%, and the accuracy is improved by 1.54 percentage points compared with the original model.

Key words: Key words: image semantic segmentation, attention mechanism, high resolution, depthwise separable convolution

中图分类号: