计算机与现代化 ›› 2023, Vol. 0 ›› Issue (06): 76-81.doi: 10.3969/j.issn.1006-2475.2023.06.013

• 图像处理 • 上一篇    下一篇

聚合多维注意力特征的单目深度估计方法

刘甲甲1, 胡旭欣2, 余萍2   

  1. 1.深圳供电规划设计院有限公司,广东 深圳518000;
    2.华北电力大学(保定)电气与电子工程学院,河北 保定 071000
  • 收稿日期:2022-06-06 修回日期:2022-07-17 出版日期:2023-06-28 发布日期:2023-06-28
  • 通讯作者: 余萍(1963—),女,副教授,本科,研究方向:图像处理,无线通信,E-mail: well_yp@sina.com。
  • 作者简介:刘甲甲(1985—),男,安徽淮南人,工程师,本科,研究方向:计算机应用,E-mail: 181509351@qq.com; 胡旭欣(1994—),女,硕士,研究方向:深度学习,E-mail: 446844661@qq.com。
  • 基金资助:
    中国南方电网公司深圳供电局有限公司科技项目(090000KK52180035)

Monocular Depth Estimation Method by Aggregating Multi-dimensional Attention Features

LIU Jia-jia1, HU Xu-xin2, YU Ping2   

  1. 1. Shenzhen Power Supply Planning and Design Institute Co., Ltd, Shenzhen 518000, China;
    2. School of Electrical and Electronic Engineering, North China Electric Power University, Baoding 071000, China
  • Received:2022-06-06 Revised:2022-07-17 Online:2023-06-28 Published:2023-06-28

摘要: 为了提升单目深度估计网络的预测精度,本文深入研究多维注意力机制对单目深度估计网络的影响与作用,并设计一种优化后的通道和空间注意力模块。在基于局部平面指导层的卷积神经网络框架上,通过设计模块的放置方法,构建可以充分激发多维注意力机制有效性的网络结构。结合以上2点改进措施,得到一种高性能的单目深度估计网络——聚合通道和空间注意力特征的单目深度估计网络。在KITTI深度数据集和NYU Depth V2数据集中,通过实验分别验证优化模块的有效性和聚合网络的优秀性能。相比于基于局部平面指导层的卷积神经网络,聚合网络对于图像的总体特征具有更好的处理能力,预测的深度信息更加精准,网络的多个评价指标均有不同幅度的提升。同时,聚合网络生成的深度图也展现出了更多的物体轮廓和细节信息。

关键词: 单目深度估计, 卷积神经网络, 通道注意力, 空间注意力

Abstract: This study is outlined to improve the precision for predicting monocular depth estimation networks and provides an in-depth analysis of the effects of multidimensional attention mechanisms on monocular depth estimation networks. The conclusions and observations are used to design a set of optimized channel and space attention blocks. Considering the convolutional neural network framework obtained based on the local plane guidance layer, a new network structure is created to fully activate the multidimensional attention mechanism through a method that is based on placing different design blocks. Furthermore, in combination with the above two measures for improvement, this study proposes a high-performance monocular depth estimation network that integrates channel and space attention features. On the KITTI Depth dataset and an NYU Depth V2 dataset, the outcomes of this study prove the effectiveness of the optimized blocks and the satisfactory performance of the proposed network through experiments. Compared with the convolutional neural network based on the local plane guidance layer, the proposed network is better in processing the overall features of images and more accurate in predicting depth information with several metrics for network evaluation improved to different degrees. The depth maps generated by the proposed network also demonstrated more data associated with the contours and details of objects.

Key words: monocular depth estimation, convolutional neural network, channel attention, spatial attention

中图分类号: