Computer and Modernization ›› 2024, Vol. 0 ›› Issue (09): 121-126.doi: 10.3969/j.issn.1006-2475.2024.09.020

Previous Articles    

Multi-scale Depth Fusion Monocular Depth Estimation Based on Transposed Attention

  

  1. (1. School of Physics and Optoelectronic Engineering, Guangdong University of Technology, Guangzhou 510006, China;
    2. Guangdong Provincial Key Laboratory of Information Photonics Technology, Guangzhou 510006, China)
  • Online:2024-09-27 Published:2024-09-29

Abstract: Monocular depth estimation is a fundamental task in computer vision, aiming to predict depth maps from single images and retrieve depth information for corresponding pixel positions. This paper proposes a novel network architecture for monocular depth estimation to further enhance the predictive accuracy of the network. Transposed attention introduces a self-attention mechanism, enabling it to focus on specific regions within the image while reducing the parameter and computation requirements. By incorporating information across different channels, it effectively captures fine-grained regions and edge details for learning. The paper presents an improved version of transposed attention that retains semantic information with fewer parameters. Multi-scale depth fusion leverages the characteristic of extracting features with different depths from distinct channels. It computes the average depth for each channel, enhancing the model’s depth perception capability. Furthermore, it models long-range dependencies for vertical distances, effectively separating edges between objects and mitigating the loss of fine-grained information. Finally, the proposed modules’ effectiveness is validated through experiments conducted on the NYU Depth V2 dataset and the KITTI dataset, demonstrating exceptional performance.

Key words: deep learning, monocular depth estimation, transposed attention, multi-scale deep fusion, channel average depth

CLC Number: