Computer and Modernization ›› 2023, Vol. 0 ›› Issue (02): 34-39.

Previous Articles     Next Articles

Monocular Depth and Pose Estimation Based on Conditionally Convolution and Polarized Self-attention

  

  1. (School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201600, China)
  • Online:2023-04-10 Published:2023-04-10

Abstract: This paper proposed a novel monocular depth and pose estimation framework based on view synthesis and the self-supervised structure from motion paradigm by introducing conditionally convolution and polarized self-attention. Conditional convolution assigns multiple groups of dynamic weights to different input data, and all weights share one convolution operation after integration, which improves the model capacity without significantly increasing the computational cost. The image information integrity has significant impacts on the performance of depth estimation tasks. Polarized self-attention keeps the high resolution of data in channel or spatial dimensions through polarization filtering, which could prevent the loss of fine-grained and structural information. The dimension orthogonal to the channel or space is compressed to reduce the computation, and the feature intensity range lost in the compression process is enhanced and dynamically mapped through nonlinear functions. The self-attention mechanism can realize long-distance modeling of data in various dimensions. Experiments on the KITTI dataset demonstrate that the proposed model has excellent performance in self-supervised monocular depth and pose estimation tasks.

Key words: monocular depth estimation, pose estimation; self-supervised learning; conditionally convolution; polarized filter; self-attention