Computer and Modernization ›› 2024, Vol. 0 ›› Issue (11): 106-112.doi: 10.3969/j.issn.1006-2475.2024.11.016

Previous Articles     Next Articles

Multi-view 3D Reconstruction Based on Improved Self-attention Mechanism

  

  1. (College of Electronics and Electrical Engineering, Ningxia University, Yinchuan 750021, China)
  • Online:2024-11-29 Published:2024-12-10

Abstract: To address the current problems that multi-view 3D reconstruction cannot adapt to high-resolution scenes, poor completeness, and ignoring global background information, this paper proposes a 3D reconstruction network MVFSAM-CasMVSNet that fuses deformable convolution with improved self-attention mechanism. Firstly, a deformable convolution module dedicated to the task of multi-view stereo reconstruction is designed to adaptively adjust the range of extracted features and enhance the feature extraction capability for depth mutation. Secondly, considering the correlation of depth information and feature interactions among multiple views, a multi-view fusion self-attention module is designed to aggregate remote context information within each view by linear self-attention with low computational complexity, and capture the depth dependencies between the reference view and the source view by improved multi-head attention. Finally, the cost volume is constructed and regularized from coarse to fine using a multi-stage strategy, and depth map is generated using the cost volume with higher resolution. The test results on DTU dataset show that MVFSAM-CasMVSNet has respectively improved completeness, accuracy, and overall by 15.6%, 7.4%, and 11.8%, compared with baseline model, and has optimal overall compared with other existing models. Meanwhile, experimental results on the Tanks and Temples dataset show that the network has an average F-score improvement of 6.5% compared to the benchmark model. The method in this paper has excellent reconstruction effect and generalization ability for high-resolution scenes in the field of multi-view 3D reconstruction

Key words:  , 3D reconstruction, deep learning, multi-view stereo, self-attention mechanism

CLC Number: