计算机与现代化 ›› 2024, Vol. 0 ›› Issue (11): 106-112.doi: 10.3969/j.issn.1006-2475.2024.11.016

• 图像处理 • 上一篇    下一篇

基于改进自注意力机制的多视图三维重建



  

  1. (宁夏大学电子与电气工程学院,宁夏 银川 750021)
  • 出版日期:2024-11-29 发布日期:2024-12-10
  • 基金资助:
    宁夏回族自治区自然科学基金资助项目(2021AAC03113)

Multi-view 3D Reconstruction Based on Improved Self-attention Mechanism

  1. (College of Electronics and Electrical Engineering, Ningxia University, Yinchuan 750021, China)
  • Online:2024-11-29 Published:2024-12-10

摘要: 针对目前多视图三维重建无法适应高分辨率场景、重建完整性差、忽略全局背景信息等问题,提出一种融合可变形卷积与改进自注意力机制的三维重建网络MVFSAM-CasMVSNet。首先,设计专用于多视图立体重建任务的可变形卷积模块,自适应地调整提取特征的范围,增强深度突变的特征提取能力。其次,考虑到多视图间深度信息的关联性和特征交互,设计一种多视图融合自注意力模块,通过计算复杂度较低的线性自注意力聚合每个视图内部的远程上下文信息,并通过改进的多头注意力捕获参考视图与源视图间的深度依赖关系。最后利用多阶段策略构建匹配代价体并对其进行正则化,使用具有更高分辨率的代价体生成深度图。在DTU数据集上的测试结果显示,该网络与基准模型相比,完整性、准确性、整体性分别提升15.6%、7.4%、11.8%,与现有其他模型相比具有最优的整体性。同时,在Tanks and Temples数据集上的实验结果显示,该网络与基准模型相比平均F-score提升6.5%。该网络在多视图三维重建领域针对高分辨率场景具有优良的重建效果与泛化能力。

关键词: 三维重建, 深度学习, 多视图立体, 自注意力机制

Abstract: To address the current problems that multi-view 3D reconstruction cannot adapt to high-resolution scenes, poor completeness, and ignoring global background information, this paper proposes a 3D reconstruction network MVFSAM-CasMVSNet that fuses deformable convolution with improved self-attention mechanism. Firstly, a deformable convolution module dedicated to the task of multi-view stereo reconstruction is designed to adaptively adjust the range of extracted features and enhance the feature extraction capability for depth mutation. Secondly, considering the correlation of depth information and feature interactions among multiple views, a multi-view fusion self-attention module is designed to aggregate remote context information within each view by linear self-attention with low computational complexity, and capture the depth dependencies between the reference view and the source view by improved multi-head attention. Finally, the cost volume is constructed and regularized from coarse to fine using a multi-stage strategy, and depth map is generated using the cost volume with higher resolution. The test results on DTU dataset show that MVFSAM-CasMVSNet has respectively improved completeness, accuracy, and overall by 15.6%, 7.4%, and 11.8%, compared with baseline model, and has optimal overall compared with other existing models. Meanwhile, experimental results on the Tanks and Temples dataset show that the network has an average F-score improvement of 6.5% compared to the benchmark model. The method in this paper has excellent reconstruction effect and generalization ability for high-resolution scenes in the field of multi-view 3D reconstruction

Key words:  , 3D reconstruction, deep learning, multi-view stereo, self-attention mechanism

中图分类号: