Multi-view 3D Reconstruction Based on Improved Self-attention Mechanism

doi:10.3969/j.issn.1006-2475.2024.11.016

Abstract

Abstract: To address the current problems that multi-view 3D reconstruction cannot adapt to high-resolution scenes， poor completeness， and ignoring global background information， this paper proposes a 3D reconstruction network MVFSAM-CasMVSNet that fuses deformable convolution with improved self-attention mechanism. Firstly， a deformable convolution module dedicated to the task of multi-view stereo reconstruction is designed to adaptively adjust the range of extracted features and enhance the feature extraction capability for depth mutation. Secondly， considering the correlation of depth information and feature interactions among multiple views， a multi-view fusion self-attention module is designed to aggregate remote context information within each view by linear self-attention with low computational complexity， and capture the depth dependencies between the reference view and the source view by improved multi-head attention. Finally， the cost volume is constructed and regularized from coarse to fine using a multi-stage strategy， and depth map is generated using the cost volume with higher resolution. The test results on DTU dataset show that MVFSAM-CasMVSNet has respectively improved completeness， accuracy， and overall by 15.6%， 7.4%， and 11.8%， compared with baseline model， and has optimal overall compared with other existing models. Meanwhile， experimental results on the Tanks and Temples dataset show that the network has an average F-score improvement of 6.5% compared to the benchmark model. The method in this paper has excellent reconstruction effect and generalization ability for high-resolution scenes in the field of multi-view 3D reconstruction

Key words: , 3D reconstruction, deep learning, multi-view stereo, self-attention mechanism

CLC Number:

TP391.4

QI Xian, LIU Daming, CHANG Jiaxin. Multi-view 3D Reconstruction Based on Improved Self-attention Mechanism[J]. Computer and Modernization, 2024, 0(11): 106-112.

References

［1］ FURUKAWA Y， HERNÁNDEZ C. Multi-view stereo: A tutorial［J］. Foundations and Trends in Computer Graphics and Vision， 2015，9（1-2）:1-148.
［2］李明阳，陈伟，王珊珊，等. 视觉深度学习的三维重建方法综述［J］. 计算机科学与探索， 2023，17（2）:279-302.
［3］ COLLINS R T. A space-sweep approach to true multi-image matching［C］// Proceedings 1996 IEEE Conference on Computer Vision and Pattern Recognition（CVPR）. IEEE， 1996:358-363.
［4］ ZHANG F H， PRISACARIU V， YANG R G， et al. Ga-Net: Guided aggregation net for end-to-end stereo matching［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2019: 185-194.
［5］周晴，谭功全，尹宋麟，等. 改进YOLOv5s的道路目标检测算法［J］. 液晶与显示， 2023，38（5）:680-690.
［6］崔志强，单慧琳，张银胜，等. 基于改进型U-Net的遥感云图分割方法［J］. 电子测量技术， 2022，45（12）: 127-132.
［7］ ZAGORUYKO S， KOMODAKIS N. Learning to compare image patches via convolutional neural networks［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2015:4353-4361.
［8］ YAO Y， LUO Z X， LI S W， et al. MVSNet: Depth inference for unstructured multi-view stereo［C］// Proceedings of the European Conference on Computer Vision （ECCV）2018. Springer， 2018: 785-801.
［9］ CHEN R， HAN S F， XU J， et al. Point-based multi-view stereo network［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. IEEE， 2019:1538-1547.
［10］ GU X D， FAN Z W， ZHU S Y， et al. Cascade cost volume for high-resolution multi-view stereo and stereo matching［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2020: 2492-2501.
［11］ GUO M H， XU T X， LIU J J， et al. Attention mechanisms in computer vision: A survey［J］. Computational Visual Media， 2022，8（3）:331-368.
［12］刘会杰，柏正尧，程威，等. 融合注意力机制和多层U-Net的多视图立体重建［J］. 中国图象图形学报， 2022， 27（2）:475-485.
［13］谢琪琦，辛月兰，曾曦. 基于注意力机制的多视图三维重建［J］. 激光杂志， 2023， 44（1）:136-142.
［14］朱光照，韦博，杨阿峰，等. 基于自注意力机制的多视图三维重建方法［J/OL］. 激光与光电子学进展: 1-17［2023-07-11］. http://kns.cnki.net/kcms/detail/31.1690.TN.20230104.1235.014.html.
［15］ LIN T Y， DOLLAR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2017:936-944.
［16］ DAI J F， QI H Z， XIONG Y W， et al. Deformable convolutional networks［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. IEEE， 2017:764-773.
［17］ HUA W Z， DAI Z H， LIU H X， et al. Transformer quality in linear time［C］// Proceeding of the 39th International Conference on Machine Learning（ICML 2022）. PMLR，2022，162:9099-9117.
［18］ CHEN J R， HE T L， ZHUO W P， et al. TVConv: Efficient translation variant convolution for layout-aware visual processing［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2022:12538-12548.
［19］ RONNEBERGER O， FISCHER P， BROX T. U-Net: Convolutional networks for biomedical image segmentation［C］// Proceedings of the 18he International Conference on Medical Image Computing and Computer-Assisted Intervention（MICCAI 2015）. Springer， 2015，9351:234-241.
［20］ AANAES H， JENSEN R R， VOGIATZIS G， et al. Large-scale data for multiple-view stereopsis［J］. International Journal of Computer Vision， 2016，120（2）：153-168.
［21］ KNAPITSCH A， PARK J， ZHOU Q Y， et al. Tanks and temples: Benchmarking large-scale scene reconstruction［J］. ACM Transactions on Graphics， 2017，36（4）. DOI： 10.1145/3072959.3073599.
［22］ TOLA E， STRECHA C， FUA P. Efficient large-scale multi-view stereo for ultra high-resolution image sets［J］. Machine Vision and Applications， 2012， 23（5）: 903-920.
［23］ CAMPBELL N D F， VOGIATZIS G， HERNÁNDEZ C， et al. Using multiple hypotheses to improve depth-maps for multi-view stereo［C］// 10th European Conference on Computer Vision（ECCV 2008）. Springer Berlin Heidelberg， 2008:766-779.
［24］ SCHÖNBERGER J L， ZHENG E L， FRAHM J M， et al. Pixelwise view selection for unstructured multi-view stereo［C］// 14th European Conference on Computer Vision（ECCV 2016）. Springer International Publishing， 2016:501-518.
［25］ GALLIANI S， LASINGER K， SCHINDLER K. Massively parallel multiview stereopsis by surface normal diffusion［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. IEEE，2015:873-881.
［26］ YAO Y， LUO Z X， LI S W， et al. Recurrent MVSNet for high-resolution multi-view stereo depth inference［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2019: 5520-5529.
［27］ LUO K Y， GUAN T， JU L L， et al. P-MVSNet: Learning patch-wise matching confidence aggregation for multi-view stereo［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. IEEE，2019: 10451-10460.
［28］ WANG F J H， GALLIANI S， VOGEL C， et al. PatchmatchNet: Learned multi-view patchmatch stereo［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE，2021:14189-14198.

[1]	ZHENG Jiuchao, ZHAO Xinyuan. Entity Linking Method Based on Topics and Description Information [J]. Computer and Modernization, 2024, 0(12): 10-14.
[2]	ZHAO Chenyang, XUE Tao, LIU Junhua. Fashion Clothing Pattern Generation Based on Improved Stable Diffusion [J]. Computer and Modernization, 2024, 0(12): 15-23.
[3]	WANG Xiaohang1, LI Yongjie1, YU Lei1, FAN Xiao2. A Method of Using Compound Event Probability Operation to Solve Problem of Negative Information Blocking Maximization [J]. Computer and Modernization, 2024, 0(12): 24-33.
[4]	ZHANG Xiaodong1, BAI Guangzhi1, LI Min1, LI Haoyang2. Oil and Gas Well Production Prediction Model Based on Empirical Wavelet Transform [J]. Computer and Modernization, 2024, 0(12): 53-58.
[5]	LIU Yunhai1, Feng Guang1, WU Xiaoting2, YANG Qun2 . Safety Helmet Wearing Detection Algorithm for Complex Construction Scenes [J]. Computer and Modernization, 2024, 0(12): 66-71.
[6]	LIU Baobao, YANG Jingjing, TAO Lu, WANG Heying . DSMSC Based on Attention Mechanism for Remote Sensing Image Scene Classification [J]. Computer and Modernization, 2024, 0(12): 72-77.
[7]	GU Yue, DENG Songfeng, SHEN Ji, MU Wentao, ZHAO Enqi. SAR Ship Detection Algorithm Based on Improved YOLOv8 [J]. Computer and Modernization, 2024, 0(12): 78-83.
[8]	WU Xiuling1, ZHOU Sheng1, WANG Chunjuan1, YU Cuizhuo2, LIU Hao3. Research Progress in Ultra Short-term Power Load Forecasting Technology [J]. Computer and Modernization, 2024, 0(12): 108-115.
[9]	LI Deyou1, 2, YU Jinsongdi1, 2, WEI Dandan1, 2, LUO Yuan1, 2, TONG Ruiju3. Abstract Tree Model for Gridded Cube Metadata [J]. Computer and Modernization, 2024, 0(11): 1-6.
[10]	GONG Yicheng1, 2, LIU Qing1, 2. Beijing Opera Binary Classification Based on RF-LCE-BiLSTM-Attention-AMSSA Model [J]. Computer and Modernization, 2024, 0(11): 7-12.
[11]	LI Taoying, LI Meng, WU Mengqiao. Taxi Passenger Flow Prediction Based on Heterogeneous Spatiotemporal Graph#br# Convolutional Networks [J]. Computer and Modernization, 2024, 0(11): 13-18.
[12]	ZHANG Tai1, YAN Zihao2, DUAN Jie2, ZHANG Zhihong2. Information Forwarding Strategy of Internet of Vehicles in Named Data Network [J]. Computer and Modernization, 2024, 0(11): 19-27.
[13]	YUAN Qingle, MU Li. Inventory Forecasting Method Based on Improved Elman Neural Network [J]. Computer and Modernization, 2024, 0(11): 28-33.
[14]	ZHANG Kun1, ZHANG Yongwei1, WU Yongcheng1, ZHANG Xiaowen2, ZHAI Shichen2. An LLM-based Method for Automatic Construction of Equipment Failure Knowledge Graphs [J]. Computer and Modernization, 2024, 0(11): 46-53.
[15]	YE Xue, YANG Sheng, CHENG Kai, ZHU Feng. A Financial Knowledge Q&A Model for Power Enterprise Based on ChatGLM2-6B [J]. Computer and Modernization, 2024, 0(11): 54-63.