MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions
Project Page | Paper
If you find our work useful for your research, please consider citing our paper:
@article{DBLP:journals/corr/abs-2104-13325,
author = {Zhenpei Yang and
Zhile Ren and
Qi Shan and
Qixing Huang},
title = {{MVS2D:} Efficient Multi-view Stereo via Attention-Driven 2D Convolutions},
journal = {CoRR},
volume = {abs/2104.13325},
year = {2021},
url = {https://arxiv.org/abs/2104.13325},
eprinttype = {arXiv},
eprint = {2104.13325},
timestamp = {Tue, 04 May 2021 15:12:43 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2104-13325.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
βοΈ Changelog
Nov 27 2021
- Initial release. Note that our released code achieve improved results than those reported in the initial arxiv pre-print. In addition, we include the evaluation on DTU dataset. We will update our paper soon.
βοΈ Installation
Click to expand
The code is tested with CUDA10.1. Please use following commands to install dependencies:
conda create --name mvs2d python=3.7
conda activate mvs2d
pip install -r requirements.txt
The folder structure should looks like the following if you have downloaded all data and pretrained models. Download links are inside each dataset tab at the end of this README.
.
βββ configs
βββ datasets
βββ demo
βββ networks
βββ scripts
βββ pretrained_model
βΒ Β βββ demon
βΒ Β βββ dtu
βΒ Β βββ scannet
βββ data
βΒ Β βββ DeMoN
βΒ Β βββ DTU_hr
βΒ Β βββ SampleSet
βΒ Β βββ ScanNet
βΒ Β βββ ScanNet_3_frame_jitter_pose.npy
βββ splits
βΒ Β βββ DeMoN_samples_test_2_frame.npy
βΒ Β βββ DeMoN_samples_train_2_frame.npy
βΒ Β βββ ScanNet_3_frame_test.npy
βΒ Β βββ ScanNet_3_frame_train.npy
βΒ Β βββ ScanNet_3_frame_val.npy
π¬ Demo
Click to expand
After downloading the pretrained models for ScanNet, try to run following command to make a prediction on a sample data.
python demo.py --cfg configs/scannet/release.conf
The results are saved as demo.png
β³ Training & Testing
We use 4 Nvidia V100 GPU for training. You may need to modify 'CUDA_VISIBLE_DEVICES' and batch size to accomodate your GPU resources.
ScanNet
Click to expand
Download
data π split π pretrained models π noisy pose π
Training
First download and extract ScanNet training data and split. Then run following command to train our model.
bash scripts/scannet/train.sh
To train the multi-scale attention model, add --robust 1
to the training command in scripts/scannet/train.sh
.
To train our model with noisy input pose, add --perturb_pose 1
to the training command in scripts/scannet/train.sh
.
Testing
First download and extract data, split and pretrained models.
Then run:
bash scripts/scannet/test.sh
You should get something like these:
abs_rel | sq_rel | log10 | rmse | rmse_log | a1 | a2 | a3 | abs_diff | abs_diff_median | thre1 | thre3 | thre5 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.059 | 0.016 | 0.026 | 0.157 | 0.084 | 0.964 | 0.995 | 0.999 | 0.108 | 0.079 | 0.856 | 0.974 | 0.996 |
SUN3D/RGBD/Scenes11
Click to expand
Download
data π split π pretrained models π
Training
First download and extract DeMoN training data and split. Then run following command to train our model.
bash scripts/demon/train.sh
Testing
First download and extract data, split and pretrained models.
Then run:
bash scripts/demon/test.sh
You should get something like these:
dataset rgbd: 160
abs_rel | sq_rel | log10 | rmse | rmse_log | a1 | a2 | a3 | abs_diff | abs_diff_median | thre1 | thre3 | thre5 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.082 | 0.165 | 0.047 | 0.440 | 0.147 | 0.921 | 0.939 | 0.948 | 0.325 | 0.284 | 0.753 | 0.894 | 0.933 |
dataset scenes11: 256
abs_rel | sq_rel | log10 | rmse | rmse_log | a1 | a2 | a3 | abs_diff | abs_diff_median | thre1 | thre3 | thre5 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.046 | 0.080 | 0.018 | 0.439 | 0.107 | 0.976 | 0.989 | 0.993 | 0.155 | 0.058 | 0.822 | 0.945 | 0.979 |
dataset sun3d: 160
abs_rel | sq_rel | log10 | rmse | rmse_log | a1 | a2 | a3 | abs_diff | abs_diff_median | thre1 | thre3 | thre5 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.099 | 0.055 | 0.044 | 0.304 | 0.137 | 0.893 | 0.970 | 0.993 | 0.224 | 0.171 | 0.649 | 0.890 | 0.969 |
-> Done!
depth
abs_rel | sq_rel | log10 | rmse | rmse_log | a1 | a2 | a3 | abs_diff | abs_diff_median | thre1 | thre3 | thre5 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.071 | 0.096 | 0.033 | 0.402 | 0.127 | 0.938 | 0.970 | 0.981 | 0.222 | 0.152 | 0.755 | 0.915 | 0.963 |
DTU
Click to expand
Download
data π eval data π eval toolkit π pretrained models π
Training
First download and extract DTU training data. Then run following command to train our model.
bash scripts/dtu/test.sh
Testing
First download and extract DTU eval data and pretrained models.
The following command performs three steps together: 1. Generate depth prediction on DTU test set. 2. Fuse depth predictions into final point cloud. 3. Evaluate predicted point cloud. Note that we re-implement the original Matlab Evaluation of DTU dataset using python.
bash scripts/dtu/test.sh
You should get something like these:
Acc 0.4051747996189477
Comp 0.2776021161518006
F-score 0.34138845788537414
Acknowledgement
The fusion code for DTU dataset is heavily built upon from PatchMatchNet