• Stars
    star
    563
  • Rank 79,150 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[NeurIPS'22] MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction

MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction

Zehao Yu · Songyou Peng · Michael Niemeyer · Torsten Sattler · Andreas Geiger

NeurIPS 2022

Paper | Project Page | SDFStudio

Logo

We demonstrate that state-of-the-art depth and normal cues extracted from monocular images are complementary to reconstruction cues and hence significantly improve the performance of implicit surface reconstruction methods.


Update

MonoSDF is integrated to SDFStudio, where monocular depth and normal cues can be applied to UniSurf and NeuS. Please check it out.

Setup

Installation

Clone the repository and create an anaconda environment called monosdf using

git clone [email protected]:autonomousvision/monosdf.git
cd monosdf

conda create -y -n monosdf python=3.8
conda activate monosdf

conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
conda install cudatoolkit-dev=11.3 -c conda-forge

pip install -r requirements.txt

The hash encoder will be compiled on the fly when running the code.

Dataset

For downloading the preprocessed data, run the following script. The data for the DTU, Replica, Tanks and Temples is adapted from VolSDF, Nice-SLAM, and Vis-MVSNet, respectively.

bash scripts/download_dataset.sh

Training

Run the following command to train monosdf:

cd ./code
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf CONFIG  --scan_id SCAN_ID

where CONFIG is the config file in code/confs, and SCAN_ID is the id of the scene to reconstruct.

We provide example commands for training DTU, ScanNet, and Replica dataset as follows:

# DTU scan65
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/dtu_mlp_3views.conf  --scan_id 65

# ScanNet scan 1 (scene_0050_00)
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/scannet_mlp.conf  --scan_id 1

# Replica scan 1 (room0)
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/replica_mlp.conf  --scan_id 1

We created individual config file on Tanks and Temples dataset so you don't need to set the scan_id. Run training on the courtroom scene as:

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/tnt_mlp_1.conf

We also generated high resolution monocular cues on the courtroom scene and it's better to train with more gpus. First download the dataset

bash scripts/download_highres_TNT.sh

Then run training with 8 gpus:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8 python -m torch.distributed.launch --nproc_per_node 8 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/tnt_highres_grids_courtroom.conf

Of course, you can also train on all other scenes with multi-gpus.

Evaluations

DTU

First, download the ground truth DTU point clouds:

bash scripts/download_dtu_ground_truth.sh

then you can evaluate the quality of extracted meshes (take scan 65 for example):

python evaluate_single_scene.py --input_mesh scan65_mesh.ply --scan_id 65 --output_dir dtu_scan65

We also provide script for evaluating all DTU scenes:

python evaluate.py

Evaluation results will be saved to evaluation/DTU.csv by default, please check the script for more details.

Replica

Evaluate on one scene (take scan 1 room0 for example)

cd replica_eval
python evaluate_single_scene.py --input_mesh replica_scan1_mesh.ply --scan_id 1 --output_dir replica_scan1

We also provided script for evaluating all Replica scenes:

cd replica_eval
python evaluate.py

please check the script for more details.

ScanNet

cd scannet_eval
python evaluate.py

please check the script for more details.

Tanks and Temples

You need to submit the reconstruction results to the official evaluation server, please follow their guidance. We also provide an example of our submission here for reference.

Custom dataset

We provide an example of how to train monosdf on custom data (Apartment scene from nice-slam). First, download the dataset and run the script to subsample training images, normalize camera poses, and etc.

bash scripts/download_apartment.sh 
cd preprocess
python nice_slam_apartment_to_monosdf.py

Then, we can extract monocular depths and normals (please install omnidata model before running the command):

python extract_monocular_cues.py --task depth --img_path ../data/Apartment/scan1/image --output_path ../data/Apartment/scan1 --omnidata_path YOUR_OMNIDATA_PATH --pretrained_models PRETRAINED_MODELS
python extract_monocular_cues.py --task normal --img_path ../data/Apartment/scan1/image --output_path ../data/Apartment/scan1 --omnidata_path YOUR_OMNIDATA_PATH --pretrained_models PRETRAINED_MODELS

Finally, we train monosdf as

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node 1 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/nice_slam_grids.conf

Pretrained Models

First download the pretrained models with

bash scripts/download_pretrained.sh

Then you can run inference with (DTU for example)

cd code
python evaluation/eval.py --conf confs/dtu_mlp_3views.conf --checkpoint ../pretrained_models/dtu_3views_mlp/scan65.pth --scan_id 65 --resolution 512 --eval_rendering --evals_folder ../pretrained_results

You can also run the following script to extract all the meshes:

python scripts/extract_all_meshes_from_pretrained_models.py

High-resolution Cues

Here we privode script to generate high-resolution cues, and training with high-resolution cues. Please refer to our supplementary for more details.

First you need to download the Tanks and Temples dataset from here and unzip it to data/tanksandtemples. Then you can run the script to create overlapped patches

cd preprocess
python generate_high_res_map.py --mode create_patches

and run the Omnidata model to predict monocular cues for each patch

python extract_monocular_cues.py --task depth --img_path ./highres_tmp/scan1/image/ --output_path ./highres_tmp/scan1 --omnidata_path YOUR_OMNIDATA_PATH --pretrained_models PRETRAINED_MODELS
python extract_monocular_cues.py --task depth --img_path ./highres_tmp/scan1/image/ --output_path ./highres_tmp/scan1 --omnidata_path YOUR_OMNIDATA_PATH --pretrained_models PRETRAINED_MODELS

This step will take a long time (~2 hours) since there are many patches and the model only use a batch size of 1.

Then run the script again to merge the output of Omnidata.

python generate_high_res_map.py --mode merge_patches

Now you can train the model with

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8 python -m torch.distributed.launch --nproc_per_node 8 --nnodes=1 --node_rank=0 training/exp_runner.py --conf confs/tnt_highres_grids_courtroom.conf

Please note that the script for generating high-resolution cues only works for the Tanks and Temples dataset. You need to adapt it if you want to apply to other dataset.

Acknowledgements

This project is built upon VolSDF. We use pretrained Omnidata for monocular depth and normal extraction. Cuda implementation of Multi-Resolution hash encoding is based on torch-ngp. Evaluation scripts for DTU, Replica, and ScanNet are taken from DTUeval-python, Nice-SLAM and manhattan-sdf respectively. We thank all the authors for their great work and repos.

Citation

If you find our code or paper useful, please cite

@article{Yu2022MonoSDF,
  author    = {Yu, Zehao and Peng, Songyou and Niemeyer, Michael and Sattler, Torsten and Geiger, Andreas},
  title     = {MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction},
  journal   = {Advances in Neural Information Processing Systems (NeurIPS)},
  year      = {2022},
}

More Repositories

1

sdfstudio

A Unified Framework for Surface Reconstruction
Python
1,965
star
2

occupancy_networks

This repository contains the code for the paper "Occupancy Networks - Learning 3D Reconstruction in Function Space"
Python
1,492
star
3

giraffe

This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"
Python
1,227
star
4

stylegan-t

[ICML'23] StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
Python
1,122
star
5

mip-splatting

[CVPR'24 Best Student Paper] Mip-Splatting: Alias-free 3D Gaussian Splatting
Python
1,046
star
6

transfuser

[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving; [CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
Python
1,023
star
7

stylegan-xl

[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
Python
939
star
8

projected-gan

[NeurIPS'21] Projected GANs Converge Faster
Python
876
star
9

unimatch

[TPAMI'23] Unifying Flow, Stereo and Depth Estimation
Python
855
star
10

convolutional_occupancy_networks

[ECCV'20] Convolutional Occupancy Networks
Python
792
star
11

differentiable_volumetric_rendering

This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"
Python
782
star
12

gaussian-opacity-fields

[SIGGRAPH Asia'24 & TOG] Gaussian Opacity Fields: Efficient Adaptive Surface Reconstruction in Unbounded Scenes
Python
705
star
13

shape_as_points

[NeurIPS'21] Shape As Points: A Differentiable Poisson Solver
Python
518
star
14

tuplan_garage

[CoRL'23] Parting with Misconceptions about Learning-based Vehicle Motion Planning
Python
499
star
15

unisurf

[ICCV'21] UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction
Python
418
star
16

graf

Official code release for "GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis"
Jupyter Notebook
398
star
17

kitti360Scripts

This repository contains utility scripts for the KITTI-360 dataset.
Python
385
star
18

neat

[ICCV'21] NEAT: Neural Attention Fields for End-to-End Autonomous Driving
Python
301
star
19

navsim

[NeurIPS 2024] NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking
Python
244
star
20

occupancy_flow

This repository contains the code for the ICCV 2019 paper "Occupancy Flow - 4D Reconstruction by Learning Particle Dynamics"
Python
207
star
21

plant

[CoRL'22] PlanT: Explainable Planning Transformers via Object-Level Representations
Python
201
star
22

factor-fields

[SIGGRAPH 2023] We provide a unified formula for neural fields (Factor Fields) and a novel dictionary factorization (Dictionary Fields)
Jupyter Notebook
183
star
23

sledge

[ECCV'24] SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic
Python
151
star
24

voxgraf

Official code release for VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids
Python
128
star
25

carla_garage

[ICCV'23] Hidden Biases of End-to-End Driving Models
Python
121
star
26

gta

[ICLR'24] GTA: A Geometry-Aware Attention Mechanism for Multi-view Transformers
Python
121
star
27

texture_fields

This repository contains code for the paper 'Texture Fields: Learning Texture Representations in Function Space'.
Python
115
star
28

kitti360LabelTool

JavaScript
103
star
29

counterfactual_generative_networks

[ICLR'21] Counterfactual Generative Networks
Python
102
star
30

murf

[CVPR'24] MuRF: Multi-Baseline Radiance Fields
Python
84
star
31

king

[ECCV'22] KING: Generating Safety-Critical Driving Scenarios for Robust Imitation via Kinematics Gradients
Python
73
star
32

controllable_image_synthesis

Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis, CVPR 2020
Python
70
star
33

handheld_svbrdf_geometry

On Joint Estimation of Pose, Geometry and svBRDF from a Handheld Scanner, CVPR2020
Python
59
star
34

connecting_the_dots

This repository contains the code for the paper "Connecting the Dots: Learning Representations for Active Monocular Depth Estimation" https://avg.is.tuebingen.mpg.de/publications/riegler2019cvpr
Python
56
star
35

frequency_bias

Official code for "On the Frequency Bias of Generative Models", NeurIPS 2021
Python
45
star
36

good

[ICLR'23] GOOD: Exploring Geometric Cues for Detecting Objects in an Open World
Python
39
star
37

data_aggregation

This repository contains the code for the CVPR 2020 paper "Exploring Data Aggregation in Policy Learning for Vision-based Urban Autonomous Driving"
Python
38
star
38

campari

[3DV'21] CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields
Python
29
star
39

akorn

Reproducing code for the work: Artificial Kuramoto Oscillatory Neurons
22
star
40

autonomousvision.github.io

Blog of the Autonomous Vision Group at MPI-IS Tübingen and University of Tübingen.
HTML
19
star
41

hdt

[COLM'24] HDT: Hierarchical Document Transformer
Python
7
star
42

visual_abstractions

6
star
43

slides

Slide repository of the Autonomous Vision Group at MPI-IS Tübingen and University of Tübingen.
CSS
2
star
44

similarity_reconstruction

This code is based on the paper Exploiting Object Similarity in 3D Reconstruction.
C++
1
star
45

slow_flow

This code is based on the paper Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data.
C++
1
star