• Stars
    star
    4,086
  • Rank 10,624 (Top 0.3 %)
  • Language
    Jupyter Notebook
  • License
    Other
  • Created over 5 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ICCV 2019] Monocular depth estimation from a single image

Monodepth2

This is the reference PyTorch implementation for training and testing depth estimation models using the method described in

Digging into Self-Supervised Monocular Depth Prediction

Clรฉment Godard, Oisin Mac Aodha, Michael Firman and Gabriel J. Brostow

ICCV 2019 (arXiv pdf)

example input output gif

This code is for non-commercial use; please see the license file for terms.

If you find our work useful in your research please consider citing our paper:

@article{monodepth2,
  title     = {Digging into Self-Supervised Monocular Depth Prediction},
  author    = {Cl{\'{e}}ment Godard and
               Oisin {Mac Aodha} and
               Michael Firman and
               Gabriel J. Brostow},
  booktitle = {The International Conference on Computer Vision (ICCV)},
  month = {October},
year = {2019}
}

โš™๏ธ Setup

Assuming a fresh Anaconda distribution, you can install the dependencies with:

conda install pytorch=0.4.1 torchvision=0.2.1 -c pytorch
pip install tensorboardX==1.4
conda install opencv=3.3.1   # just needed for evaluation

We ran our experiments with PyTorch 0.4.1, CUDA 9.1, Python 3.6.6 and Ubuntu 18.04. We have also successfully trained models with PyTorch 1.0, and our code is compatible with Python 2.7. You may have issues installing OpenCV version 3.3.1 if you use Python 3.7, we recommend to create a virtual environment with Python 3.6.6 conda create -n monodepth2 python=3.6.6 anaconda .

๐Ÿ–ผ๏ธ Prediction for a single image

You can predict scaled disparity for a single image with:

python test_simple.py --image_path assets/test_image.jpg --model_name mono+stereo_640x192

or, if you are using a stereo-trained model, you can estimate metric depth with

python test_simple.py --image_path assets/test_image.jpg --model_name mono+stereo_640x192 --pred_metric_depth

On its first run either of these commands will download the mono+stereo_640x192 pretrained model (99MB) into the models/ folder. We provide the following options for --model_name:

--model_name Training modality Imagenet pretrained? Model resolution KITTI abs. rel. error delta < 1.25
mono_640x192 Mono Yes 640 x 192 0.115 0.877
stereo_640x192 Stereo Yes 640 x 192 0.109 0.864
mono+stereo_640x192 Mono + Stereo Yes 640 x 192 0.106 0.874
mono_1024x320 Mono Yes 1024 x 320 0.115 0.879
stereo_1024x320 Stereo Yes 1024 x 320 0.107 0.874
mono+stereo_1024x320 Mono + Stereo Yes 1024 x 320 0.106 0.876
mono_no_pt_640x192 Mono No 640 x 192 0.132 0.845
stereo_no_pt_640x192 Stereo No 640 x 192 0.130 0.831
mono+stereo_no_pt_640x192 Mono + Stereo No 640 x 192 0.127 0.836

You can also download models trained on the odometry split with monocular and mono+stereo training modalities.

Finally, we provide resnet 50 depth estimation models trained with ImageNet pretrained weights and trained from scratch. Make sure to set --num_layers 50 if using these.

๐Ÿ’พ KITTI training data

You can download the entire raw KITTI dataset by running:

wget -i splits/kitti_archives_to_download.txt -P kitti_data/

Then unzip with

cd kitti_data
unzip "*.zip"
cd ..

Warning: it weighs about 175GB, so make sure you have enough space to unzip too!

Our default settings expect that you have converted the png images to jpeg with this command, which also deletes the raw KITTI .png files:

find kitti_data/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2,1x1,1x1 {.}.png {.}.jpg && rm {}'

or you can skip this conversion step and train from raw png files by adding the flag --png when training, at the expense of slower load times.

The above conversion command creates images which match our experiments, where KITTI .png images were converted to .jpg on Ubuntu 16.04 with default chroma subsampling 2x2,1x1,1x1. We found that Ubuntu 18.04 defaults to 2x2,2x2,2x2, which gives different results, hence the explicit parameter in the conversion command.

You can also place the KITTI dataset wherever you like and point towards it with the --data_path flag during training and evaluation.

Splits

The train/test/validation splits are defined in the splits/ folder. By default, the code will train a depth model using Zhou's subset of the standard Eigen split of KITTI, which is designed for monocular training. You can also train a model using the new benchmark split or the odometry split by setting the --split flag.

Custom dataset

You can train on a custom monocular or stereo dataset by writing a new dataloader class which inherits from MonoDataset โ€“ see the KITTIDataset class in datasets/kitti_dataset.py for an example.

โณ Training

By default models and tensorboard event files are saved to ~/tmp/<model_name>. This can be changed with the --log_dir flag.

Monocular training:

python train.py --model_name mono_model

Stereo training:

Our code defaults to using Zhou's subsampled Eigen training data. For stereo-only training we have to specify that we want to use the full Eigen training set โ€“ see paper for details.

python train.py --model_name stereo_model \
  --frame_ids 0 --use_stereo --split eigen_full

Monocular + stereo training:

python train.py --model_name mono+stereo_model \
  --frame_ids 0 -1 1 --use_stereo

GPUs

The code can only be run on a single GPU. You can specify which GPU to use with the CUDA_VISIBLE_DEVICES environment variable:

CUDA_VISIBLE_DEVICES=2 python train.py --model_name mono_model

All our experiments were performed on a single NVIDIA Titan Xp.

Training modality Approximate GPU memory Approximate training time
Mono 9GB 12 hours
Stereo 6GB 8 hours
Mono + Stereo 11GB 15 hours

๐Ÿ’ฝ Finetuning a pretrained model

Add the following to the training command to load an existing model for finetuning:

python train.py --model_name finetuned_mono --load_weights_folder ~/tmp/mono_model/models/weights_19

๐Ÿ”ง Other training options

Run python train.py -h (or look at options.py) to see the range of other training options, such as learning rates and ablation settings.

๐Ÿ“Š KITTI evaluation

To prepare the ground truth depth maps run:

python export_gt_depth.py --data_path kitti_data --split eigen
python export_gt_depth.py --data_path kitti_data --split eigen_benchmark

...assuming that you have placed the KITTI dataset in the default location of ./kitti_data/.

The following example command evaluates the epoch 19 weights of a model named mono_model:

python evaluate_depth.py --load_weights_folder ~/tmp/mono_model/models/weights_19/ --eval_mono

For stereo models, you must use the --eval_stereo flag (see note below):

python evaluate_depth.py --load_weights_folder ~/tmp/stereo_model/models/weights_19/ --eval_stereo

If you train your own model with our code you are likely to see slight differences to the publication results due to randomization in the weights initialization and data loading.

An additional parameter --eval_split can be set. The three different values possible for eval_split are explained here:

--eval_split Test set size For models trained with... Description
eigen 697 --split eigen_zhou (default) or --split eigen_full The standard Eigen test files
eigen_benchmark 652 --split eigen_zhou (default) or --split eigen_full Evaluate with the improved ground truth from the new KITTI depth benchmark
benchmark 500 --split benchmark The new KITTI depth benchmark test files.

Because no ground truth is available for the new KITTI depth benchmark, no scores will be reported when --eval_split benchmark is set. Instead, a set of .png images will be saved to disk ready for upload to the evaluation server.

External disparities evaluation

Finally you can also use evaluate_depth.py to evaluate raw disparities (or inverse depth) from other methods by using the --ext_disp_to_eval flag:

python evaluate_depth.py --ext_disp_to_eval ~/other_method_disp.npy

๐Ÿ“ท๐Ÿ“ท Note on stereo evaluation

Our stereo models are trained with an effective baseline of 0.1 units, while the actual KITTI stereo rig has a baseline of 0.54m. This means a scaling of 5.4 must be applied for evaluation. In addition, for models trained with stereo supervision we disable median scaling. Setting the --eval_stereo flag when evaluating will automatically disable median scaling and scale predicted depths by 5.4.

โคด๏ธโคต๏ธ Odometry evaluation

We include code for evaluating poses predicted by models trained with --split odom --dataset kitti_odom --data_path /path/to/kitti/odometry/dataset.

For this evaluation, the KITTI odometry dataset (color, 65GB) and ground truth poses zip files must be downloaded. As above, we assume that the pngs have been converted to jpgs.

If this data has been unzipped to folder kitti_odom, a model can be evaluated with:

python evaluate_pose.py --eval_split odom_9 --load_weights_folder ./odom_split.M/models/weights_29 --data_path kitti_odom/
python evaluate_pose.py --eval_split odom_10 --load_weights_folder ./odom_split.M/models/weights_29 --data_path kitti_odom/

๐Ÿ“ฆ Precomputed results

You can download our precomputed disparity predictions from the following links:

Training modality Input size .npy filesize Eigen disparities
Mono 640 x 192 343 MB Download ๐Ÿ”—
Stereo 640 x 192 343 MB Download ๐Ÿ”—
Mono + Stereo 640 x 192 343 MB Download ๐Ÿ”—
Mono 1024 x 320 914 MB Download ๐Ÿ”—
Stereo 1024 x 320 914 MB Download ๐Ÿ”—
Mono + Stereo 1024 x 320 914 MB Download ๐Ÿ”—

๐Ÿ‘ฉโ€โš–๏ธ License

Copyright ยฉ Niantic, Inc. 2019. Patent Pending. All rights reserved. Please see the license file for terms.

More Repositories

1

simplerecon

[ECCV 2022] SimpleRecon: 3D Reconstruction Without 3D Convolutions
Python
1,304
star
2

acezero

[ECCV 2024 - Oral] ACE0 is a learning-based structure-from-motion approach that estimates camera parameters of sets of images by learning a multi-view consistent, implicit scene representation.
Python
623
star
3

manydepth

[CVPR 2021] Self-supervised depth estimation from short sequences
Python
620
star
4

mickey

[CVPR 2024 - Oral] Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
Python
417
star
5

stereo-from-mono

[ECCV 2020] Learning stereo from single images using monocular depth estimation networks
Python
392
star
6

ace

[CVPR 2023 - Highlight] Accelerated Coordinate Encoding (ACE): Learning to Relocalize in Minutes using RGB and Poses
Python
353
star
7

diffusionerf

[CVPR 2023] DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models
Python
286
star
8

map-free-reloc

[ECCV 2022] Map-free Visual Relocalization: Metric Pose Relative to a Single Image
Python
248
star
9

wavelet-monodepth

[CVPR 2021] Monocular depth estimation using wavelets for efficiency
Jupyter Notebook
226
star
10

footprints

[CVPR 2020] Estimation of the visible and hidden traversable space from a single color image
Python
220
star
11

depth-hints

[ICCV 2019] Depth Hints are complementary depth suggestions which improve monocular depth estimation algorithms trained from stereo pairs
Jupyter Notebook
185
star
12

doubletake

[ECCV 2024] DoubleTake: Geometry Guided Depth Estimation
Python
134
star
13

marepo

[CVPR 2024 Highlight] Map-Relative Pose Regression for Visual Re-Localization
Python
126
star
14

nerf-object-removal

[CVPR 2023] Removing Objects From Neural Radiance Fields
Python
100
star
15

implicit-depth

[CVPR 2023] Virtual Occlusions Through Implicit Depth
Python
79
star
16

scoring-without-correspondences

[CVPR 2023] Two-view Geometry Scoring Without Correspondences
Python
79
star
17

rectified-features

[ECCV 2020] Single image depth prediction allows us to rectify planar surfaces in images and extract view-invariant local features for better feature matching
63
star
18

image-box-overlap

[ECCV 2020] Training neural networks to predict visual overlap of images, through interpretable non-metric box embeddings
Jupyter Notebook
53
star
19

airplanes

[CVPR 2024] AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings
Python
49
star
20

panoptic-forecasting

[CVPR 2021] Forecasting the panoptic segmentation of future video frames
Python
47
star
21

relpose-gnn

[3DV21] Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision, M. Tรผrkoวงlu et al.
Python
39
star
22

modron

Modron - Cloud security compliance
JavaScript
32
star
23

nianticlabs.github.io

HTML
6
star
24

time-repeatability

[ICRA 2021] Learning to Predict Repeatability of Interest Points
6
star
25

metagame-balance

[AAMAS 2023] Bilevel Entropy based Mechanism Design for Balancing Meta in Video Games
Python
5
star
26

nagatha

Nagatha - Alerts without fatigue
1
star