• Stars
    star
    139
  • Rank 253,505 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created about 1 year ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for Monocular Visual-Inertial Depth Estimation (ICRA 2023)

Monocular Visual-Inertial Depth Estimation

This repository contains code and models for our paper:

Monocular Visual-Inertial Depth Estimation
Diana Wofk, Renรฉ Ranftl, Matthias Mรผller, Vladlen Koltun

For a quick overview of the work you can watch the short talk and teaser on YouTube.

Introduction

Methodology Diagram

We present a visual-inertial depth estimation pipeline that integrates monocular depth estimation and visual-inertial odometry to produce dense depth estimates with metric scale. Our approach consists of three stages: (1) input processing, where RGB and IMU data feed into monocular depth estimation alongside visual-inertial odometry, (2) global scale and shift alignment, where monocular depth estimates are fitted to sparse depth from VIO in a least-squares manner, and (3) learning-based dense scale alignment, where globally-aligned depth is locally realigned using a dense scale map regressed by the ScaleMapLearner (SML). The images at the bottom in the diagram above illustrate a VOID sample being processed through our pipeline; from left to right: the input RGB, ground truth depth, sparse depth from VIO, globally-aligned depth, scale map scaffolding, dense scale map regressed by SML, final depth output.

Teaser Figure

Setup

  1. Setup dependencies:

    conda env create -f environment.yaml
    conda activate vi-depth
  2. Pick one or more ScaleMapLearner (SML) models and download the corresponding weights to the weights folder.

    Depth Predictor SML on VOID 150 SML on VOID 500 SML on VOID 1500
    DPT-BEiT-Large model model model
    DPT-SwinV2-Large model model model
    DPT-Large model model model
    DPT-Hybrid model* model model
    DPT-SwinV2-Tiny model model model
    DPT-LeViT model model model
    MiDaS-small model model model

    *Also available with pretraining on TartanAir: model

Inference

  1. Place inputs into the input folder. An input image and corresponding sparse metric depth map are expected:

    input
    โ”œโ”€โ”€ image                   # RGB image
    โ”‚   โ”œโ”€โ”€ <timestamp>.png
    โ”‚   โ””โ”€โ”€ ...
    โ””โ”€โ”€ sparse_depth            # sparse metric depth map
        โ”œโ”€โ”€ <timestamp>.png     # as 16b PNG
        โ””โ”€โ”€ ...

    The load_sparse_depth function in run.py may need to be modified depending on the format in which sparse depth is stored. By default, the depth storage method used in the VOID dataset is assumed.

  2. Run the run.py script as follows:

    DEPTH_PREDICTOR="dpt_beit_large_512"
    NSAMPLES=150
    SML_MODEL_PATH="weights/sml_model.dpredictor.${DEPTH_PREDICTOR}.nsamples.${NSAMPLES}.ckpt"
    
    python run.py -dp $DEPTH_PREDICTOR -ns $NSAMPLES -sm $SML_MODEL_PATH --save-output
  3. The --save-output flag enables saving outputs to the output folder. By default, the following outputs will be saved per sample:

    output
    โ”œโ”€โ”€ ga_depth                # metric depth map after global alignment
    โ”‚   โ”œโ”€โ”€ <timestamp>.pfm     # as PFM
    โ”‚   โ”œโ”€โ”€ <timestamp>.png     # as 16b PNG
    โ”‚   โ””โ”€โ”€ ...
    โ””โ”€โ”€ sml_depth               # metric depth map output by SML
        โ”œโ”€โ”€ <timestamp>.pfm     # as PFM
        โ”œโ”€โ”€ <timestamp>.png     # as 16b PNG
        โ””โ”€โ”€ ...

Evaluation

Models provided in this repo were trained on the VOID dataset.

  1. Download the VOID dataset following the instructions in the VOID dataset repo.

  2. To evaluate on VOID test sets, run the evaluate.py script as follows:

    DATASET_PATH="/path/to/void_release/"
    
    DEPTH_PREDICTOR="dpt_beit_large_512"
    NSAMPLES=150
    SML_MODEL_PATH="weights/sml_model.dpredictor.${DEPTH_PREDICTOR}.nsamples.${NSAMPLES}.ckpt"
    
    python evaluate.py -ds $DATASET_PATH -dp $DEPTH_PREDICTOR -ns $NSAMPLES -sm $SML_MODEL_PATH

    Results for the example shown above:

    Averaging metrics for globally-aligned depth over 800 samples
    Averaging metrics for SML-aligned depth over 800 samples
    +---------+----------+----------+
    |  metric | GA Only  |  GA+SML  |
    +---------+----------+----------+
    |   RMSE  |  191.36  |  142.85  |
    |   MAE   |  115.84  |   76.95  |
    |  AbsRel |    0.069 |    0.046 |
    |  iRMSE  |   72.70  |   57.13  |
    |   iMAE  |   49.32  |   34.25  |
    | iAbsRel |    0.071 |    0.048 |
    +---------+----------+----------+
    

    To evaluate on VOID test sets at different densities (void_150, void_500, void_1500), change the NSAMPLES argument above accordingly.

Citation

If you reference our work, please consider citing the following:

@inproceedings{wofk2023videpth,
    author      = {{Wofk, Diana and Ranftl, Ren\'{e} and M{\"u}ller, Matthias and Koltun, Vladlen}},
    title       = {{Monocular Visual-Inertial Depth Estimation}},
    booktitle   = {{IEEE International Conference on Robotics and Automation (ICRA)}},
    year        = {{2023}}
}

Acknowledgements

Our work builds on and uses code from MiDaS, timm, and PyTorch Lightning. We'd like to thank the authors for making these libraries and frameworks available.

More Repositories

1

Open3D

Open3D: A Modern Library for 3D Data Processing
C++
10,396
star
2

MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
Python
4,041
star
3

OpenBot

OpenBot leverages smartphones as brains for low-cost robots. We have designed a small electric vehicle that costs about $50 and serves as a robot body. Our software stack for Android smartphones supports advanced robotics workloads such as person following and real-time autonomous navigation.
Swift
2,679
star
4

DPT

Dense Prediction Transformers
Python
1,794
star
5

ZoeDepth

Metric depth estimation from a single image
Jupyter Notebook
1,750
star
6

Open3D-ML

An extension of Open3D to address 3D Machine Learning tasks
Python
1,644
star
7

PhotorealismEnhancement

Code & Data for Enhancing Photorealism Enhancement
Python
1,237
star
8

MultiObjectiveOptimization

Source code for Neural Information Processing Systems (NeurIPS) 2018 paper "Multi-Task Learning as Multi-Objective Optimization"
Python
753
star
9

lang-seg

Language-Driven Semantic Segmentation
Jupyter Notebook
654
star
10

FastGlobalRegistration

Fast Global Registration
C++
489
star
11

Open3D-PointNet2-Semantic3D

Semantic3D segmentation with Open3D and PointNet++
Python
461
star
12

FreeViewSynthesis

Code repository for "Free View Synthesis", ECCV 2020.
Python
262
star
13

StableViewSynthesis

Python
212
star
14

DeepLagrangianFluids

Code repository for "Lagrangian Fluid Simulation with Continuous Convolutions", ICLR 2020.
Python
187
star
15

spear

SPEAR: A Simulator for Photorealistic Embodied AI Research
C++
173
star
16

DirectFuturePrediction

Code for the paper "Learning to Act by Predicting the Future", Alexey Dosovitskiy and Vladlen Koltun, ICLR 2017
Python
152
star
17

NPHard

Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search
Python
139
star
18

redwood-3dscan

Python
100
star
19

Intseg

Interactive Image Segmentation with Latent Diversity
Python
78
star
20

TanksAndTemples

Toolbox for the TanksAndTemples benchmark website
Python
58
star
21

dcflow

Code for the paper "Accurate Optical Flow via Direct Cost Volume Processing. Jia Xu, Renรฉ Ranftl, and Vladlen Koltun. CVPR 2017"
C++
52
star
22

adaptive-surface-reconstruction

Adaptive Surface Reconstruction for 3D Data Processing
Python
48
star
23

DFE

Python
43
star
24

open3d-cmake-find-package

Find pre-installed Open3D package in CMake
C++
42
star
25

vision-for-action

Code to accompany "Does computer vision matter for action?"
Python
41
star
26

LMRS

Source code for ICLR 2020 paper: "Learning to Guide Random Search"
Python
39
star
27

open3d_downloads

Hosting Open3D test data for development use
23
star
28

Open3D-3rdparty

C
20
star
29

open3d-cmake-external-project

Use Open3D as a CMake external project
CMake
15
star
30

0shot-object-insertion

Simulation and robot code for contact-rich household object insertion (ICRA 2023).
Python
11
star
31

objects-with-lighting

8
star
32

Open3D-Viewer

C++
7
star
33

generalized-smoothing

Companion code for the ICML 2022 paper "Generalizing Gaussian Smoothing for Random Search"
Python
5
star
34

Open3D-Python-CI

Testing Open3D Python package from PyPI and Conda
4
star
35

MetaLearningTradeoffs

Source code for the NeurIPS 2020 Paper: Modeling and Optimization Trade-off in Meta-learning.
Python
4
star
36

hello-world-docker-action

Dockerfile
1
star
37

mshadow

Forked from https://github.com/dmlc/mshadow
C++
1
star