• Stars
    star
    651
  • Rank 69,175 (Top 2 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created over 5 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

DeepV2D

This repository contains the source code for our paper:

DeepV2D: Video to Depth with Differentiable Structure from Motion
Zachary Teed and Jia Deng
International Conference on Learning Representations (ICLR) 2020

Requirements

Our code was tested using Tensorflow 1.12.0 and Python 3. To use the code, you need to first install the following python packages:

First create a clean virtualenv

virtualenv --no-site-packages -p python3 deepv2d_env
source deepv2d_env/bin/activate
pip install tensorflow-gpu==1.12.0
pip install h5py
pip install easydict
pip install scipy
pip install opencv-python
pip install pyyaml
pip install toposort
pip install vtk

You can optionally compile our cuda backprojection operator by running

cd deepv2d/special_ops && ./make.sh && cd ../..

This will reduce peak GPU memory usage. You may need to change CUDALIB to where you have cuda is installed.

Demos

Video to Depth (V2D)

Try it out on one of the provided test sequences. First download our pretrained models

./data/download_models.sh

or from google drive

The demo code will output a depth map and display a point cloud for visualization. Once the depth map has appeared, press any key to open the point cloud visualization.

NYUv2:

python demos/demo_v2d.py --model=models/nyu.ckpt --sequence=data/demos/nyu_0

ScanNet:

python demos/demo_v2d.py --model=models/scannet.ckpt --sequence=data/demos/scannet_0

KITTI:

python demos/demo_v2d.py --model=models/kitti.ckpt --sequence=data/demos/kitti_0

You can also run motion estimation in global mode which updates all the poses jointly as a single optimization problem

python demos/demo_v2d.py --model=models/nyu.ckpt --sequence=data/demos/nyu_0 --mode=global

Uncalibrated Video to Depth (V2D-Uncalibrated)

If you do not know the camera intrinsics you can run DeepV2D in uncalibrated mode. In the uncalibrated setting, the motion module estimates the focal length during inference.

python demos/demo_uncalibrated.py --video=data/demos/golf.mov

SLAM / VO

DeepV2D can also be used for tracking and mapping on longer videos. First, download some test sequences

./data/download_slam_sequences.sh

Try it out on NYU-Depth, ScanNet, TUM-RGBD, or KITTI. Using more keyframes --n_keyframes=? reduces drift but results in slower tracking.

python demos/demo_slam.py --dataset=kitti --n_keyframes=2
python demos/demo_slam.py --dataset=scannet --n_keyframes=3

The --cinematic flag forces the visualization to follow the camera

python demos/demo_slam.py --dataset=nyu --n_keyframes=3 --cinematic

The --clear_points flag can be used so that only the point cloud of the current depth is plotted.

python demos/demo_slam.py --dataset=tum --n_keyframes=3 --clear_points

Evaluation

You can evaluate the trained models on one of the datasets...

NYUv2:

./data/download_nyu_data.sh
python evaluation/eval_nyu.py --model=models/nyu.ckpt

KITTI:

First download the dataset using this script provided on the official website. Then run the evaluation script where KITTI_PATH is the location of where the dataset was downloaded

./data/download_kitti_data.sh
python evaluation/eval_kitti.py --model=models/kitti.ckpt --dataset_dir=KITTI_PATH

ScanNet:

First download the ScanNet dataset.

Then run the evaluation script where SCANNET_PATH is the location of where you downloaded ScanNet

python evaluation/eval_scannet.py --model=models/scannet.ckpt --dataset_dir=SCANNET_PATH

Training

You can train a model on one of the datasets

NYUv2:

First download the training tfrecords file here (143Gb) containing the NYU data. Once the data has been downloaded, train the model by running the command (training takes about 1 week on a Nvidia 1080Ti GPU)

Camera poses for NYU were estimated using ORB-SLAM2 using kinect measurements. You can download the estimated poses from google drive.

python training/train_nyu.py --cfg=cfgs/nyu.yaml --name=nyu_model --tfrecords=nyu_train.tfrecords

Note: this creates a temporary directory which is used to store intermediate depth predictions. You can specify the location of the temporary directory using the --tmp flag. You can use multiple gpus by using the --num_gpus flag. If you train with multiple gpus, you can reduce the number of training iterations in cfgs/nyu.yaml.

KITTI:

First download the dataset using this script provided on the official website. Once the dataset has been downloaded, write the training sequences to a tfrecords file

python training/write_tfrecords.py --dataset=kitti --dataset_dir=KITTI_DIR --records_file=kitti_train.tfrecords

You can now train the model (training takes about 1 week on a Nvidia 1080Ti GPU). Note: this creates a temporary directory which is used to store intermediate depth predictions. You can specify the location of the temporary directory using the --tmp flag. You can use multiple gpus by using the --num_gpus flag.

python training/train_kitti.py --cfg=cfgs/kitti.yaml --name=kitti_model --tfrecords=kitti_train.tfrecords

ScanNet:

python training/train_scannet.py --cfg=cfgs/scannet.yaml --name=scannet_model --dataset_dir="path to scannet"

More Repositories

1

infinigen

Infinite Photorealistic Worlds using Procedural Generation
Python
5,286
star
2

RAFT

Python
3,189
star
3

CornerNet

Python
2,355
star
4

CornerNet-Lite

Python
1,780
star
5

DROID-SLAM

Python
1,730
star
6

lietorch

Cuda
670
star
7

RAFT-Stereo

Python
667
star
8

DPVO

Deep Patch Visual Odometry/SLAM
C++
597
star
9

pose-hg-train

Training and experimentation code used for "Stacked Hourglass Networks for Human Pose Estimation"
Jupyter Notebook
575
star
10

pytorch_stacked_hourglass

Pytorch implementation of the ECCV 2016 paper "Stacked Hourglass Networks for Human Pose Estimation"
Python
469
star
11

CoqGym

A Learning Environment for Theorem Proving with the Coq proof assistant
Coq
380
star
12

pose-ae-train

Training code for "Associative Embedding: End-to-End Learning for Joint Detection and Grouping"
Python
373
star
13

pose-hg-demo

Code to test and use the model from "Stacked Hourglass Networks for Human Pose Estimation"
Lua
316
star
14

SEA-RAFT

[ECCV2024 - Oral, Best Paper Award Candidate] SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow
Python
298
star
15

RAFT-3D

Python
229
star
16

SimpleView

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"
Python
154
star
17

px2graph

Training code for "Pixels to Graphs by Associative Embedding"
Python
133
star
18

relative_depth

Code for the NIPS 2016 paper
Lua
124
star
19

CER-MVS

Python
122
star
20

YouTube3D

Code for the CVPR 2019 paper "Learning Single-Image Depth from Videos using Quality Assessment Networks"
Python
106
star
21

Coupled-Iterative-Refinement

Python
105
star
22

pose-ae-demo

Python
97
star
23

MultiSlam_DiffPose

Jupyter Notebook
94
star
24

SNP

Official code for View Synthesis with Sculpted Neural Points
Python
83
star
25

DecorrelatedBN

Code for Decorrelated Batch Normalization
Lua
80
star
26

SpatialSense

An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition
Python
70
star
27

oasis

Code for the CVPR 2020 paper "OASIS: A Large-Scale Dataset for Single Image 3D in the Wild"
MATLAB
64
star
28

selfstudy

Code for reproducing experiments in "How Useful is Self-Supervised Pretraining for Visual Tasks?"
Python
60
star
29

PackIt

Code for reproducing results in ICML 2020 paper "PackIt: A Virtual Environment for Geometric Planning"
Jupyter Notebook
52
star
30

d3dhelper

Unofficial sample code for Distilled 3D Networks (D3D) in Tensorflow.
Jupyter Notebook
48
star
31

Oriented1D

Official code for ICCV 2023 paper "Convolutional Networks with Oriented 1D Kernels"
Python
44
star
32

SOLID

Python
41
star
33

OGNI-DC

[ECCV24] official code for "OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations"
Python
38
star
34

OcMesher

C++
35
star
35

attach-juxtapose-parser

Code for the paper "Strongly Incremental Constituency Parsing with Graph Neural Networks"
Python
34
star
36

surface_normals

Code for the ICCV 2017 paper "Surface Normals in the Wild"
Lua
33
star
37

MetaGen

Code for the paper "Learning to Prove Theorems by Learning to Generate Theorems"
Objective-C++
30
star
38

FormulaNet

Code for FormulaNet in NIPS 2017
Python
29
star
39

Rel3D

Official code for NeurRIPS 2020 paper "Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D"
Python
26
star
40

selfstudy-render

Code to generate datasets used in "How Useful is Self-Supervised Pretraining for Visual Tasks?"
Python
22
star
41

think_visually

Code for ACL 2018 paper 'Think Visually: Question Answering through Virtual Imagery'
Python
14
star
42

structured-matching

codes for ECCV 2016
Lua
9
star
43

DPVO_Docker

Shell
8
star
44

uniloss

Python
8
star
45

MetaQNL

Learning Symbolic Rules for Reasoning in Quasi-Natural Language: https://arxiv.org/abs/2111.12038
Julia
6
star
46

PackIt_Extra

Code for generating data in ICML 2020 paper "PackIt: A Virtual Environment for Geometric Planning"
C#
5
star
47

Rel3D_Render

Code for rendering images for NeurRIPS 2020 paper "Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D"
Python
3
star
48

HYPE-C

Python
1
star