• Stars
    star
    464
  • Rank 94,450 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 3 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

DD3D: "Is Pseudo-Lidar needed for Monocular 3D Object detection?"

Install // Datasets // Experiments // Models // License // Reference

Full video

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

Installation

We recommend using docker (see nvidia-docker2 instructions) to have a reproducible environment. To setup your environment, type in a terminal (only tested in Ubuntu 18.04):

git clone https://github.com/TRI-ML/dd3d.git
cd dd3d
# If you want to use docker (recommended)
make docker-build # CUDA 10.2
# Alternative docker image for cuda 11.1
# make docker-build DOCKERFILE=Dockerfile-cu111

Please check the version of your nvidia driver and cuda compatibility to determine which Dockerfile to use.

We will list below all commands as if run directly inside our container. To run any of the commands in a container, you can either start the container in interactive mode with make docker-dev to land in a shell where you can type those commands, or you can do it in one step:

# single GPU
make docker-run COMMAND="<some-command>"
# multi GPU
make docker-run-mpi COMMAND="<some-command>"

If you want to use features related to AWS (for caching the output directory) and Weights & Biases (for experiment management/visualization), then you should create associated accounts and configure your shell with the following environment variables before building the docker image:

export AWS_SECRET_ACCESS_KEY="<something>"
export AWS_ACCESS_KEY_ID="<something>"
export AWS_DEFAULT_REGION="<something>"
export WANDB_ENTITY="<something>"
export WANDB_API_KEY="<something>"

You should also enable these features in configuration, such as WANDB.ENABLED and SYNC_OUTPUT_DIR_S3.ENABLED.

Datasets

By default, datasets are assumed to be downloaded in /data/datasets/<dataset-name> (can be a symbolic link). The dataset root is configurable by DATASET_ROOT.

KITTI

The KITTI 3D dataset used in our experiments can be downloaded from the KITTI website. For convenience, we provide the standard splits used in 3DOP for training and evaluation:

# download a standard splits subset of KITTI
curl -s https://tri-ml-public.s3.amazonaws.com/github/dd3d/mv3d_kitti_splits.tar | sudo tar xv -C /data/datasets/KITTI3D

The dataset must be organized as follows:

<DATASET_ROOT>
    └── KITTI3D
        β”œβ”€β”€ mv3d_kitti_splits
        β”‚Β Β  β”œβ”€β”€ test.txt
        β”‚Β Β  β”œβ”€β”€ train.txt
        β”‚Β Β  β”œβ”€β”€ trainval.txt
        β”‚Β Β  └── val.txt
        β”œβ”€β”€ testing
        β”‚Β Β  β”œβ”€β”€ calib
        |   β”‚Β Β  β”œβ”€β”€ 000000.txt
        |   β”‚Β Β  β”œβ”€β”€ 000001.txt
        |   β”‚Β Β  └── ...
        β”‚Β Β  └── image_2
        β”‚Β Β      β”œβ”€β”€ 000000.png
        β”‚Β Β      β”œβ”€β”€ 000001.png
        β”‚Β Β      └── ...
        └── training
            β”œβ”€β”€ calib
            β”‚Β Β  β”œβ”€β”€ 000000.txt
            β”‚Β Β  β”œβ”€β”€ 000001.txt
            β”‚Β Β  └── ...
            β”œβ”€β”€ image_2
            β”‚Β Β  β”œβ”€β”€ 000000.png
            β”‚Β Β  β”œβ”€β”€ 000001.png
            β”‚Β Β  └── ...
            └── label_2
                β”œβ”€β”€ 000000.txt
                β”œβ”€β”€ 000001.txt
                └── ..

nuScenes

The nuScenes dataset (v1.0) can be downloaded from the nuScenes website. The dataset must be organized as follows:

<DATASET_ROOT>
    └── nuScenes
        β”œβ”€β”€ samples
        β”‚Β Β  β”œβ”€β”€ CAM_FRONT
        β”‚Β Β  β”‚Β Β  β”œβ”€β”€ n008-2018-05-21-11-06-59-0400__CAM_FRONT__1526915243012465.jpg
        β”‚Β Β  β”‚Β Β  β”œβ”€β”€ n008-2018-05-21-11-06-59-0400__CAM_FRONT__1526915243512465.jpg
        β”‚Β Β  β”‚Β Β  β”œβ”€β”€ ...
        β”‚Β Β  β”‚Β Β 
        β”‚Β Β  β”œβ”€β”€ CAM_FRONT_LEFT
        β”‚Β Β  β”‚Β Β  β”œβ”€β”€ n008-2018-05-21-11-06-59-0400__CAM_FRONT_LEFT__1526915243004917.jpg
        β”‚Β Β  β”‚Β Β  β”œβ”€β”€ n008-2018-05-21-11-06-59-0400__CAM_FRONT_LEFT__1526915243504917.jpg
        β”‚Β Β  β”‚Β Β  β”œβ”€β”€ ...
        β”‚Β Β  β”‚Β Β 
        β”‚Β Β  β”œβ”€β”€ ...
        β”‚Β Β 
        β”œβ”€β”€ v1.0-trainval
        β”‚Β Β  β”œβ”€β”€ attribute.json
        β”‚Β Β  β”œβ”€β”€ calibrated_sensor.json
        β”‚Β Β  β”œβ”€β”€ category.json
        β”‚Β Β  β”œβ”€β”€ ...
        β”‚Β Β 
        β”œβ”€β”€ v1.0-test
        β”‚Β Β  β”œβ”€β”€ attribute.json
        β”‚Β Β  β”œβ”€β”€ calibrated_sensor.json
        β”‚Β Β  β”œβ”€β”€ category.json
        β”‚Β Β  β”œβ”€β”€ ...
        β”‚Β Β 
        β”œβ”€β”€ v1.0-mini
        β”‚Β Β  β”œβ”€β”€ attribute.json
        β”‚Β Β  β”œβ”€β”€ calibrated_sensor.json
        β”‚Β Β  β”œβ”€β”€ category.json
        β”‚Β Β  β”œβ”€β”€ ...

Pre-trained DD3D models

The DD3D models pre-trained on dense depth estimation using DDAD15M can be downloaded here:

backbone download
DLA34 model
V2-99 model
OmniML model

The OmniML model is optimized by OmniML for highly efficient deployment on target hardware with better accuracy. The OmniML model achieves 1.75x speedup (measured with NVIDIA Xavier, int8, batch_size=1), 60% less GFlops (measured with input size 512x896) with better performance compared to standard DLA-34. Please see the Models section for configs.

(Optional) Eigen-clean subset of KITTI raw.

To train our Pseudo-Lidar detector, we curated a new subset of KITTI (raw) dataset and use it to fine-tune its depth network. This subset can be downloaded here. Each row contains left and right image pairs. The KITTI raw dataset can be download here.

Validating installation

To validate and visualize the dataloader (including data augmentation), run the following:

./scripts/visualize_dataloader.py +experiments=dd3d_kitti_dla34 SOLVER.IMS_PER_BATCH=4

To validate the entire training loop (including evaluation and visualization), run the overfit experiment (trained on test set):

./scripts/train.py +experiments=dd3d_kitti_dla34_overfit
experiment backbone train mem. (GB) train time (hr) train log Box AP (%) BEV AP (%) download
config DLA-34 6 0.25 log 84.54 88.83 model

Experiments

Configuration

We use hydra to configure experiments, specifically following this pattern to organize and compose configurations. The experiments under configs/experiments describe the delta from the default configuration, and can be run as follows:

# omit the '.yaml' extension from the experiment file.
./scripts/train.py +experiments=<experiment-file> <config-override>

The configuration is modularized by various components such as datasets, backbones, evaluators, and visualizers, etc.

Using multiple GPUs

The training script supports (single-node) multi-GPU for training and evaluation via mpirun. This is most conveniently executed by the make docker-run-mpi command (see above). Internally, IMS_PER_BATCH parameters of the optimizer and the evaluator denote the total size of batch that is sharded across available GPUs while training or evaluating. They are required to be set as a multuple of available GPUs.

Evaluation

One can run only evaluation using the pretrained models:

./scripts/train.py +experiments=<some-experiment> EVAL_ONLY=True MODEL.CKPT=<path-to-pretrained-model>
# use smaller batch size for single-gpu
./scripts/train.py +experiments=<some-experiment> EVAL_ONLY=True MODEL.CKPT=<path-to-pretrained-model> TEST.IMS_PER_BATCH=4

Gradient accumulation

If you have insufficient GPU memory for any experiment, you can use gradient accumulation by configuring ACCUMULATE_GRAD_BATCHES, at the cost of longer training time. For instance, if the experiment requires at least 400 of GPU memory (e.g. V2-99, KITTI) and you have only 128 (e.g., 8 x 16G GPUs), then you can update parameters at every 4th step:

# The original batch size is 64.
./scripts/train.py +experiments=dd3d_kitti_v99 SOLVER.IMS_PER_BATCH=16 SOLVER.ACCUMULATE_GRAD_BATCHES=4

Models

All DLA-34 and V2-99 experiments here use 8 A100 40G GPUs, and use gradient accumulation when more GPU memory is needed. We subsample nuScenes validation set by a factor of 8 (2Hz ⟢ 0.25Hz) to save training time.

(*): Trained using 8 A5000 GPUs. (**): Benchmarked on NVIDIA Xavier.

KITTI

experiment backbone train mem. (GB) train time (hr) GFLOPs latency (ms) train log Box AP (%) BEV AP (%) download
config DLA-34 256 4.5 103 19.9** log 16.92 24.77 model
config V2-99 400 9.0 453 - log 23.90 32.01 model
config OmniML 70* 3.0* 41 11.4** log 20.58 28.73 model

nuScenes

experiment backbone train mem. (GB) train time (hr) train log mAP (%) NDS download
config DLA-34 TBD TBD TBD) TBD TBD TBD
config V2-99 TBD TBD TBD TBD TBD TBD

License

The source code is released under the MIT license. We note that some code in this repository is adapted from the following repositories:

Reference

@inproceedings{park2021dd3d,
  author = {Dennis Park and Rares Ambrus and Vitor Guizilini and Jie Li and Adrien Gaidon},
  title = {Is Pseudo-Lidar needed for Monocular 3D Object detection?},
  booktitle = {IEEE/CVF International Conference on Computer Vision (ICCV)},
  primaryClass = {cs.CV},
  year = {2021},
}

More Repositories

1

packnet-sfm

TRI-ML Monocular Depth Estimation Repository
Python
1,243
star
2

vidar

Python
560
star
3

DDAD

Dense Depth for Autonomous Driving (DDAD) dataset.
Python
490
star
4

prismatic-vlms

A flexible and efficient codebase for training visually-conditioned language models (VLMs)
Python
445
star
5

KP3D

Code for "Self-Supervised 3D Keypoint Learning for Ego-motion Estimation"
Python
240
star
6

PF-Track

Implementation of PF-Track
Python
203
star
7

KP2D

Python
176
star
8

sdflabel

Official PyTorch implementation of CVPR 2020 oral "Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors"
Python
161
star
9

realtime_panoptic

Official PyTorch implementation of CVPR 2020 Oral: Real-Time Panoptic Segmentation from Dense Detections
Python
115
star
10

permatrack

Implementation for Learning to Track with Object Permanence
Python
112
star
11

camviz

Visualization Library
Python
101
star
12

dgp

ML Dataset Governance Policy for Autonomous Vehicle Datasets
Python
94
star
13

VEDet

Python
39
star
14

RAP

This is the official code for the paper RAP: Risk-Aware Prediction for Robust Planning: https://arxiv.org/abs/2210.01368
Python
34
star
15

VOST

Code for the VOST dataset
Python
22
star
16

RAM

Implementation for Object Permanence Emerges in a Random Walk along Memory
Python
18
star
17

road

ROAD: Learning an Implicit Recursive Octree Auto-Decoder to Efficiently Encode 3D Shapes (CoRL 2022)
Python
11
star
18

efm_datasets

TRI-ML Embodied Foundation Datasets
Python
8
star
19

OctMAE

Zero-Shot Multi-Object Shape Completion (ECCV 2024)
Python
5
star
20

refine

Official PyTorch implementation of the SIGGRAPH 2024 paper "ReFiNe: Recursive Field Networks for Cross-Modal Multi-Scene Representation"
Python
5
star
21

stochastic_verification

Official repository for the paper "How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation"
Python
5
star
22

HAICU

4
star
23

binomial_cis

Computation of binomial confidence intervals that achieve exact coverage.
Jupyter Notebook
4
star
24

vlm-evaluation

VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning
Python
1
star