• Stars
    star
    279
  • Rank 147,967 (Top 3 %)
  • Language
    Python
  • License
    Other
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

iDisc: Internal Discretization for Monocular Depth Estimation [CVPR 2023]

PWC KITTI Benchmark PWC PWC

iDisc: Internal Discretization for Monocular Depth Estimation

iDisc: Internal Discretization for Monocular Depth Estimation,
Luigi Piccinelli, Christos Sakaridis, Fisher Yu, CVPR 2023 (to appear) Project Website (iDisc) Paper (arXiv 2304.06334)

Visualization

KITTI

animated

NYUv2-Depth

animated

For more, and not compressed, visual examples please visit vis.xyz.

Citation

If you find our work useful in your research please consider citing our publication:

    @inproceedings{piccinelli2023idisc,
      title={iDisc: Internal Discretization for Monocular Depth Estimation},
      author={Piccinelli, Luigi and Sakaridis, Christos and Yu, Fisher},
      booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      year={2023}
    }

Abstract

Monocular depth estimation is fundamental for 3D scene understanding and downstream applications. However, even under the supervised setup, it is still challenging and ill posed due to the lack of geometric constraints. We observe that although a scene can consist of millions of pixels, there are much fewer high-level patterns. We propose iDisc to learn those patterns with internal discretized representations. The method implicitly partitions the scene into a set of high-level concepts. In particular, our new module, Internal Discretization (ID), implements a continuous-discrete-continuous bottleneck to learn those concepts without supervision. In contrast to state-of-the-art methods, the proposed model does not enforce any explicit constraints or priors on the depth output. The whole network with the ID module can be trained in an end-to-end fashion thanks to the bottleneck module based on attention. Our method sets the new state of the art with significant improvements on NYU-Depth v2 and KITTI, outperforming all published methods on the official KITTI benchmark. iDisc can also achieve state-of-the-art results on surface normal estimation. Further, we explore the model generalization capability via zero-shot testing. From there, we observe the compelling need to promote diversification in the outdoor scenario and we introduce splits of two autonomous driving datasets, DDAD and Argoverse

Installation

Please refer to INSTALL.md for installation and to DATA.md for datasets preparation.

Get Started

Please see GETTING_STARTED.md for the basic usage of iDisc.

Model Zoo

General

We store the output predictions in the same relative path as the depth path from the corresponding dataset. For evaluation we used micro averaging, while some other depth repos use macro averaging; the difference is in the order of decimals of percentage points, but we found it more appropriate for datasets with uneven density distributions, e.g. due to point cloud accumulation or depth cameras. Please note that the depth map is rescaled as in the original dataset to be stored as .png file. In particular, to obtain metric depth, you need to divide NYUv2 results by 1000, and results for all other datasets by 256. Normals need to be rescaled from [0, 255] to [-1, 1]. Predictions are not interpolated, that is, the output dimensions are one quarter of the input dimensions. For evaluation we used bilinear interpolation with aligned corners.

KITTI

Backbone d0.5 d1 d2 RMSE RMSE log A.Rel Sq.Rel Config Weights Predictions
Resnet101 0.860 0.965 0.996 2.362 0.090 0.059 0.197 config weights predictions
EfficientB5 0.852 0.963 0.994 2.510 0.094 0.063 0.223 config weights predictions
Swin-Tiny 0.870 0.968 0.996 2.291 0.087 0.058 0.184 config weights predictions
Swin-Base 0.885 0.974 0.997 2.149 0.081 0.054 0.159 config weights predictions
Swin-Large 0.896 0.977 0.997 2.067 0.077 0.050 0.145 config weights predictions

NYUv2

Backbone d1 d2 d3 RMSE A.Rel Log10 Config Weights Predictions
Resnet101 0.892 0.983 0.995 0.380 0.109 0.046 config weights predictions
EfficientB5 0.903 0.986 0.997 0.369 0.104 0.044 config weights predictions
Swin-Tiny 0.894 0.983 0.996 0.377 0.109 0.045 config weights predictions
Swin-Base 0.926 0.989 0.997 0.327 0.091 0.039 config weights predictions
Swin-Large 0.940 0.993 0.999 0.313 0.086 0.037 config weights predictions

Normals

Results may differ (~0.1%) due to micro vs. macro averaging and bilinear vs. bicubic interpolation.

Backbone 11.5 22.5 30 RMSE Mean Median Config Weights Predictions
Swin-Large 0.637 0.796 0.855 22.9 14.6 7.3 config weights predictions

DDAD

Backbone d1 d2 d3 RMSE RMSE log A.Rel Sq.Rel Config Weights Predictions
Swin-Large 0.809 0.934 0.971 8.989 0.221 0.163 1.85 config weights predictions

Argoverse

Backbone d1 d2 d3 RMSE RMSE log A.Rel Sq.Rel Config Weights Predictions
Swin-Large 0.821 0.923 0.960 7.567 0.243 0.163 2.22 config weights predictions

Zero-shot testing

Train Dataset Test Dataset d1 RMSE A.Rel Config Weights
NYUv2 SUN-RGBD 0.838 0.387 0.128 config weights
NYUv2 Diode 0.810 0.721 0.156 config weights
KITTI Argoverse 0.560 12.18 0.269 config weights
KITTI DDAD 0.350 14.26 0.367 config weights

License

This software is released under Creatives Common BY-NC 4.0 license. You can view a license summary here.

Contributions

If you find any bug in the code, please report to
Luigi Piccinelli (lpiccinelli_at_ethz.ch)

Acknowledgement

This work is funded by Toyota Motor Europe via the research project TRACE-Zurich (Toyota Research on Automated Cars Europe).

More Repositories

1

sam-hq

Segment Anything in High Quality [NeurIPS 2023]
Python
3,689
star
2

sam-pt

SAM-PT: Extending SAM to zero-shot video segmentation with point-based tracking.
Python
970
star
3

transfiner

Mask Transfiner for High-Quality Instance Segmentation, CVPR 2022
Python
525
star
4

qd-3dt

Official implementation of Monocular Quasi-Dense 3D Object Tracking, TPAMI 2022
Python
515
star
5

qdtrack

Quasi-Dense Similarity Learning for Multiple Object Tracking, CVPR 2021 (Oral)
Python
382
star
6

pcan

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021 Spotlight
Python
362
star
7

MaskFreeVIS

Mask-Free Video Instance Segmentation [CVPR 2023]
Python
358
star
8

bdd100k-models

Model Zoo of BDD100K Dataset
Python
285
star
9

LiDAR_snow_sim

LiDAR snowfall simulation
Python
172
star
10

r3d3

Python
144
star
11

P3Depth

Python
123
star
12

shift-dev

SHIFT Dataset DevKit - CVPR2022
Python
103
star
13

cascade-detr

[ICCV'23] Cascade-DETR: Delving into High-Quality Universal Object Detection
Python
92
star
14

tet

Implementation of Tracking Every Thing in the Wild, ECCV 2022
Python
69
star
15

TrafficBots

TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction. ICRA 2023. Code is now available at https://github.com/zhejz/TrafficBots
51
star
16

nutsh

A Platform for Visual Learning from Human Feedback
TypeScript
42
star
17

vmt

Video Mask Transfiner for High-Quality Video Instance Segmentation (ECCV'2022)
Jupyter Notebook
29
star
18

spc2

Instance-Aware Predictive Navigation in Multi-Agent Environments, ICRA 2021
Python
20
star
19

CISS

Unsupervised condition-level adaptation for semantic segmentation
Python
20
star
20

shift-detection-tta

This repository implements continuous test-time adaptation algorithms for object detection on the SHIFT dataset.
Python
18
star
21

vis4d

A modular library for visual 4D scene understanding
Python
17
star
22

dla-afa

Official implementation of Dense Prediction with Attentive Feature Aggregation, WACV 2023
Python
12
star
23

soccer-player

Python
8
star
24

project-template

Python
4
star
25

vis4d_cuda_ops

Cuda
3
star
26

vis4d-template

Vis4D Template.
Shell
3
star