• Stars
    star
    221
  • Rank 176,137 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    Other
  • Created about 3 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR 2021] Monocular depth estimation using wavelets for efficiency

Single Image Depth Prediction with Wavelet Decomposition

Michaรซl Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit and Daniyar Turmukhambetov

CVPR 2021

[Link to paper]

kitti gif nyu gif

We introduce WaveletMonoDepth, which improves efficiency of standard encoder-decoder monocular depth estimation methods by exploiting wavelet decomposition.

5 minute CVPR presentation video link

๐Ÿง‘โ€๐Ÿซ Methodology

WaveletMonoDepth was implemented for two benchmarks, KITTI and NYUv2. For each dataset, we build our code upon a baseline code. Both baselines share a common encoder-decoder architecture, and we modify their decoder to provide a wavelet prediction.

Wavelets predictions are sparse, and can therefore be computed only at relevant locations, therefore saving a lot of unnecessary computations.

our architecture

The network is first trained with a dense convolutions in the decoder until convergence, and the dense convolutions are then replaced with sparse ones.

This is because the network first needs to learn to predict sparse wavelet coefficients before we can use sparse convolutions.

๐Ÿ—‚ Environment Requirements ๐Ÿ—‚

We recommend creating a new Anaconda environment to use WaveletMonoDepth. Use the following to setup a new environment:

conda env create -f environment.yml
conda activate wavelet-mdp

Our work uses Pytorch Wavelets, a great package from Fergal Cotter which implements the Inverse Discrete Wavelet Transform (IDWT) used in our work, and a lot more! To install Pytorch Wavelets, simply run:

git clone https://github.com/fbcotter/pytorch_wavelets
cd pytorch_wavelets
pip install .

๐Ÿš—๐Ÿšฆ KITTI ๐ŸŒณ๐Ÿ›ฃ

Depth Hints was used as a baseline for KITTI.

Depth Hints builds upon monodepth2. If you have questions about running the code, please see the issues in their repositories first.

โš™ Setup, Training and Evaluation

Please see the KITTI directory of this repository for details on how to train and evaluate our method.

๐Ÿ“Š Results ๐Ÿ“ฆ Trained models

Please find below the scores using dense convolutions to predict wavelet coefficients. Download links coming soon!

Model name Training modality Resolution abs_rel RMSE ฮด<1.25
Ours Resnet18 Stereo + DepthHints 640 x 192 0.106 4.693 0.876
Ours Resnet50 Stereo + DepthHints 640 x 192 0.105 4.625 0.879
Ours Resnet18 Stereo + DepthHints 1024 x 320 0.102 4.452 0.890
Ours Resnet50 Stereo + DepthHints 1024 x 320 0.097 4.387 0.891

๐ŸŽš Playing with sparsity

However the most interesting part is that we can make use of the sparsity property of the predicted wavelet coefficients to trade-off performance with efficiency, at a minimal cost on performance. We do so by tuning the threshold, and:

  • low thresholds values will lead to high performance but high number of computations,
  • high thresholds will lead to highly efficient computation, as convolutions will be computed only in a few pixel locations. This will have a minimal impact on performance.

sparsify kitti

Computing coefficients at only 10% of the pixels in the decoding process gives a relative score loss of less than 1.4%.

scores kitti

Our wavelet based method allows us to greatly reduce the number of computation in the decoder at a minimal expense in performance. We can measure the performance-vs-efficiency trade-off by evaluating scores vs FLOPs.

scores vs flops kitti

๐Ÿช‘๐Ÿ› NYUv2 ๐Ÿ›‹๐Ÿšช

Dense Depth was used as a baseline for NYUv2. Note that we used the experimental PyTorch implementation of DenseDepth. Note that compared to the original paper, we made a few different modifications:

  • we supervise depth directly instead of supervising disparity
  • we do not use SSIM
  • we use DenseNet161 as encoder instead of DenseNet169

โš™ Setup, Training and Evaluation

Please see the NYUv2 directory of this repository for details on how to train and evaluate our method.

๐Ÿ“Š Results and ๐Ÿ“ฆ Trained models

Please find below the scores and associated trained models, using dense convolutions to predict wavelet coefficients.

Model name Encoder Resolution abs_rel RMSE ฮด<1.25 ฮต_acc
Baseline DenseNet 640 x 480 0.1277 0.5479 0.8430 1.7170
Ours DenseNet 640 x 480 0.1258 0.5515 0.8451 1.8070
Baseline MobileNetv2 640 x 480 0.1772 0.6638 0.7419 1.8911
Ours MobileNetv2 640 x 480 0.1727 0.6776 0.7380 1.9732

๐ŸŽš Playing with sparsity

As with the KITTI dataset, we can tune the wavelet threshold to greatly reduce computation at minimal cost on performance.

sparsify nyu

Computing coefficients at only 5% of the pixels in the decoding process gives a relative depth score loss of less than 0.15%.

scores nyu

๐ŸŽฎ Try it yourself!

Try using our Jupyter notebooks to visualize results with different levels of sparsity, as well as compute the resulting computational saving in FLOPs. Notebooks can be found in <DATASET>/sparsity_test_notebook.ipynb where <DATASET> is either KITTI or NYUv2.

โœ๏ธ ๐Ÿ“„ Citation

If you find our work useful or interesting, please consider citing our paper:

@inproceedings{ramamonjisoa-2021-wavelet-monodepth,
  title     = {Single Image Depth Prediction with Wavelet Decomposition},
  author    = {Ramamonjisoa, Micha{\"{e}}l and
               Michael Firman and
               Jamie Watson and
               Vincent Lepetit and
               Daniyar Turmukhambetov},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  month = {June},
  year = {2021}
}

๐Ÿ‘ฉโ€โš–๏ธ License

Copyright ยฉ Niantic, Inc. 2021. Patent Pending. All rights reserved. Please see the license file for terms.

More Repositories

1

monodepth2

[ICCV 2019] Monocular depth estimation from a single image
Jupyter Notebook
4,013
star
2

simplerecon

[ECCV 2022] SimpleRecon: 3D Reconstruction Without 3D Convolutions
Python
1,252
star
3

manydepth

[CVPR 2021] Self-supervised depth estimation from short sequences
Python
600
star
4

stereo-from-mono

[ECCV 2020] Learning stereo from single images using monocular depth estimation networks
Python
379
star
5

mickey

[CVPR 2024 - Oral] Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
Python
335
star
6

ace

[CVPR 2023 - Highlight] Accelerated Coordinate Encoding (ACE): Learning to Relocalize in Minutes using RGB and Poses
Python
328
star
7

diffusionerf

[CVPR 2023] DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models
Python
281
star
8

footprints

[CVPR 2020] Estimation of the visible and hidden traversable space from a single color image
Python
220
star
9

map-free-reloc

[ECCV 2022] Map-free Visual Relocalization: Metric Pose Relative to a Single Image
Python
197
star
10

depth-hints

[ICCV 2019] Depth Hints are complementary depth suggestions which improve monocular depth estimation algorithms trained from stereo pairs
Jupyter Notebook
183
star
11

acezero

ACE0 is a learning-based structure-from-motion approach that estimates camera parameters of sets of images by learning a multi-view consistent, implicit scene representation.
171
star
12

nerf-object-removal

[CVPR 2023] Removing Objects From Neural Radiance Fields
Python
95
star
13

marepo

[CVPR 2024 Highlight] Map-Relative Pose Regression for Visual Re-Localization
Python
79
star
14

scoring-without-correspondences

[CVPR 2023] Two-view Geometry Scoring Without Correspondences
Python
78
star
15

implicit-depth

[CVPR 2023] Virtual Occlusions Through Implicit Depth
Python
72
star
16

rectified-features

[ECCV 2020] Single image depth prediction allows us to rectify planar surfaces in images and extract view-invariant local features for better feature matching
63
star
17

image-box-overlap

[ECCV 2020] Training neural networks to predict visual overlap of images, through interpretable non-metric box embeddings
Jupyter Notebook
53
star
18

panoptic-forecasting

[CVPR 2021] Forecasting the panoptic segmentation of future video frames
Python
47
star
19

relpose-gnn

[3DV21] Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision, M. Tรผrkoวงlu et al.
Python
37
star
20

modron

Modron - Cloud security compliance
JavaScript
33
star
21

time-repeatability

[ICRA 2020] Learning to Predict Repeatability of Interest Points
6
star
22

nianticlabs.github.io

HTML
4
star
23

metagame-balance

[AAMAS 2023] Bilevel Entropy based Mechanism Design for Balancing Meta in Video Games
Python
4
star