isl-org/DPT

Stars
1,866
Rank 24,836 (Top 0.5 %)
Language
Python
License
MIT License
Created over 3 years ago
Updated about 1 year ago

isl-org/DPT

isl-org

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Dense Prediction Transformers

Vision Transformers for Dense Prediction

This repository contains code and models for our paper:

Vision Transformers for Dense Prediction
René Ranftl, Alexey Bochkovskiy, Vladlen Koltun

Changelog

[March 2021] Initial release of inference code and models

Setup

Download the model weights and place them in the weights folder:

Monodepth:

Segmentation:

Set up dependencies:
```
pip install -r requirements.txt
```
The code was tested with Python 3.7, PyTorch 1.8.0, OpenCV 4.5.1, and timm 0.4.5

Usage

Place one or more input images in the folder input.
Run a monocular depth estimation model:
```
python run_monodepth.py
```
Or run a semantic segmentation model:
```
python run_segmentation.py
```
The results are written to the folder output_monodepth and output_semseg, respectively.

Use the flag -t to switch between different models. Possible options are dpt_hybrid (default) and dpt_large.

Additional models:

Monodepth finetuned on KITTI: dpt_hybrid_kitti-cb926ef4.pt Mirror
Monodepth finetuned on NYUv2: dpt_hybrid_nyu-2ce69ec7.pt Mirror

Run with

python run_monodepth -t [dpt_hybrid_kitti|dpt_hybrid_nyu]

Evaluation

Hints on how to evaluate monodepth models can be found here: https://github.com/intel-isl/DPT/blob/main/EVALUATION.md

Citation

Please cite our papers if you use this code or any of the models.

@article{Ranftl2021,
	author    = {Ren\'{e} Ranftl and Alexey Bochkovskiy and Vladlen Koltun},
	title     = {Vision Transformers for Dense Prediction},
	journal   = {ArXiv preprint},
	year      = {2021},
}

@article{Ranftl2020,
	author    = {Ren\'{e} Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun},
	title     = {Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer},
	journal   = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
	year      = {2020},
}

Acknowledgements

Our work builds on and uses code from timm and PyTorch-Encoding. We'd like to thank the authors for making these libraries available.

License

MIT License

Open3D

Open3D: A Modern Library for 3D Data Processing

MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"

OpenBot

OpenBot leverages smartphones as brains for low-cost robots. We have designed a small electric vehicle that costs about $50 and serves as a robot body. Our software stack for Android smartphones supports advanced robotics workloads such as person following and real-time autonomous navigation.

ZoeDepth

Metric depth estimation from a single image

Jupyter Notebook

Open3D-ML

An extension of Open3D to address 3D Machine Learning tasks

PhotorealismEnhancement

Code & Data for Enhancing Photorealism Enhancement

MultiObjectiveOptimization

Source code for Neural Information Processing Systems (NeurIPS) 2018 paper "Multi-Task Learning as Multi-Objective Optimization"

lang-seg

Language-Driven Semantic Segmentation

Jupyter Notebook

FastGlobalRegistration

Fast Global Registration

Open3D-PointNet2-Semantic3D

Semantic3D segmentation with Open3D and PointNet++

FreeViewSynthesis

Code repository for "Free View Synthesis", ECCV 2020.

spear

SPEAR: A Simulator for Photorealistic Embodied AI Research

StableViewSynthesis

DeepLagrangianFluids

Code repository for "Lagrangian Fluid Simulation with Continuous Convolutions", ICLR 2020.

DirectFuturePrediction

Code for the paper "Learning to Act by Predicting the Future", Alexey Dosovitskiy and Vladlen Koltun, ICLR 2017

VI-Depth

Code for Monocular Visual-Inertial Depth Estimation (ICRA 2023)

NPHard

Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search

redwood-3dscan

Intseg

Interactive Image Segmentation with Latent Diversity

TanksAndTemples

Toolbox for the TanksAndTemples benchmark website

dcflow

Code for the paper "Accurate Optical Flow via Direct Cost Volume Processing. Jia Xu, René Ranftl, and Vladlen Koltun. CVPR 2017"

adaptive-surface-reconstruction

Adaptive Surface Reconstruction for 3D Data Processing

open3d-cmake-find-package

Find pre-installed Open3D package in CMake

DFE

vision-for-action

Code to accompany "Does computer vision matter for action?"

LMRS

Source code for ICLR 2020 paper: "Learning to Guide Random Search"

objects-with-lighting

Repository for the Objects With Lighting Dataset

open3d_downloads

Hosting Open3D test data for development use

Open3D-3rdparty

open3d-cmake-external-project

Use Open3D as a CMake external project

0shot-object-insertion

Simulation and robot code for contact-rich household object insertion (ICRA 2023).

Open3D-Viewer

generalized-smoothing

Companion code for the ICML 2022 paper "Generalizing Gaussian Smoothing for Random Search"

Open3D-Python-CI

Testing Open3D Python package from PyPI and Conda

MetaLearningTradeoffs

Source code for the NeurIPS 2020 Paper: Modeling and Optimization Trade-off in Meta-learning.

hello-world-docker-action

mshadow

Forked from https://github.com/dmlc/mshadow