PatchFusion
for High-Resolution Monocular Metric Depth Estimation
An End-to-End Tile-Based Framework
Zhenyu Li, Shariq Farooq Bhat, Peter Wonka.
KAUST
DEMO
Our official huggingface demo is available here! You can test with your own high-resolution image, even without a local GPU! It only takes 1 minute for depth prediction plus ControlNet generation!
Thanks for the kind support from hysts!
Environment setup
The project depends on :
- pytorch (Main framework)
- timm (Backbone helper for MiDaS)
- ZoeDepth (Main baseline)
- ControlNet (For potential application)
- pillow, matplotlib, scipy, h5py, opencv (utilities)
Install environment using environment.yml
:
Using mamba (fastest):
mamba env create -n patchfusion --file environment.yml
mamba activate patchfusion
Using conda :
conda env create -n patchfusion --file environment.yml
conda activate patchfusion
Pre-Train Model
Download our pre-trained model here, and put this checkpoint at nfs/patchfusion_u4k.pt
as preparation for the following steps.
If you want to play the ControlNet demo, please download the pre-trained ControlNet model here, and put this checkpoint at nfs/control_sd15_depth.pth
.
Gradio Demo
We provide a UI demo built using gradio. To get started, install UI requirements:
pip install -r ui_requirements.txt
Launch the gradio UI for depth estimation or image to 3D:
python ./ui_prediction.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json
Launch the gradio UI for depth-guided image generation with ControlNet:
python ./ui_generative.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json
User Inference
-
Put your images in folder
path/to/your/folder
-
Run codes:
python ./infer_user.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json --rgb_dir path/to/your/folder --show --show_path path/to/show --save --save_path path/to/save --mode r128 --boundary 0 --blur_mask
-
Check visualization results in
path/to/show
and depth results inpath/to/save
, respectively.
Args
- We recommend using
--blur_mask
to reduce patch artifacts, though we didn't use it in our standard evaluation process. --mode
: select from p16, p49, and rn, where n is the number of random added patches.- Please refer to
infer_user.py
for more details.
Citation
If you find our work useful for your research, please consider citing the paper
@article{li2023patchfusion,
title={PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation},
author={Zhenyu Li and Shariq Farooq Bhat and Peter Wonka},
year={2023},
eprint={2312.02284},
archivePrefix={arXiv},
primaryClass={cs.CV}}