• Stars
    star
    931
  • Rank 49,076 (Top 1.0 %)
  • Language
    Python
  • License
    MIT License
  • Created 12 months ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR 2024] An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation

PatchFusion

An End-to-End Tile-Based Framework
for High-Resolution Monocular Metric Depth Estimation

Website Paper Hugging Face Space Hugging Face Model License: MIT

Zhenyu Li, Shariq Farooq Bhat, Peter Wonka.
KAUST

DEMO

Our official huggingface demo is available here! You can test with your own high-resolution image, even without a local GPU! It only takes 1 minute for depth prediction plus ControlNet generation!

Thanks for the kind support from hysts!

Environment setup

The project depends on :

  • pytorch (Main framework)
  • timm (Backbone helper for MiDaS)
  • ZoeDepth (Main baseline)
  • ControlNet (For potential application)
  • pillow, matplotlib, scipy, h5py, opencv (utilities)

Install environment using environment.yml :

Using mamba (fastest):

mamba env create -n patchfusion --file environment.yml
mamba activate patchfusion

Using conda :

conda env create -n patchfusion --file environment.yml
conda activate patchfusion

Pre-Train Model

Download our pre-trained model here, and put this checkpoint at nfs/patchfusion_u4k.pt as preparation for the following steps.

If you want to play the ControlNet demo, please download the pre-trained ControlNet model here, and put this checkpoint at nfs/control_sd15_depth.pth.

Gradio Demo

We provide a UI demo built using gradio. To get started, install UI requirements:

pip install -r ui_requirements.txt

Launch the gradio UI for depth estimation or image to 3D:

python ./ui_prediction.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json

Launch the gradio UI for depth-guided image generation with ControlNet:

python ./ui_generative.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json

User Inference

  1. Put your images in folder path/to/your/folder

  2. Run codes:

    python ./infer_user.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json --rgb_dir path/to/your/folder --show --show_path path/to/show --save --save_path path/to/save --mode r128 --boundary 0 --blur_mask
  3. Check visualization results in path/to/show and depth results in path/to/save, respectively.

Args

  • We recommend using --blur_mask to reduce patch artifacts, though we didn't use it in our standard evaluation process.
  • --mode: select from p16, p49, and rn, where n is the number of random added patches.
  • Please refer to infer_user.py for more details.

Citation

If you find our work useful for your research, please consider citing the paper

@article{li2023patchfusion,
    title={PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation}, 
    author={Zhenyu Li and Shariq Farooq Bhat and Peter Wonka},
    year={2023},
    eprint={2312.02284},
    archivePrefix={arXiv},
    primaryClass={cs.CV}}