Prompt-Free Diffusion
This repo hosts the official implementation of:
Xingqian Xu, Jiayi Guo, Zhangyang Wang, Gao Huang, Irfan Essa, and Humphrey Shi, Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models, Paper arXiv Link.
News
- [2023.06.20]: SDWebUI plugin is created, repo at this link
- [2023.05.25]: Our demo is running on HuggingFace
🤗 - [2023.05.25]: Repo created
Introduction
Prompt-Free Diffusion is a diffusion model that relys on only visual inputs to generate new images, handled by Semantic Context Encoder (SeeCoder) by substituting the commonly used CLIP-based text encoder. SeeCoder is reusable to most public T2I models as well as adaptive layers like ControlNet, LoRA, T2I-Adapter, etc. Just drop in and play!
Performance
Network
Setup
conda create -n prompt-free-diffusion python=3.10
conda activate prompt-free-diffusion
pip install torch==2.0.0+cu117 torchvision==0.15.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txt
Demo
We provide a WebUI empowered by Gradio. Start the WebUI with the following command:
python app.py
Pretrained models
To support the full functionality of our demo. You need the following models located in these paths:
└── pretrained
├── pfd
| ├── vae
| │ └── sd-v2-0-base-autokl.pth
| ├── diffuser
| │ ├── AbyssOrangeMix-v2.safetensors
| │ ├── AbyssOrangeMix-v3.safetensors
| │ ├── Anything-v4.safetensors
| │ ├── Deliberate-v2-0.safetensors
| │ ├── OpenJouney-v4.safetensors
| │ ├── RealisticVision-v2-0.safetensors
| │ └── SD-v1-5.safetensors
| └── seecoder
| ├── seecoder-v1-0.safetensors
| ├── seecoder-pa-v1-0.safetensors
| └── seecoder-anime-v1-0.safetensors
└── controlnet
├── control_sd15_canny_slimmed.safetensors
├── control_sd15_depth_slimmed.safetensors
├── control_sd15_hed_slimmed.safetensors
├── control_sd15_mlsd_slimmed.safetensors
├── control_sd15_normal_slimmed.safetensors
├── control_sd15_openpose_slimmed.safetensors
├── control_sd15_scribble_slimmed.safetensors
├── control_sd15_seg_slimmed.safetensors
├── control_v11p_sd15_canny_slimmed.safetensors
├── control_v11p_sd15_lineart_slimmed.safetensors
├── control_v11p_sd15_mlsd_slimmed.safetensors
├── control_v11p_sd15_openpose_slimmed.safetensors
├── control_v11p_sd15s2_lineart_anime_slimmed.safetensors
├── control_v11p_sd15_softedge_slimmed.safetensors
└── preprocess
├── hed
│ └── ControlNetHED.pth
├── midas
│ └── dpt_hybrid-midas-501f0c75.pt
├── mlsd
│ └── mlsd_large_512_fp32.pth
├── openpose
│ ├── body_pose_model.pth
│ ├── facenet.pth
│ └── hand_pose_model.pth
└── pidinet
└── table5_pidinet.pth
All models can be downloaded at HuggingFace link.
Tools
We also provide tools to convert pretrained models from sdwebui and diffuser library to this codebase, please modify the following files:
└── tools
  ├── get_controlnet.py
  └── model_conversion.pth
You are expected to do some customized coding to make it work (i.e. changing hardcoded input output file paths)
Performance Anime
Citation
@article{xu2023prompt,
title={Prompt-Free Diffusion: Taking" Text" out of Text-to-Image Diffusion Models},
author={Xu, Xingqian and Guo, Jiayi and Wang, Zhangyang and Huang, Gao and Essa, Irfan and Shi, Humphrey},
journal={arXiv preprint arXiv:2305.16223},
year={2023}
}
Acknowledgement
Part of the codes reorganizes/reimplements code from the following repositories: Versatile Diffusion official Github and ControlNet sdwebui Github, which are also great influenced by LDM official Github and DDPM official Github