ViCo
ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation
β³ To Do
- Release inference code
- Release pretrained models
- Release training code
- Quantitative evaluation code
- Hugging Face demo
βοΈ Set-up
Create a conda environment vico
using
conda env create -f environment.yaml
conda activate vico
β¬ Download
Download the pretrained stable diffusion v1-4 under models/ldm/stable-diffusion-v1
.
We provide the pretrained checkpoints at 300, 350, and 400 steps of 8 objects. You can download the sample images and their corresponding pretrained checkpoints. You can also download the data of any object:
Object | Sample images | Checkpoints |
---|---|---|
barn | image | ckpt |
batman | image | ckpt |
clock | image | ckpt |
dog7 | image | ckpt |
monster toy | image | ckpt |
pink sunglasses | image | ckpt |
teddybear | image | ckpt |
wooden pot | image | ckpt |
Datasets are originally collected and provided by Textual Inversion, DreamBooth, and Custom Diffsuion.
π Inference
Before running the inference command, please set:
REF_IMAGE_PATH
: Path of the reference image. It can be any image in the samples likebatman/1.jpg
.CHECKPOINT_PATH
: Path of the checkpoint weight. Its subfolder should be similar tocheckpoints/*-399.pt
.OUTPUT_PATH
: Path of the generated images. For example, it can be likeoutputs/batman
.
python scripts/vico_txt2img.py \
--ddim_eta 0.0 --n_samples 4 --n_iter 2 --scale 7.5 --ddim_steps 50 \
--ckpt_path models/ldm/stable-diffusion-v1/sd-v1-4.ckpt \
--image_path REF_IMAGE_PATH \
--ft_path CHECKPOINT_PATH \
--load_step 399 \
--prompt "a photo of * on the beach" \
--outdir OUTPUT_PATH
You can specify load_step
(300,350,400) and personalize prompt
(a prefix "a photo of" usually makes better results).
π» Training
Before running the training command, please set:
RUN_NAME
: Your run name. Will be the name of the folder of logs.GPUS_USED
: GPUs you are using, e.g., "0,1,2,3". (4 RTX 3090 GPUs in my case)TRAIN_DATA_ROOT
: Path of your training images.INIT_WORD
: Initialize the word to represent your unique object, e.g., "dog" and "toy".
python main.py \
--base configs/stable-diffusion/v1-finetune.yaml -t \
--actual_resume models/ldm/stable-diffusion-v1/sd-v1-4.ckpt \
-n RUN_NAME \
--gpus GPUS_USED \
--data_root TRAIN_DATA_ROOT \
--init_word INIT_WORD
π Citation
If you use this code in your research, please consider citing our paper:
@inproceedings{Hao2023ViCo,
title={ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation},
author={Shaozhe Hao and Kai Han and Shihao Zhao and Kwan-Yee K. Wong},
year={2023}
}
π Acknowledgements
This code repository is based on the great work of Textual Inversion. Thanks!