• Stars
    star
    180
  • Rank 211,878 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 2 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This respository contains the code for the CVPR 2023 paper SINE: SINgle Image Editing with Text-to-Image Diffusion Models.

SINE
SINgle Image Editing with Text-to-Image Diffusion Models

Colab

Project | ArXiv

This respository contains the code for the CVPR 2023 paper SINE: SINgle Image Editing with Text-to-Image Diffusion Models. For more visualization results, please check our webpage.

SINE: SINgle Image Editing with Text-to-Image Diffusion Models
Zhixing Zhang 1, Ligong Han 1, Arnab Ghosh 2, Dimitris Metaxas 1, and Jian Ren 2
1 Rutgers University 2 Snap Inc.
CVPR 2023.

Setup

First, clone the repository and install the dependencies:

git clone [email protected]:zhang-zx/SINE.git

Then, install the dependencies following the instructions.

Alternatively, you can also try to use the following docker image.

docker pull sunggukcha/sine

To fine-tune the model, you need to download the pre-trained model.

Data Preparation

The data we use in the paper can be found from here.

Pre-trained Models

We provide some of the fine-tuned models together with the corresonding inference configuration files in the following:

Image config ckpt
castle patch-based config ckpt
castle w/o patch-based config ckpt
dog patch-based config ckpt
dog w/o patch-based config ckpt
Girl with a peral earring patch-based config ckpt
Monalisa patch-based config ckpt

Fine-tuning

Fine-tuning w/o patch-based training scheme

IMG_PATH=path/to/image
CLS_WRD='coarse class word'
NAME='name of the experiment'

python main.py \
    --base configs/stable-diffusion/v1-finetune_picture.yaml \
    -t --actual_resume /path/to/pre-trained/model \
    -n $NAME --gpus 0,  --logdir ./logs \
    --data_root $IMG_PATH \
    --reg_data_root $IMG_PATH --class_word $CLS_WRD 

Fine-tuning with patch-based training scheme

IMG_PATH=path/to/image
CLS_WRD='coarse class word'
NAME='name of the experiment'

python main.py \
    --base configs/stable-diffusion/v1-finetune_patch_picture.yaml \
    -t --actual_resume /path/to/pre-trained/model \
    -n $NAME --gpus 0,   --logdir ./logs \
    --data_root $IMG_PATH \
    --reg_data_root $IMG_PATH --class_word $CLS_WRD  

Model-based Image Editing

Editing with one model's guidance

LOG_DIR=/path/to/logdir
python scripts/stable_txt2img_guidance.py --ddim_eta 0.0 --n_iter 1 \
    --scale 10 --ddim_steps 100 \
    --sin_config configs/stable-diffusion/v1-inference.yaml \
    --sin_ckpt $LOG_DIR"/checkpoints/last.ckpt" \
    --prompt "prompt for pre-trained model[SEP]prompt for fine-tuned model" \
    --cond_beta 0.4 \
    --range_t_min 500 --range_t_max 1000 --single_guidance \
    --skip_save --H 512 --W 512 --n_samples 2 \
    --outdir $LOG_DIR

Editing with multiple models' guidance

python scripts/stable_txt2img_multi_guidance.py --ddim_eta 0.0 --n_iter 2 \
    --scale 10 --ddim_steps 100 \
    --sin_ckpt path/to/ckpt1 path/to/ckpt2 \
    --sin_config ./configs/stable-diffusion/v1-inference.yaml \
    configs/stable-diffusion/v1-inference.yaml \
    --prompt "prompt for pre-trained model[SEP]prompt for fine-tuned model1[SEP]prompt for fine-tuned model2" \
    --beta 0.4 0.5 \
    --range_t_min 400 400 --range_t_max 1000 1000 --single_guidance \
    --H 512 --W 512 --n_samples 2 \
    --outdir path/to/output_dir

Diffusers library Example

The Diffusers Library support is still under development. Results in our paper are obtained using previous code based on LDM.

Training

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export IMG_PATH="path/to/image"
export OUTPUT_DIR="path/to/output_dir"

accelerate launch diffusers_train.py  \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --train_text_encoder \
  --img_path=$IMG_PATH \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="prompt for fine-tuning" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=NUMBERS_OF_STEPS \
  --checkpointing_steps=FREQUENCY_FOR_CHECKPOINTING \
  --patch_based_training # OPTIONAL: add this flag for patch-based training scheme

Sampling

python diffusers_sample.py \
--pretrained_model_name_or_path "path/to/output_dir" \
--prompt "prompt for fine-tuned model" \
--editing_prompt 'prompt for pre-trained model' 

Visualization Results

Some of the editing results are shown below. See more results on our webpage.

image

Acknowledgments

In this code we refer to the following implementations: Dreambooth-Stable-Diffusion and stable-diffusion. Implementation with the Diffusers Library support is highly based on Dreambooth. Great thanks to them!

Reference

If our work or code helps you, please consider to cite our paper. Thank you!

@article{zhang2022sine,
  title={SINE: SINgle Image Editing with Text-to-Image Diffusion Models},
  author={Zhang, Zhixing and Han, Ligong and Ghosh, Arnab and Metaxas, Dimitris and Ren, Jian},
  journal={arXiv preprint arXiv:2212.04489},
  year={2022}
}