Discover zhang-zx/SINE Open Source project

SINE
_{SINgle Image Editing with Text-to-Image Diffusion Models}

This respository contains the code for the CVPR 2023 paper SINE: SINgle Image Editing with Text-to-Image Diffusion Models. For more visualization results, please check our webpage.

SINE: SINgle Image Editing with Text-to-Image Diffusion Models
Zhixing Zhang ¹, Ligong Han ¹, Arnab Ghosh ², Dimitris Metaxas ¹, and Jian Ren ²
¹ Rutgers University ² Snap Inc.
CVPR 2023.

Setup

First, clone the repository and install the dependencies:

git clone [email protected]:zhang-zx/SINE.git

Then, install the dependencies following the instructions.

Alternatively, you can also try to use the following docker image.

docker pull sunggukcha/sine

To fine-tune the model, you need to download the pre-trained model.

Data Preparation

The data we use in the paper can be found from here.

Pre-trained Models

We provide some of the fine-tuned models together with the corresonding inference configuration files in the following:

Image	config	ckpt
castle	patch-based config	ckpt
castle	w/o patch-based config	ckpt
dog	patch-based config	ckpt
dog	w/o patch-based config	ckpt
Girl with a peral earring	patch-based config	ckpt
Monalisa	patch-based config	ckpt

Fine-tuning

Fine-tuning w/o patch-based training scheme

IMG_PATH=path/to/image
CLS_WRD='coarse class word'
NAME='name of the experiment'

python main.py \
    --base configs/stable-diffusion/v1-finetune_picture.yaml \
    -t --actual_resume /path/to/pre-trained/model \
    -n $NAME --gpus 0,  --logdir ./logs \
    --data_root $IMG_PATH \
    --reg_data_root $IMG_PATH --class_word $CLS_WRD

Fine-tuning with patch-based training scheme

IMG_PATH=path/to/image
CLS_WRD='coarse class word'
NAME='name of the experiment'

python main.py \
    --base configs/stable-diffusion/v1-finetune_patch_picture.yaml \
    -t --actual_resume /path/to/pre-trained/model \
    -n $NAME --gpus 0,   --logdir ./logs \
    --data_root $IMG_PATH \
    --reg_data_root $IMG_PATH --class_word $CLS_WRD

Model-based Image Editing

Editing with one model's guidance

LOG_DIR=/path/to/logdir
python scripts/stable_txt2img_guidance.py --ddim_eta 0.0 --n_iter 1 \
    --scale 10 --ddim_steps 100 \
    --sin_config configs/stable-diffusion/v1-inference.yaml \
    --sin_ckpt $LOG_DIR"/checkpoints/last.ckpt" \
    --prompt "prompt for pre-trained model[SEP]prompt for fine-tuned model" \
    --cond_beta 0.4 \
    --range_t_min 500 --range_t_max 1000 --single_guidance \
    --skip_save --H 512 --W 512 --n_samples 2 \
    --outdir $LOG_DIR

Editing with multiple models' guidance

python scripts/stable_txt2img_multi_guidance.py --ddim_eta 0.0 --n_iter 2 \
    --scale 10 --ddim_steps 100 \
    --sin_ckpt path/to/ckpt1 path/to/ckpt2 \
    --sin_config ./configs/stable-diffusion/v1-inference.yaml \
    configs/stable-diffusion/v1-inference.yaml \
    --prompt "prompt for pre-trained model[SEP]prompt for fine-tuned model1[SEP]prompt for fine-tuned model2" \
    --beta 0.4 0.5 \
    --range_t_min 400 400 --range_t_max 1000 1000 --single_guidance \
    --H 512 --W 512 --n_samples 2 \
    --outdir path/to/output_dir

Diffusers library Example

The Diffusers Library support is still under development. Results in our paper are obtained using previous code based on LDM.

Training

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export IMG_PATH="path/to/image"
export OUTPUT_DIR="path/to/output_dir"

accelerate launch diffusers_train.py  \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --train_text_encoder \
  --img_path=$IMG_PATH \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="prompt for fine-tuning" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=NUMBERS_OF_STEPS \
  --checkpointing_steps=FREQUENCY_FOR_CHECKPOINTING \
  --patch_based_training # OPTIONAL: add this flag for patch-based training scheme

Sampling

python diffusers_sample.py \
--pretrained_model_name_or_path "path/to/output_dir" \
--prompt "prompt for fine-tuned model" \
--editing_prompt 'prompt for pre-trained model'

Visualization Results

Some of the editing results are shown below. See more results on our webpage.

Acknowledgments

In this code we refer to the following implementations: Dreambooth-Stable-Diffusion and stable-diffusion. Implementation with the Diffusers Library support is highly based on Dreambooth. Great thanks to them!

Reference

If our work or code helps you, please consider to cite our paper. Thank you!

@article{zhang2022sine,
  title={SINE: SINgle Image Editing with Text-to-Image Diffusion Models},
  author={Zhang, Zhixing and Han, Ligong and Ghosh, Arnab and Metaxas, Dimitris and Ren, Jian},
  journal={arXiv preprint arXiv:2212.04489},
  year={2022}
}

zhang-zx/SINE

zhang-zx

Reviews

Repository Details

SINE
_{SINgle Image Editing with Text-to-Image Diffusion Models}

Setup

Data Preparation

Pre-trained Models

Fine-tuning

Fine-tuning w/o patch-based training scheme

Fine-tuning with patch-based training scheme

Model-based Image Editing

Editing with one model's guidance

Editing with multiple models' guidance

Diffusers library Example

Training

Sampling

Visualization Results

Acknowledgments

Reference

More Repositories

zhang-zx/SINE

zhang-zx

Reviews

Repository Details

SINE SINgle Image Editing with Text-to-Image Diffusion Models

Setup

Data Preparation

Pre-trained Models

Fine-tuning

Fine-tuning w/o patch-based training scheme

Fine-tuning with patch-based training scheme

Model-based Image Editing

Editing with one model's guidance

Editing with multiple models' guidance

Diffusers library Example

Training

Sampling

Visualization Results

Acknowledgments

Reference

More Repositories

SINE
_{SINgle Image Editing with Text-to-Image Diffusion Models}