Inserting Anybody in Diffusion Models via Celeb Basis (NeurIPS 23)
Xintao Wang2, Ying Shan2, Huicheng Zheng1,* (* Corresponding Authors)
TL;DR: Intergrating a unique individual into the pre-trained diffusion model with:
✅ just one facial photograph
✅ only 1024 learnable parameters
✅ in 3 minutes tunning
✅ Textural-Inversion compatibility ✅ Genearte and interact with other (new person) concepts
Updates
- 2023/10/11: Our paper is accepted by NIPS'23!
- 2023/06/23: Code released!
How It Work
First, we collect about 1,500 celebrity names as the initial collection. Then, we manually filter the initial one to m = 691 names, based on the synthesis quality of text-to-image diffusion model(stable-diffusion} with corresponding name prompt. Later, each filtered name is tokenized and encoded into a celeb embedding group. Finally, we conduct Principle Component Analysis to build a compact orthogonal basis.
We then personalize the model using input photo. During training~(left), we optimize the coefficients of the celeb basis with the help of a fixed face encoder. During inference~(right), we combine the learned personalized weights and shared celeb basis to generate images with the input identity.
More details can be found in our project page.
Setup
Our code mainly bases on Textual Inversion. We add some environment requirements for Face Alignment & Recognition to the original environment of Textual Inversion. To set up our environment, please run:
conda env create -f environment.yaml
conda activate sd
The pre-trained weights used in this repo include Stable Diffusion v1-4 and
CosFace R100 trained on Glint360K.
You may copy these pre-trained weights to ./weights
, and the directory tree will be like:
CelebBasis/
|-- weights/
|--glint360k_cosface_r100_fp16_0.1/
|-- backbone.pth (249MB)
|--sd-v1-4-full-ema.ckpt (7.17GB)
We use PIPNet to align and crop the face.
The PIPNet pre-trained weights can be downloaded from this link (provided by @justindujardin)
or our Baidu Yun Drive with extracting code: ygss
.
Please copy epoch59.pth
and FaceBoxesV2.pth
to CelebBasis/evaluation/face_align/PIPNet/weights/
.
Usage
0. Face Alignment
To make the Face Recognition model work as expected, given an image of a person, we first align and crop the face following FFHQ-Dataset.
Assuming your image folder is /Your/Path/To/Images/ori/
and the output folder is /Your/Path/To/Image/ffhq/
,
you may run the following command to align & crop images.
bash ./00_align_face.sh /Your/Path/To/Images/ori /Your/Path/To/Images/ffhq
Then, a pickle file named ffhq.pickle
using absolute path will be generated under /Your/Path/To/Images/
,
which is used for training dataset setting later.
For example, we provide the original and cropped StyleGAN generated faces
in Baiduyun Drive (code:ygss), where:
stylegan3-r-ffhq-1024x1024
is the original images (/Your/Path/To/Images/ori
)stylegan3-r-ffhq-1024x1024_ffhq
is the cropped images (/Your/Path/To/Image/ffhq/
)stylegan3-r-ffhq-1024x1024_ffhq.pickle
is the pickle list file (/Your/Path/To/Images/ffhq.pickle
)
We also provide some cropped faces in ./infer_images/dataset_stylegan3_10id/ffhq
as the example and reference.
1. Personalization
The training config file is ./configs/stable-diffusion/aigc_id.yaml
.
The most important settings are listed as follows.
Important Data Settings
data:
params:
batch_size: 2 # We use batch_size 2
train:
target: ldm.data.face_id.FaceIdDatasetOneShot # or ldm.data.face_id.FaceIdDatasetStyleGAN3
params:
pickle_path: /Your/Path/To/Images/ffhq.pickle # pickle file generated by Face Alignment, consistent with 'target'
num_ids: 2 # how many IDs used for jointly training
specific_ids: [1,2] # you may specify the index of ID for training, e.g. [0,1,2,3,4,5,6,7,8,9], 0 means the first
validation:
target: ldm.data.face_id.FaceIdDatasetOneShot
params:
pickle_path: /Your/Path/To/Images/ffhq.pickle # consistent with train.params.pickle_path
Important Model Settings
model:
params:
personalization_config:
target: ldm.modules.embedding_manager.EmbeddingManagerId
params:
max_ids: 10 # max joint learning #ids, should >= data.train.num_ids
num_embeds_per_token: 2 # consistent with [cond_stage_config]
meta_mlp_depth: 1 # single layer is ok
meta_inner_dim: 512 # consistent with [n_components]
test_mode: 'coefficient' # coefficient/embedding/image/all
momentum: 0.99 # momentum update the saved dictionary
save_fp16: False # save FP16, default is FP32
cond_stage_config:
target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
params:
use_celeb: True # use celeb basis
use_svd: True # use SVD version of PCA
rm_repeats: True # removing repeated words can be better
celeb_txt: "./infer_images/wiki_names_v2.txt" # celebs, wiki_names_v1, wiki_names_v2.txt
n_components: 512 # consistent with [meta_inner_dim]
use_flatten: False # flattening means dropping the word position information
num_embeds_per_token: 2 # consistent with [personalization_config]
Important Training Settings
lightning:
modelcheckpoint:
params:
every_n_train_steps: 200 # 100x num of IDs
callbacks:
image_logger:
params:
batch_frequency: 600 # 300x num of IDs
trainer:
max_steps: 800 # 400x num of IDs
Training
bash ./01_start_train.sh ./weights/sd-v1-4-full-ema.ckpt
Consequently, a project folder named traininYYYY-MM-DDTHH-MM-SS_celebbasis
is generated under ./logs
.
2. Generation
Edit the prompt file ./infer_images/example_prompt.txt
, where sks
denotes the first identity
and ks
denotes the second identity.
Optionally, in ./02_start_test.sh
, you may modify the following var as you need:
step_list=(799) # the step of trained '.pt' files, e.g. (99 199 299 399)
eval_id1_list=(0) # the ID index of the 1st person, e.g. (0 1 2 3 4)
eval_id2_list=(1) # the ID index of the 2nd person, e.g. (0 1 2 3 4)
Testing
bash ./02_start_test.sh "./weights/sd-v1-4-full-ema.ckpt" "./infer_images/example_prompt.txt" "traininYYYY-MM-DDTHH-MM-SS_celebbasis"
The generated images are under ./outputs/traininYYYY-MM-DDTHH-MM-SS_celebbasis
.
3. (Optional) Extracting ID Coefficients
Optionally, you can extract the coefficients for each identity by running:
bash ./03_extract.sh "./weights/sd-v1-4-full-ema.ckpt" "traininYYYY-MM-DDTHH-MM-SS_celebbasis"
The extracted coefficients or embeddings are under ./weights/ti_id_embeddings/
.
TODO
- release code
- release celeb basis names
- simplify the pipeline
- add diffusers supports
- add SDXL supports
- release google colab project
- release WebUI extension
- release automatic name filter
- finetuning with multiple persons
- finetuning with LORA
BibTex
@article{yuan2023celebbasis,
title={Inserting Anybody in Diffusion Models via Celeb Basis},
author={Yuan, Ge and Cun, Xiaodong and Zhang, Yong and Li, Maomao and Qi, Chenyang and Wang, Xintao and Shan, Ying and Zheng, Huicheng},
journal={arXiv preprint arXiv:2306.00926},
year={2023}
}