• Stars
    star
    259
  • Rank 156,798 (Top 4 %)
  • Language
    Python
  • License
    Other
  • Created over 1 year ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code Release for DiffusionRig (CVPR 2023)

DiffusionRig

DiffusionRig: Learning Personalized Priors for Facial Appearance Editing
Zheng Ding, Xuaner Zhang, Zhihao Xia, Lars Jebe, Zhuowen Tu, Xiuming Zhang
CVPR 2023
arXiv / Project Page / Video / BibTex

teaser

Setup & Preparation

Environment Setup

conda create -n diffusionrig python=3.8
conda activate diffusionrig
conda install pytorch=1.11 cudatoolkit=11.3 torchvision -c pytorch
conda install mpi4py dlib scikit-learn scikit-image tqdm -c conda-forge
pip install lmdb opencv-python kornia yacs blobfile chumpy face_alignment

You need to also install pytorch3d to render the physical buffers:

conda install -c fvcore -c iopath -c conda-forge fvcore iopath
pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1110/download.html

DECA Setup

Before doing data preparation for training, please first download the source files and checkpoints of DECA to set it up (you will need to create an account to download FLAME resources):

  1. deca_model.tar: Visit this page to download the pretrained DECA model.
  2. generic_model.pkl: Visit this page to download FLAME 2020 and extract generic_model.pkl.
  3. FLAME_texture.npz: Visit this same page to download the FLAME texture space and extract FLAME_texture.npz.
  4. Download the other files listed below from DECA's Data Page and put them also in the data/ folder:
data/
  deca_model.tar
  generic_model.pkl
  FLAME_texture.npz
  fixed_displacement_256.npy
  head_template.obj
  landmark_embedding.npy
  mean_texture.jpg
  texture_data_256.npy
  uv_face_eye_mask.png
  uv_face_mask.png

Data Preparation

We use FFHQ to train the first stage and a personal photo album to train the second stage. Before training, you need to extract, with DECA, the physical buffers for those images.

For FFHQ, you need to align the images first with:

python scripts/create_data.py --data_dir PATH_TO_FFHQ_ALIGNED_IMAGES --output_dir ffhq256_deca.lmdb --image_size 256 --use_meanshape False

For the personal photo album (we use around 20 per identity in our experiments), put all images into a folder and then align them by running:

python scripts/align.py -i PATH_TO_PERSONAL_PHOTO_ALBUM -o personal_images_aligned -s 256

Then, create a dataset by running:

python scripts/create_data.py --data_dir personal_images_aligned --output_dir personal_deca.lmdb --image_size 256 --use_meanshape True

Training

Stage 1: Learning Generic Face Priors

Our 256x256 model uses eight GPUs for Stage 1 training with a batch size of 32 per GPU:

mpiexec -n 8 python scripts/train.py --latent_dim 64 --encoder_type resnet18 \
    --log_dir log/stage1 --data_dir ffhq256_deca.lmdb --lr 1e-4 \
    --p2_weight True --image_size 256 --batch_size 32 --max_steps 50000 \
    --num_workers 8 --save_interval 5000 --stage 1

To keep the model training indefinitely, set --max_steps 0. If you want to resume a training process, simply add --resume_checkpoint PATH_TO_THE_MODEL.

βœ… We also provide the Stage 1 model trained by us here so that you can fast-forward to training your personalized model.

Stage 2: Learning Personalized Priors

Finetune the model on your tiny personal album:

mpiexec -n 1 python scripts/train.py --latent_dim 64 --encoder_type resnet18 \
    --log_dir log/stage2 --resume_checkpoint log/stage1/[MODEL_NAME].pt \
    --data_dir peronsal_deca.lmdb --lr 1e-5 \
    --p2_weight True --image_size 256 --batch_size 4 --max_steps 5000 \
    --num_workers 8 --save_interval 5000 --stage 2

It takes around 30 minutes on a single Nvidia V100 GPU.

Inference

We provide a script to edit face appearance by modifying the physical buffers. Run:

python scripts/inference.py --source SOURCE_IMAGE_FILE --target TARGET_IMAGE_FILE --output_dir OUTPUT_DIR --modes light --model_path PATH_TO_MODEL --meanshape PATH_TO_MEANSHAPE --timestep_respacing ddim20 

to use the physical parameters (e.g., lighting, expression, or head pose) of the target image to edit the source image.

Issues or Questions?

If the issue is code-related, please open an issue here.

For questions, please also consider opening an issue as it may benefit future reader. Otherwise, email Zheng Ding at [email protected].

Acknowledgements

This codebase was built upon and drew inspirations from Guided-Diffusion, DECA and Diff-AE. We thank the authors for making those repositories public.

More Repositories

1

custom-diffusion

Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)
Python
1,835
star
2

theseus

A pretty darn cool JavaScript debugger for Brackets
JavaScript
1,337
star
3

MakeItTalk

Jupyter Notebook
481
star
4

DeepAFx-ST

DeepAFx-ST - Style transfer of audio effects with differentiable signal processing. Please see https://csteinmetz1.github.io/DeepAFx-ST/
Python
352
star
5

spindle

Next-generation web analytics processing with Scala, Spark, and Parquet.
JavaScript
332
star
6

MetaAF

Control adaptive filters with neural networks.
Python
221
star
7

DeepAFx

Third-party audio effects plugins as differentiable layers within deep neural networks.
Jupyter Notebook
185
star
8

ActionScript4

ActionScript 4 specification archive
TeX
181
star
9

sam_inversion

[CVPR 2022] GAN inversion and editing with spatially-adaptive multiple latent layers
Python
169
star
10

affordance-insertion

Python
135
star
11

convmelspec

Convmelspec: Convertible Melspectrograms via 1D Convolutions
Python
128
star
12

MagicFixup

Python
125
star
13

VideoDoodles

Python
119
star
14

fondue

JavaScript instrumentation library for collecting traces
JavaScript
110
star
15

libkafka

A C++ client library for Apache Kafka v0.8+. Also includes C API.
C++
90
star
16

domain-expansion

Domain Expansion of Image Generators - CVPR23
Python
86
star
17

deft_corpus

The Definition Extraction From Text corpus and relevant formatting scripts
Python
79
star
18

node-theseus

JavaScript
76
star
19

GCview

GC / memory management visualization and monitoring framework.
JavaScript
73
star
20

vaw_dataset

This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in the Wild" and the ECCV 2022 paper titled "Improving Closed and Open-Vocabulary Attribute Prediction using Transformers"
Python
61
star
21

svgObjectModelGenerator

SVG OM Generator & Writer
JavaScript
49
star
22

spark-parquet-thrift-example

Example Spark project using Parquet as a columnar store with Thrift objects.
Scala
48
star
23

spark-cluster-deployment

Automates Spark standalone cluster tasks with Puppet and Fabric.
Python
43
star
24

EntitySeg-Dataset

Adobe-EntitySeg dataset
38
star
25

spark-gpu

GPU Acceleration for Apache Spark
Python
34
star
26

layered-depth-refinement

Python
32
star
27

auto-wire-removal

28
star
28

sunstage

Python
28
star
29

deep-acoustic-analysis

Python
26
star
30

mesh

General-purpose programming language featuring functional idioms, strong static inferred types, and a concurrency model built on managed mutability and STM.
26
star
31

AutoToon

Python
25
star
32

VideoSham-dataset

22
star
33

CHART-Synthetic

Synthetic Dataset used in the ICDAR2019 Competition on HArvesting Raw Tables from Infographics (CHART-Infographics)
Python
19
star
34

DiffusionHandles

Diffusion Handles is a training-free method that enables 3D-aware image edits using a pre-trained Diffusion Model.
Python
15
star
35

Cross-lingual-Test-Dataset-XTD10

13
star
36

beacon-aug

Cross-library augmentation toolbox supporting 300 operators over 8 libraries + AI transforms
Jupyter Notebook
12
star
37

audio-retargeting

C
11
star
38

prometheus-opentsdb-exporter

A Prometheus exporter component for OpenTSDB
Scala
10
star
39

cross-preferences

Java Preferences SPI implementations backed by distributed configuration stores (web API included)
Java
8
star
40

aesop

AESOP: Abstract Encoding of Stories, Objects and Pictures
Python
7
star
41

meetingqa

Python
7
star
42

UniHuman

Python
7
star
43

mississippi

Mississippi is a Python package that runs batch jobs in the Amazon Web Services (AWS) environment.
6
star
44

http_streaming_client

Ruby HTTP client with support for HTTP 1.1 streaming, GZIP compressed streams, and chunked transfer encoding. Includes extensible OAuth support for the Adobe Analytics Firehose and Twitter Streaming APIs.
Ruby
6
star
45

DocEdit-Dataset

Release of the DocEdit Dataset associated with the AAAI 2023 paper "DocEdit: Language-guided Document Editing"
5
star
46

longmoment-detr

Python
5
star
47

LexDeMod

3
star
48

hw_with_style

Python
2
star
49

AutoForecast_ResourceUsageData

2
star
50

ASWValData

Jupyter Notebook
1
star