3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation (Official Pytorch Implementation)

Project page | Paper

Junshu Tang, Bo Zhang, Binxin Yang, Ting Zhang, Dong Chen, Lizhuang Ma, and Fang Wen.

Abstract

In contrast to the traditional avatar creation pipeline which is a costly process, contemporary generative approaches directly learn the data distribution from photographs. While plenty of works extend unconditional generative models and achieve some levels of controllability, it is still challenging to ensure multi-view consistency, especially in large poses. In this work, we propose a network that generates 3D-aware portraits while being controllable according to semantic parameters regarding pose, identity, expression and illumination. Our network uses neural scene representation to model 3D-aware portraits, whose generation is guided by a parametric face model that supports explicit control. While the latent disentanglement can be further enhanced by contrasting images with partially different attributes, there still exists noticeable inconsistency in non-face areas, e.g, hair and background, when animating expressions. We solve this by proposing a volume blending strategy in which we form a composite output by blending dynamic and static areas, with two parts segmented from the jointly learned semantic field. Our method outperforms prior arts in extensive experiments, producing realistic portraits with vivid expression in natural lighting when viewed from free viewpoints. It also demonstrates generalization ability to real images as well as out-of-domain data, showing great promise in real applications.

Installation

Install dependencies:

pip install -r requirements.txt
pip install -U git+https://github.com/fadel/pytorch_ema

Training requirements

Nvdiffrast. We use Nvdiffrast which is a pytorch library that provides high-performance primitive operations for rasterization-based differentiable rendering.
```
git clone https://github.com/NVlabs/nvdiffrast.git
cd nvdiffrast/
python setup.py install
```
Basel Face Model 2009 (BFM09). Get access to BFM09 using this link. After getting the access, download 01_MorphableModel.mat. In addition, we use an Expression Basis provided by Guo et al.. Download the Expression Basis (Exp_Pca.bin) using this link. Put them in checkpoints/face_ckpt/BFM/
Face Reconstruction Model. We use the network to extract identity, expression, lighting, and pose coefficients. Download the pretrained model epoch_20.pth and put it in checkpoints/face_ckpt/face_ckp/recon_model
Face Recognition Model. We use the ArcFace for extracting the deep face feature. Download the pretrained model ms1mv3_arcface_r50_fp16/backbone.pth and put it in checkpoints/face_ckpt/face_ckp/recog_model
Face Landmark Detection. Download shape_predictor_68_face_landmarks.dat from Dlib and put it in checkpoints/face_ckpt/face_ckp/.
Face Parsing Network. Download 79999_iter.pth from face-parsing.PyTorch and put it in checkpoints/face_ckpt/face_ckp/.

Quick Inference Using Pretrained Model

Download the pretrained models from here and save them in checkpoints/model. For pretrained VAE decoder, please download our pretrained models from here and save them in checkpoints/vae_ckp/. We provide a test sequence in here. Please download obama/mat/*.mat and put them in data/. Then run the command.

python test.py --curriculum FFHQ_512 --load_dir checkpoints/model/ --output_dir results --blend_mode both  --seeds 41

Train from Scratch

1) Prepair training data

FFHQ. Download images1024x1024 and resize to 512x512 resolution and put them in data/ffhq/img.

Preprocess. Run the command, modify aligned_image_path and mat_path.

python preprocess.py --curriculum FFHQ_512 --image_dir data/ffhq/img --img_output_dir aligned_image_path --mat_output_dir mat_path

RAVDESS. We select 10 videos and sample 400 images from each video, resulting in 96,000 images in total. We extract expression coefficients for each image. You can download these data from here.

2) VAE-GAN training

python train_vae.py --curriculum VAE_ALL --output_dir results/vae  --render_dir results/render --weight 0.0025 --factor id # id/exp/gamma

You can also download our pretrained VAE decoders from here and save them in checkpoints/vae_ckp/.

3) Imitation learning

python train_control.py --curriculum FFHQ_512 --output_dir train_ffhq_512 --warmup1 5000 --warmup2 20000

4) Disentanglement learning

python train_control.py --curriculum FFHQ_512 --output_dir train_ffhq_512 --load_dir load_dir --set_step 20001 --warmup1 5000 --warmup2 20000 --second

Citation

If you use this code for your research, please cite our paper.

@article{tang2022explicitly,
  title={Explicitly Controllable 3D-Aware Portrait Generation},
  author={Tang, Junshu and Zhang, Bo and Yang, Binxin and Zhang, Ting and Chen, Dong and Ma, Lizhuang and Wen, Fang},
  journal={arXiv preprint arXiv:2209.05434},
  year={2022}
}

Acknowledgments

This code borrows heavily from pi-GAN, StyleGAN2 and Deep3DFaceRecon.

junshutang/3DFaceShop

junshutang

Reviews

Repository Details