One-shot Implicit Animatable Avatars with Model-based Priors [ICCV2023]
teaser.mp4
ELICIT creates free-viewpoint motion videos from a single image by constructing an animatable avatar NeRF representation in one-shot learning.
Official repository of "One-shot Implicit Animatable Avatars with Model-based Priors".
What Can Your Learn from ELICIT?
- The data-efficient pipeline of creating a 3D animatable avatar from a single image.
- Use CLIP-based semantic loss to infer the entire 3D appearance of the human body with the help of a rough SMPL shape.
- A segmentation-based sampling strategy to create more realistic visual details and geometries for 3D avatars.
Installation
Please follow the Installation Instruction to setup all the required packages.
Data
Results of the experiments
We provide result videos in our webpage for the qualitative and quantitative evaluations in our paper. We also provided checkpoints for those experiments in Google Drive.
Training data for re-implementation
For the datasets we use for quantitative evaluations (ZJU-MoCAP, Human 3.6M), please prepare the original datasets into the same format as ZJU-MoCAP. Then use our scripts in tools
to preprocess the dataset and render SMPL meshes for training.
For customized single-image data, we provides examples from DeepFashion datasets in dataset/fashion
.
See more details in Data Instruction.
Getting Started
Training
python train.py --cfg configs/elicit/zju_mocap/377/smpl_init_texture.yaml # Run SMPL Meshes initialization.
python train.py --cfg configs/elicit/zju_mocap/377/finetune.yaml # Run training on the input subject.
We also provide checkpoints for all the subjects in Google Drive, please unzip the file in the following structure:
${ELICIT_ROOT}
βββ experiments
βββ elicit
βββ zju_mocap
βββ h36m
βββ fashion
Please refer to scripts
for training all the quantative experiments of novel pose synthesis and novel view synthesis on ZJU MoCap and Human 3.6M.
Evaluation / Rendering
We also provide results of all our quantitative results of ELICIT and other baselines in Google Drive. Please use the bounding masks in this file to calculate correct PSNR, SSIM and LPIPS scores, which are generated by Neural Human Performer and Animatable-NeRF.
Evaluate novel pose synthesis.
python run.py --type movement --cfg configs/elicit/zju_mocap/377/finetune.yaml
Evaluate novel view synthesis.
python run.py --type freeview --cfg configs/elicit/zju_mocap/377/finetune.yaml freeview.use_gt_camera True
Freeview rendering on arbitrary frames.
python run.py --type freeview --cfg configs/elicit/zju_mocap/377/finetune.yaml freeview.frame_idx $FRAME_INDEX_TO_RENDER
The rendered frames and video will be saved at experiments/zju_mocap/377/latest
.
Citation
@inproceedings{huang2022elicit,
title={One-shot Implicit Animatable Avatars with Model-based Priors},
author={Huang, Yangyi and Yi, Hongwei and Liu, Weiyang and Wang, Haofan and Wu, Boxi and Wang, Wenxiao and Lin, Binbin and Zhang, Debing and Cai, Deng},
booktitle={IEEE Conference on Computer Vision (ICCV)},
year={2023}
}
Acknowledgments
Our implementation is mainly based on HumanNeRF, and took reference from Animatable NeRF and AvatarCLIP. We thanks the authors for their open source contributions. In addition, we thank the authors of Animatble NeRF for their help in the data preprocessing of Human 3.6M.