One-Shot Free-View Neural Talking Head Synthesis
Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing".
Python 3.6
and Pytorch 1.7
are used.
Updates:
2021.11.05
:
Replace Jacobian with the rotation matrix (Assuming J = R) to avoid estimating Jacobian.- Correct the rotation matrix.
2021.11.17
:
- Better Generator, better performance (models and checkpoints have been released).
Driving | Beta Version | FOMM | New Version:
driving-beta-fomm-new.mp4
Train:
python run.py --config config/vox-256.yaml --device_ids 0,1,2,3,4,5,6,7
Demo:
python demo.py --config config/vox-256.yaml --checkpoint path/to/checkpoint --source_image path/to/source --driving_video path/to/driving --relative --adapt_scale --find_best_frame
free-view (e.g. yaw=20, pitch=roll=0):
python demo.py --config config/vox-256.yaml --checkpoint path/to/checkpoint --source_image path/to/source --driving_video path/to/driving --relative --adapt_scale --find_best_frame --free_view --yaw 20 --pitch 0 --roll 0
Note: run crop-video.py --inp driving_video.mp4
first to get the cropping suggestion and crop the raw video.
Pretrained Model:
Model | Train Set | Baidu Netdisk | Media Fire |
---|---|---|---|
Vox-256-Beta | VoxCeleb-v1 | Baidu (PW: c0tc) | MF |
Vox-256-New | VoxCeleb-v1 | - | MF |
Vox-512 | VoxCeleb-v2 | soon | soon |
Note:
For now, the Beta Version is not well tuned.- For free-view synthesis, it is recommended that Yaw, Pitch and Roll are within Β±45Β°, Β±20Β° and Β±20Β° respectively.
- Face Restoration algorithms (GPEN) can be used for post-processing to significantly improve the resolution.
Acknowlegement:
Thanks to NV, AliaksandrSiarohin and DeepHeadPose.