Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
VideoReTalking
Mingrui Zhu 1 Xuan Wang 2 Jue Wang 2 Nannan Wang 1
SIGGRAPH Asia 2022 Conferenence Track
Results in the Wild (contains audio)
Results_in_the_wild.mp4
Environment
git clone https://github.com/vinthony/video-retalking.git
cd video-retalking
conda create -n video_retalking python=3.8
conda activate video_retalking
conda install ffmpeg
# Please follow the instructions from https://pytorch.org/get-started/previous-versions/
# This installation command only works on CUDA 11.1
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
Quick Inference
Pretrained Models
Please download our pre-trained models and put them in ./checkpoints
.
Inference
python3 inference.py \
--face examples/face/1.mp4 \
--audio examples/audio/1.wav \
--outfile results/1_1.mp4
This script includes data preprocessing steps. You can test any talking face videos without manual alignment. But it is worth noting that DNet cannot handle extreme poses.
You can also control the expression by adding the following parameters:
--exp_img
: Pre-defined expression template. The default is "neutral". You can choose "smile" or an image path.
--up_face
: You can choose "surprise" or "angry" to modify the expression of upper face with GANimation.
Citation
If you find our work useful in your research, please consider citing:
@misc{cheng2022videoretalking,
title={VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild},
author={Kun Cheng and Xiaodong Cun and Yong Zhang and Menghan Xia and Fei Yin and Mingrui Zhu and Xuan Wang and Jue Wang and Nannan Wang},
year={2022},
eprint={2211.14758},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Acknowledgement
Thanks to Wav2Lip, PIRenderer, GFP-GAN, GPEN, ganimation_replicate, STIT for sharing their code.
Related Work
- StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN (ECCV 2022)
- CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior (CVPR 2023)
- SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023)
- DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)
- 3D GAN Inversion with Facial Symmetry Prior (CVPR 2023)
- T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations (CVPR 2023)
Disclaimer
This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.