• Stars
    star
    188
  • Rank 205,563 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

VideoReTalking
Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

           Open In Colab


1 Xidian University   2 Tencent AI Lab   3 Tsinghua University

SIGGRAPH Asia 2022 Conferenence Track

We present VideoReTalking, a new system to edit the faces of a real-world talking head video according to input audio, producing a high-quality and lip-syncing output video even with a different emotion. Our system disentangles this objective into three sequential tasks: (1) face video generation with a canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for improving photo-realism. Given a talking-head video, we first modify the expression of each frame according to the same expression template using the expression editing network, resulting in a video with the canonical expression. This video, together with the given audio, is then fed into the lip-sync network to generate a lip-syncing video. Finally, we improve the photo-realism of the synthesized faces through an identity-aware face enhancement network and post-processing. We use learning-based approaches for all three steps and all our modules can be tackled in a sequential pipeline without any user intervention.

pipeline
Pipeline

Results in the Wild (contains audio)

Results_in_the_wild.mp4

Environment

git clone https://github.com/vinthony/video-retalking.git
cd video-retalking
conda create -n video_retalking python=3.8
conda activate video_retalking

conda install ffmpeg

# Please follow the instructions from https://pytorch.org/get-started/previous-versions/
# This installation command only works on CUDA 11.1
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html

pip install -r requirements.txt

Quick Inference

Pretrained Models

Please download our pre-trained models and put them in ./checkpoints.

Inference

python3 inference.py \
  --face examples/face/1.mp4 \
  --audio examples/audio/1.wav \
  --outfile results/1_1.mp4

This script includes data preprocessing steps. You can test any talking face videos without manual alignment. But it is worth noting that DNet cannot handle extreme poses.

You can also control the expression by adding the following parameters:

--exp_img: Pre-defined expression template. The default is "neutral". You can choose "smile" or an image path.

--up_face: You can choose "surprise" or "angry" to modify the expression of upper face with GANimation.

Citation

If you find our work useful in your research, please consider citing:

@misc{cheng2022videoretalking,
        title={VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild}, 
        author={Kun Cheng and Xiaodong Cun and Yong Zhang and Menghan Xia and Fei Yin and Mingrui Zhu and Xuan Wang and Jue Wang and Nannan Wang},
        year={2022},
        eprint={2211.14758},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
  }

Acknowledgement

Thanks to Wav2Lip, PIRenderer, GFP-GAN, GPEN, ganimation_replicate, STIT for sharing their code.

Related Work

Disclaimer

This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.

More Repositories

1

awesome-deep-hdr

A collection of deep learning based methods for HDR image synthesis
388
star
2

ghost-free-shadow-removal

[AAAI 2020] Towards Ghost-free Shadow Removal via Dual Hierarchical Aggregation Network and Shadow Matting GAN
Jupyter Notebook
288
star
3

deep-blind-watermark-removal

[AAAI 2021] Split then Refine: Stacked Attention-guided ResUNets for Blind Single Image Visible Watermark Removal
Python
225
star
4

depth-distillation

[ECCV 2020] Defocus Blur Detection via Depth Distillation
Python
63
star
5

s2am

[TIP 2020] Improving the Harmony of the Composite Image by Spatial-Separated Attention Module
Jupyter Notebook
55
star
6

Dual_TVL1_Optical_Flow

dual tvl1 optical flow matlab version
MATLAB
28
star
7

pso-cnn

Unofficial implementation of paper “Particle Swarm Optimization for Hyper-Parameter Selection in Deep Neural Networks” using Tensorflow/Keras
Python
26
star
8

image-splicing-localization

image splicing locailzation using CNN
Python
22
star
9

node-avatar-generator

a random avatar generator base on node and gm
JavaScript
14
star
10

academic

Yet Another Academic Homepage Template
SCSS
13
star
11

awesome-reflection-removal

9
star
12

vinthony.github.io

all about myself.
HTML
8
star
13

leetcode-python-solution

my leecode python solution
Jupyter Notebook
8
star
14

ImageBoard

Simple Web-based interface for images comparison
JavaScript
7
star
15

SimpleHoop

虎扑第三方android客户端
Java
4
star
16

racpider

a spider framework base on python
Python
3
star
17

jekyll-monokai

jekyll highlight plug with self-use monokai
CSS
2
star
18

rem.py

pure numpy-based nerual network
Python
2
star
19

HupoAPI

虎扑网第三方API
PHP
2
star
20

simple_python_server

a simple school manager system server base on python
Python
2
star
21

sicp

some codes during learning sicp
Scheme
2
star
22

test-weblatex

TeX
1
star