Awesome Talking Face

This is a repository for organizing papres, codes and other resources related to talking face/head. Most papers are linked to the pdf address provided by "arXiv" or "OpenAccess". However, some papers require an academic license to browse. For example, IEEE, springer, and elsevier journal, etc.

🔆 This project is still on-going, pull requests are welcomed!!

If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and pull a request. Just letting me know the title of papers can also be a big contribution to me. You can do this by open issue or contact me directly via email.

⭐ If you find this repo useful, please star it!!!

2022.09 Update!

Thanks for PR from everybody! From now on, I'll occasionally include some papers about video-driven talking face generation. Because I found that the community is trying to include the video-driven methods into the talking face generation scope, though it is originally termed as Face Reenactment.

So, if you are looking for video-driven talking face generation, I would suggest you have a star here, and go to search Face Reenactment, you'll find more :)

One more thing, please correct me if you find that there are any paper noted as arXiv paper has been accepted to some conferences or journals.

2021.11 Update!

I updated a batch of papers that appeared in the past few months. In this repo, I was intend to cover the audio-driven talking face generation works. However, I found several text-based research works are also very interesting. So I included them here. Enjoy it!

TO DO LIST

Papers

2D Video - Person independent

2023

Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks [arXiv 2023] Paper
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis [arXiv 2023] [Paper](Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis)
Parametric Implicit Face Representation for Audio-Driven Facial Reenactment [CVPR 2023] Paper
Identity-Preserving Talking Face Generation with Landmark and Appearance Priors [CVPR 2023] Paper Code
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator [CVPR 2023] Paper ProjectPage Code
Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos [arXiv 2023] Paper ProjectPage
Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model [arXiv 2023] Paper
High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning [arXiv 2023] Paper
StyleLipSync: Style-based Personalized Lip-sync Video Generation [arXiv 2023] Paper ProjectPage Code
GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation [arXiv 2023] Paper ProjectPage
High-Fidelity and Freely Controllable Talking Head Video Generation [CVPR 2023] Paper Project Page
One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field [CVPR 2023] Paper ProjectPage
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert [CVPR 2023] Paper Code
Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations [arXiv 2023] Paper
That's What I Said: Fully-Controllable Talking Face Generation [arXiv 2023] Paper ProjectPage
Emotionally Enhanced Talking Face Generation [arXiv 2023] Paper Code ProjectPage
A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation [MLSys Workshop 2023] Paper
TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles [arXiv 2023] Paper
FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions [ICME 2023] Paper
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder [arXiv 2023] Paper ProjectPage
OPT: ONE-SHOT POSE-CONTROLLABLE TALKING HEAD GENERATION [ICASSP 2023] Paper
DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions [ICASSP 2023] Paper Code ProjectPage
GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis [ICLR 2023] Paper Code ProjectPage
OTAvatar : One-shot Talking Face Avatar with Controllable Tri-plane Rendering [arXiv 2023] Paper Code
Emotionally Enhanced Talking Face Generation [arXiv 2023] Paper Code ProjectPage
Style Transfer for 2D Talking Head Animation [arXiv 2023] Paper
READ Avatars: Realistic Emotion-controllable Audio Driven Avatars [arXiv 2023] Paper
On the Audio-visual Synchronization for Lip-to-Speech Synthesis [arXiv 2023] Paper
DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis [arXiv 2023] Paper
Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation [arXiv 2023] Paper ProjectPage
StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles [AAAI 2023] Paper Code
Audio-Visual Face Reenactment [WACV 2023] Paper ProjectPage Code

2022

Memories are One-to-Many Mapping Alleviators in Talking Face Generation [arXiv 2022] Paper ProjectPage
Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers [SIGGRAPH Asia 2022] Paper
Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors [arXiv 2022] Paper ProjectPage
Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis [arXiv 2022] Paper ProjectPage
SPACE: Speech-driven Portrait Animation with Controllable Expression [arXiv 2022] Paper ProjectPage
Compressing Video Calls using Synthetic Talking Heads [BMVC 2022] Paper Project Page
Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement [arXiv 2022] Paper
StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation [arXiv 2022] Paper
Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control [arXiv 2022] Paper
EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model [SIGGRAPH 2022] Paper
Talking Head from Speech Audio using a Pre-trained Image Generator [ACM MM 2022] Paper
Latent Image Animator: Learning to Animate Images via Latent Space Navigation [ICLR 2022] Paper ProjectPage(note this page has auto-play music...) Code
Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis [ECCV 2022] Paper ProjectPage Code
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation [ECCV 2022] Paper ProjectPage Code
Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [ICASSP 2022] Paper ProjectPage Code
StableFace: Analyzing and Improving Motion Stability for Talking Face Generation [arXiv 2022] Paper ProjectPage
Emotion-Controllable Generalized Talking Face Generation [IJCAI 2022] Paper
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN [arXiv 2022] Paper Code ProjectPage
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering [arXiv 2022] Paper
Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions [arXiv 2022] Paper
Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels [TMM 2022] Paper
Depth-Aware Generative Adversarial Network for Talking Head Video Generation [CVPR 2022] Paper ProjectPage Code
Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning [CVPR 2022] Paper Code ProjectPage
Depth-Aware Generative Adversarial Network for Talking Head Video Generation [CVPR 2022] Paper Code ProjectPage
Expressive Talking Head Generation with Granular Audio-Visual Control [CVPR 2022] Paper
Talking Face Generation with Multilingual TTS [CVPR 2022 Demo] Paper DemoPage
SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory [AAAI 2022] Paper

2021

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation [SIGGRAPH Asia 2021] Paper Code
Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis [ACMMM 2021] Paper Code
AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [ICCV 2021] Paper Code
FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning [ICCV 2021] Paper Code
Learned Spatial Representations for Few-shot Talking-Head Synthesis [ICCV 2021] Paper
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation [CVPR 2021] Paper Code ProjectPage
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing [CVPR 2021] Paper
Audio-Driven Emotional Video Portraits [CVPR 2021] Paper Code
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person [arXiv 2021] Paper
Talking Head Generation with Audio and Speech Related Facial Action Units [BMVC 2021] Paper
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion [IJCAI 2021] Paper
Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation [AAAI 2021] Paper
Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [arXiv 2021] Paper Code

2020

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose [arXiv 2020] Paper Code
A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild [ACMMM 2020] Paper Code
Talking Face Generation with Expression-Tailored Generative Adversarial Network [ACMMM 2020] Paper
Speech Driven Talking Face Generation from a Single Image and an Emotion Condition [arXiv 2020] Paper Code
A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors [ICPR 2020] Paper
Everybody's Talkin': Let Me Talk as You Want [arXiv 2020] Paper
HeadGAN: Video-and-Audio-Driven Talking Head Synthesis [arXiv 2020] Paper
Talking-head Generation with Rhythmic Head Motion [ECCV 2020] Paper
Neural Voice Puppetry: Audio-driven Facial Reenactment [ECCV 2020] Paper Project Code
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis [CVPR 2020] Paper
Robust One Shot Audio to Video Generation [CVPRW 2020] Paper
MakeItTalk: Speaker-Aware Talking Head Animation [SIGGRAPH Asia 2020] Paper Code
FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis. [AAAI 2020] Paper
Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose [AAAI 2020] Paper
Photorealistic Lip Sync with Adversarial Temporal Convolutional [arXiv 2020] Paper
SPEECH-DRIVEN FACIAL ANIMATION USING POLYNOMIAL FUSION OF FEATURES [arXiv 2020] Paper
Animating Face using Disentangled Audio Representations [WACV 2020] Paper

Before 2020

Realistic Speech-Driven Facial Animation with GANs. [IJCV 2019] Paper PorjectPage
Few-Shot Adversarial Learning of Realistic Neural Talking Head Models [ICCV 2019] Paper Code
Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss [CVPR 2019] Paper Code
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation [AAAI 2019] Paper Code ProjectPage
Lip Movements Generation at a Glance [ECCV 2018] Paper
X2Face: A network for controlling face generation using images, audio, and pose codes [ECCV 2018] Paper Code ProjectPage
Talking Face Generation by Conditional Recurrent Adversarial Network [IJCAI 2019] Paper Code
Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks [arXiv 2018] Paper
High-Resolution Talking Face Generation via Mutual Information Approximation [arXiv 2018] Paper
Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network [arXiv 2018] Paper
You said that? [BMVC 2017] Paper

2D Video - Person dependent

Synthesizing Obama: Learning Lip Sync from Audio [SIGGRAPH 2017] Paper Project Page
PHOTOREALISTIC ADAPTATION AND INTERPOLATION OF FACIAL EXPRESSIONS USING HMMS AND AAMS FOR AUDIO-VISUAL SPEECH SYNTHESIS [ICIP 2017] Paper
HMM-Based Photo-Realistic Talking Face Synthesis Using Facial Expression Parameter Mapping with Deep Neural Networks [Journal of Computer and Communications2017] Paper
ObamaNet: Photo-realistic lip-sync from text [arXiv 2017] Paper
A deep bidirectional LSTM approach for video-realistic talking head [Multimedia Tools Appl 2015] Paper
Photo-Realistic Expressive Text to Talking Head Synthesis [Interspeech 2013] Paper
PHOTO-REAL TALKING HEAD WITH DEEP BIDIRECTIONAL LSTM [ICASSP 2015] Paper
Expressive Speech-Driven Facial Animation [TOG 2005] Paper

3D Animation

EmoTalk: Speech-driven emotional disentanglement for 3D face animation [arXiv 2023] Paper ProjectPage
FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning [arXiv 2023] Paper Code ProjectPage
Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertices Attention [arXiv 2023] Paper
Learning Audio-Driven Viseme Dynamics for 3D Face Animation [arXiv 2023] Paper ProjectPage
CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior [arXiv 2023] Paper ProjectPage
Expressive Speech-driven Facial Animation with controllable emotions [arXiv 2023] Paper
Imitator: Personalized Speech-driven 3D Facial Animation [arXiv 2022] Paper ProjectPage
PV3D: A 3D Generative Model for Portrait Video Generation [arXiv 2022] Paper ProjectPage
Neural Emotion Director: Speech-preserving semantic control of facial expressions in “in-the-wild” videos [CVPR 2022] Paper Code
FaceFormer: Speech-Driven 3D Facial Animation with Transformers [CVPR 2022] Paper Code ProjectPage
LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization [CVPR 2021] Paper
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement [ICCV 2021] Paper
AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [ICCV 2021] Paper Code
3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head [arXiv 2021] Paper
Modality Dropout for Improved Performance-driven Talking Faces [ICMI 2020] Paper
Audio- and Gaze-driven Facial Animation of Codec Avatars [arXiv 2020] Paper
Capture, Learning, and Synthesis of 3D Speaking Styles [CVPR 2019] Paper
VisemeNet: Audio-Driven Animator-Centric Speech Animation [TOG 2018] Paper
Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks [TAC 2018] Paper
End-to-end Learning for 3D Facial Animation from Speech [ICMI 2018] Paper
Visual Speech Emotion Conversion using Deep Learning for 3D Talking Head [MMAC 2018]
A Deep Learning Approach for Generalized Speech Animation [SIGGRAPH 2017] Paper
Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion [TOG 2017] Paper
Speech-driven 3D Facial Animation with Implicit Emotional Awareness A Deep Learning Approach [CVPR 2017]
Expressive Speech Driven Talking Avatar Synthesis with DBLSTM using Limited Amount of Emotional Bimodal Data [Interspeech 2016] Paper
Real-Time Speech-Driven Face Animation With Expressions Using Neural Networks [TONN 2012] Paper
Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar [SIST 2010] Paper

Datasets

TalkingHead-1KH Link
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV 2020] ProjectPage
VoxCeleb Link
LRW Link
LRS2 Link
GRID Link
CREMA-D Link
MMFace4D Link
DPCD Link Paper

Survey

Deep Learning for Visual Speech Analysis: A Survey [arXiv 2022] Paper
What comprises a good talking-head video generation?: A Survey and Benchmark [arXiv 2020] Paper

JosephPai/Awesome-Talking-Face

JosephPai

Reviews

Repository Details