Awesome Talking Face
This is a repository for organizing papres, codes and other resources related to talking face/head. Most papers are linked to the pdf address provided by "arXiv" or "OpenAccess". However, some papers require an academic license to browse. For example, IEEE, springer, and elsevier journal, etc.
🔆 This project is still on-going, pull requests are welcomed!!
If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and pull a request. Just letting me know the title of papers can also be a big contribution to me. You can do this by open issue or contact me directly via email.
⭐ If you find this repo useful, please star it!!!
2022.09 Update!
Thanks for PR from everybody! From now on, I'll occasionally include some papers about video-driven talking face generation. Because I found that the community is trying to include the video-driven methods into the talking face generation scope, though it is originally termed as Face Reenactment.
So, if you are looking for video-driven talking face generation, I would suggest you have a star here, and go to search Face Reenactment, you'll find more :)
One more thing, please correct me if you find that there are any paper noted as arXiv paper has been accepted to some conferences or journals.
2021.11 Update!
I updated a batch of papers that appeared in the past few months. In this repo, I was intend to cover the audio-driven talking face generation works. However, I found several text-based research works are also very interesting. So I included them here. Enjoy it!
TO DO LIST
- Main paper list
- Add paper link
- Add codes if have
- Add project page if have
- Datasets and survey
Papers
2D Video - Person independent
2023
- Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks [arXiv 2023] Paper
- Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis [arXiv 2023] [Paper](Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis)
- Parametric Implicit Face Representation for Audio-Driven Facial Reenactment [CVPR 2023] Paper
- Identity-Preserving Talking Face Generation with Landmark and Appearance Priors [CVPR 2023] Paper Code
- StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator [CVPR 2023] Paper ProjectPage Code
- Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos [arXiv 2023] Paper ProjectPage
- Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model [arXiv 2023] Paper
- High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning [arXiv 2023] Paper
- StyleLipSync: Style-based Personalized Lip-sync Video Generation [arXiv 2023] Paper ProjectPage Code
- GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation [arXiv 2023] Paper ProjectPage
- High-Fidelity and Freely Controllable Talking Head Video Generation [CVPR 2023] Paper Project Page
- One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field [CVPR 2023] Paper ProjectPage
- Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert [CVPR 2023] Paper Code
- Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations [arXiv 2023] Paper
- That's What I Said: Fully-Controllable Talking Face Generation [arXiv 2023] Paper ProjectPage
- Emotionally Enhanced Talking Face Generation [arXiv 2023] Paper Code ProjectPage
- A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation [MLSys Workshop 2023] Paper
- TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles [arXiv 2023] Paper
- FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions [ICME 2023] Paper
- DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder [arXiv 2023] Paper ProjectPage
- OPT: ONE-SHOT POSE-CONTROLLABLE TALKING HEAD GENERATION [ICASSP 2023] Paper
- DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions [ICASSP 2023] Paper Code ProjectPage
- GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis [ICLR 2023] Paper Code ProjectPage
- OTAvatar : One-shot Talking Face Avatar with Controllable Tri-plane Rendering [arXiv 2023] Paper Code
- Emotionally Enhanced Talking Face Generation [arXiv 2023] Paper Code ProjectPage
- Style Transfer for 2D Talking Head Animation [arXiv 2023] Paper
- READ Avatars: Realistic Emotion-controllable Audio Driven Avatars [arXiv 2023] Paper
- On the Audio-visual Synchronization for Lip-to-Speech Synthesis [arXiv 2023] Paper
- DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis [arXiv 2023] Paper
- Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation [arXiv 2023] Paper ProjectPage
- StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles [AAAI 2023] Paper Code
- Audio-Visual Face Reenactment [WACV 2023] Paper ProjectPage Code
2022
- Memories are One-to-Many Mapping Alleviators in Talking Face Generation [arXiv 2022] Paper ProjectPage
- Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers [SIGGRAPH Asia 2022] Paper
- Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors [arXiv 2022] Paper ProjectPage
- Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis [arXiv 2022] Paper ProjectPage
- SPACE: Speech-driven Portrait Animation with Controllable Expression [arXiv 2022] Paper ProjectPage
- Compressing Video Calls using Synthetic Talking Heads [BMVC 2022] Paper Project Page
- Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement [arXiv 2022] Paper
- StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation [arXiv 2022] Paper
- Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control [arXiv 2022] Paper
- EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model [SIGGRAPH 2022] Paper
- Talking Head from Speech Audio using a Pre-trained Image Generator [ACM MM 2022] Paper
- Latent Image Animator: Learning to Animate Images via Latent Space Navigation [ICLR 2022] Paper ProjectPage(note this page has auto-play music...) Code
- Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis [ECCV 2022] Paper ProjectPage Code
- Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation [ECCV 2022] Paper ProjectPage Code
- Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [ICASSP 2022] Paper ProjectPage Code
- StableFace: Analyzing and Improving Motion Stability for Talking Face Generation [arXiv 2022] Paper ProjectPage
- Emotion-Controllable Generalized Talking Face Generation [IJCAI 2022] Paper
- StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN [arXiv 2022] Paper Code ProjectPage
- DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering [arXiv 2022] Paper
- Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions [arXiv 2022] Paper
- Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels [TMM 2022] Paper
- Depth-Aware Generative Adversarial Network for Talking Head Video Generation [CVPR 2022] Paper ProjectPage Code
- Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning [CVPR 2022] Paper Code ProjectPage
- Depth-Aware Generative Adversarial Network for Talking Head Video Generation [CVPR 2022] Paper Code ProjectPage
- Expressive Talking Head Generation with Granular Audio-Visual Control [CVPR 2022] Paper
- Talking Face Generation with Multilingual TTS [CVPR 2022 Demo] Paper DemoPage
- SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory [AAAI 2022] Paper
2021
- Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation [SIGGRAPH Asia 2021] Paper Code
- Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis [ACMMM 2021] Paper Code
- AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [ICCV 2021] Paper Code
- FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning [ICCV 2021] Paper Code
- Learned Spatial Representations for Few-shot Talking-Head Synthesis [ICCV 2021] Paper
- Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation [CVPR 2021] Paper Code ProjectPage
- One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing [CVPR 2021] Paper
- Audio-Driven Emotional Video Portraits [CVPR 2021] Paper Code
- AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person [arXiv 2021] Paper
- Talking Head Generation with Audio and Speech Related Facial Action Units [BMVC 2021] Paper
- Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion [IJCAI 2021] Paper
- Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation [AAAI 2021] Paper
- Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [arXiv 2021] Paper Code
2020
- Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose [arXiv 2020] Paper Code
- A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild [ACMMM 2020] Paper Code
- Talking Face Generation with Expression-Tailored Generative Adversarial Network [ACMMM 2020] Paper
- Speech Driven Talking Face Generation from a Single Image and an Emotion Condition [arXiv 2020] Paper Code
- A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors [ICPR 2020] Paper
- Everybody's Talkin': Let Me Talk as You Want [arXiv 2020] Paper
- HeadGAN: Video-and-Audio-Driven Talking Head Synthesis [arXiv 2020] Paper
- Talking-head Generation with Rhythmic Head Motion [ECCV 2020] Paper
- Neural Voice Puppetry: Audio-driven Facial Reenactment [ECCV 2020] Paper Project Code
- Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis [CVPR 2020] Paper
- Robust One Shot Audio to Video Generation [CVPRW 2020] Paper
- MakeItTalk: Speaker-Aware Talking Head Animation [SIGGRAPH Asia 2020] Paper Code
- FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis. [AAAI 2020] Paper
- Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose [AAAI 2020] Paper
- Photorealistic Lip Sync with Adversarial Temporal Convolutional [arXiv 2020] Paper
- SPEECH-DRIVEN FACIAL ANIMATION USING POLYNOMIAL FUSION OF FEATURES [arXiv 2020] Paper
- Animating Face using Disentangled Audio Representations [WACV 2020] Paper
Before 2020
- Realistic Speech-Driven Facial Animation with GANs. [IJCV 2019] Paper PorjectPage
- Few-Shot Adversarial Learning of Realistic Neural Talking Head Models [ICCV 2019] Paper Code
- Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss [CVPR 2019] Paper Code
- Talking Face Generation by Adversarially Disentangled Audio-Visual Representation [AAAI 2019] Paper Code ProjectPage
- Lip Movements Generation at a Glance [ECCV 2018] Paper
- X2Face: A network for controlling face generation using images, audio, and pose codes [ECCV 2018] Paper Code ProjectPage
- Talking Face Generation by Conditional Recurrent Adversarial Network [IJCAI 2019] Paper Code
- Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks [arXiv 2018] Paper
- High-Resolution Talking Face Generation via Mutual Information Approximation [arXiv 2018] Paper
- Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network [arXiv 2018] Paper
- You said that? [BMVC 2017] Paper
2D Video - Person dependent
- Synthesizing Obama: Learning Lip Sync from Audio [SIGGRAPH 2017] Paper Project Page
- PHOTOREALISTIC ADAPTATION AND INTERPOLATION OF FACIAL EXPRESSIONS USING HMMS AND AAMS FOR AUDIO-VISUAL SPEECH SYNTHESIS [ICIP 2017] Paper
- HMM-Based Photo-Realistic Talking Face Synthesis Using Facial Expression Parameter Mapping with Deep Neural Networks [Journal of Computer and Communications2017] Paper
- ObamaNet: Photo-realistic lip-sync from text [arXiv 2017] Paper
- A deep bidirectional LSTM approach for video-realistic talking head [Multimedia Tools Appl 2015] Paper
- Photo-Realistic Expressive Text to Talking Head Synthesis [Interspeech 2013] Paper
- PHOTO-REAL TALKING HEAD WITH DEEP BIDIRECTIONAL LSTM [ICASSP 2015] Paper
- Expressive Speech-Driven Facial Animation [TOG 2005] Paper
3D Animation
- EmoTalk: Speech-driven emotional disentanglement for 3D face animation [arXiv 2023] Paper ProjectPage
- FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning [arXiv 2023] Paper Code ProjectPage
- Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertices Attention [arXiv 2023] Paper
- Learning Audio-Driven Viseme Dynamics for 3D Face Animation [arXiv 2023] Paper ProjectPage
- CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior [arXiv 2023] Paper ProjectPage
- Expressive Speech-driven Facial Animation with controllable emotions [arXiv 2023] Paper
- Imitator: Personalized Speech-driven 3D Facial Animation [arXiv 2022] Paper ProjectPage
- PV3D: A 3D Generative Model for Portrait Video Generation [arXiv 2022] Paper ProjectPage
- Neural Emotion Director: Speech-preserving semantic control of facial expressions in “in-the-wild” videos [CVPR 2022] Paper Code
- FaceFormer: Speech-Driven 3D Facial Animation with Transformers [CVPR 2022] Paper Code ProjectPage
- LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization [CVPR 2021] Paper
- MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement [ICCV 2021] Paper
- AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [ICCV 2021] Paper Code
- 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head [arXiv 2021] Paper
- Modality Dropout for Improved Performance-driven Talking Faces [ICMI 2020] Paper
- Audio- and Gaze-driven Facial Animation of Codec Avatars [arXiv 2020] Paper
- Capture, Learning, and Synthesis of 3D Speaking Styles [CVPR 2019] Paper
- VisemeNet: Audio-Driven Animator-Centric Speech Animation [TOG 2018] Paper
- Speech-Driven Expressive Talking Lips with Conditional Sequential Generative Adversarial Networks [TAC 2018] Paper
- End-to-end Learning for 3D Facial Animation from Speech [ICMI 2018] Paper
- Visual Speech Emotion Conversion using Deep Learning for 3D Talking Head [MMAC 2018]
- A Deep Learning Approach for Generalized Speech Animation [SIGGRAPH 2017] Paper
- Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion [TOG 2017] Paper
- Speech-driven 3D Facial Animation with Implicit Emotional Awareness A Deep Learning Approach [CVPR 2017]
- Expressive Speech Driven Talking Avatar Synthesis with DBLSTM using Limited Amount of Emotional Bimodal Data [Interspeech 2016] Paper
- Real-Time Speech-Driven Face Animation With Expressions Using Neural Networks [TONN 2012] Paper
- Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar [SIST 2010] Paper
Datasets
- TalkingHead-1KH Link
- MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV 2020] ProjectPage
- VoxCeleb Link
- LRW Link
- LRS2 Link
- GRID Link
- CREMA-D Link
- MMFace4D Link
- DPCD Link Paper