Awesome Video Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, nerf, etc.
(Source: Make-A-Video, Tune-A-Video, and Fate/Zero.)
Table of Contents
- Open-source Toolboxes and Foundation Models
- Video Generation
- Video Editing
- Long-form Video Generation and Completion
- Human or Subject Motion
- Video Enhancement and Restoration
- 3D / NeRF
- Video Understanding
- Healthcare and Biology
Open-source Toolboxes and Foundation Models
Video Generation
-
Dual-Stream Diffusion Net for Text-to-Video Generation (Aug., 2023)
-
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory (Aug., 2023)
-
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation (Jul., 2023)
-
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation (Jul., 2023)
-
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning (Jul., 2023)
-
DisCo: Disentangled Control for Referring Human Dance Generation in Real World (Jul., 2023)
-
VideoComposer: Compositional Video Synthesis with Motion Controllability (Jun., 2023)
-
Probabilistic Adaptation of Text-to-Video Models (Jun., 2023)
-
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance (Jun., 2023)
-
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising (May, 2023)
-
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models (May, 2023)
-
ControlVideo: Training-free Controllable Text-to-Video Generation (May, 2023)
-
Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity (May, 2023)
-
Any-to-Any Generation via Composable Diffusion (May, 2023)
-
VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation (May, 2023)
-
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models (May, 2023)
-
Motion-Conditioned Diffusion Model for Controllable Video Synthesis (Apr., 2023)
-
LaMD: Latent Motion Diffusion for Video Generation (Apr., 2023)
-
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models (CVPR 2023)
-
Text2Performer: Text-Driven Human Video Generation (Apr., 2023)
-
Generative Disco: Text-to-Video Generation for Music Visualization (Apr., 2023)
-
Latent-Shift: Latent Diffusion with Temporal Shift (Apr., 2023)
-
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion (Apr., 2023)
-
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos (Apr., 2023)
-
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos (CVPR 2023)
-
Seer: Language Instructed Video Prediction with Latent Diffusion Models (Mar., 2023)
-
Text2video-Zero: Text-to-Image Diffusion Models Are Zero-Shot Video Generators (Mar., 2023)
-
Conditional Image-to-Video Generation with Latent Flow Diffusion Models (CVPR 2023)
-
Decomposed Diffusion Models for High-Quality Video Generation (CVPR 2023)
-
Video Probabilistic Diffusion Models in Projected Latent Space (CVPR 2023)
-
Learning 3D Photography Videos via Self-supervised Diffusion on Single Images (Feb., 2023)
-
Structure and Content-Guided Video Synthesis With Diffusion Models (Feb., 2023)
-
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation (Dec., 2022)
-
Mm-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation (CVPR 2023)
-
Magvit: Masked Generative Video Transformer (Dec., 2022)
-
VIDM: Video Implicit Diffusion Models (AAAI 2023)
-
Latent Video Diffusion Models for High-Fidelity Video Generation With Arbitrary Lengths (Nov., 2022)
-
SinFusion: Training Diffusion Models on a Single Image or Video (Nov., 2022)
-
MagicVideo: Efficient Video Generation With Latent Diffusion Models (Nov., 2022)
-
Imagen Video: High Definition Video Generation With Diffusion Models (Oct., 2022)
-
Make-A-Video: Text-to-Video Generation without Text-Video Data (ICLR 2023)
-
Diffusion Models for Video Prediction and Infilling (TMLR 2022)
-
McVd: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation (NeurIPS 2022)
-
Video Diffusion Models (Apr., 2022)
-
Diffusion Probabilistic Modeling for Video Generation (Mar., 2022)
Video Editing
-
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing (Aug., 2023)
-
TokenFlow: Consistent Diffusion Features for Consistent Video Editing (Jul., 2023)
-
INVE: Interactive Neural Video Editing (Jul., 2023)
-
VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing (Jun., 2023)
-
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation (Jun., 2023)
-
ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing (May, 2023)
-
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts (May, 2023)
-
Soundini: Sound-Guided Diffusion for Natural Video Editing (Apr., 2023)
-
Zero-Shot Video Editing Using Off-the-Shelf Image Diffusion Models (Mar., 2023)
-
Edit-A-Video: Single Video Editing with Object-Aware Consistency (Mar., 2023)
-
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing (Mar., 2023)
-
Pix2video: Video Editing Using Image Diffusion (Mar., 2023)
-
Video-P2P: Video Editing with Cross-attention Control (Mar., 2023)
-
Dreamix: Video Diffusion Models Are General Video Editors (Feb., 2023)
-
Shape-Aware Text-Driven Layered Video Editing (Jan., 2023)
-
Speech Driven Video Editing via an Audio-Conditioned Diffusion Model (Jan., 2023)
-
Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding (CVPR 2023)
Long-form Video Generation and Completion
-
MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation (NeurIPS 2022)
-
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation (Mar., 2023)
-
Flexible Diffusion Modeling of Long Videos (May, 2022)
Human or Subject Motion
-
Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model (CVPR 2023)
-
InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions (Apr., 2023)
-
ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model (Apr., 2023)
-
Human Motion Diffusion as a Generative Prior (Mar., 2023)
-
Can We Use Diffusion Probabilistic Models for 3d Motion Prediction? (Feb., 2023)
-
Single Motion Diffusion (Feb., 2023)
-
HumanMAC: Masked Motion Completion for Human Motion Prediction (Feb., 2023)
-
DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model (Jan., 2023)
-
Modiff: Action-Conditioned 3d Motion Generation With Denoising Diffusion Probabilistic Models (Jan., 2023)
-
Unifying Human Motion Synthesis and Style Transfer With Denoising Diffusion Probabilistic Models (GRAPP 2023)
-
Executing Your Commands via Motion Diffusion in Latent Space (CVPR 2023)
-
Pretrained Diffusion Models for Unified Human Motion Synthesis (Dec., 2022)
-
PhysDiff: Physics-Guided Human Motion Diffusion Model (Dec., 2022)
-
BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction (Dec., 2022)
-
Listen, Denoise, Action! Audio-Driven Motion Synthesis With Diffusion Models (Nov. 2022)
-
Diffusion Motion: Generate Text-Guided 3d Human Motion by Diffusion Model (ICASSP 2023)
-
Human Joint Kinematics Diffusion-Refinement for Stochastic Motion Prediction (Oct., 2022)
-
Human Motion Diffusion Model (ICLR 2023)
-
FLAME: Free-form Language-based Motion Synthesis & Editing (AAAI 2023)
-
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model (Aug., 2022)
-
Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion (CVPR 2022)
Video Enhancement and Restoration
-
LDMVFI: Video Frame Interpolation with Latent Diffusion Models (Mar., 2023)
-
CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming (Nov., 2022)
3D / NeRF
-
Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields (May, 2023)
-
RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture (May, 2023)
-
NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models (CVPR 2023)
-
Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction (Apr., 2023)
-
Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions (Mar., 2023)
-
DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models (Feb., 2023)
-
NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion (Feb., 2023)
-
DiffRF: Rendering-guided 3D Radiance Field Diffusion (CVPR 2023)
Video Understanding
-
Exploring Diffusion Models for Unsupervised Video Anomaly Detection (Apr., 2023)
-
PDPP:Projected Diffusion for Procedure Planning in Instructional Videos (CVPR 2023)
-
DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion (Mar., 2023)
-
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model (Mar., 2023)
-
Refined Semantic Enhancement Towards Frequency Diffusion for Video Captioning (Nov., 2022)
-
A Generalist Framework for Panoptic Segmentation of Images and Videos (Oct., 2022)