• Stars
    star
    212
  • Rank 186,122 (Top 4 %)
  • Language
  • Created almost 2 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Update-to-data resources for conditional content generation, including human motion generation, image or video generation and editing.

awesome-conditional-content-generation Awesome

This repository contains a collection of resources and papers on Conditional Content Generation. Especially for human motion generation, image generation and video generation. This repo is maintained by Haofan Wang.

如果你对可控内容生成(2D/3D)方向感兴趣,希望与我保持更广泛的学术合作或寻求一份实习,并且已经发表过至少一篇顶会论文,欢迎随时发邮件至[email protected],高校、工业界均欢迎。

Contents

Papers

Music-Driven motion generation

Taming Diffusion Models for Music-driven Conducting Motion Generation
NUS, AAAI 2023 Summer Symposium, [Code]

Music-Driven Group Choreography
AIOZ AI, CVPR'23

Discrete Contrastive Diffusion for Cross-Modal and Conditional Generation
Illinois Institute of Technology, ICLR'23, [Code]

Magic: Multi Art Genre Intelligent Choreography Dataset and Network for 3D Dance Generation
Tsinghua University, 7 Dec 2022

Pretrained Diffusion Models for Unified Human Motion Synthesis
DAMO Academy, Alibaba Group, 6 Dec 2022

EDGE: Editable Dance Generation From Music
Stanford University, 19 Nov 2022

You Never Stop Dancing: Non-freezing Dance Generation via Bank-constrained Manifold Projection
MSRA, NeurIPS'22

GroupDancer: Music to Multi-People Dance Synthesis with Style Collaboration
Tsinghua University, ACMMM'22

A Brand New Dance Partner: Music-Conditioned Pluralistic Dancing Controlled by Multiple Dance Genres
Yonsei University, CVPR 2022, [Code]

Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory
NTU, CVPR 2022 (Oral), [Code]

Dance Style Transfer with Cross-modal Transformer
KTH, 22 Aug 2022, [Upcoming Code]

Music-driven Dance Regeneration with Controllable Key Pose Constraints
Tencent, 8 July 2022

AI Choreographer: Music Conditioned 3D Dance Generation with AIST++
USC, ICCV 2021, [Code]

Text-Driven motion generation

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model
NTU, CVPR'23, [Code]

TEMOS: Generating diverse human motions from textual descriptions
ENPC, CVPR'23

GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents
Peking University, CVPR'23

Human Motion Diffusion as a Generative Prior
Anonymous Authors, [Code]

T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations
Tencent AI Lab, 16 Jan 2023, [Code]

Modiff: Action-Conditioned 3D Motion Generation with Denoising Diffusion Probabilistic Models
Beihang University, 10 Jan 2023

Executing your Commands via Motion Diffusion in Latent Space
Tencent, 8 Dec 2022, [Code]

MultiAct: Long-Term 3D Human Motion Generation from Multiple Action Labels
Seoul National University, AAAI 2023 Oral, [Code]

MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis
Max Planck Institute for Informatics, 8 Dec 2022

Executing your Commands via Motion Diffusion in Latent Space
Tencent PCG, 8 Dec 2022, [Upcoming Code]

UDE: A Unified Driving Engine for Human Motion Generation
Xiaobing Inc, 29 Nov 2022, [Upcoming Code]

MotionBERT: Unified Pretraining for Human Motion Analysis
SenseTime Research, 12 Oct 2022, [Code]

Human Motion Diffusion Model
Tel Aviv University, 3 Oct 2022, [Code]

FLAME: Free-form Language-based Motion Synthesis & Editing
Korea University, 1 Sep 2022

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
NTU, 22 Aug 2022, [Code]

TEMOS: Generating diverse human motions from textual descriptions
MPI, ECCV 2022 (Oral), [Code]

GIMO: Gaze-Informed Human Motion Prediction in Context
Stanford University, ECCV 2022, [Code]

MotionCLIP: Exposing Human Motion Generation to CLIP Space
Tel Aviv University, ECCV 2022, [Code]

Generating Diverse and Natural 3D Human Motions from Text
University of Alberta, CVPR 2022, [Code]

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
NTU, SIGGRAPH 2022, [Code]

Text2Gestures: A Transformer-Based Network for Generating Emotive Body Gestures for Virtual Agents
University of Maryland,, VR 2021, [Code]

Audio-Driven motion generation

For more recent paper, you can find from here

Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
NTU, CVPR'23, [Code]

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis
Zhejiang University, ICLR'23, [Code]

DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model
Macau University of Science and Technolog, 24 Jan 2023

DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis
Tsinghua University, 10 Jan 2023

Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation
University of Wrocław, 6 Jan 2023, [Incoming Code]

Generating Holistic 3D Human Motion from Speech
Max Planck Institute for Intelligent Systems, 8 Dev 2022

Audio-Driven Co-Speech Gesture Video Generation
NTU, 5 Dec 2022

Listen, denoise, action! Audio-driven motion synthesis with diffusion models
KTH Royal Institute of Technology, 17 Nov 2022

ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech
York University, 23 Sep 2022, [Code]

BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis
The University of Tokyo, ECCV 2022, [Code]

EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model
Nanjing University, SIGGRAPH 2022, [Code]

Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
The Chinese University of Hong Kong, CVPR 2022, [Code]

SEEG: Semantic Energized Co-speech Gesture Generation
Alibaba DAMO Academy, CVPR 2022, [Code]

FaceFormer: Speech-Driven 3D Facial Animation with Transformers
The University of Hong Kong, CVPR 2022, [Code]

Freeform Body Motion Generation from Speech
JD AI Research, 4 Mar 2022, [Code]

Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders
Tencent AI Lab, ICCV 2021, [Code]

Learning Speech-driven 3D Conversational Gestures from Video
Max Planck Institute for Informatics, IVA 2021, [Code]

Learning Individual Styles of Conversational Gesture
UC Berkeley, CVPR 2019, [Code]

Human motion prediction

For more recent more, you can find from here

HumanMAC: Masked Motion Completion for Human Motion Prediction
Tsinghua University, 7 Feb 2023, [Code]

BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction
University of Barcelona, 25 Nov 2022, [Upcoming Code]

PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting
NAVER LABS, ECCV'2022, [Code]

NeMF: Neural Motion Fields for Kinematic Animation
Yale University, NeurIPS 2022 (Spotlight), [Code]

Multi-Person Extreme Motion Prediction
Inria University, CVPR 2022, [Code]

MotionMixer: MLP-based 3D Human Body Pose Forecasting
Mercedes-Benz, IJCAI 2022 (Oral), [Code]

Multi-Person 3D Motion Prediction with Multi-Range Transformers
UCSD, NeurIPS 2021

Motion Applications

MIME: Human-Aware 3D Scene Generation
MPI

Scene Synthesis from Human Motion
Stanford University, SIGGRAPH Asia 2022, [Code]

TEACH: Temporal Action Compositions for 3D Humans
MPI, 3DV 2022, [Code]

Motion In-betweening via Two-stage Transformers
Zhejiang University, SIGGRAPH Asia 2022

Skeleton2Humanoid: Animating Simulated Characters for Physically-plausible Motion In-betweening
Shanghai Jiaotong University, ACMMM 2022, [Upcoming Code]

Conditional Motion In-betweening
Korea University, 6 Oct 2022, [Code]

SkeletonMAE: Spatial-Temporal Masked Autoencoders for Self-supervised Skeleton Action Recognition
University of North Carolina, 1 Sep 2022

A Unified Framework for Real Time Motion Completion
NetEase Games AI Lab, AAAI 2022

Transformer based Motion In-betweening
National Institute of Technology - Tiruchirappalli, ACCV 2022 Workshop, [Code]

Generative Tweening: Long-term Inbetweening of 3D Human Motions
Adobe Research, 28 May 2020

Text-Image Generation

For more recent paper, you can find from here

Adding Conditional Control to Text-to-Image Diffusion Models
Stanford, Feb 2023

SpaText: Spatio-Textual Representation for Controllable Image Generation
Meta AI (FAIR), 25 Nov 2022

Sketch-Guided Text-to-Image Diffusion Models
Google Research, 24 Nov 2022

Make-A-Story: Visual Memory Conditioned Consistent Story Generation
University of British Columbia, 23 Nov 2022

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
University of Waterloo, 20 Nov 2022, [Upcoming Code]

InstructPix2Pix: Learning to Follow Image Editing Instructions
UC Berkeley, 17 Nov 2022

Null-text Inversion for Editing Real Images using Guided Diffusion Models
Google Research, 17 Nov 2022

HumanDiffusion: a Coarse-to-Fine Alignment Diffusion Framework for Controllable Text-Driven Person Image Generation
University of Chinese Academy of Sciences, 11 Nov 2022

Imagic: Text-Based Real Image Editing with Diffusion Models
Google Research, 17 Oct 2022

Self-Guided Diffusion Models
University of Amsterdam, 12 Oct 2022

On Distillation of Guided Diffusion Models
Stanford University, NeurIPS 2022 Workshop, 6 Oct 2022

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
Google Research, 25 Aug 2022, [Code]

Prompt-to-Prompt Image Editing with Cross Attention Control
Google Research, 2 Aug 2022, [Code]

Improved Vector Quantized Diffusion Models
University of Science and Technology of China, 31 May 2022, [Code]

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Meta AI Research, 24 Mar 2022

Diffusion Autoencoders: Toward a Meaningful and Decodable Representation
Vidyasirimedhi Institute of Science and Technology, CVPR 2022 (Oral), [Code]

Vector Quantized Diffusion Model for Text-to-Image Synthesis
University of Science and Technology of China, CVPR 2022, [Code]

High-Resolution Image Synthesis with Latent Diffusion Models
Runway ML, CVPR 2022, [Code]

Text-Video Generation

Text-To-4D Dynamic Scene Generation
Meta AI, 2023, [Code]

Structure and Content-Guided Video Synthesis with Diffusion Models
Runway, 6 Feb 2023

Latent Video Diffusion Models for High-Fidelity Video Generation with Arbitrary Lengths
The Hong Kong University of Science and Technology, 23 Nov 2022, [Upcoming Code]

MagicVideo: Efficient Video Generation With Latent Diffusion Models
ByteDance Inc, 20 Nov 2022

Text2LIVE: Text-Driven Layered Image and Video Editing
NVIDIA Research, ECCV 2022 (Oral), [Code]

Text-3D Image Generation

Point-E: A System for Generating 3D Point Clouds from Complex Prompts
OpenAI, 16 Dec 2022

DreamFusion: Text-to-3D using 2D Diffusion
Google Research, 29 Sep 2022

More Repositories

1

ControlNet-for-Diffusers

Transfer the ControlNet with any basemodel in diffusers🔥
Python
743
star
2

Lora-for-Diffusers

The most easy-to-understand tutorial for using LoRA (Low-Rank Adaptation) within diffusers framework for AI Generation Researchers🔥
Python
684
star
3

Score-CAM

Official implementation of Score-CAM in PyTorch
Python
379
star
4

inswapper

One-click Face Swapper and Restoration powered by insightface 🔥
Python
327
star
5

Awesome-Computer-Vision

Awesome Resources for Advanced Computer Vision Topics
209
star
6

video-swin-transformer-pytorch

Video Swin Transformer - PyTorch
Python
188
star
7

natural-language-joint-query-search

Search photos on Unsplash based on OpenAI's CLIP model, support search with joint image+text queries and attention visualization.
Jupyter Notebook
184
star
8

T2I-Adapter-for-Diffusers

Transfer the T2I-Adapter with any basemodel in diffusers🔥
125
star
9

CLIFF

This repo equips the official CLIFF [ECCV 2022 Oral] with better detector, better tracker. Support multi-person, motion interpolation, motion smooth and SMPLify fitting.
Python
113
star
10

awesome-mlp-papers

Recent Advances in MLP-based Models (MLP is all you need!)
110
star
11

accurate-head-pose

Pytorch code for Hybrid Coarse-fine Classification for Head Pose Estimation
Python
97
star
12

Train-ControlNet-in-Diffusers

We show you how to train a ControlNet with your own control hint in diffusers framework
52
star
13

mxnet-Head-Pose

An MXNet implementation of Fine-Grained Head Pose
Python
47
star
14

cropimage

A simple toolkit for detecting and cropping main body from pictures. Support face and saliency detection.
Python
34
star
15

awesome-vision-language-modeling

Recent Advances in Vision-Language Pre-training!
25
star
16

visbeat3

Python3 Implementation for 'Visual Rhythm and Beat' SIGGRAPH 2018
Python
16
star
17

DWPose

Inference code for DWCode
Python
15
star
18

Multi-Frame-Rendering-in-Diffusers

7
star
19

stable-diffusion-xl-handbook

6
star
20

mmdet_benchmark

mmdetection、mmdeploy 中的 Mask R-CNN 深度优化
Python
5
star
21

Anime-Facial-Landmarks

Python
4
star
22

lora-block-weight-diffusers

When applying Lora, strength can be set block by block. Support for diffusers framework.
Python
3
star
23

mxnet-Hand-Detection

A simple headmap regression for hand detection
Python
2
star
24

CS188-Project

CS188 Project Fall 2017 Berkeley
Python
1
star
25

pytorch-distributed-training

A simple cookbook for DDP training in Pytorch
Python
1
star
26

KGRN-SR

Official Implementation for Knowledge Graph Routed Network for Situation Recognition [TPAMI'2023]
Python
1
star
27

SD3-diffusers

Stable Diffusion 3 in diffusers
1
star