• Stars
    star
    176
  • Rank 216,987 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created over 1 year ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[NeurIPS 2023] Official implementation of the paper "DreamWaltz: Make a Scene with Complex 3D Animatable Avatars".

πŸ’ƒDreamWaltz: Make a Scene with Complex 3D Animatable Avatars

Project Page | Arxiv | Video

This repository contains the official implementation of NeurIPS 2023 paper:

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars
Yukun Huang1,2, Jianan Wang1, Ailing Zeng1, He Cao1, Xianbiao Qi1, Yukai Shi1, Zheng-Jun Zha2, Lei Zhang1
1International Digital Economy Academy Β  2University of Science and Technology of China

Introduction

DreamWaltz is a learning framework for text-driven 3D animatable avatar creation using pretrained 2D diffusion model ControlNet and human parametric model SMPL. The core idea is to optimize a deformable NeRF representation from skeleton-conditioned diffusion supervisions, which ensures 3D consistency and generalization to arbitrary poses.


Figure 1. DreamWaltz can generate animatable avatars (a) and construct complex scenes (b)(c)(d).

Installation

This code is heavily based on the excellent latent-nerf and stable-dreamfusion projects. Please install dependencies:

pip install -r requirements.txt

The cuda extension for instant-ngp is built at runtime as in stable-dreamfusion.

Prepare SMPL Weights

We use smpl and vposer models for avatar creation and animation learning, please follow the instructions in smplx and human_body_prior to download the model weights, and build a directory with the following structure:

smpl_models
β”œβ”€β”€ smpl
β”‚Β Β  β”œβ”€β”€ SMPL_FEMALE.pkl
β”‚Β Β  └── SMPL_MALE.pkl
β”‚Β Β  └── SMPL_NEUTRAL.pkl
└── vposer
 Β Β  └── v2.0
        β”œβ”€β”€ snapshots
        β”œβ”€β”€ V02_05.yaml
        └── V02_05.log

Then, update the model paths SMPL_ROOT and VPOSER_ROOT in configs/paths.py.

Prepare Motion Sequences

You might need to prepare SMPL-format human motion sequences to animate the generated avatars. Our code provide a data api for AIST++, which is a high-quality dance video database with SMPL annotations. Please download the SMPL annotations from this website, build a directory with the following structure:

aist
β”œβ”€β”€ gWA_sFM_cAll_d26_mWA5_ch13.pkl
β”œβ”€β”€ gWA_sFM_cAll_d27_mWA0_ch15.pkl
β”œβ”€β”€ gWA_sFM_cAll_d27_mWA2_ch17.pkl
└── ...

and update the data path AIST_ROOT in configs/paths.py.

Getting Started

DreamWaltz mainly consists of two training stages: (I) Canonical Avatar Creation and (II) Animatable Avatar Learning.

The following commands are also provided in run.sh.

1. SMPL-Guided NeRF Initialization

To pretrain NeRF using mask images rendered from canonical-posed SMPL mesh:

python train.py \
  --log.exp_name "pretrained" \
  --log.pretrain_only True \
  --prompt.scene canonical-A \
  --prompt.smpl_prompt depth \
  --optim.iters 10000

The obtained pretrained ckpt is available to different text prompts.

2. Canonical Avatar Creation

To learn a NeRF-based canonical avatar representation using ControlNet-based SDS:

text="a wooden robot"
avatar_name="wooden_robot"
pretrained_ckpt="./outputs/pretrained/checkpoints/step_010000.pth"
# the pretrained ckpt is available to different text prompts

python train.py \
  --guide.text "${text}" \
  --log.exp_name "canonical/${avatar_name}" \
  --optim.ckpt "${pretrained_ckpt}" \
  --optim.iters 30000 \
  --prompt.scene canonical-A

3. Animatable Avatar Learning

To learn a NeRF-based animatable avatar representation using ControlNet-based SDS:

text="a wooden robot"
avatar_name="wooden_robot"
canonical_ckpt="./outputs/canonical/${avatar_name}/checkpoints/step_030000.pth"

python train.py \
  --animation True \
  --guide.text "${text}" \
  --log.exp_name "animatable/${avatar_name}" \
  --optim.ckpt "${canonical_ckpt}" \
  --optim.iters 50000 \
  --prompt.scene random \
  --render.cuda_ray False

4. Make a Dancing Video

To make a dancing video based on the well-trained animatable avatar representation and the target motion sequences:

scene="gWA_sFM_cAll_d27_mWA2_ch17,180-280"
# "gWA_sFM_cAll_d27_mWA2_ch17" is the filename of motion sequences in AIST++
# "180-280" is the range of video frame indices: [180, 280]

avatar_name="wooden_robot"
animatable_ckpt="./outputs/animatable/${avatar_name}/checkpoints/step_050000.pth"

python train.py \
    --animation True \
    --log.eval_only True \
    --log.exp_name "videos/${avatar_name}" \
    --optim.ckpt "${animatable_ckpt}" \
    --prompt.scene "${scene}" \
    --render.cuda_ray False \
    --render.eval_fix_camera True

The resulting video can be found in PROJECT_ROOT/outputs/videos/${avatar_name}/results/128x128/.

Results

Canonical Avatars


Figure 2. DreamWaltz can create canonical avatars from textual descriptions.

Animatable Avatars


Figure 3. DreamWaltz can animate canonical avatars given motion sequences.

Complex Scenes


Figure 4. DreamWaltz can make complex 3D scenes with avatar-object interactions.


Figure 5. DreamWaltz can make complex 3D scenes with avatar-scene interactions.


Figure 6. DreamWaltz can make complex 3D scenes with avatar-avatar interactions.

Reference

If you find this repository useful for your work, please consider citing it as follows:

@article{huang2023dreamwaltz,
 title={DreamWaltz: Make a Scene with Complex 3D Animatable Avatars},
 author={Yukun Huang and Jianan Wang and Ailing Zeng and He Cao and Xianbiao Qi and Yukai Shi and Zheng-Jun Zha and Lei Zhang},
 year={2023},
 eprint={2305.12529},
 archivePrefix={arXiv},
 primaryClass={cs.CV}
}

@article{huang2023dreamtime,
 title={DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation},
 author={Yukun Huang and Jianan Wang and Yukai Shi and Xianbiao Qi and Zheng-Jun Zha and Lei Zhang},
 year={2023},
 eprint={2306.12422},
 archivePrefix={arXiv},
 primaryClass={cs.CV}
}

More Repositories

1

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Jupyter Notebook
14,724
star
2

GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Python
6,003
star
3

DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
Python
2,160
star
4

T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Python
2,147
star
5

DWPose

"Effective Whole-body Pose Estimation with Two-stages Distillation" (ICCV 2023, CV4Metaverse Workshop)
Python
2,136
star
6

detrex

detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
Python
2,001
star
7

awesome-detection-transformer

Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)
1,261
star
8

MaskDINO

[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"
Python
1,149
star
9

Grounding-DINO-1.5-API

API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
Python
680
star
10

OpenSeeD

[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"
Python
650
star
11

Motion-X

[NeurIPS 2023] Official implementation of the paper "Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset"
Python
542
star
12

DN-DETR

[CVPR 2022 Oral] Official implementation of DN-DETR
Python
535
star
13

DAB-DETR

[ICLR 2022] Official implementation of the paper "DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR"
Jupyter Notebook
499
star
14

OSX

[CVPR 2023] Official implementation of the paper "One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer"
Python
291
star
15

HumanTOMATO

[ICML 2024] πŸ…HumanTOMATO: Text-aligned Whole-body Motion Generation
Python
276
star
16

MotionLLM

[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Python
226
star
17

deepdataspace

The Go-To Choice for CV Data Visualization, Annotation, and Model Analysis.
TypeScript
212
star
18

Stable-DINO

[ICCV 2023] Official implementation of the paper "Detection Transformer with Stable Matching"
Python
203
star
19

Lite-DETR

[CVPR 2023] Official implementation of the paper "Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR"
Python
182
star
20

MP-Former

[CVPR 2023] Official implementation of the paper: MP-Former: Mask-Piloted Transformer for Image Segmentation
Python
99
star
21

HumanSD

The official implementation of paper "HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation"
Python
92
star
22

HumanArt

The official implementation of CVPR 2023 paper "Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes"
86
star
23

ED-Pose

The official repo for [ICLR'23] "Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation "
Python
73
star
24

DQ-DETR

[AAAI 2023] DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
54
star
25

DisCo-CLIP

Official PyTorch implementation of the paper "DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training".
Python
47
star
26

LipsFormer

Python
34
star
27

DiffHOI

Official implementation of the paper "Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model"
Python
29
star
28

hana

Implementation and checkpoints of Imagen, Google's text-to-image synthesis neural network, in Pytorch
Python
17
star
29

TOSS

[ICLR 2024] Official implementation of the paper "Toss: High-quality text-guided novel view synthesis from a single image"
Python
15
star
30

IYFC

C++
9
star
31

TAPTR

6
star
32

detrex-storage

2
star