• Stars
    star
    132
  • Rank 272,619 (Top 6 %)
  • Language
  • Created about 1 year ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Source code for the paper: "AutoDecoding Latent 3D Diffusion Models"

3D VADER - AutoDecoding Latent 3D Diffusion Models

Evangelos Ntavelis1*, Aliaksandr Siarohin2, Kyle Olszewski2, Chaoyang Wang3, Luc Van Gool1,4, Sergey Tulyakov2

1Computer Vision Lab - ETH Zurich 2Snap Inc. 3CI2CV Lab - CMU 4ESAT - KULeuven

*Work done while interning at Snap.

Project Page - arXiv - Paper - Cite

TL;DR
We generate 3D assets from diverse 2D multi-view datasets by training a 3D Diffusion model on the intermediate features of a Volumetric AutoDecodER.

Abstract

We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core. The 3D autodecoder framework embeds properties learned from the target dataset in the latent space, which can then be decoded into a volumetric representation for rendering view-consistent appearance and geometry. We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations to learn a 3D diffusion from 2D images or monocular videos of rigid or articulated objects. Our approach is flexible enough to use either existing camera supervision or no camera information at all -- instead efficiently learning it during training. Our evaluations demonstrate that our generation results outperform state-of-the-art alternatives on various benchmark datasets and metrics, including multi-view image datasets of synthetic objects, real in-the-wild videos of moving people, and a large-scale, real video dataset of static objects.

Method

Our proposed two-stage framework: Stage 1 trains an autodecoder with two generative components, G1 and G2. It learns to assign each training set object a 1D embedding that is processed by G1 into a latent volumetric space. G2 decodes these volumes into larger radiance volumes suitable for rendering. Note that we are using only 2D supervision to train the autodecoder. In Stage 2, the autodecoder parameters are frozen. Latent volumes generated by G1 are then used to train the 3D denoising diffusion process. At inference time, G1 is not used, as the generated volume is randomly sampled, denoised, and then decoded by G2 for rendering.

3D Assets Visualization

Please visit our Project Page.

Code

Source code will be available soon.

BibTeX

@misc{ntavelis2023autodecoding,
    title={AutoDecoding Latent 3D Diffusion Models},
    author={Evangelos Ntavelis and Aliaksandr Siarohin and Kyle Olszewski and Chaoyang Wang and Luc Van Gool and Sergey Tulyakov},
    year={2023},
    eprint={2307.05445},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgements

We would like to thank Michael Vasilkovsky for preparing the ObjaVerse renderings, and Colin Eles for his support with infrastructure. Moreover, we would like to thank Norman Mรผller, author of DiffRF paper, for his invaluable help with setting up the DiffRF baseline, the ABO Tables and PhotoShape Chairs datasets, and the evaluation pipeline as well as answering all related questions. A true marvel of a scientist. Finally, Evan would like to thank Claire and Gio for making the best cappuccinos and fueling up this research.

More Repositories

1

articulated-animation

Code for Motion Representations for Articulated Animation paper
Jupyter Notebook
1,210
star
2

EfficientFormer

EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
Python
972
star
3

NeROIC

Python
909
star
4

HyperHuman

[ICLR 2024] Github Repo for "HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion"
HTML
489
star
5

Panda-70M

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Python
459
star
6

MoCoGAN-HD

[ICLR 2021 Spotlight] A Good Image Generator Is What You Need for High-Resolution Video Synthesis
Python
240
star
7

3dgp

3D generation on ImageNet [ICLR 2023]
Python
207
star
8

MMVID

[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
Python
194
star
9

MobileR2L

[CVPR 2023] Real-Time Neural Light Field on Mobile Devices
Python
192
star
10

R2L

[ECCV 2022] R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis
Python
189
star
11

CAT

[CVPR 2021] Teachers Do More Than Teach: Compressing Image-to-Image Models (CAT)
Python
180
star
12

discoscene

CVPR 2023 Highlight: DiscoScene
Python
138
star
13

BitsFusion

118
star
14

SnapFusion

HTML
95
star
15

F8Net

[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
Python
95
star
16

SF-V

This respository contains the code for SF-V: Single Forward Video Generation Model.
82
star
17

AToM

Official implementation of `AToM: Amortized Text-to-Mesh using 2D Diffusion`
82
star
18

graphless-neural-networks

[ICLR 2022] Code for Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation (GLNN)
Python
75
star
19

MLPInit-for-GNNs

[ICLR 2023] MLPInit: Embarrassingly Simple GNN Training Acceleration with MLP Initialization
Jupyter Notebook
69
star
20

unsupervised-volumetric-animation

The repository for paper Unsupervised Volumetric Animation
Python
67
star
21

non-contrastive-link-prediction

[ICLR 2023] Link Prediction with Non-Contrastive Learning
Python
26
star
22

linkless-link-prediction

[ICML 2023] Linkless Link Prediction via Relational Distillation
Python
18
star
23

locomo

Python
15
star
24

LargeGT

Graph Transformers for Large Graphs
Python
13
star
25

efficient-nn-tutorial

Page for the CVPR 2023 Tutorial - Efficient Neural Networks: From Algorithm Design to Practical Mobile Deployments
HTML
13
star
26

weights2weights

Official Implementation of weights2weights
12
star
27

SpFDE

[NeurIPs 2022] Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training
11
star
28

representations-for-creativity

HTML
7
star
29

hpdm

Hierarchical Patch Diffusion Models for High-Resolution Video Synthesis [CVPR 2024]
HTML
7
star
30

video-synthesis-tutorial

HTML
5
star
31

snap-research-website

https://research.snap.com/
HTML
2
star
32

promptable-game-models

2
star
33

NeurT-FDR

NeurT-FDR, a method for controlling false discovery rate by incorporating feature hierarchy
Python
2
star
34

qfar

Official implementation of MobiCom 2023 paper "QfaR: Location-Guided Scanning of Visual Codes from Long Distances"
Python
1
star
35

cabam-graph-generation

[KDD MLG'20] Class-Assortative Barabasi Albert Model for Graph Generation
Jupyter Notebook
1
star
36

cv-call-for-interns-2022

HTML
1
star
37

NodeDup

Node Duplication Improves Cold-start Link Prediction
Python
1
star
38

SPAD

Source code for paper "SPAD: Spatially Aware Multi-View Diffusers"
1
star
39

snapvideo

HTML
1
star