• Stars
    star
    611
  • Rank 73,401 (Top 2 %)
  • Language
  • License
    Other
  • Created over 2 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

GAUDI: A Neural Architect for Immersive 3D Scene Generation, Arxiv.

Β  Β  Β  Β  Β 

Samples from GAUDI (Allow a couple minutes of loading time for videos.)

Miguel Angel Bautista*, Pengsheng Guo*, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Joshua M. Susskind
Apple (*equal contribution)

Summary

  • We introduce GAUDI, a generative model that captures the distribution of 3D scenes parametrized as radiance fields.
  • We decompose generative model in two steps: (i) Optimizing a latent representation of 3D radiance fields and corresponding camera poses. (ii) Learning a powerful score based generative model on latent space.
  • GAUDI obtains state-of-the-art performance accross multiple datasets for unconditional generation and enables conditional generation of 3D scenes from different modalities like text or RGB images.
Expand Abstract

We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera. We tackle this challenging problem with a scalable yet powerful approach, where we first optimize a latent representation that disentangles radiance fields and camera poses. This latent representation is then used to learn a generative model that enables both unconditional and conditional generation of 3D scenes. Our model generalizes previous works that focus on single objects by removing the assumption that the camera pose distribution can be shared across samples. We show that GAUDI obtains state-of-the-art performance in the unconditional generative setting across multiple datasets and allows for conditional generation of 3D scenes given conditioning variables like sparse image observations or text that describes the scene.

Model

Our model is composed of two stages: latent representation optimization and generative modeling. Finding powerful latent representation for scene radiance fields and camera poses is critical to obtain good performance. To achieve this, we design a decoder with three modules:

  • A scene decoder $d$ that takes as input scene latents and outputs a tri-plane latent representation to condition a the radiance field MLP.
  • A camera pose decoder $c$ that takes as input a camera pose latent and a timestamp and outputs a camera pose.
  • A radiance field $f$ that takes as input a 3d point and is conditioned on the tri-plane representation.

The parameters of all the modules and the latents for scene and camera poses are optimized in the first stage. In the second stage, we learn a score-based generative model in latent space.

model

Results

We present qualitative results for both unconditional and conditional generative modeling. During inference, we sample latents from the generative model and feed them through the decoder to obtain a radiance field and camera path. In the conditional setting we train the generative model using pairs of latents and conditioning variables (like text or images) and sample latents given conditioning variables during inference.

Uncoditional generation

Random samples from the unconditional version of GAUDI for 4 different datasets: Vizdoom, Replica, VLN-CE and ARKITScenes.


Text conditional generation

Random samples from a text conditional GAUDI model trained on VLN-CE.

Prompt: "go down the stairs"

Prompt: "go through the hallway"

Prompt: "go up the stairs"

Prompt: "walk into the kitchen"

Image conditional generation

Random samples from a image conditional GAUDI model trained on VLN-CE.

Image prompt

Image prompt

Image prompt

Image prompt

Interpolation

We can linearly interpolate the latent representation of two scenes (leftmost and rightmost columns) and move the camera to explore the interpolated scene.

Citation

@article{bautista2022gaudi,
    title={GAUDI: A Neural Architect for Immersive 3D Scene Generation},
    author={Miguel Angel Bautista and Pengsheng Guo and Samira Abnar and Walter Talbott and Alexander Toshev and Zhuoyuan Chen and Laurent Dinh and Shuangfei Zhai and Hanlin Goh and Daniel Ulbricht and Afshin Dehghan and Josh Susskind},
    journal={arXiv},
    year={2022}
}

The author's copyright under the videos provided here are licensed under the CC-BY-NC license.

Source code

Source code will be available in the following weeks.

Related links

Check out recent related work on making radiance fields generalize to multiple objects/scenes:

More Repositories

1

swift

The Swift Programming Language
C++
66,491
star
2

ml-stable-diffusion

Stable Diffusion with Core ML on Apple Silicon
Python
16,831
star
3

swift-evolution

This maintains proposals for changes and user-visible enhancements to the Swift Programming Language.
Markdown
15,085
star
4

foundationdb

FoundationDB - the open source, distributed, transactional key-value store
C++
14,444
star
5

turicreate

Turi Create simplifies the development of custom machine learning models.
C++
11,197
star
6

darwin-xnu

The Darwin Kernel (mirror). This repository is a pure mirror and contributions are currently not accepted via pull-requests, please submit your contributions via https://developer.apple.com/bug-reporting/
C
10,558
star
7

pkl

A configuration as code language with rich validation and tooling.
Java
10,223
star
8

swift-package-manager

The Package Manager for the Swift Programming Language
Swift
9,637
star
9

ml-ferret

Python
8,415
star
10

swift-nio

Event-driven network application framework for high performance protocol servers & clients, non-blocking.
Swift
7,274
star
11

corenet

CoreNet: A library for training deep neural networks
Jupyter Notebook
6,968
star
12

swift-algorithms

Commonly used sequence and collection algorithms for Swift
Swift
5,885
star
13

swift-corelibs-foundation

The Foundation Project, providing core utilities, internationalization, and OS independence
C
5,269
star
14

swift-protobuf

Plugin and runtime library for using protobuf with Swift
Swift
4,561
star
15

coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
Python
4,401
star
16

password-manager-resources

A place for creators and users of password managers to collaborate on resources to make password management better.
JavaScript
4,144
star
17

ml-mgie

Python
3,853
star
18

tensorflow_macos

TensorFlow for macOS 11.0+ accelerated using Apple's ML Compute framework.
Shell
3,672
star
19

swift-collections

Commonly used data structures for Swift
Swift
3,651
star
20

ml-depth-pro

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
Python
3,436
star
21

swift-argument-parser

Straightforward, type-safe argument parsing for Swift
Swift
3,289
star
22

sourcekit-lsp

Language Server Protocol implementation for Swift and C-based languages
Swift
3,160
star
23

swift-syntax

A set of Swift libraries for parsing, inspecting, generating, and transforming Swift source code.
Swift
3,064
star
24

swift-log

A Logging API for Swift
Swift
2,931
star
25

swift-async-algorithms

Async Algorithms for Swift
Swift
2,895
star
26

swift-markdown

A Swift package for parsing, building, editing, and analyzing Markdown documents.
Swift
2,669
star
27

ml-ane-transformers

Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE)
Python
2,527
star
28

swift-corelibs-libdispatch

The libdispatch Project, (a.k.a. Grand Central Dispatch), for concurrency on multicore hardware
C
2,467
star
29

HomeKitADK

C
2,456
star
30

swift-format

Formatting technology for Swift source code
Swift
2,341
star
31

swift-foundation

The Foundation project
Swift
2,302
star
32

homebrew-apple

Ruby
2,240
star
33

cups

Apple CUPS Sources
C
1,926
star
34

axlearn

An Extensible Deep Learning Library
Python
1,840
star
35

ml-fastvit

This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization" ICCV 2023
Python
1,826
star
36

ml-cvnets

CVNets: A library for training computer vision networks
Python
1,777
star
37

sample-food-truck

SwiftUI sample code from WWDC22
Swift
1,738
star
38

swift-numerics

Advanced mathematical types and functions for Swift
Swift
1,669
star
39

swift-book

The Swift Programming Language book
Markdown
1,666
star
40

ml-4m

4M: Massively Multimodal Masked Modeling
Python
1,590
star
41

swift-testing

A modern, expressive testing package for Swift
Swift
1,582
star
42

ml-hypersim

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding
Python
1,495
star
43

swift-crypto

Open-source implementation of a substantial portion of the API of Apple CryptoKit suitable for use on Linux platforms.
C
1,441
star
44

swift-openapi-generator

Generate Swift client and server code from an OpenAPI document.
Swift
1,423
star
45

swift-docker

Docker Official Image packaging for Swift
Dockerfile
1,331
star
46

ml-neuman

Official repository of NeuMan: Neural Human Radiance Field from a Single Video (ECCV 2022)
Python
1,256
star
47

swift-system

Low-level system calls and types for Swift
Swift
1,166
star
48

swift-docc

Documentation compiler that produces rich API reference documentation and interactive tutorials for your Swift framework or package.
Swift
1,140
star
49

swift-corelibs-xctest

The XCTest Project, A Swift core library for providing unit test support
Swift
1,138
star
50

swift-llbuild

A low-level build system, used by Xcode and the Swift Package Manager
C++
1,072
star
51

swift-atomics

Low-level atomic operations for Swift
Swift
1,050
star
52

servicetalk

A networking framework that evolves with your application
Java
910
star
53

swift-http-types

Version-independent HTTP currency types for Swift
Swift
902
star
54

swift-llvm

LLVM
813
star
55

swift-driver

Swift compiler driver reimplementation in Swift
Swift
784
star
56

swift-protobuf-plugin

Moved to apple/swift-protobuf
755
star
57

unityplugins

C#
721
star
58

swift-embedded-examples

A collection of example projects using Embedded Swift
Swift
713
star
59

ml-mobileone

This repository contains the official implementation of the research paper, "An Improved One millisecond Mobile Backbone".
Swift
709
star
60

ml-aim

This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Models
Python
693
star
61

swift-lldb

This is the version of LLDB that supports the Swift programming language & REPL.
C++
674
star
62

swift-clang

C++
672
star
63

ml-mobileclip

This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024
Python
605
star
64

swift-metrics

Metrics API for Swift
Swift
602
star
65

swift-distributed-actors

Peer-to-peer cluster implementation for Swift Distributed Actors
Swift
591
star
66

ARKitScenes

This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data and contains the data, scripts to visualize and process assets, and training code described in our paper.
Python
589
star
67

device-management

Device management schema data for MDM.
580
star
68

sample-backyard-birds

Swift
544
star
69

ml-facelit

Official repository of FaceLit: Neural 3D Relightable Faces (CVPR 2023)
Python
472
star
70

ccs-calendarserver

The Calendar and Contacts Server.
Python
470
star
71

swift-3-api-guidelines-review

Swift
455
star
72

swift-org-website

Swift.org website
SCSS
450
star
73

GCGC

Jupyter Notebook
438
star
74

ml-mdm

Train high-quality text-to-image diffusion models in a data & compute efficient manner
Python
433
star
75

swift-nio-http2

HTTP/2 support for SwiftNIO
Swift
405
star
76

swift-tools-support-core

Contains common infrastructural code for both SwiftPM and llbuild.
Swift
390
star
77

swift-nio-ssh

SwiftNIO SSH is a programmatic implementation of SSH using SwiftNIO
Swift
389
star
78

swift-playdate-examples

An Embedded Swift game running on Playdate by Panic
Swift
386
star
79

swift-nio-ssl

TLS Support for SwiftNIO, based on BoringSSL.
C
345
star
80

ml-gmpi

[ECCV 2022, Oral Presentation] Official PyTorch implementation of GMPI
Python
339
star
81

example-package-dealer

Example package for use with the Swift Package Manager
Swift
335
star
82

security-pcc

Private Cloud Compute (PCC)
Swift
334
star
83

swift-collections-benchmark

A benchmarking tool for Swift Collection algorithms
Swift
333
star
84

swift-homomorphic-encryption

Homomorphic Encryption library and applications in Swift
Swift
330
star
85

example-package-playingcard

Example package for use with the Swift Package Manager
Swift
323
star
86

indexstore-db

Index database library for use with sourcekit-lsp
C++
315
star
87

swift-docc-render

Web renderer for Swift-DocC documentation.
JavaScript
307
star
88

ml-hierarchical-confusion-matrix

Neo: Hierarchical Confusion Matrix Visualization (CHI 2022)
TypeScript
302
star
89

swift-docc-plugin

Swift Package Manager command plugin for Swift-DocC
Swift
301
star
90

swift-migration-guide

Markdown
294
star
91

ml-sigma-reparam

Python
292
star
92

pfl-research

Simulation framework for accelerating research in Private Federated Learning
Jupyter Notebook
289
star
93

ml-gsn

Python
284
star
94

swift-llbuild2

A fresh take on a low-level build system API.
Swift
281
star
95

swift-source-compat-suite

The infrastructure and project index comprising the Swift source compatibility suite.
Python
280
star
96

swift-xcode-playground-support

Logging and communication to allow Swift toolchains to communicate with Xcode.
Swift
279
star
97

sample-cloudkit-sharing

Swift
275
star
98

swift-experimental-string-processing

An early experimental general-purpose pattern matching engine for Swift.
Swift
270
star
99

swift-matter-examples

An Embedded Swift Matter application running on ESP32-C6
Swift
266
star
100

pkl-go

Pkl bindings for the Go programming language
Go
263
star