• Stars
    star
    356
  • Rank 115,622 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created about 3 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Is a geometric model required to synthesize novel views from a single image?

Geometry-Free View Synthesis: Transformers and no 3D Priors

teaser

Geometry-Free View Synthesis: Transformers and no 3D Priors
Robin Rombach*, Patrick Esser*, Bjรถrn Ommer
* equal contribution

arXiv | BibTeX | Colab

Interactive Scene Exploration Results

RealEstate10K:
realestate
Videos: short (2min) / long (12min)

ACID:
acid
Videos: short (2min) / long (9min)

Demo

For a quickstart, you can try the Colab demo, but for a smoother experience we recommend installing the local demo as described below.

Installation

The demo requires building a PyTorch extension. If you have a sane development environment with PyTorch, g++ and nvcc, you can simply

pip install git+https://github.com/CompVis/geometry-free-view-synthesis#egg=geometry-free-view-synthesis

If you run into problems and have a GPU with compute capability below 8, you can also use the provided conda environment:

git clone https://github.com/CompVis/geometry-free-view-synthesis
conda env create -f geometry-free-view-synthesis/environment.yaml
conda activate geofree
pip install geometry-free-view-synthesis/

Running

After installation, running

braindance.py

will start the demo on a sample scene. Explore the scene interactively using the WASD keys to move and arrow keys to look around. Once positioned, hit the space bar to render the novel view with GeoGPT.

You can move again with WASD keys. Mouse control can be activated with the m key. Run braindance.py <folder to select image from/path to image> to run the demo on your own images. By default, it uses the re-impl-nodepth (trained on RealEstate without explicit transformation and no depth input) which can be changed with the --model flag. The corresponding checkpoints will be downloaded the first time they are required. Specify an output path using --video path/to/vid.mp4 to record a video.

> braindance.py -h
usage: braindance.py [-h] [--model {re_impl_nodepth,re_impl_depth,ac_impl_nodepth,ac_impl_depth}] [--video [VIDEO]] [path]

What's up, BD-maniacs?

key(s)       action                  
=====================================
wasd         move around             
arrows       look around             
m            enable looking with mouse
space        render with transformer 
q            quit                    

positional arguments:
  path                  path to image or directory from which to select image. Default example is used if not specified.

optional arguments:
  -h, --help            show this help message and exit
  --model {re_impl_nodepth,re_impl_depth,ac_impl_nodepth,ac_impl_depth}
                        pretrained model to use.
  --video [VIDEO]       path to write video recording to. (no recording if unspecified).

Training

Data Preparation

We support training on RealEstate10K and ACID. Both come in the same format as described here and the preparation is the same for both of them. You will need to have colmap installed and available on your $PATH.

We assume that you have extracted the .txt files of the dataset you want to prepare into $TXT_ROOT, e.g. for RealEstate:

> tree $TXT_ROOT
โ”œโ”€โ”€ test
โ”‚ย ย  โ”œโ”€โ”€ 000c3ab189999a83.txt
โ”‚ย ย  โ”œโ”€โ”€ ...
โ”‚ย ย  โ””โ”€โ”€ fff9864727c42c80.txt
โ””โ”€โ”€ train
    โ”œโ”€โ”€ 0000cc6d8b108390.txt
    โ”œโ”€โ”€ ...
    โ””โ”€โ”€ ffffe622a4de5489.txt

and that you have downloaded the frames (we downloaded them in resolution 640 x 360) into $IMG_ROOT, e.g. for RealEstate:

> tree $IMG_ROOT
โ”œโ”€โ”€ test
โ”‚ย ย  โ”œโ”€โ”€ 000c3ab189999a83
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ 45979267.png
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ ...
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ 55255200.png
โ”‚ย ย  โ”œโ”€โ”€ ...
โ”‚ย ย  โ”œโ”€โ”€ 0017ce4c6a39d122
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ 40874000.png
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ ...
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ 48482000.png
โ”œโ”€โ”€ train
โ”‚ย ย  โ”œโ”€โ”€ ...

To prepare the $SPLIT split of the dataset ($SPLIT being one of train, test for RealEstate and train, test, validation for ACID) in $SPA_ROOT, run the following within the scripts directory:

python sparse_from_realestate_format.py --txt_src ${TXT_ROOT}/${SPLIT} --img_src ${IMG_ROOT}/${SPLIT} --spa_dst ${SPA_ROOT}/${SPLIT}

You can also simply set TXT_ROOT, IMG_ROOT and SPA_ROOT as environment variables and run ./sparsify_realestate.sh or ./sparsify_acid.sh. Take a look into the sources to run with multiple workers in parallel.

Finally, symlink $SPA_ROOT to data/realestate_sparse/data/acid_sparse.

First Stage Models

As described in our paper, we train the transformer models in a compressed, discrete latent space of pretrained VQGANs. These pretrained models can be conveniently downloaded by running

python scripts/download_vqmodels.py 

which will also create symlinks ensuring that the paths specified in the training configs (see configs/*) exist. In case some of the models have already been downloaded, the script will only create the symlinks.

For training custom first stage models, we refer to the taming transformers repository.

Running the Training

After both the preparation of the data and the first stage models are done, the experiments on ACID and RealEstate10K as described in our paper can be reproduced by running

python geofree/main.py --base configs/<dataset>/<dataset>_13x23_<experiment>.yaml -t --gpus 0,

where <dataset> is one of realestate/acid and <experiment> is one of expl_img/expl_feat/expl_emb/impl_catdepth/impl_depth/impl_nodepth/hybrid. These abbreviations correspond to the experiments listed in the following Table (see also Fig.2 in the main paper)

variants

Note that each experiment was conducted on a GPU with 40 GB VRAM.

BibTeX

@misc{rombach2021geometryfree,
      title={Geometry-Free View Synthesis: Transformers and no 3D Priors}, 
      author={Robin Rombach and Patrick Esser and Bjรถrn Ommer},
      year={2021},
      eprint={2104.07652},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

More Repositories

1

stable-diffusion

A latent text-to-image diffusion model
Jupyter Notebook
64,474
star
2

latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models
Jupyter Notebook
10,221
star
3

taming-transformers

Taming Transformers for High-Resolution Image Synthesis
Jupyter Notebook
5,244
star
4

adaptive-style-transfer

source code for the ECCV18 paper A Style-Aware Content Loss for Real-time HD Style Transfer
Python
710
star
5

vunet

A generative model conditioned on shape and appearance.
Python
492
star
6

metric-learning-divide-and-conquer

Source code for the paper "Divide and Conquer the Embedding Space for Metric Learning", CVPR 2019
Python
262
star
7

net2net

Network-to-Network Translation with Conditional Invertible Neural Networks
Python
217
star
8

image2video-synthesis-using-cINNs

Implementation of Stochastic Image-to-Video Synthesis using cINNs.
Python
179
star
9

brushstroke-parameterized-style-transfer

TensorFlow implementation of our CVPR 2021 Paper "Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes".
Python
158
star
10

imagebart

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis
Python
119
star
11

iin

A Disentangling Invertible Interpretation Network
Python
119
star
12

content-style-disentangled-ST

Content and Style Disentanglement for Artistic Style Transfer [ICCV19]
89
star
13

retrieval-augmented-diffusion-models

Official codebase for the Paper โ€œRetrieval-Augmented Diffusion Modelsโ€
Jupyter Notebook
83
star
14

fm-boosting

Boosting Latent Diffusion with Flow Matching
73
star
15

unsupervised-disentangling

Python
54
star
16

invariances

Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with Invertible Neural Networks
Python
52
star
17

interactive-image2video-synthesis

Python
51
star
18

ipoke

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis
Python
46
star
19

unsupervised-part-segmentation

Code for GCPR 2020 Oral : "Unsupervised Part Discovery by Unsupervised Disentanglement"
Jupyter Notebook
30
star
20

instant-lora-composition

29
star
21

behavior-driven-video-synthesis

Python
26
star
22

content-targeted-style-transfer

Content Transformation Block For Image Style Transfer [CVPR19]
24
star
23

robust-disentangling

Unsupervised Robust Disentangling of Latent Characteristics for Image Synthesis
Python
23
star
24

metric-learning-divide-and-conquer-improved

Source code for the paper "Improving Deep Metric Learning byDivide and Conquer"
Python
19
star
25

cuneiform-sign-detection-dataset

Dataset provided with the article "Deep learning for cuneiform sign detection with weak supervision using transliteration alignment". It comprises image references, transliterations and sign annotations of clay tablets from the Neo-Assyrian epoch.
Jupyter Notebook
11
star
26

visual-search

Visual search interface
10
star
27

magnify-posture-deviations

Unsupervised Magnification of Posture Deviations Across Subjects
8
star
28

cuneiform-sign-detection-code

Code for the article "Deep learning of cuneiform sign detection with weak supervision using transliteration alignment"
Jupyter Notebook
7
star
29

hbugen2018

Towards Learning a Realistic Rendering of Human Behavior
7
star
30

zigma

7
star
31

cuneiform-sign-detection-webapp

Code for demo web application of the article "Deep learning for cuneiform sign detection with weak supervision using transliteration alignment".
JavaScript
4
star
32

Characterizing_Generalization_in_DML

Python
3
star
33

AutomaticBehaviorAnalysis_NatureComm

Source Code + Documentation of our Automatic Behavior Analysis Software
MATLAB
3
star
34

depth-fm

DepthFM: Fast Monocular Depth Estimation with Flow Matching
Jupyter Notebook
3
star
35

network-fusion

1
star