• Stars
    star
    122
  • Rank 292,031 (Top 6 %)
  • Language
    Python
  • Created over 3 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

ImageBART

NeurIPS 2021

teaser
Patrick Esser*, Robin Rombach*, Andreas Blattmann*, Björn Ommer
* equal contribution

arXiv | BibTeX | Poster

Requirements

A suitable conda environment named imagebart can be created and activated with:

conda env create -f environment.yaml
conda activate imagebart

Get the Models

We provide pretrained weights and hyperparameters for models trained on the following datasets:

Download the respective files and extract their contents to a directory ./models/.

Moreover, we provide all the required VQGANs as a .zip at https://ommer-lab.com/files/vqgan.zip, which contents have to be extracted to ./vqgan/.

Get the Data

Running the training configs or the inpainting script requires a dataset available locally. For ImageNet and FFHQ, see this repo's parent directory taming-transformers. The LSUN datasets can be conveniently downloaded via the script available here. We performed a custom split into training and validation images, and provide the corresponding filenames at https://ommer-lab.com/files/lsun.zip. After downloading, extract them to ./data/lsun. The beds/cats/churches subsets should also be placed/symlinked at ./data/lsun/bedrooms/./data/lsun/cats/./data/lsun/churches, respectively.

Inference

Unconditional Sampling

We provide a script for sampling from unconditional models trained on the LSUN-{bedrooms,bedrooms,cats}- and FFHQ-datasets.

FFHQ

On the FFHQ dataset, we provide two distinct pretrained models, one with a chain of length 4 and a geometric noise schedule as proposed by Sohl-Dickstein et al. [1] , and another one with a chain of length 2 and a custom schedule. These models can be started with

CUDA_VISIBLE_DEVICES=<gpu_id> streamlit run scripts/sample_imagebart.py configs/sampling/ffhq/<config>

LSUN

For the models trained on the LSUN-datasets, use

CUDA_VISIBLE_DEVICES=<gpu_id> streamlit run scripts/sample_imagebart.py configs/sampling/lsun/<config>

Class Conditional Sampling on ImageNet

To sample from class-conditional ImageNet models, use

CUDA_VISIBLE_DEVICES=<gpu_id> streamlit run scripts/sample_imagebart.py configs/sampling/imagenet/<config>

Image Editing with Unconditional Models

We also provide a script for image editing with our unconditional models. For our FFHQ-model with geometric schedule this can be started with

CUDA_VISIBLE_DEVICES=<gpu_id> streamlit run scripts/inpaint_imagebart.py configs/sampling/ffhq/ffhq_4scales_geometric.yaml

resulting in samples similar to the following. teaser

Training

In general, there are two options for training the autoregressive transition probabilities of the reverse Markov chain: (i) train them jointly, taking into account a weighting of the individual scale contributions, or (ii) train them independently, which means that each training process optimizes a single transition and the scales must be stacked after training. We conduct most of our experiments using the latter option, but provide configurations for both cases.

Training Scales Independently

For training scales independently, each transition requires a seperate optimization process, which can started via

CUDA_VISIBLE_DEVICES=<gpu_id> python main.py --base configs/<data>/<config>.yaml -t --gpus 0, 

We provide training configs for a four scale training of FFHQ using a geometric schedule, a four scale geometric training on ImageNet and various three-scale experiments on LSUN. See also the overview of our pretrained models.

Training Scales Jointly

For completeness, we also provide a config to run a joint training with 4 scales on FFHQ. Training can be started by running

CUDA_VISIBLE_DEVICES=<gpu_id> python main.py --base configs/ffhq/ffhq_4_scales_joint-training.yaml -t --gpus 0, 

Shout-Outs

Many thanks to all who make their work and implementations publicly available. For this work, these were in particular:

teaser

References

[1] Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S.. (2015). Deep Unsupervised Learning using Nonequilibrium Thermodynamics. Proceedings of the 32nd International Conference on Machine Learning

Bibtex

@article{DBLP:journals/corr/abs-2108-08827,
  author    = {Patrick Esser and
               Robin Rombach and
               Andreas Blattmann and
               Bj{\"{o}}rn Ommer},
  title     = {ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive
               Image Synthesis},
  journal   = {CoRR},
  volume    = {abs/2108.08827},
  year      = {2021}
}

More Repositories

1

stable-diffusion

A latent text-to-image diffusion model
Jupyter Notebook
67,358
star
2

latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models
Jupyter Notebook
11,417
star
3

taming-transformers

Taming Transformers for High-Resolution Image Synthesis
Jupyter Notebook
5,679
star
4

adaptive-style-transfer

source code for the ECCV18 paper A Style-Aware Content Loss for Real-time HD Style Transfer
Python
710
star
5

vunet

A generative model conditioned on shape and appearance.
Python
492
star
6

geometry-free-view-synthesis

Is a geometric model required to synthesize novel views from a single image?
Python
373
star
7

depth-fm

DepthFM: Fast Monocular Depth Estimation with Flow Matching
Jupyter Notebook
282
star
8

metric-learning-divide-and-conquer

Source code for the paper "Divide and Conquer the Embedding Space for Metric Learning", CVPR 2019
Python
262
star
9

net2net

Network-to-Network Translation with Conditional Invertible Neural Networks
Python
221
star
10

zigma

A PyTorch implementation of the paper "ZigMa: A DiT-Style Mamba-based Diffusion Model"
Python
188
star
11

image2video-synthesis-using-cINNs

Implementation of Stochastic Image-to-Video Synthesis using cINNs.
Python
183
star
12

brushstroke-parameterized-style-transfer

TensorFlow implementation of our CVPR 2021 Paper "Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes".
Python
158
star
13

fm-boosting

FMBoost: Boosting Latent Diffusion with Flow Matching (ECCV 2024 Oral)
122
star
14

iin

A Disentangling Invertible Interpretation Network
Python
122
star
15

retrieval-augmented-diffusion-models

Official codebase for the Paper “Retrieval-Augmented Diffusion Models”
Jupyter Notebook
112
star
16

attribute-control

Fine-Grained Subject-Specific Attribute Expression Control in T2I Models
Jupyter Notebook
101
star
17

content-style-disentangled-ST

Content and Style Disentanglement for Artistic Style Transfer [ICCV19]
89
star
18

unsupervised-disentangling

Python
54
star
19

invariances

Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with Invertible Neural Networks
Python
53
star
20

interactive-image2video-synthesis

Python
51
star
21

ipoke

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis
Python
46
star
22

instant-lora-composition

31
star
23

unsupervised-part-segmentation

Code for GCPR 2020 Oral : "Unsupervised Part Discovery by Unsupervised Disentanglement"
Jupyter Notebook
30
star
24

behavior-driven-video-synthesis

Python
27
star
25

content-targeted-style-transfer

Content Transformation Block For Image Style Transfer [CVPR19]
24
star
26

robust-disentangling

Unsupervised Robust Disentangling of Latent Characteristics for Image Synthesis
Python
23
star
27

metric-learning-divide-and-conquer-improved

Source code for the paper "Improving Deep Metric Learning byDivide and Conquer"
Python
20
star
28

cuneiform-sign-detection-dataset

Dataset provided with the article "Deep learning for cuneiform sign detection with weak supervision using transliteration alignment". It comprises image references, transliterations and sign annotations of clay tablets from the Neo-Assyrian epoch.
Jupyter Notebook
11
star
29

visual-search

Visual search interface
10
star
30

magnify-posture-deviations

Unsupervised Magnification of Posture Deviations Across Subjects
9
star
31

cuneiform-sign-detection-code

Code for the article "Deep learning of cuneiform sign detection with weak supervision using transliteration alignment"
Jupyter Notebook
7
star
32

hbugen2018

Towards Learning a Realistic Rendering of Human Behavior
7
star
33

AutomaticBehaviorAnalysis_NatureComm

Source Code + Documentation of our Automatic Behavior Analysis Software
MATLAB
5
star
34

cuneiform-sign-detection-webapp

Code for demo web application of the article "Deep learning for cuneiform sign detection with weak supervision using transliteration alignment".
JavaScript
4
star
35

Characterizing_Generalization_in_DML

Python
3
star
36

network-fusion

1
star