• Stars
    star
    207
  • Rank 189,769 (Top 4 %)
  • Language
    Python
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

3D generation on ImageNet [ICLR 2023]

3D generation on ImageNet [ICLR 2023]

[Project page] [Paper] [OpenReview]

Samples on ImageNet 256x256

Release progress:

  • Installation guide
  • Training code
  • Inference/visualization code
  • Datasets
  • Data preprocessing scripts
  • ImageNet checkpoint (there is our old one available here, but it is only compatible with our old code version).
  • Checkpoints for satellite datasets
  • Docker container

Installation

To install and activate the environment, run the following command:

conda env create -f environment.yml -p env
conda activate ./env

This repo is built on top of StyleGAN3, so make sure that it runs on your system.

Training

Dataset preparation

First, download ImageNet, and then filter it out to remove the outliers with InceptionV3 with the procedure proposed by Instance Selection for GANs. You can use our scripts/data_scripts/run_instance_selection.py script for this (for ImageNet we do this class-wise in the same manner as in Instance Selection for GANs). Note, that we always compute FID on the full ImageNet.

Now, you need to extract the depths with LeReS. We refer to the LeReS repo for this. The depth maps should be saved under the same names as the images, but with the ._depth.png extension at the end (intead of .jpg or .png in the image file name --- see our provided datasets). You might find the script scripts/data_scripts/merge_depth_data.py useful since it merges the image files with depth files into a single directory, taking into account that LeReS failed to produce depth maps for some of the images.

After that, you need to resize the dataset into 256x256 (and also the depth maps). You can do this by running the following command:

python scripts/data_scripts/resize_dataset.py -d /path/to/imagenet  -t /path/to/imagenet_256 -s 256 -j <NUM_JOBS> -f '.png'

Finally, you need to extract the ResNet-50 features for the dataset. We do this via the following command:

python scripts/data_scripts/extract_features.py dataset_path=/path/to/dataset embedder_name=timm +embedder_kwargs.model_name=resnet50 trg_path=/path/to/save/embeddings_resnet50.memmap

Commands

For ImageNet 256x256 training, we run the following command:

python src/infra/launch.py hydra.run.dir=. desc=MY_EXP_NAME dataset=imagenet dataset.resolution=256 model.loss_kwargs.gamma=0.05 training.resume=null num_gpus=4 model.generator.cmax=1024 model.discriminator.cmax=1024 model.generator.cbase=65536 model.discriminator.cbase=65536

Note, that the training might diverge at some point (in our experience, it diverges 1-2 times during the first 1-5K kimgs) — if that happens, continue training from the latest non-diverged checkpoint. To make the training more stable, you can use a more narrow prior for field-of-view: camera.fov.min=10 camera.fov.max=21 camera.cube_scale=0.35 For the satellite datasets (dogs/horses/elephants), we use normal sizes for generator/discriminator (e.g., running them without the model.generator.cmax=1024 model.discriminator.cmax=1024 model.generator.cbase=65536 model.discriminator.cbase=65536 arguments).

To continue training, launch:

python src/infra/launch.py hydra.run.dir=. experiment_dir=<PATH_TO_EXPERIMENT> training.resume=latest

Training on a cluster or with slurm

If you use slurm or some cluster training, you might be interested in our cluster training infrastructure. We leave our A100 cluster config in configs/env/raven-local.yaml as an example on how to structure the config environment in your own case. In principle, we provide two ways to train: locally and on cluster via slurm (by passing slurm=true when launching training). By default, the simple local environment configs/env/local.yaml is used, but you can switch to your custom one by specifying env=my_env argument (after your created my_env.yaml config in configs/env/).

Evalution

During training, we track the progress by regularly computing FID on 2,048 fake images (versus all the available real images), since generating 50,000 images takes too long. To compute FID for 50k fake images after the training is done, run:

python scripts/calc_metrics.py hydra.run.dir=. ckpt.network_pkl=<CKPT_PATH> data=<PATH_TO_DATA> mirror=true gpus=4 metrics=fid50k_full img_resolution=<IMG_RESOLUTION>

If you have several checkpoints for the same experiment, you can alternatively pass ckpt.networks_dir=<CKPTS_DIR> instead of ckpt.network_pkl=<CKPT_PATH>. In this case, the script will find the best checkpoint out of all the available ones (measured by FID@2k) and computes the metrics for it.

Inference and visualization

Doing visualizations for a 3D GANs paper is pretty tedious, and we tried to structure/simplify this process as much as we could. We created a scripts which runs the necessary visualization types, where each visualization is defined by its own config. Below, we will provide several visualization types, the rest of them can be found in scripts/inference.py. Everywhere we use a direct path to a checkpoint via ckpt.network_pkl, but often it is easier to pass ckpt.networks_dir which should lead to a directory with checkpoints of your experiment --- the script will then take the best checkpoint based on the fid2k_full metric. Please see configs/scripts/inference.yaml for the available parameters and their desciptions.

To generate multi-view videos with a flying camera, run this command:

python scripts/inference.py hydra.run.dir=. ckpt.network_pkl=/path/to/checkpoint.pkl batch_size=4 num_seeds=1 classes=1-16 vis=video_grid trajectory=front_circle output_dir=/path/to/output img_resolution=256 vis.num_videos_per_grid=4

To generate the multi-view image grids for some camera trajectory, run this command:

python scripts/inference.py hydra.run.dir=. ckpt.network_pkl=/path/to/checkpoint.pkl batch_size=4 num_seeds=2 vis=image_grid output_dir=/path/to/output/dir img_resolution=256 classes=1-4 trajectory=points vis.nrow=2

There are different flying camera trajectories available, specified in configs/scripts/trajectory. Each of them have its own hyperparameters, which can be found in the corresponding config.

Bibtex

@inproceedings{3DGP,
    title={3D generation on ImageNet},
    author={Ivan Skorokhodov and Aliaksandr Siarohin and Yinghao Xu and Jian Ren and Hsin-Ying Lee and Peter Wonka and Sergey Tulyakov},
    booktitle={International Conference on Learning Representations},
    year={2023},
    url={https://openreview.net/forum?id=U2WjB9xxZ9q}
}

Bibtex

@inproceedings{3dgp,
    title={3D generation on ImageNet},
    author={Ivan Skorokhodov and Aliaksandr Siarohin and Yinghao Xu and Jian Ren and Hsin-Ying Lee and Peter Wonka and Sergey Tulyakov},
    booktitle={International Conference on Learning Representations},
    year={2023},
    url={https://openreview.net/forum?id=U2WjB9xxZ9q}
}

More Repositories

1

articulated-animation

Code for Motion Representations for Articulated Animation paper
Jupyter Notebook
1,233
star
2

EfficientFormer

EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
Python
972
star
3

NeROIC

Python
909
star
4

Panda-70M

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Python
516
star
5

HyperHuman

[ICLR 2024] Github Repo for "HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion"
HTML
489
star
6

MoCoGAN-HD

[ICLR 2021 Spotlight] A Good Image Generator Is What You Need for High-Resolution Video Synthesis
Python
242
star
7

MMVID

[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
Python
194
star
8

MobileR2L

[CVPR 2023] Real-Time Neural Light Field on Mobile Devices
Python
192
star
9

R2L

[ECCV 2022] R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis
Python
189
star
10

CAT

[CVPR 2021] Teachers Do More Than Teach: Compressing Image-to-Image Models (CAT)
Python
180
star
11

discoscene

CVPR 2023 Highlight: DiscoScene
Python
143
star
12

3DVADER

Source code for the paper: "AutoDecoding Latent 3D Diffusion Models"
133
star
13

BitsFusion

118
star
14

weights2weights

Official Implementation of weights2weights
Jupyter Notebook
115
star
15

SnapFusion

HTML
95
star
16

F8Net

[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
Python
95
star
17

SF-V

This respository contains the code for SF-V: Single Forward Video Generation Model.
82
star
18

AToM

Official implementation of `AToM: Amortized Text-to-Mesh using 2D Diffusion`
82
star
19

graphless-neural-networks

[ICLR 2022] Code for Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation (GLNN)
Python
75
star
20

MLPInit-for-GNNs

[ICLR 2023] MLPInit: Embarrassingly Simple GNN Training Acceleration with MLP Initialization
Jupyter Notebook
69
star
21

unsupervised-volumetric-animation

The repository for paper Unsupervised Volumetric Animation
Python
68
star
22

non-contrastive-link-prediction

[ICLR 2023] Link Prediction with Non-Contrastive Learning
Python
26
star
23

linkless-link-prediction

[ICML 2023] Linkless Link Prediction via Relational Distillation
Python
18
star
24

locomo

Python
15
star
25

LargeGT

Graph Transformers for Large Graphs
Python
13
star
26

efficient-nn-tutorial

Page for the CVPR 2023 Tutorial - Efficient Neural Networks: From Algorithm Design to Practical Mobile Deployments
HTML
13
star
27

SpFDE

[NeurIPs 2022] Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training
11
star
28

GenAU

Jupyter Notebook
7
star
29

representations-for-creativity

HTML
7
star
30

hpdm

Hierarchical Patch Diffusion Models for High-Resolution Video Synthesis [CVPR 2024]
HTML
7
star
31

video-synthesis-tutorial

HTML
5
star
32

promptable-game-models

4
star
33

snap-research-website

https://research.snap.com/
HTML
2
star
34

NeurT-FDR

NeurT-FDR, a method for controlling false discovery rate by incorporating feature hierarchy
Python
2
star
35

qfar

Official implementation of MobiCom 2023 paper "QfaR: Location-Guided Scanning of Visual Codes from Long Distances"
Python
1
star
36

cabam-graph-generation

[KDD MLG'20] Class-Assortative Barabasi Albert Model for Graph Generation
Jupyter Notebook
1
star
37

cv-call-for-interns-2022

HTML
1
star
38

NodeDup

Node Duplication Improves Cold-start Link Prediction
Python
1
star
39

SPAD

Source code for paper "SPAD: Spatially Aware Multi-View Diffusers"
1
star
40

snapvideo

HTML
1
star