• Stars
    star
    3,866
  • Rank 10,811 (Top 0.3 %)
  • Language
    Python
  • License
    Other
  • Created over 3 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

StyleGAN2-ADA - Official PyTorch implementation

StyleGAN2-ADA โ€” Official PyTorch implementation

Teaser image

Training Generative Adversarial Networks with Limited Data
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, Timo Aila
https://arxiv.org/abs/2006.06676

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. The approach does not require changes to loss functions or network architectures, and is applicable both when training from scratch and when fine-tuning an existing GAN on another dataset. We demonstrate, on several datasets, that good results are now possible using only a few thousand training images, often matching StyleGAN2 results with an order of magnitude fewer images. We expect this to open up new application domains for GANs. We also find that the widely used CIFAR-10 is, in fact, a limited data benchmark, and improve the record FID from 5.59 to 2.42.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing

Release notes

This repository is a faithful reimplementation of StyleGAN2-ADA in PyTorch, focusing on correctness, performance, and compatibility.

Correctness

  • Full support for all primary training configurations.
  • Extensive verification of image quality, training curves, and quality metrics against the TensorFlow version.
  • Results are expected to match in all cases, excluding the effects of pseudo-random numbers and floating-point arithmetic.

Performance

  • Training is typically 5%โ€“30% faster compared to the TensorFlow version on NVIDIA Tesla V100 GPUs.
  • Inference is up to 35% faster in high resolutions, but it may be slightly slower in low resolutions.
  • GPU memory usage is comparable to the TensorFlow version.
  • Faster startup time when training new networks (<50s), and also when using pre-trained networks (<4s).
  • New command line options for tweaking the training performance.

Compatibility

  • Compatible with old network pickles created using the TensorFlow version.
  • New ZIP/PNG based dataset format for maximal interoperability with existing 3rd party tools.
  • TFRecords datasets are no longer supported โ€” they need to be converted to the new format.
  • New JSON-based format for logs, metrics, and training curves.
  • Training curves are also exported in the old TFEvents format if TensorBoard is installed.
  • Command line syntax is mostly unchanged, with a few exceptions (e.g., dataset_tool.py).
  • Comparison methods are not supported (--cmethod, --dcap, --cfg=cifarbaseline, --aug=adarv)
  • Truncation is now disabled by default.

Data repository

Path Description
stylegan2-ada-pytorch Main directory hosted on Amazon S3
โ€‚โ€‚โ”œย  ada-paper.pdf Paper PDF
โ€‚โ€‚โ”œย  images Curated example images produced using the pre-trained models
โ€‚โ€‚โ”œย  videos Curated example interpolation videos
โ€‚โ€‚โ””ย  pretrained Pre-trained models
โ€‚โ€‚โ€‚โ€‚โ”œย  ffhq.pkl FFHQ at 1024x1024, trained using original StyleGAN2
โ€‚โ€‚โ€‚โ€‚โ”œย  metfaces.pkl MetFaces at 1024x1024, transfer learning from FFHQ using ADA
โ€‚โ€‚โ€‚โ€‚โ”œย  afhqcat.pkl AFHQ Cat at 512x512, trained from scratch using ADA
โ€‚โ€‚โ€‚โ€‚โ”œย  afhqdog.pkl AFHQ Dog at 512x512, trained from scratch using ADA
โ€‚โ€‚โ€‚โ€‚โ”œย  afhqwild.pkl AFHQ Wild at 512x512, trained from scratch using ADA
โ€‚โ€‚โ€‚โ€‚โ”œย  cifar10.pkl Class-conditional CIFAR-10 at 32x32
โ€‚โ€‚โ€‚โ€‚โ”œย  brecahad.pkl BreCaHAD at 512x512, trained from scratch using ADA
โ€‚โ€‚โ€‚โ€‚โ”œย  paper-fig7c-training-set-sweeps Models used in Fig.7c (sweep over training set size)
โ€‚โ€‚โ€‚โ€‚โ”œย  paper-fig11a-small-datasets Models used in Fig.11a (small datasets & transfer learning)
โ€‚โ€‚โ€‚โ€‚โ”œย  paper-fig11b-cifar10 Models used in Fig.11b (CIFAR-10)
โ€‚โ€‚โ€‚โ€‚โ”œย  transfer-learning-source-nets Models used as starting point for transfer learning
โ€‚โ€‚โ€‚โ€‚โ””ย  metrics Feature detectors used by the quality metrics

Requirements

  • Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
  • 1โ€“8 high-end NVIDIA GPUs with at least 12 GB of memory. We have done all testing and development using NVIDIA DGX-1 with 8 Tesla V100 GPUs.
  • 64-bit Python 3.7 and PyTorch 1.7.1. See https://pytorch.org/ for PyTorch install instructions.
  • CUDA toolkit 11.0 or later. Use at least version 11.1 if running on RTX 3090. (Why is a separate CUDA toolkit installation required? See comments in #2.)
  • Python libraries: pip install click requests tqdm pyspng ninja imageio-ffmpeg==0.4.3. We use the Anaconda3 2020.11 distribution which installs most of these by default.
  • Docker users: use the provided Dockerfile to build an image with the required library dependencies.

The code relies heavily on custom PyTorch extensions that are compiled on the fly using NVCC. On Windows, the compilation requires Microsoft Visual Studio. We recommend installing Visual Studio Community Edition and adding it into PATH using "C:\Program Files (x86)\Microsoft Visual Studio\<VERSION>\Community\VC\Auxiliary\Build\vcvars64.bat".

Getting started

Pre-trained networks are stored as *.pkl files that can be referenced using local filenames or URLs:

# Generate curated MetFaces images without truncation (Fig.10 left)
python generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

# Generate uncurated MetFaces images with truncation (Fig.12 upper left)
python generate.py --outdir=out --trunc=0.7 --seeds=600-605 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

# Generate class conditional CIFAR-10 images (Fig.17 left, Car)
python generate.py --outdir=out --seeds=0-35 --class=1 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/cifar10.pkl

# Style mixing example
python style_mixing.py --outdir=out --rows=85,100,75,458,1500 --cols=55,821,1789,293 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

Outputs from the above commands are placed under out/*.png, controlled by --outdir. Downloaded network pickles are cached under $HOME/.cache/dnnlib, which can be overridden by setting the DNNLIB_CACHE_DIR environment variable. The default PyTorch extension build directory is $HOME/.cache/torch_extensions, which can be overridden by setting TORCH_EXTENSIONS_DIR.

Docker: You can run the above curated image example using Docker as follows:

docker build --tag sg2ada:latest .
./docker_run.sh python3 generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

Note: The Docker image requires NVIDIA driver release r455.23 or later.

Legacy networks: The above commands can load most of the network pickles created using the previous TensorFlow versions of StyleGAN2 and StyleGAN2-ADA. However, for future compatibility, we recommend converting such legacy pickles into the new format used by the PyTorch version:

python legacy.py \
    --source=https://nvlabs-fi-cdn.nvidia.com/stylegan2/networks/stylegan2-cat-config-f.pkl \
    --dest=stylegan2-cat-config-f.pkl

Projecting images to latent space

To find the matching latent vector for a given image file, run:

python projector.py --outdir=out --target=~/mytargetimg.png \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/ffhq.pkl

For optimal results, the target image should be cropped and aligned similar to the FFHQ dataset. The above command saves the projection target out/target.png, result out/proj.png, latent vector out/projected_w.npz, and progression video out/proj.mp4. You can render the resulting latent vector by specifying --projected_w for generate.py:

python generate.py --outdir=out --projected_w=out/projected_w.npz \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/ffhq.pkl

Using networks from Python

You can use pre-trained networks in your own Python code as follows:

with open('ffhq.pkl', 'rb') as f:
    G = pickle.load(f)['G_ema'].cuda()  # torch.nn.Module
z = torch.randn([1, G.z_dim]).cuda()    # latent codes
c = None                                # class labels (not used in this example)
img = G(z, c)                           # NCHW, float32, dynamic range [-1, +1]

The above code requires torch_utils and dnnlib to be accessible via PYTHONPATH. It does not need source code for the networks themselves โ€” their class definitions are loaded from the pickle via torch_utils.persistence.

The pickle contains three networks. 'G' and 'D' are instantaneous snapshots taken during training, and 'G_ema' represents a moving average of the generator weights over several training steps. The networks are regular instances of torch.nn.Module, with all of their parameters and buffers placed on the CPU at import and gradient computation disabled by default.

The generator consists of two submodules, G.mapping and G.synthesis, that can be executed separately. They also support various additional options:

w = G.mapping(z, c, truncation_psi=0.5, truncation_cutoff=8)
img = G.synthesis(w, noise_mode='const', force_fp32=True)

Please refer to generate.py, style_mixing.py, and projector.py for further examples.

Preparing datasets

Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels.

Custom datasets can be created from a folder containing images; see python dataset_tool.py --help for more information. Alternatively, the folder can also be used directly as a dataset, without running it through dataset_tool.py first, but doing so may lead to suboptimal performance.

Legacy TFRecords datasets are not supported โ€” see below for instructions on how to convert them.

FFHQ:

Step 1: Download the Flickr-Faces-HQ dataset as TFRecords.

Step 2: Extract images from TFRecords using dataset_tool.py from the TensorFlow version of StyleGAN2-ADA:

# Using dataset_tool.py from TensorFlow version at
# https://github.com/NVlabs/stylegan2-ada/
python ../stylegan2-ada/dataset_tool.py unpack \
    --tfrecord_dir=~/ffhq-dataset/tfrecords/ffhq --output_dir=/tmp/ffhq-unpacked

Step 3: Create ZIP archive using dataset_tool.py from this repository:

# Original 1024x1024 resolution.
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq.zip

# Scaled down 256x256 resolution.
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq256x256.zip \
    --width=256 --height=256

MetFaces: Download the MetFaces dataset and create ZIP archive:

python dataset_tool.py --source=~/downloads/metfaces/images --dest=~/datasets/metfaces.zip

AFHQ: Download the AFHQ dataset and create ZIP archive:

python dataset_tool.py --source=~/downloads/afhq/train/cat --dest=~/datasets/afhqcat.zip
python dataset_tool.py --source=~/downloads/afhq/train/dog --dest=~/datasets/afhqdog.zip
python dataset_tool.py --source=~/downloads/afhq/train/wild --dest=~/datasets/afhqwild.zip

CIFAR-10: Download the CIFAR-10 python version and convert to ZIP archive:

python dataset_tool.py --source=~/downloads/cifar-10-python.tar.gz --dest=~/datasets/cifar10.zip

LSUN: Download the desired categories from the LSUN project page and convert to ZIP archive:

python dataset_tool.py --source=~/downloads/lsun/raw/cat_lmdb --dest=~/datasets/lsuncat200k.zip \
    --transform=center-crop --width=256 --height=256 --max_images=200000

python dataset_tool.py --source=~/downloads/lsun/raw/car_lmdb --dest=~/datasets/lsuncar200k.zip \
    --transform=center-crop-wide --width=512 --height=384 --max_images=200000

BreCaHAD:

Step 1: Download the BreCaHAD dataset.

Step 2: Extract 512x512 resolution crops using dataset_tool.py from the TensorFlow version of StyleGAN2-ADA:

# Using dataset_tool.py from TensorFlow version at
# https://github.com/NVlabs/stylegan2-ada/
python dataset_tool.py extract_brecahad_crops --cropsize=512 \
    --output_dir=/tmp/brecahad-crops --brecahad_dir=~/downloads/brecahad/images

Step 3: Create ZIP archive using dataset_tool.py from this repository:

python dataset_tool.py --source=/tmp/brecahad-crops --dest=~/datasets/brecahad.zip

Training new networks

In its most basic form, training new networks boils down to:

python train.py --outdir=~/training-runs --data=~/mydataset.zip --gpus=1 --dry-run
python train.py --outdir=~/training-runs --data=~/mydataset.zip --gpus=1

The first command is optional; it validates the arguments, prints out the training configuration, and exits. The second command kicks off the actual training.

In this example, the results are saved to a newly created directory ~/training-runs/<ID>-mydataset-auto1, controlled by --outdir. The training exports network pickles (network-snapshot-<INT>.pkl) and example images (fakes<INT>.png) at regular intervals (controlled by --snap). For each pickle, it also evaluates FID (controlled by --metrics) and logs the resulting scores in metric-fid50k_full.jsonl (as well as TFEvents if TensorBoard is installed).

The name of the output directory reflects the training configuration. For example, 00000-mydataset-auto1 indicates that the base configuration was auto1, meaning that the hyperparameters were selected automatically for training on one GPU. The base configuration is controlled by --cfg:

Base config Description
autoย (default) Automatically select reasonable defaults based on resolution and GPU count. Serves as a good starting point for new datasets but does not necessarily lead to optimal results.
stylegan2 Reproduce results for StyleGAN2 config F at 1024x1024 using 1, 2, 4, or 8 GPUs.
paper256 Reproduce results for FFHQ and LSUN Cat at 256x256 using 1, 2, 4, or 8 GPUs.
paper512 Reproduce results for BreCaHAD and AFHQ at 512x512 using 1, 2, 4, or 8 GPUs.
paper1024 Reproduce results for MetFaces at 1024x1024 using 1, 2, 4, or 8 GPUs.
cifar Reproduce results for CIFAR-10 (tuned configuration) using 1 or 2 GPUs.

The training configuration can be further customized with additional command line options:

  • --aug=noaug disables ADA.
  • --cond=1 enables class-conditional training (requires a dataset with labels).
  • --mirror=1 amplifies the dataset with x-flips. Often beneficial, even with ADA.
  • --resume=ffhq1024 --snap=10 performs transfer learning from FFHQ trained at 1024x1024.
  • --resume=~/training-runs/<NAME>/network-snapshot-<INT>.pkl resumes a previous training run.
  • --gamma=10 overrides R1 gamma. We recommend trying a couple of different values for each new dataset.
  • --aug=ada --target=0.7 adjusts ADA target value (default: 0.6).
  • --augpipe=blit enables pixel blitting but disables all other augmentations.
  • --augpipe=bgcfnc enables all available augmentations (blit, geom, color, filter, noise, cutout).

Please refer to python train.py --help for the full list.

Expected training time

The total training time depends heavily on resolution, number of GPUs, dataset, desired quality, and hyperparameters. The following table lists expected wallclock times to reach different points in the training, measured in thousands of real images shown to the discriminator ("kimg"):

Resolution GPUs 1000 kimg 25000 kimg sec/kimg GPU mem CPU mem
128x128 1 4h 05m 4d 06h 12.8โ€“13.7 7.2 GB 3.9 GB
128x128 2 2h 06m 2d 04h 6.5โ€“6.8 7.4 GB 7.9 GB
128x128 4 1h 20m 1d 09h 4.1โ€“4.6 4.2 GB 16.3 GB
128x128 8 1h 13m 1d 06h 3.9โ€“4.9 2.6 GB 31.9 GB
256x256 1 6h 36m 6d 21h 21.6โ€“24.2 5.0 GB 4.5 GB
256x256 2 3h 27m 3d 14h 11.2โ€“11.8 5.2 GB 9.0 GB
256x256 4 1h 45m 1d 20h 5.6โ€“5.9 5.2 GB 17.8 GB
256x256 8 1h 24m 1d 11h 4.4โ€“5.5 3.2 GB 34.7 GB
512x512 1 21h 03m 21d 22h 72.5โ€“74.9 7.6 GB 5.0 GB
512x512 2 10h 59m 11d 10h 37.7โ€“40.0 7.8 GB 9.8 GB
512x512 4 5h 29m 5d 17h 18.7โ€“19.1 7.9 GB 17.7 GB
512x512 8 2h 48m 2d 22h 9.5โ€“9.7 7.8 GB 38.2 GB
1024x1024 1 1d 20h 46d 03h 154.3โ€“161.6 8.1 GB 5.3 GB
1024x1024 2 23h 09m 24d 02h 80.6โ€“86.2 8.6 GB 11.9 GB
1024x1024 4 11h 36m 12d 02h 40.1โ€“40.8 8.4 GB 21.9 GB
1024x1024 8 5h 54m 6d 03h 20.2โ€“20.6 8.3 GB 44.7 GB

The above measurements were done using NVIDIA Tesla V100 GPUs with default settings (--cfg=auto --aug=ada --metrics=fid50k_full). "sec/kimg" shows the expected range of variation in raw training performance, as reported in log.txt. "GPU mem" and "CPU mem" show the highest observed memory consumption, excluding the peak at the beginning caused by torch.backends.cudnn.benchmark.

In typical cases, 25000 kimg or more is needed to reach convergence, but the results are already quite reasonable around 5000 kimg. 1000 kimg is often enough for transfer learning, which tends to converge significantly faster. The following figure shows example convergence curves for different datasets as a function of wallclock time, using the same settings as above:

Training curves

Note: --cfg=auto serves as a reasonable first guess for the hyperparameters but it does not necessarily lead to optimal results for a given dataset. For example, --cfg=stylegan2 yields considerably better FID for FFHQ-140k at 1024x1024 than illustrated above. We recommend trying out at least a few different values of --gamma for each new dataset.

Quality metrics

By default, train.py automatically computes FID for each network pickle exported during training. We recommend inspecting metric-fid50k_full.jsonl (or TensorBoard) at regular intervals to monitor the training progress. When desired, the automatic computation can be disabled with --metrics=none to speed up the training slightly (3%โ€“9%).

Additional quality metrics can also be computed after the training:

# Previous training run: look up options automatically, save result to JSONL file.
python calc_metrics.py --metrics=pr50k3_full \
    --network=~/training-runs/00000-ffhq10k-res64-auto1/network-snapshot-000000.pkl

# Pre-trained network pickle: specify dataset explicitly, print result to stdout.
python calc_metrics.py --metrics=fid50k_full --data=~/datasets/ffhq.zip --mirror=1 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/ffhq.pkl

The first example looks up the training configuration and performs the same operation as if --metrics=pr50k3_full had been specified during training. The second example downloads a pre-trained network pickle, in which case the values of --mirror and --data must be specified explicitly.

Note that many of the metrics have a significant one-off cost when calculating them for the first time for a new dataset (up to 30min). Also note that the evaluation is done using a different random seed each time, so the results will vary if the same metric is computed multiple times.

We employ the following metrics in the ADA paper. Execution time and GPU memory usage is reported for one NVIDIA Tesla V100 GPU at 1024x1024 resolution:

Metric Time GPU mem Description
fid50k_full 13 min 1.8 GB Frรฉchet inception distance[1] against the full dataset
kid50k_full 13 min 1.8 GB Kernel inception distance[2] against the full dataset
pr50k3_full 13 min 4.1 GB Precision and recall[3] againt the full dataset
is50k 13 min 1.8 GB Inception score[4] for CIFAR-10

In addition, the following metrics from the StyleGAN and StyleGAN2 papers are also supported:

Metric Time GPU mem Description
fid50k 13 min 1.8 GB Frรฉchet inception distance against 50k real images
kid50k 13 min 1.8 GB Kernel inception distance against 50k real images
pr50k3 13 min 4.1 GB Precision and recall against 50k real images
ppl2_wend 36 min 2.4 GB Perceptual path length[5] in W, endpoints, full image
ppl_zfull 36 min 2.4 GB Perceptual path length in Z, full paths, cropped image
ppl_wfull 36 min 2.4 GB Perceptual path length in W, full paths, cropped image
ppl_zend 36 min 2.4 GB Perceptual path length in Z, endpoints, cropped image
ppl_wend 36 min 2.4 GB Perceptual path length in W, endpoints, cropped image

References:

  1. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, Heusel et al. 2017
  2. Demystifying MMD GANs, Biล„kowski et al. 2018
  3. Improved Precision and Recall Metric for Assessing Generative Models, Kynkรครคnniemi et al. 2019
  4. Improved Techniques for Training GANs, Salimans et al. 2016
  5. A Style-Based Generator Architecture for Generative Adversarial Networks, Karras et al. 2018

License

Copyright ยฉ 2021, NVIDIA Corporation. All rights reserved.

This work is made available under the Nvidia Source Code License.

Citation

@inproceedings{Karras2020ada,
  title     = {Training Generative Adversarial Networks with Limited Data},
  author    = {Tero Karras and Miika Aittala and Janne Hellsten and Samuli Laine and Jaakko Lehtinen and Timo Aila},
  booktitle = {Proc. NeurIPS},
  year      = {2020}
}

Development

This is a research reference implementation and is treated as a one-time code drop. As such, we do not accept outside code contributions in the form of pull requests.

Acknowledgements

We thank David Luebke for helpful comments; Tero Kuosmanen and Sabu Nadarajan for their support with compute infrastructure; and Edgar Schรถnfeld for guidance on setting up unconditional BigGAN.

More Repositories

1

instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more
Cuda
15,102
star
2

stylegan

StyleGAN - Official TensorFlow Implementation
Python
13,882
star
3

stylegan2

StyleGAN2 - Official TensorFlow Implementation
Python
10,740
star
4

SPADE

Semantic Image Synthesis with SPADE
Python
7,518
star
5

stylegan3

Official PyTorch implementation of StyleGAN3
Python
6,108
star
6

neuralangelo

Official implementation of "Neuralangelo: High-Fidelity Neural Surface Reconstruction" (CVPR 2023)
Python
4,125
star
7

imaginaire

NVIDIA's Deep Imagination Team's PyTorch Library
Python
3,941
star
8

ffhq-dataset

Flickr-Faces-HQ Dataset (FFHQ)
Python
3,483
star
9

tiny-cuda-nn

Lightning fast C++/CUDA neural network framework
C++
3,286
star
10

eg3d

Python
3,089
star
11

MUNIT

Multimodal Unsupervised Image-to-Image Translation
Python
2,564
star
12

SegFormer

Official PyTorch implementation of SegFormer
Python
2,252
star
13

nvdiffrec

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".
Python
2,019
star
14

few-shot-vid2vid

Pytorch implementation for few-shot photorealistic video-to-video translation.
Python
1,780
star
15

stylegan2-ada

StyleGAN2 with adaptive discriminator augmentation (ADA) - Official TensorFlow implementation
Python
1,778
star
16

FUNIT

Translate images to unseen domains in the test time with few example images.
Python
1,545
star
17

PWC-Net

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume, CVPR 2018 (Oral)
Python
1,512
star
18

noise2noise

Noise2Noise: Learning Image Restoration without Clean Data - Official TensorFlow implementation of the ICML 2018 paper
Python
1,356
star
19

alias-free-gan

Alias-Free GAN project website and code
1,320
star
20

prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Python
1,287
star
21

DG-Net

๐Ÿ‘ซ Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral) ๐Ÿ‘ซ
Python
1,268
star
22

nvdiffrast

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering
C++
1,137
star
23

edm

Elucidating the Design Space of Diffusion-Based Generative Models (EDM)
Python
1,014
star
24

Deep_Object_Pose

Deep Object Pose Estimation (DOPE) โ€“ ROS inference (CoRL 2018)
Python
955
star
25

VoxFormer

Official PyTorch implementation of VoxFormer [CVPR 2023 Highlight]
Python
937
star
26

NVAE

The Official PyTorch Implementation of "NVAE: A Deep Hierarchical Variational Autoencoder" (NeurIPS 2020 spotlight paper)
Python
889
star
27

BundleSDF

[CVPR 2023] BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects
Python
842
star
28

ODISE

Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]
Python
779
star
29

GroupViT

Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.
Python
679
star
30

FasterViT

[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention
Python
664
star
31

GA3C

Hybrid CPU/GPU implementation of the A3C algorithm for deep reinforcement learning.
Python
641
star
32

denoising-diffusion-gan

Tackling the Generative Learning Trilemma with Denoising Diffusion GANs https://arxiv.org/abs/2112.07804
Python
634
star
33

genvs

610
star
34

sionna

Sionna: An Open-Source Library for Next-Generation Physical Layer Research
Jupyter Notebook
580
star
35

curobo

CUDA Accelerated Robot Library
Python
545
star
36

FB-BEV

Official PyTorch implementation of FB-BEV & FB-OCC - Forward-backward view transformation for vision-centric autonomous driving perception
Python
518
star
37

Dancing2Music

Python
513
star
38

planercnn

PlaneRCNN detects and reconstructs piece-wise planar surfaces from a single RGB image
Python
502
star
39

pacnet

Pixel-Adaptive Convolutional Neural Networks (CVPR '19)
Python
490
star
40

CALM

Python
486
star
41

DeepInversion

Official PyTorch implementation of Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion (CVPR 2020)
Python
474
star
42

EmerNeRF

PyTorch Implementation of EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision
Python
456
star
43

FAN

Official PyTorch implementation of Fully Attentional Networks
Python
454
star
44

FourCastNet

Initial public release of code, data, and model weights for FourCastNet
Python
421
star
45

GCVit

[ICML 2023] Official PyTorch implementation of Global Context Vision Transformers
Python
414
star
46

intrinsic3d

Intrinsic3D - High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting (ICCV 2017)
C++
411
star
47

nvdiffmodeling

Differentiable rasterization applied to 3D model simplification tasks
Python
404
star
48

flip

A tool for visualizing and communicating the errors in rendered images.
C++
375
star
49

wetectron

Weakly-supervised object detection.
Python
355
star
50

FoundationPose

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
JavaScript
349
star
51

nvdiffrecmc

Official code for the NeurIPS 2022 paper "Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising".
C
341
star
52

geomapnet

Geometry-Aware Learning of Maps for Camera Localization (CVPR2018)
Python
338
star
53

GLAMR

[CVPR 2022 Oral] Official PyTorch Implementation of "GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Camerasโ€.
Python
329
star
54

LSGM

The Official PyTorch Implementation of "LSGM: Score-based Generative Modeling in Latent Space" (NeurIPS 2021)
Python
326
star
55

ssn_superpixels

Superpixel Sampling Networks (ECCV2018)
Python
323
star
56

DiffiT

Official Repository for DiffiT: Diffusion Vision Transformers for Image Generation
315
star
57

FreeSOLO

FreeSOLO for unsupervised instance segmentation, CVPR 2022
Python
307
star
58

long-video-gan

Official PyTorch implementation of LongVideoGAN
Python
297
star
59

neuralrgbd

Neural RGBโ†’D Sensing: Per-pixel depth and its uncertainty estimation from a monocular RGB video
Python
294
star
60

selfsupervised-denoising

High-Quality Self-Supervised Deep Image Denoising - Official TensorFlow implementation of the NeurIPS 2019 paper
Python
293
star
61

Taylor_pruning

Pruning Neural Networks with Taylor criterion in Pytorch
Python
279
star
62

timeloop

Timeloop performs modeling, mapping and code-generation for tensor algebra workloads on various accelerator architectures.
C++
278
star
63

metfaces-dataset

Python
272
star
64

few_shot_gaze

Pytorch implementation and demo of FAZE: Few-Shot Adaptive Gaze Estimation (ICCV 2019, oral)
Python
272
star
65

splatnet

SPLATNet: Sparse Lattice Networks for Point Cloud Processing (CVPR2018)
Python
268
star
66

MinVIS

Python
261
star
67

edm2

Analyzing and Improving the Training Dynamics of Diffusion Models (EDM2)
Python
261
star
68

contact_graspnet

Efficient 6-DoF Grasp Generation in Cluttered Scenes
Python
260
star
69

CenterPose

Single-Stage Keypoint-based Category-level Object Pose Estimation from an RGB Image (ICRA 2022)
Python
251
star
70

trajdata

A unified interface to many trajectory forecasting datasets.
Python
245
star
71

STEP

STEP: Spatio-Temporal Progressive Learning for Video Action Detection. CVPR'19 (Oral)
Python
244
star
72

matchlib

SystemC/C++ library of commonly-used hardware functions and components for HLS.
C++
235
star
73

sim-web-visualizer

Web Based Visualizer for Simulation Environments
Python
231
star
74

SCOPS

SCOPS: Self-Supervised Co-Part Segmentation (CVPR'19)
Python
221
star
75

UMR

Self-supervised Single-view 3D Reconstruction
Python
221
star
76

DiffRL

[ICLR 2022] Accelerated Policy Learning with Parallel Differentiable Simulation
Python
220
star
77

cule

CuLE: A CUDA port of the Atari Learning Environment (ALE)
C++
216
star
78

SSV

Pytorch implementation of SSV: Self-Supervised Viewpoint Learning from Image Collections (CVPR 2020)
Python
214
star
79

DiffPure

A new adversarial purification method that uses the forward and reverse processes of diffusion models to remove adversarial perturbations.
Python
210
star
80

latentfusion

LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation
Python
197
star
81

I2SB

Python
194
star
82

nvbio

NVBIO is a library of reusable components designed to accelerate bioinformatics applications using CUDA.
C++
193
star
83

6dof-graspnet

Implementation of 6-DoF GraspNet with tensorflow and python. This repo has been tested with python 2.7 and tensorflow 1.12.
Python
186
star
84

NVBit

183
star
85

AFNO-transformer

Adaptive FNO transformer - official Pytorch implementation
Python
174
star
86

UnseenObjectClustering

Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation
Python
166
star
87

AL-MDN

Official pytorch implementation of Active Learning for deep object detection via probabilistic modeling (ICCV 2021)
Python
159
star
88

fermat

Fermat is a high performance research oriented physically based rendering system, trying to produce beautiful pictures following the mathematicianโ€™s principle of least time
C++
158
star
89

PoseCNN-PyTorch

PyTorch implementation of the PoseCNN framework
C
156
star
90

mask-auto-labeler

Python
153
star
91

mimicgen_environments

This code corresponds to simulation environments used as part of the MimicGen project.
Python
153
star
92

Bi3D

Python
150
star
93

RVT

Official Code for RVT: Robotic View Transformer for 3D Object Manipulation
Python
147
star
94

condensa

Programmable Neural Network Compression
Python
146
star
95

traffic-behavior-simulation

Python
145
star
96

learningrigidity

Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation (ECCV 2018)
Python
144
star
97

ocrodeg

document image degradation
Jupyter Notebook
142
star
98

ocropus3

Repository collecting all the submodules for the new PyTorch-based OCR System.
Shell
141
star
99

CGBN

CGBN: CUDA Accelerated Multiple Precision Arithmetic (Big Num) using Cooperative Groups
Cuda
139
star
100

PL4NN

Perceptual Losses for Neural Networks: Caffe implementation of loss layers based on perceptual image quality metrics.
Python
138
star