• Stars
    star
    10,908
  • Rank 3,111 (Top 0.07 %)
  • Language
  • Created about 8 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

starter from "How to Train a GAN?" at NIPS2016

(this list is no longer maintained, and I am not sure how relevant it is in 2020)

How to Train a GAN? Tips and tricks to make GANs work

While research in Generative Adversarial Networks (GANs) continues to improve the fundamental stability of these models, we use a bunch of tricks to train them and make them stable day to day.

Here are a summary of some of the tricks.

Here's a link to the authors of this document

If you find a trick that is particularly useful in practice, please open a Pull Request to add it to the document. If we find it to be reasonable and verified, we will merge it in.

1. Normalize the inputs

  • normalize the images between -1 and 1
  • Tanh as the last layer of the generator output

2: A modified loss function

In GAN papers, the loss function to optimize G is min (log 1-D), but in practice folks practically use max log D

  • because the first formulation has vanishing gradients early on
  • Goodfellow et. al (2014)

In practice, works well:

  • Flip labels when training generator: real = fake, fake = real

3: Use a spherical Z

  • Dont sample from a Uniform distribution

cube

  • Sample from a gaussian distribution

sphere

4: BatchNorm

  • Construct different mini-batches for real and fake, i.e. each mini-batch needs to contain only all real images or all generated images.
  • when batchnorm is not an option use instance normalization (for each sample, subtract mean and divide by standard deviation).

batchmix

5: Avoid Sparse Gradients: ReLU, MaxPool

  • the stability of the GAN game suffers if you have sparse gradients
  • LeakyReLU = good (in both G and D)
  • For Downsampling, use: Average Pooling, Conv2d + stride
  • For Upsampling, use: PixelShuffle, ConvTranspose2d + stride

6: Use Soft and Noisy Labels

  • Label Smoothing, i.e. if you have two target labels: Real=1 and Fake=0, then for each incoming sample, if it is real, then replace the label with a random number between 0.7 and 1.2, and if it is a fake sample, replace it with 0.0 and 0.3 (for example).
    • Salimans et. al. 2016
  • make the labels the noisy for the discriminator: occasionally flip the labels when training the discriminator

7: DCGAN / Hybrid Models

  • Use DCGAN when you can. It works!
  • if you cant use DCGANs and no model is stable, use a hybrid model : KL + GAN or VAE + GAN

8: Use stability tricks from RL

  • Experience Replay
    • Keep a replay buffer of past generations and occassionally show them
    • Keep checkpoints from the past of G and D and occassionaly swap them out for a few iterations
  • All stability tricks that work for deep deterministic policy gradients
  • See Pfau & Vinyals (2016)

9: Use the ADAM Optimizer

  • optim.Adam rules!
    • See Radford et. al. 2015
  • Use SGD for discriminator and ADAM for generator

10: Track failures early

  • D loss goes to 0: failure mode
  • check norms of gradients: if they are over 100 things are screwing up
  • when things are working, D loss has low variance and goes down over time vs having huge variance and spiking
  • if loss of generator steadily decreases, then it's fooling D with garbage (says martin)

11: Dont balance loss via statistics (unless you have a good reason to)

  • Dont try to find a (number of G / number of D) schedule to uncollapse training
  • It's hard and we've all tried it.
  • If you do try it, have a principled approach to it, rather than intuition

For example

while lossD > A:
  train D
while lossG > B:
  train G

12: If you have labels, use them

  • if you have labels available, training the discriminator to also classify the samples: auxillary GANs

13: Add noise to inputs, decay over time

14: [notsure] Train discriminator more (sometimes)

  • especially when you have noise
  • hard to find a schedule of number of D iterations vs G iterations

15: [notsure] Batch Discrimination

  • Mixed results

16: Discrete variables in Conditional GANs

  • Use an Embedding layer
  • Add as additional channels to images
  • Keep embedding dimensionality low and upsample to match image channel size

17: Use Dropouts in G in both train and test phase

Authors

  • Soumith Chintala
  • Emily Denton
  • Martin Arjovsky
  • Michael Mathieu

More Repositories

1

convnet-benchmarks

Easy benchmarking of all publicly accessible implementations of convnets
Python
2,675
star
2

dcgan.torch

A torch implementation of http://arxiv.org/abs/1511.06434
Lua
1,427
star
3

cvpr2015

Jupyter Notebook
869
star
4

cudnn.torch

Torch-7 FFI bindings for NVIDIA CuDNN
Lua
408
star
5

imagenet-multiGPU.torch

an imagenet example in torch.
Lua
395
star
6

torch-android

Torch-7 for Android
CMake
275
star
7

talks

Jupyter Notebook
261
star
8

net2net.torch

Implementation of http://arxiv.org/abs/1511.05641 that lets one build a larger net starting from a smaller one.
Lua
159
star
9

imagenetloader.torch

some old code that i wrote, might be useful to others
Shell
88
star
10

deepmind-atari

Lua
67
star
11

lua---audio

Module for torch to support audio i/o as well as do common operations like dFFT, generate spectrograms etc.
C
67
star
12

inception.torch

Torch port of https://github.com/google/inception
Jupyter Notebook
66
star
13

torch-signal

Signal processing toolbox for Torch 7
Lua
48
star
14

cuda-convnet2.torch

Torch7 bindings for cuda-convnet2 kernels!
Cuda
40
star
15

matio-ffi.torch

A LuaJIT FFI interface to MATIO and simple bindings for torch
Lua
39
star
16

galaxyzoo

Entry for GalaxyZoo challenge
Lua
35
star
17

eyescream

JavaScript
35
star
18

nextml

35
star
19

examplepackage.torch

A hello-world for torch packages
CMake
23
star
20

sunfish.lua

tiny and basic chess engine for lua. Port of https://github.com/thomasahle/sunfish
Lua
20
star
21

kaggle_retinopathy_starter.torch

A starter kit in Torch for Kaggle Diabetic Retinopathy Detection
Lua
19
star
22

neon.torch

Nervana Neon kernels in Torch
Lua
18
star
23

torch-ship-binaries

A page describing how to ship torch binaries without sharing the source code of your scripts.
17
star
24

nnjs

JavaScript
16
star
25

deep_gitstats

Based on SciPy's normalized git stats, adapted for Deep Learning frameworks
Jupyter Notebook
16
star
26

cifar.torch

Lua
15
star
27

torch.js

nodejs bindings for libTH (tensor library that powers torch). for fun!
JavaScript
14
star
28

fakecuda

A convenient package for the lazy torch programmer to leave all your :cuda() calls as-is when running on CPU
Lua
14
star
29

rgbd_streamer

Python
12
star
30

mscoco.torch

Lua
11
star
31

torch-docker

Dockerfile to create an image for Torch7
Shell
10
star
32

NeuralNetworks.jl

hacking torch-like neural networks in Julia
Julia
10
star
33

torch-cheatsheet

A quick page for everything Torch
9
star
34

fftw3-ffi

A LuaJIT FFI interface to FFTW3
Lua
5
star
35

thnb

iTorch notebooks
4
star
36

lzmqstatic

Self-contained statically linked zeromq bindings for lua
C++
3
star
37

nvblog_rnnlstm

HTML
3
star
38

fairmark1

Lua
2
star
39

cunnsparse

Lua
2
star
40

yasa

Yet another Sentiment analyzer. This one uses convolution networks.
Lua
1
star
41

cunnCUDA

some depreceated, ugly and old modules
Cuda
1
star
42

housenumbers_classifier

An attempt on the Stanford Housenumbers dataset
Lua
1
star
43

Bar__ZEbulLonX22L.torch

wtf
1
star