• Stars
    star
    126
  • Rank 279,136 (Top 6 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 7 years ago
  • Updated almost 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for "A-NICE-MC: Adversarial Training for MCMC"

A-NICE-MC: Adversarial Training for MCMC

Tensorflow implementation for the paper A-NICE-MC: Adversarial Training for MCMC, NIPS 2017.

by Jiaming Song, Shengjia Zhao and Stefano Ermon, Stanford Artificial Intelligence Laboratory


A-NICE-MC is a framework that trains a parametric Markov Chain Monte Carlo proposal. It achieves higher performance than traditional nonparametric proposals, such as Hamiltonian Monte Carlo (HMC). This repository provides code to replicate the experiments, as well as providing grounds for further research.

A-NICE-MC stands for Adversarial Non-linear Independent Component Estimation Monte Carlo, in that:

  • The framework utilizes a parametric proposal for Markov Chain Monte Carlo (MC).
  • The proposal is represented through Non-linear Independent Component Estimation (NICE).
  • The NICE network is trained through adversarial methods (A); see jiamings/markov-chain-gan.

Running the Experiments

The code depends on tensorflow >= 1.0, numpy, scipy, matplotlib, and pandas. It has been tested on both Python 2 and Python 3.

The Effective Sample Size metric for evaluating MCMC algorithms will appear on screen, and is stored in logs/[experiment_name]/ess.csv.

Analytical Expression Targets

To run the Ring experiments:

python examples/nice_ring2d.py
python examples/nice_lord_of_rings.py

To run the Mixture of Gaussian experiments:

python examples/nice_mog2.py
python examples/nice_mog6.py

Bayesian Logistic Regression Posterior Inference

To run the experiment on Australian dataset:

python examples/nice_australian.py

To run the experiment on the German dataset:

python examples/nice_german.py

To run the experiment on the Heart dataset:

python examples/nice_heart.py

The resulting ESS should be at least as good as reported in the paper (if not better, train it for longer iterations).

The running time depends on the machine, so only the ratio between running times of A-NICE-MC and HMC is particularly meaningful. Sanity check: during one update HMC computes the entire dataset for 40 + 1 times (HMC steps + MH step), while A-NICE-MC computes the entire dataset for only 1 time (only for MH step); so A-NICE-MC at this stage should not be 40x faster, but it seems reasonable that it is 10x faster.

Visualization

Visualizing samples from a single chain (in the 2d case). Details are in figs/animation.ipynb (install ffmpeg if necessary).

How A-NICE-MC Works

In general, Markov Chain Monte Carlo methods estimate a density p(x) by sampling through a Markov Chain, where the transition kernel has two components:

  • A proposal p(x_|x) that proposes a new x_ given the previous x. The proposal should satisfy detailed balance.
  • A Metropolis-Hastings acceptance step (MH step), which accepts or rejects x_ according to p(x) and p(x_).

It might be tempting to use any generative model as the proposal; however, training is difficult because the kernel is non-differentiable, and score-based gradient estimator are not effective when initially the rejection rate is high.

Therefore, we draw ideas from Hamiltonian Monte Carlo and NICE. NICE is a deterministic, invertible transformation that preserves volume; HMC introduces an auxiliary variable that reduces random walk behavior.

A NICE Proposal

We can therefore use a NICE network x_, v_ = f(x, v) as our proposal, where v is the auxiliary variable we sample independently from x at every step. Hence, we can treat f(x, v) as some "implicit generative model", which can be used to construct p(x_|x).

We use the following proposal to ensure p(x_, v_|x, v) = p(x, v|x_, v_) for all (x, v) and (x_, v_) pairs, thereby satisfying the detailed balance condition directly.

  • For p = 0.5, x_, v_ = f(x, v)
  • For p = 0.5, x_, v_ = f^{-1}(x, v)

Training

Then, we can utilize adversarial training to train the Markov Chain from f(x, v) (not the proposal), thereby making the entire objective differentiable.

Wait! How can you train on a differentiable model that is totally different from the MCMC kernel that you sample from?

Due to the invertibility of the NICE network, if the forward operation tranforms a point in the (x, v) manifold to another point in the (x, v) manifold, then the backward operation will do the same. Meanwhile, the forward operation will encourage the points to move toward p(x, v) and the MH step tends to reject backward operations, thereby removing random-walk behavior.

Increasing ESS with Pairwise Discriminator

Ideally we would like to reduce autocorrelation between the samples from the chain. This can be done by simply providing a pair of correlated data to the discriminator as generated data, so that the generator has the incentive to generate samples that are less correlated.

Suppose two settings of generation:

  • x -> z1
  • z -> z2 -> stop_gradient(z2) -> z3

where x is the "true data", z is the starting distribution, and z1, z2, and z3 are the distribution that are generated by the model. In the case if pairwise discriminator, we consider the two type of pairs: (x, z1) and (z2, z3). The optimal solution for the generator (given a perfect discriminator) is to let p(z1), p(z2), p(z3) to be the data distribution and z2 and z3 are uncorrelated.

This is illustrated in the following figure:

Bootstrap

In order to obtain samples to begin training, we adopt a bootstrap technique to obtain samples from our own model, which allows us to improve the sample quality and the model quality iteratively.

Currently, we draw the initial samples from the untrained model (with randomly initialized samples). This sounds a bit crazy, but it works in our experiments. For domains with higher dimensions it might be better to start with a chain that has higher acceptance rate.

Citation

If you use this code for your research, please cite our paper:

@article{song2017nice,
  title={A-NICE-MC: Adversarial Training for MCMC},
  author={Song, Jiaming and Zhao, Shengjia and Ermon, Stefano},
  journal={arXiv preprint arXiv:1706.07561},
  year={2017}
}

Related Projects

markov-chain-gan: training a transition operator for a Markov chain. This contains part of the image generation experiments for this paper.

Contact

[email protected]

This method is very new and experimental, so there might be cases where this fails (or because of poor parameter choices). We welcome all kinds of suggestions - including but not limited to

  • improving the method (MMD loss for v? other bootstrap techniques?)
  • additional experiments in other domains (some other applications that this method would shine?)
  • and how to improve the current code to make experiments more scalable (save and load feature?)

If something does not work as you would expect - please let me know. It helps everyone to know the strengths as well as weaknesses of the method.

More Repositories

1

cs228-notes

Course notes for CS228: Probabilistic Graphical Models.
SCSS
1,863
star
2

ddim

Denoising Diffusion Implicit Models
Python
1,300
star
3

SDEdit

PyTorch implementation for SDEdit: Image Synthesis and Editing with Stochastic Differential Equations
Python
933
star
4

ncsn

Noise Conditional Score Networks (NeurIPS 2019, Oral)
Python
630
star
5

ncsnv2

The official PyTorch implementation for NCSNv2 (NeurIPS 2020)
Python
262
star
6

CSDI

Codes for "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation"
Jupyter Notebook
253
star
7

Wifi_Activity_Recognition

Code for IEEE Communication Magazine (A Survey on Behaviour Recognition Using WiFi Channle State Information)
Jupyter Notebook
237
star
8

Variational-Ladder-Autoencoder

Implementation of VLAE
Python
216
star
9

MA-AIRL

Multi-Agent Adversarial Inverse Reinforcement Learning, ICML 2019.
Python
181
star
10

sliced_score_matching

Code for reproducing results in the sliced score matching paper (UAI 2019)
Python
133
star
11

neuralsort

Code for "Stochastic Optimization of Sorting Networks using Continuous Relaxations", ICLR 2019.
Python
133
star
12

tile2vec

Implementation and examples for Tile2Vec
Python
110
star
13

flow-gan

Code for "Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models", AAAI 2018.
Python
104
star
14

GraphScoreMatching

Official implementation for the paper: Permutation Invariant Graph Generation via Score-Based Generative Modeling
Python
97
star
15

Sequential-Variational-Autoencoder

Implementation of Sequential Variational Autoencoder
Python
84
star
16

multiagent-gail

Python
80
star
17

markov-chain-gan

Code for "Generative Adversarial Training for Markov Chains" (ICLR 2017 Workshop)
Python
79
star
18

ssdkl

Code that accompanies the paper Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance
Python
72
star
19

MetaIRL

Meta-Inverse Reinforcement Learning with Probabilistic Context Variables
Python
68
star
20

smile-mi-estimator

PyTorch implementation for the ICLR 2020 paper "Understanding the Limitations of Variational Mutual Information Estimators"
Jupyter Notebook
68
star
21

PatchDrop

PyTorch Implementation of `Learning to Process Fewer Pixels` - [CVPR20 (Oral)]
Python
66
star
22

generative_adversary

Code for the unrestricted adversarial examples paper (NeurIPS 2018)
Python
63
star
23

pirank

PiRank: Learning to Rank via Differentiable Sorting
Python
60
star
24

graphite

Code for Graphite iterative graph generation
Python
55
star
25

CalibratedModelBasedRL

Code for "Calibrated Model-Based Deep Reinforcement Learning", ICML 2019.
Python
54
star
26

ODS

Code for "Diversity can be Transferred: Output Diversification for White- and Black-box Attacks"
Python
53
star
27

subsets

Code for Reparameterizable Subset Sampling via Continuous Relaxations, IJCAI 2019.
Python
49
star
28

necst

Neural Joint-Source Channel Coding
Python
44
star
29

cs323-notes

Course notes for CS323: Automated Reasoning
CSS
40
star
30

mintnet

MintNet: Building Invertible Neural Networks with Masked Convolutions
Python
38
star
31

f-EBM

Code for "Training Deep Energy-Based Models with f-Divergence Minimization" ICML 2020
Python
35
star
32

alignflow

Python
33
star
33

higher_order_invariance

Code for "Accelerating Natural Gradient with Higher-Order Invariance"
MATLAB
29
star
34

lagvae

Lagrangian VAE
Python
28
star
35

BiasAndGeneralization

Jupyter Notebook
26
star
36

BCD-Nets

Code for `BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery`, Neurips 2021
Python
24
star
37

fast_feedforward_computation

Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021
Jupyter Notebook
24
star
38

Crop_Yield_Prediction

Python
23
star
39

NDA

Python
23
star
40

sparse_gen

Code for "Modeling Sparse Deviations for Compressed Sensing using Generative Models", ICML 2018
Python
23
star
41

self-similarity-prior

Self-Similarity Priors: Neural Collages as Differentiable Fractal Representations
Jupyter Notebook
22
star
42

dail

The Official Implementation of Domain Adaptive Imitation Learning (DAIL)
Python
22
star
43

lag-fairness

Python
22
star
44

STGAN

PyTorch Implementation of STGAN for Cloud Removal in Satellite Images.
Python
22
star
45

bgm

Code for "Boosted Generative Models", AAAI 2018.
Python
20
star
46

best-arm-delayed

Code for "Best arm identification in multi-armed bandits with delayed feedback", AISTATS 2018.
Python
19
star
47

f-dre

Featurized Density Ratio Estimation
Jupyter Notebook
18
star
48

WikipediaPovertyMapping

Implementation of Geolocated Articles Processing and Poverty Mapping - [KDD19]
Jupyter Notebook
18
star
49

fairgen

Fair Generative Modeling via Weak Supervision
Jupyter Notebook
18
star
50

Neural-PDE-Solver

Python
15
star
51

SPN_Variational_Inference

PyTorch implementation for "Probabilistic Circuits for Variational Inference in Discrete Graphical Models", NeurIPS 2020
Python
15
star
52

acl

Code for "Adversarial Constraint Learning for Structured Prediction"
Python
14
star
53

f-wgan

Code for "Bridging the Gap between f-GANs and Wasserstein GANs", ICML 2020
Jupyter Notebook
14
star
54

HyperSPN

PyTorch implementation for "HyperSPNs: Compact and Expressive Probabilistic Circuits", NeurIPS 2021
Python
13
star
55

dre-infinity

Density Ratio Estimation via Infinitesimal Classification (AISTATS 2022 Oral)
Python
13
star
56

EfficientObjectDetection

PyTorch Implementation of Efficient Object Detection in Large Images
Python
8
star
57

streamline-vi-csp

C
7
star
58

bayes-opt

Python
4
star
59

BestArmIdentification

Python
3
star
60

permanent_adaptive

Python
3
star
61

rbpf_fireworks

Python
2
star
62

PretrainingWikiSatNet

Python
2
star
63

pestat

Keep pestat great
Shell
2
star
64

weighted-rademacher

Python
2
star
65

gac

Python
2
star