Deep Convolutional Inverse Graphics Network

This repository contains the code for the network described in http://arxiv.org/abs/1503.03167.

Use Cases:

Unsupervised Feature Learning
Neural 3D graphics engine: Given a static face image, our model can re-render (hallucinate) the face with arbitrary light and viewpoint transformations. Below is a sample movie generated by our model from a single face photograph -- this is achieved by varying the light neuron and obtaining the image frame prediction at each time step. Same can be done for pose variations (see paper or project website)

Project Website: http://willwhitney.github.io/dc-ign/www/

Citation

@article{kulkarni2015deep,
  title={Deep Convolutional Inverse Graphics Network},
  author={Kulkarni, Tejas D and Whitney, Will and Kohli, Pushmeet and Tenenbaum, Joshua B},
  journal={arXiv preprint arXiv:1503.03167},
  year={2015}
}

Running

Requirements

A CUDA-capable GPU
Torch7
The CUDA Toolkit
cuDNN: NVidia's NN library
cudnn.torch: Torch bindings to cuDNN.

Facebook has some great instructions for installing these over at https://github.com/facebook/fbcunn/blob/master/INSTALL.md

Instructions

Dataset and pre-trained network: The train/test dataset can be downloaded from Dropbox or Amazon S3.

A pretrained network is also available if you just want to see the results: Dropbox, Amazon S3

Update 06/23/16: We've been getting a bunch of traffic due to the (highly recommended!) InfoGAN paper, so I've mirrored the files on S3. If neither Dropbox nor S3 works, please email me ([email protected]) and I'll get it to you another way.

Training a network with separated pose/light/shape etc (disentangled representations)

git clone this repo
Download the dataset and unzip it.
Grab a coffee while you wait for that to happen. It's pretty big.
Run th monovariant_main.lua --help to see the available options.
To train from scratch:
1. run something like th monovariant_main.lua --no_load --name my_first_dcign --datasetdir <path_to_dataset>
2. [The network will save itself to networks/<name> after each epoch]
3. After a couple of epochs, open up visualize_networks.lua and set network_search_str to your network's name. Then you can run th visualize_networks.lua and it will create a folder called renderings with some visualizations of the kinds of faces your network generates.
To use a pretrained network:
1. Download the pretrained network and unzip it.
2. More coffee while you wait.
3. Run a command like th monovariant_main.lua --import <path/to/unzipped/network/dir> --name my_first_dcign --datasetdir <path_to_dataset> that imports the directory of that pretrained net.
4. Or, just do the visualize_networks thing from above with the pretrained network to see what it makes.
The default will run on CPU, to enable cuda please do following
1. --useCuda --deviceId deviceToUse : Default deviceId is 1.
2. For cudnn use --useCuda --useCudnn --deviceId deviceToUse.

Training a network with undifferentiated latents

Instructions coming soon, but if you're not afraid of code that hasn't been cleaned up yet, check out main.lua.

Paper abstract

This paper presents the Deep Convolution Inverse Graphics Network (DC-IGN) that aims to learn an interpretable representation of images that is disentangled with respect to various transformations such as object out-of-plane rotations, lighting variations, and texture. The DC-IGN model is composed of multiple layers of convolution and de-convolution operators and is trained using the Stochastic Gradient Variational Bayes (SGVB) algorithm. We propose training procedures to encourage neurons in the graphics code layer to have semantic meaning and force each group to distinctly represent a specific transformation (pose, light, texture, shape etc.). Given a static face image, our model can re-generate the input image with different pose, lighting or even texture and shape variations from the base face. We present qualitative and quantitative results of the model’s efficacy to learn a 3D rendering engine. Moreover, we also utilize the learnt representation for two important visual recognition tasks: (1) an invariant face recognition task and (2) using the representation as a summary statistic for generative modeling.

Acknowledgements

A big shout-out to all the Torch developers. Torch is simply awesome. We thank Thomas Vetter for giving us access to the Basel face model. T. Kulkarni was graciously supported by the Leventhal Fellowship. This research was supported by ONR award N000141310333, ARO MURI W911NF-13-1-2012 and CBMM. We would also like to thank (y0ast) https://github.com/y0ast for making the variational autoencoder code available online.

willwhitney/dc-ign

willwhitney

Reviews

Repository Details