Deep Convolutional Inverse Graphics Network
This repository contains the code for the network described in http://arxiv.org/abs/1503.03167.
Use Cases:
- Unsupervised Feature Learning
- Neural 3D graphics engine: Given a static face image, our model can re-render (hallucinate) the face with arbitrary light and viewpoint transformations. Below is a sample movie generated by our model from a single face photograph -- this is achieved by varying the light neuron and obtaining the image frame prediction at each time step. Same can be done for pose variations (see paper or project website)
Project Website: http://willwhitney.github.io/dc-ign/www/
Citation
@article{kulkarni2015deep,
title={Deep Convolutional Inverse Graphics Network},
author={Kulkarni, Tejas D and Whitney, Will and Kohli, Pushmeet and Tenenbaum, Joshua B},
journal={arXiv preprint arXiv:1503.03167},
year={2015}
}
Running
Requirements
- A CUDA-capable GPU
- Torch7
- The CUDA Toolkit
- cuDNN: NVidia's NN library
- cudnn.torch: Torch bindings to cuDNN.
Facebook has some great instructions for installing these over at https://github.com/facebook/fbcunn/blob/master/INSTALL.md
Instructions
Dataset and pre-trained network: The train/test dataset can be downloaded from Dropbox or Amazon S3.
A pretrained network is also available if you just want to see the results: Dropbox, Amazon S3
Update 06/23/16: We've been getting a bunch of traffic due to the (highly recommended!) InfoGAN paper, so I've mirrored the files on S3. If neither Dropbox nor S3 works, please email me ([email protected]) and I'll get it to you another way.
Training a network with separated pose/light/shape etc (disentangled representations)
git clone
this repo- Download the dataset and unzip it.
- Grab a coffee while you wait for that to happen. It's pretty big.
- Run
th monovariant_main.lua --help
to see the available options. - To train from scratch:
- run something like
th monovariant_main.lua --no_load --name my_first_dcign --datasetdir <path_to_dataset>
- [The network will save itself to
networks/<name>
after each epoch] - After a couple of epochs, open up
visualize_networks.lua
and setnetwork_search_str
to your network's name. Then you can runth visualize_networks.lua
and it will create a folder calledrenderings
with some visualizations of the kinds of faces your network generates.
- run something like
- To use a pretrained network:
- Download the pretrained network and unzip it.
- More coffee while you wait.
- Run a command like
th monovariant_main.lua --import <path/to/unzipped/network/dir> --name my_first_dcign --datasetdir <path_to_dataset>
that imports the directory of that pretrained net. - Or, just do the
visualize_networks
thing from above with the pretrained network to see what it makes.
- The default will run on CPU, to enable cuda please do following
--useCuda --deviceId deviceToUse
: Default deviceId is1
.- For cudnn use
--useCuda --useCudnn --deviceId deviceToUse
.
Training a network with undifferentiated latents
Instructions coming soon, but if you're not afraid of code that hasn't been cleaned up yet, check out main.lua
.
Paper abstract
This paper presents the Deep Convolution Inverse Graphics Network (DC-IGN) that aims to learn an interpretable representation of images that is disentangled with respect to various transformations such as object out-of-plane rotations, lighting variations, and texture. The DC-IGN model is composed of multiple layers of convolution and de-convolution operators and is trained using the Stochastic Gradient Variational Bayes (SGVB) algorithm. We propose training procedures to encourage neurons in the graphics code layer to have semantic meaning and force each group to distinctly represent a specific transformation (pose, light, texture, shape etc.). Given a static face image, our model can re-generate the input image with different pose, lighting or even texture and shape variations from the base face. We present qualitative and quantitative results of the modelโs efficacy to learn a 3D rendering engine. Moreover, we also utilize the learnt representation for two important visual recognition tasks: (1) an invariant face recognition task and (2) using the representation as a summary statistic for generative modeling.
Acknowledgements
A big shout-out to all the Torch developers. Torch is simply awesome. We thank Thomas Vetter for giving us access to the Basel face model. T. Kulkarni was graciously supported by the Leventhal Fellowship. This research was supported by ONR award N000141310333, ARO MURI W911NF-13-1-2012 and CBMM. We would also like to thank (y0ast) https://github.com/y0ast for making the variational autoencoder code available online.