• Stars
    star
    220
  • Rank 174,477 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 6 years ago
  • Updated almost 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Measuring the Intrinsic Dimension of Objective Landscapes

This repository contains source code necessary to reproduce the results presented in the paper Measuring the Intrinsic Dimension of Objective Landscapes (ICLR 2018):

@inproceedings{li_id_2018_ICLR
  title={Measuring the Intrinsic Dimension of Objective Landscapes},
  author={Li, Chunyuan and Farkhoor, Heerad and Liu, Rosanne and Yosinski, Jason},
  booktitle={International Conference on Learning Representations},
  year={2018}
}

For more on this project, see the Uber AI Labs Blog post.

Contents

There are four steps to use this codebase to reproduce the results in the paper.

  1. Dependencies
  2. Prepare datasets
  3. Subspace training
    1. Subspace training on image classification tasks
    2. Subspace training on reinforcement learning tasks
    3. Subspace training of ImageNet classification on distributed GPUs
  4. Collect and plot results

Dependencies

This code is based on Python 2.7, with the main dependencies being TensorFlow==1.7.0 and Keras==2.1.5. Additional dependencies for running experiments are: numpy, h5py, IPython, colorama, scikit-learn. To install all requirements, run

pip install -r requirements.txt

Prepare datasets

We consider the following datasets: MNIST (Standard, Shuffled-Pixel and Shuffled-Label versions), CIFAR-10, and ImageNet. For convenience, we provide pre-processed and pre-shuffled versions of all datasets (except ImageNet) in one download file. Data are prepared in hdf5 format, with train.h5 and test.h5 representing separate sets of training and test. Each .h5 file has the same fields: images and labels.

Datasets can be downloaded here (zip version is 347 MB, and the full size is 1.5G). To unzip:

tar xvzf dataset.tar.gz

Put the downloaded and unzipped data in any directory and supply the relative path to *.h5 to python script when executing (see Train models for examples). For the below examples, it will be assumed the untarred dataset directory is a (possibly symlinked) subdirectory of intrinsic_dim, so, e.g., ls intrinsic_dim/dataset/mnist/train.h5 should work.

Subspace training

We employ custom Keras layers for the special projection from subspace to full parameter space. The custom random projection layers (layer objects starting with RProj) are in keras-ext and used to construct models, e.g. in intrinsic_dim/model_builders.py. The main training script is train.py and conducts the training loop, taking the following options (among others) as arguments:

  • The two positional arguments specify the paths for training and validation sets (two hdf5 files), respectively; these arguments are required.
  • --vsize: subspace dimension, i.e., number of trainable parameters in the low-dimensional space.
  • --epochs: shortened as -E, number of training epochs (type=int); default 5.
  • --opt: optimization method to be used: e.g. adam (tf.train.AdamOptimizer) or sgd (tf.train.MomentumOptimizer); default sgd.
  • --lr: learning rate; default=.001.
  • --l2, L2 regularization to apply to direct parameters (type=float); default=0.0.
  • --arch, which architecture to use from arch_choices (type=str), default=mnistfc_dir. Example architecture choices for direct training include mnistfc_dir, cifarfc_dir, mnistlenet_dir, cifarlenet_dir; Example architecture choices for subspace training include mnistfc, cifarfc, mnistlenet, cifarlenet.
  • --output: directory to save network checkpoints, tfevent files, etc.
  • projection type: When training a model in a subspace, one and only one of three methods has to be specified to generate the random projection matrix: --denseproj, --sparseproj, --fastfoodproj
  • --depth and --width, Hyperparameters of the fully connected networks: the number and width of layers in FC networks; default: depth=2 and width=200.
  • --minibatch: shortened as -mb, batch size for training; default 128.
  • --d_rate, Dropout rate to apply to certain direct parameters (type=float); default=0.0.
  • --c1, --c2, --d1 and --d2: Hyperparameters of LeNet: number of channels in the first/second conv layer, and width in firse/second in the dense layer; default: c1=6, c2=16, d1=120, d2=84.

For more options, please see standard_parser.py and train.py, or just run ./train.py -h.

1. Subspace training on image classification tasks

First, to run direct training in the full parameter space as the baseline, select an architecture with _dir and do not add projection type. For example, to train a MNIST MLP network 784-200-200-10 (full parameter size: 986,643):

python ./train.py path-to-mnist-data/train.h5 path-to-mnist-data/test.h5 
    -E 100 --opt adam --lr 0.001 --l2 1e-05 --arch mnistfc_dir --depth 2 --width 200

To train the same network, but in a subspace of 1000, with various projection methods:

python ./train.py path-to-mnist-data/train.h5 path-to-mnist-data/test.h5 
    -E 100 --opt adam --lr 0.001 --l2 1e-05 --arch mnistfc --depth 2 --width 200 
    --vsize 1000 --fastfoodproj

python ./train.py path-to-mnist-data/train.h5 path-to-mnist-data/test.h5 
    -E 100 --opt adam --lr 0.001 --l2 1e-05 --arch mnistfc --depth 2 --width 200 
    --vsize 1000 --denseproj

python ./train.py path-to-mnist-data/train.h5 path-to-mnist-data/test.h5 
    -E 100 --opt adam --lr 0.001 --l2 1e-05 --arch mnistfc --depth 2 --width 200 
    --vsize 1000 --sparseproj

To further explore the toy problem in Section 2, you can check out the file train_toy.py.

2. Subspace training on reinforcement learning tasks

For example, to train 2-layer FC with width 200 on CartPole and subspace dimension as 20,

python ./train_dqn.py --vsize 20 --opt adam --lr 0.0001 --l2 0.0001 
--env_name 'CartPole-v0' --arch fc --width 200 --output results/rl_results/fnn_cartpole

3. Subspace training of ImageNet classification on distributed GPUs

An easy adoption of the software package horovod allows for distributed training on many GPUs, which is helpful for large scale tasks like ImageNet. See train_distributed.py for details and for an impression how little the incurred changes are from train.py.

Follow horovod documentations for MPI and NCLL setup. Once they are, the script is executed like this:

mpirun -np 4 ./train_distrbuted.py path-to-imagenet-data/train.h5 path-to-imagenet-data/test.h5 -E 100

Collect and plot results

Once the networks are trained and the results are saved, we extracted key results using Python script. We scan the performnace across different subspace dimensions, find the intrinsic dimension and plot the results.

The results can be plotted using the included IPython notebook plots/main_plots.ipynb. Start the IPython Notebook server:

$ cd plots
$ ipython notebook

Select the main_plots.ipynb notebook and execute the included code. Note that without modification, we have copyed our extracted results into the notebook, and script will output figures in the paper. If you've run your own training and wish to plot results, you'll have to organize your results in the same format instead.

Shortcut: to skip all the work and just see the results, take a look at this notebook with cached plots.

Questions?

Please drop us (Chunyuan, Rosanne or Jason) a line if you have any questions.

More Repositories

1

deep-neuroevolution

Deep Neuroevolution
Python
1,616
star
2

PPLM

Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.
Python
1,102
star
3

UPSNet

UPSNet: A Unified Panoptic Segmentation Network
Python
639
star
4

go-explore

Code for Go-Explore: a New Approach for Hard-Exploration Problems
Python
547
star
5

PyTorch-NEAT

Python
526
star
6

LaneGCN

[ECCV2020 Oral] Learning Lane Graph Representations for Motion Forecasting
Python
476
star
7

sbnet

Sparse Blocks Networks
Python
430
star
8

differentiable-plasticity

Implementations of the algorithms described in Differentiable plasticity: training plastic networks with gradient descent, a research paper from Uber AI Labs.
Python
394
star
9

DeepPruner

DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch (ICCV 2019)
Python
343
star
10

parallax

Tool for interactive embeddings visualization
Python
270
star
11

learning-to-reweight-examples

Code for paper "Learning to Reweight Examples for Robust Deep Learning"
Python
269
star
12

jpeg2dct

C++
251
star
13

poet

Paired Open-Ended Trailblazer (POET) and Enhanced POET
Python
235
star
14

CoordConv

Python
208
star
15

atari-model-zoo

A binary release of trained deep reinforcement learning models trained in the Atari machine learning benchmark, and a software release that enables easy visualization and analysis of models, and comparison across training algorithms.
Jupyter Notebook
201
star
16

ape-x

This repo replicates the results Horgan et al obtained in "Distributed Prioritized Experience Replay"
Python
188
star
17

EvoGrad

Python
178
star
18

TuRBO

Python
159
star
19

safemutations

safemutations
C++
143
star
20

permute-quantize-finetune

Using ideas from product quantization for state-of-the-art neural network compression.
Python
143
star
21

deconstructing-lottery-tickets

Python
142
star
22

CRISP

Python
131
star
23

metropolis-hastings-gans

Python
112
star
24

GTN

Python
75
star
25

backpropamine

Train self-modifying neural networks with neuromodulated plasticity
Python
73
star
26

loss-change-allocation

Python
61
star
27

MARVIN

Uber's Multi-Agent Routing Value Iteration Network
Python
52
star
28

GOCC

Go
51
star
29

Synthetic-Petri-Dish

Python
42
star
30

RxThreadEffectChecker

Static checker for Rx Threading Effects, based on the Checker Framework
Java
35
star
31

Map-Elites-Evolutionary

Map-Elites based on Evolution Strategies
Python
29
star
32

D3G

Estimating Q(s,s') with Deep Deterministic Dynamics Gradients
Python
29
star
33

java-dependency-validator

Dependency validator detects runtime compatibility issues at build time
Java
23
star
34

vargp

Variational Auto-Regressive Gaussian Processes for Continual Learning
Python
20
star
35

normative-uncertainty

Python
15
star
36

Evolvability-ES

Python
14
star
37

brezel

Starlark
8
star
38

dispatch-optim

Constrainted based optimization
Python
8
star
39

ga-world-models

Python
7
star
40

FSDM

Code tor the SIGDIAL 2019 paper Flexibly-Structured Model for Task-Oriented Dialogues. It implements a deep learning end-to-end differentiable dialogue system model
Python
7
star
41

rl-controller-verification

Quadcopter Verification
Python
5
star
42

go-context-propagate

Go
4
star
43

last-diff-analyzer

A multi-language tool for checking semantic equivalence for code
Go
2
star
44

tailr

TAILR
Python
1
star
45

xplane-bazel-docker

Bazel Xplane
C++
1
star