Discovering Neural Wirings
By Mitchell Wortsman, Ali Farhadi and Mohammad Rastegari.
In this work we propose a method for discovering neural wirings. We relax the typical notion of layers and instead enable channels to form connections independent of each other. This allows for a much larger space of possible networks. The wiring of our network is not fixed during training -- as we learn the network parameters we also learn the structure itself.
The folder imagenet_sparsity_experiments
contains the code for training sparse neural networks.
Citing
If you find this project useful in your research, please consider citing:
@article{Wortsman2019DiscoveringNW,
title={Discovering Neural Wirings},
author={Mitchell Wortsman and Ali Farhadi and Mohammad Rastegari},
journal={ArXiv},
year={2019},
volume={abs/1906.00586}
}
Set Up
- Clone this repository.
- Using
python 3.6
, create avenv
withpython -m venv venv
and runsource venv/bin/activate
. - Install requirements with
pip install -r requirements.txt
. - Create a data directory
<data-dir>
. If you wish to run ImageNet experiments there must be a folder<data-dir>/imagenet
that contains the ImageNettrain
andval
. By running experiments on CIFAR-10 a folder<data-dir>/cifar10
will automatically be created with the dataset.
Small Scale Experiments
To test a tiny (41k parameters) classifier on CIFAR-10 in static and dynamic settings, see apps/small_scale
.
There are 6 experiment files in total -- 3 for random graphs and 3 for discovering neural wirings (DNW).
You may run an experiment with
python runner.py app:apps/small_scale/<experiment-file> --gpus 0 --data-dir <data-dir>
We recommend running the static and discrete time experiments on a single GPU (as above), though you will need to use
multiple GPUs for the continuous time experiments. To do this you may use --gpus 0 1
.
You should expect the following result:
Model | Accuracy (CIFAR-10) |
---|---|
Static (Random Graph) | 76.1 ± 0.5 |
Static (DNW) | 80.9 ± 0.6 |
Discrete Time (Random Graph) | 77.3 ± 0.7 |
Discrete Time (DNW) | 82.3 ± 0.6 |
Continuous (Random Graph) | 78.5 ± 1.2 |
Continuous (DNW) | 83.1 ± 0.3 |
ImageNet Experiments and Pretrained Models
The experiment files for the ImageNet experiments in the paper may be found in apps/large_scale
.
To train your own model you may run
python runner.py app:apps/large_scale/<experiment-file> --gpus 0 1 2 3 --data-dir <data-dir>
and to evaluate a pretrained model which matches the experiment file use.
python runner.py app:apps/large_scale/<experiment-file> --gpus 0 1 --data-dir <data-dir> --resume <path-to-pretrained-model> --evaluate
Model | Params | FLOPs | Accuracy (ImageNet) |
---|---|---|---|
MobileNet V1 (x 0.25) | 0.5M | 41M | 50.6 |
ShuffleNet V2 (x 0.5) | 1.4M | 41M | 60.3 |
MobileNet V1 (x 0.5) | 1.3M | 149M | 63.7 |
ShuffleNet V2 (x 1) | 2.3M | 146M | 69.4 |
MobileNet V1 Random Graph (x 0.225) | 1.2M | 55.7M | 53.3 |
MobileNet V1 DNW Small (x 0.15) | 0.24M | 22.1M | 50.3 |
MobileNet V1 DNW Small (x 0.225) | 0.4M | 41.2M | 59.9 |
MobileNet V1 DNW (x 0.225) | 1.1M | 42.1M | 60.9 |
MobileNet V1 DNW (x 0.3) | 1.3M | 66.7M | 65.0 |
MobileNet V1 Random Graph (x 0.49) | 1.8M | 170M | 64.1 |
MobileNet V1 DNW (x 0.49) | 1.8M | 154M | 70.4 |
You may also add the flag --fast_eval
to make the model smaller and speed up inference. Adding --fast_eval
removes the neurons which die.
As a result, the first conv, last linear layer, and all operations throughout have much fewer input and output channels. You may add both
--fast_eval
and --use_dgl
to obtain a model for evaluation that matches the theoretical FLOPs by using a graph implementation via
https://www.dgl.ai/. You must then install the version of dgl
which matches your CUDA and Python version
(see this for more details). For example, we run
pip uninstall dgl
pip install https://s3.us-east-2.amazonaws.com/dgl.ai/wheels/cuda9.2/dgl-0.3-cp36-cp36m-manylinux1_x86_64.whl
and finally
python runner.py app:apps/large_scale/<experiment-file> --gpus 0 --data-dir <data-dir> --resume <path-to-pretrained-model> --evaluate --fast_eval --use_dgl --batch_size 256
Other Methods of Discovering Neural Wirings
To explore other methods of discovering neural wirings see apps/medium_scale
.
You may run an experiment with
python runner.py app:apps/medium_scale/<experiment-file> --gpus 0 --data-dir <data-dir>
To replicate the one-shot pruning or fine tuning experiments, you must first use mobilenetv1_complete_graph.yml
to obtain the initialization init.pt
and the final epoch.