• Stars
    star
    430
  • Rank 101,083 (Top 2 %)
  • Language
    Python
  • License
    Other
  • Created almost 7 years ago
  • Updated about 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Sparse Blocks Networks

Sparse Blocks Network (SBNet)

This repository releases code for our paper SBNet: Sparse Blocks Network for Fast Inference. Please refer to our blog post for more context. Note that benchmarking in the paper was performed with an older version of this repo using TensorFlow 1.2, cuDNN 6.1 and commit cf8ea06.

This repository contains

  1. a TensorFlow custom operations library that implements SBNet,
  2. a Python implementation of sparse ResNet blocks, and
  3. a benchmark for performance comparison with Submanifold Sparse Convolutional Networks.

Prerequisites

Installation was tested under Ubuntu 14.04 and 16.04 with TensorFlow 1.8, CUDA 9.0 and cuDNN 7.1.

Hardware requirements

Code was tested on and compiled for NVIDIA CUDA 6.1, 6.0, 5.2 and 7.0 architectures (Titan XP, GTX 1080Ti, GTX 1080, P100, V100, TitanV, and most Maxwell cards). To compile for an older architecture please modify the Makefile and add the corresponding line, such as -gencode arch=compute_50,code=sm_50 for older cards such as laptop Maxwell. Please refer to CUDA Wikipedia page to lookup the architecture code for your graphics card.

Setup

To build a release version of the library, run

cd sbnet_tensorflow/sbnet_ops && make

To run tests:

cd sbnet_tensorflow/sbnet_ops && make test

The library will be built in sbnet_tensorflow/sbnet_ops/build/libsbnet.so and symlinked to sbnet_tensorflow/sbnet_ops/libsbnet.so. To import the library into your TensorFlow Python code use the following command:

sbnet_module = tf.load_op_library('path_to_library/libsbnet.so')

The following Tensorflow ops are implemented in the op library:

sbnet_module.reduce_mask

sbnet_module.sparse_gather

sbnet_module.sparse_scatter

reduce_mask op converts a dense mask to a list of active block indices.

In the following snippet the mask is expected to be a tensor of dimensions [N,H,W,1]:

    indices = sbnet_module.reduce_mask(
        mask, tf.constant([BCH, BCW], dtype=tf.int32),
        bsize=[BSZH, BSZW],
        boffset=[BOFFSH, BOFFSW],
        bstride=[BSTRH, BSTRW],
        tol=0.5, # pooling threshold to consider a block as active
        avgpool=True) # max pooling by default

[BCH, BCW] are block counts in height and width dimensions. [BSZH, BSZW], [BOFFSH, BOFSFW] and [BSTRH, BSTRW] are block sizes, offsets and strides in H and W dimensions. reduce_mask performs a combined max pooling (or average pooling) operation localized to each block followed by generating a list of triples of indices [(ni, hi, wi)] for blocks where either max or average pooling value exceeds specified tolerance tol. In numpy terms each block is defined as a slice from the input mask of dimensions [N,H,W,1], with following dimensions: [ni, BOFFSH+BSTRH*hi : BOFFSH+BSTRH*hi+BSZH, BOFFSW+BSTRW*wi : BOFFSW+BSTRW*wi+BSZW, :].

The resulting list of indices can then be passed to two other operations: sbnet_module.sparse_scatter and sbnet_module.sparse_gather.

The following snippets illustrate the use of these operations:

    blockStack = sbnet_module.sparse_gather(
        x,
        indices.bin_counts,
        indices.active_block_indices,
        bsize=[BSZH, BSZW], # block size
        boffset=[BOFFSH, BOFFSW], # block offset
        bstride=[BSTRH, BSTRW], # block stride
        transpose=do_transpose)

This operation will use the indices generated by reduce_mask and slice out tensors of channel depth C out of input tensor x of dimensions [N,H,W,C] as illustrated in the following pseudo-code snippet:

    for (ni, hi, wi) in indices.active_block_indices:
        channel_slice = x[ni, BOFFSH+BSTRH*hi : BOFFSH+BSTRH*hi+BSZH, BOFFSW+BSTRW*wi : BOFFSW+BSTRW*wi+BSZW, :]
        blockStack[ni, :, :, :] = channel_slice

If do_transpose is true, a fused transpose operation will also be performed and the resulting tensor will have dimensions [nBlocks, C, BSZH, BSZW]. Any out-of-range values will be padded with zeroes.

The inverse operation is sbnet_module.sparse_scatter. The following snippet illustrates it's use:

    y = sbnet_module.sparse_scatter(
        blockStack,
        indices.bin_counts,
        indices.active_block_indices,
        x, # base tensor to copy to output and overwrite on top of
        bsize=[BSZH, BSZW],
        boffset=[BOFFSH, BOFFSW],
        bstride=[BSTRH, BSTRW],
        add=do_add,
        atomic=False, # use atomic or regular adds
        transpose=do_transpose)

Note that due to a limitation of TensorFlow API an intermediate tensor cannot be modified in place unless it's specified to be a tf.Variable. This necessitates creating an intermediate tensor inside the op and performing a copy which has negative implications for performance. So we created a second version of the op sbnet_module.sparse_scatter_var that expects x to be a tf.Variable and modifies it in place. Using sparse_scatter_var is strongly recommended for maximum performance.

The effect of this operation is opposite to sparse_gather - the input blocks will be written on top of base tensor x, or added to it's contents if do_add is True. The following pseudo-code snippet illustrates the semantics of sparse_scatter:

    for (ni, hi, wi) in indices.active_block_indices:
        if do_add:
            x[ni, BOFFSH+BSTRH*hi : BOFFSH+BSTRH*hi+BSZH, BOFFSW+BSTRW*wi : BOFFSW+BSTRW*wi+BSZW, :]\
                += blockStack[ni, :, :, :]
        else:
            x[ni, BOFFSH+BSTRH*hi : BOFFSH+BSTRH*hi+BSZH, BOFFSW+BSTRW*wi : BOFFSW+BSTRW*wi+BSZW, :]\
                = blockStack[ni, :, :, :]

So the blocks are 'put back in place', however the sizes and strides can be different from those passed to sparse_gather. This enables implementation of sparse ResNet blocks where output resolution is reduced after a 'VALID' convolution. Similar to sparse_gather, if do_transpose is true, a fused transpose operation will also be performed by sparse_scatter, permuting the input [N,C,H,W] dimensions to [N,H,W,C] in the output. Typically the block size for a 'VALID' convolution is reduced by 2 in each spatial dimension for each 3x3 convolution, thus creating non-overlapping outputs. Note that even though currently we support atomic adds in scatter with add=True, the gradient is not implemented at this time if overlapping scatters are used the forward pass.

Benchmarks and tests

Benchmarks for SBNet are located in sbnet_tensorflow/benchmarks/ subdirectory.

To run benchmarks execute:

cd sbnet_tensorflow/benchmarks && ./run_all_behchmarks.bash

Note that we average over a number of runs and test many permutations of parameters so this may take about 20 minutes (on a Titan XP) and will produce a number of .csv files in your /home/user/ directory. We benchmark individual sparse convolutions and entire sparse ResNet blocks on a synthetic mask with variable sparsity.

To run unit tests execute:

cd sbnet_tensorflow/sbnet_ops && make tests

Submanifold Sparse Convolutional Networks Benchmark

For comparison we implemented benchmarking code for Submanifold Sparse Convolutional Networks. Running this benchmark requires Submanifold Sparse Convolutions python package to be installed:

git clone https://github.com/facebookresearch/SparseConvNet.git 

Follow the setup instructions in SparseConvNet repo.

Code integration with Submanifold Sparse Convolutions was tested with git sha 609224df3c0e42b8a1dd4073aaa56fab805096c6. To reset the repo to this sha use the following sequence of commands:

cd SparseConvNet
git checkout 609224df3c0e42b8a1dd4073aaa56fab805096c6

The benchmark code is located in sbnet_tensorflow/benchmark_submanifold directory.

Other notes

Current code is not tuned for performance with non-square block sizes and has specialized implementations for a specific list of block sizes. This includes square blocks of sizes 1 to 34 and a few others. To achieve maximum performance for these sizes you would need to add your custom template instantiations by modifying SIZE_TEMPLATES macro in sparse_gather.cu.

Contributing to this repository

For now, we do not accept pull request to this repo, as we are currently setting up automated CI. If you would like to contribute to this repository, feel free create a GitHub issue.

Citation

If you use our code, please consider cite the following: M. Ren, A. Pokrovsky, B. Yang, and R. Urtasun. SBNet: Sparse Blocks Network for Fast Inference. CoRR, abs/1801.02108, 2018.

@article{ren18sbnet,
  author    = {Mengye Ren and 
               Andrei Pokrovsky and
               Bin Yang and
               Raquel Urtasun},
  title     = {SBNet: Sparse Blocks Network for Fast Inference},
  journal   = {CoRR},
  volume    = {abs/1801.02108},
  year      = {2018},
}

More Repositories

1

deep-neuroevolution

Deep Neuroevolution
Python
1,630
star
2

PPLM

Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.
Python
1,125
star
3

UPSNet

UPSNet: A Unified Panoptic Segmentation Network
Python
639
star
4

go-explore

Code for Go-Explore: a New Approach for Hard-Exploration Problems
Python
553
star
5

PyTorch-NEAT

Python
526
star
6

LaneGCN

[ECCV2020 Oral] Learning Lane Graph Representations for Motion Forecasting
Python
502
star
7

differentiable-plasticity

Implementations of the algorithms described in Differentiable plasticity: training plastic networks with gradient descent, a research paper from Uber AI Labs.
Python
394
star
8

DeepPruner

DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch (ICCV 2019)
Python
351
star
9

parallax

Tool for interactive embeddings visualization
Python
284
star
10

learning-to-reweight-examples

Code for paper "Learning to Reweight Examples for Robust Deep Learning"
Python
269
star
11

jpeg2dct

C++
251
star
12

poet

Paired Open-Ended Trailblazer (POET) and Enhanced POET
Python
235
star
13

intrinsic-dimension

Jupyter Notebook
220
star
14

CoordConv

Python
208
star
15

atari-model-zoo

A binary release of trained deep reinforcement learning models trained in the Atari machine learning benchmark, and a software release that enables easy visualization and analysis of models, and comparison across training algorithms.
Jupyter Notebook
201
star
16

ape-x

This repo replicates the results Horgan et al obtained in "Distributed Prioritized Experience Replay"
Python
188
star
17

EvoGrad

Python
178
star
18

TuRBO

Python
178
star
19

safemutations

safemutations
C++
143
star
20

permute-quantize-finetune

Using ideas from product quantization for state-of-the-art neural network compression.
Python
143
star
21

deconstructing-lottery-tickets

Python
142
star
22

CRISP

Python
131
star
23

metropolis-hastings-gans

Python
112
star
24

GTN

Python
75
star
25

backpropamine

Train self-modifying neural networks with neuromodulated plasticity
Python
73
star
26

loss-change-allocation

Python
61
star
27

MARVIN

Uber's Multi-Agent Routing Value Iteration Network
Python
57
star
28

GOCC

Go
51
star
29

Synthetic-Petri-Dish

Python
42
star
30

RxThreadEffectChecker

Static checker for Rx Threading Effects, based on the Checker Framework
Java
35
star
31

Map-Elites-Evolutionary

Map-Elites based on Evolution Strategies
Python
31
star
32

D3G

Estimating Q(s,s') with Deep Deterministic Dynamics Gradients
Python
29
star
33

java-dependency-validator

Dependency validator detects runtime compatibility issues at build time
Java
23
star
34

vargp

Variational Auto-Regressive Gaussian Processes for Continual Learning
Python
20
star
35

normative-uncertainty

Python
15
star
36

Evolvability-ES

Python
14
star
37

brezel

Starlark
8
star
38

dispatch-optim

Constrainted based optimization
Python
8
star
39

ga-world-models

Python
7
star
40

FSDM

Code tor the SIGDIAL 2019 paper Flexibly-Structured Model for Task-Oriented Dialogues. It implements a deep learning end-to-end differentiable dialogue system model
Python
7
star
41

rl-controller-verification

Quadcopter Verification
Python
6
star
42

go-context-propagate

Go
4
star
43

last-diff-analyzer

A multi-language tool for checking semantic equivalence for code
Go
2
star
44

presto-HDFS-read-data

A dump of some of our Presto logs, for use as part of ongoing Presto/HDFS research and presentations.
2
star
45

xplane-bazel-docker

Bazel Xplane
C++
1
star
46

tailr

TAILR
Python
1
star