• Stars
    star
    408
  • Rank 105,946 (Top 3 %)
  • Language
    Lua
  • License
    BSD 2-Clause "Sim...
  • Created about 10 years ago
  • Updated about 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Torch-7 FFI bindings for NVIDIA CuDNN

cudnn.torch

Torch7 FFI bindings for NVIDIA cuDNN (R5) kernels!

Modules are API compatible their nn equivalents. Fully unit-tested against nn implementations. Conversion between nn and cudnn is available through cudnn.convert function.

Installation

  • Install cuDNN (version R5 EA)
  • Have at least CUDA 7.0
  • Have libcudnn.so in your library path ($LD_LIBRARY_PATH) (Install cuDNN it from https://developer.nvidia.com/cuDNN )
  • Instead of the previous step, you can copy the library files into /usr/local/cuda/lib64/ or to the corresponding folders in CUDA directory

Modules

-- All inputs have to be 3D or 4D(batch-mode), except ReLU, Tanh, Sigmoid, and BatchNormalization
cudnn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, [dW = 1], [dH = 1], [padW = 0], [padH = 0], [groups = 1])
cudnn.SpatialMaxPooling(kW, kH, dW, dH, padW, padH)
cudnn.SpatialAveragePooling(kW, kH, dW, dH, padW, padH)

-- the pointwise functions take an additional optional argument. if inplace=true then they do operations in-place without using any extra memory for themselves
cudnn.ReLU(inplace[=false])
cudnn.ClippedReLU(ceiling, inplace[=false])
cudnn.Tanh(inplace[=false])
cudnn.Sigmoid(inplace[=false])

-- SoftMax can be run in fast mode or accurate mode. Default is accurate mode.
cudnn.SoftMax(fastMode [= false])          -- SoftMax across each image (just like nn.SoftMax)
cudnn.LogSoftMax()                         -- LogSoftMax across each image (just like nn.LogSoftMax)
cudnn.SpatialSoftMax(fastMode [= false])   -- SoftMax across feature-maps (per spatial location)
cudnn.SpatialLogSoftMax()                  -- LogSoftMax across feature-maps (per spatial location)
cudnn.VolumetricSoftMax(fastMode [= false])   -- SoftMax across feature-maps (per spatial location)
cudnn.VolumetricLogSoftMax()                  -- LogSoftMax across feature-maps (per spatial location)

cudnn.SpatialCrossEntropyCriterion()       -- A spatial version of LogSoftMax + ClassNLLCriterion in one shot
cudnn.VolumetricCrossEntropyCriterion()       -- A volumetric version of LogSoftMax + ClassNLLCriterion in one shot

-- Batch Normalization
cudnn.BatchNormalization(nFeature, eps, momentum, affine) -- same arguments as https://github.com/torch/nn/blob/master/doc/simple.md#nn.BatchNormalization
cudnn.SpatialBatchNormalization(nFeature, eps, momentum, affine)
cudnn.VolumetricBatchNormalization(nFeature, eps, momentum, affine)


-- Volumetric inputs (4D or 5D batched mode)
cudnn.VolumetricConvolution(nInputPlane, nOutputPlane, kT, kW, kH, dT, dW, dH, padT, padW, padH)
cudnn.VolumetricMaxPooling(kT, kW, kH, dT, dW, dH, padT, padW, padH)
cudnn.VolumetricAveragePooling(kT, kW, kH, dT, dW, dH, padT, padW, padH)

-- Recurrent Modules

-- All inputs have to be 3D. Accepts input of seqLength x batch x inputDim, or batch x seqLength x inputDim if batchFirst set to true.
cudnn.RNNReLU(inputDim, outputDim, numberOfLayers, [batchFirst = false])
cudnn.RNNTanh(inputDim, outputDim, numberOfLayers, [batchFirst = false])
cudnn.LSTM(inputDim, outputDim, numberOfLayers, [batchFirst = false])
cudnn.GRU(inputDim, outputDim, numberOfLayers, [batchFirst = false])
cudnn.BLSTM(inputDim, outputDim, numberOfLayers, [batchFirst = false])

Modes

There are two globally availabe modes useful for tuning performance:

require 'cudnn'
cudnn.benchmark = true -- uses the inbuilt cudnn auto-tuner to find the fastest convolution algorithms.
                       -- If this is set to false, uses some in-built heuristics that might not always be fastest.

by default cudnn.benchmark is set to false. Setting to true will improve performance, at the expense of using more memory. The input shape should be the same for each batch, otherwise autotune will re-run for each batch, causing a huge slow-down.

cudnn.fastest = true -- this is like the :fastest() mode for the Convolution modules,
                     -- simply picks the fastest convolution algorithm, rather than tuning for workspace size

by default, cudnn.fastest is set to false. You should set to true if memory is not an issue, and you want the fastest performance

cudnn.verbose = true -- this prints out some more verbose information useful for debugging

by default, cudnn.verbose is set to false.

Conversion between cudnn and nn

Conversion is done by cudnn.convert function which takes a network and backend arguments and goes over network modules recursively substituting equivalents. No memory copy is done, just metatables are swapped. If you don't want to convert all modules you can pass a function as the third argument to cudnn.convert. It will be called at each step, with a module that is currently converted. It is meant to exclude modules i.e. if it returns true, they will be left untouched, otherwise they will be subject to conversion.

Note that you cannot do backward pass when using cuDNN and when your model has batch normalization layers and is in evaluate mode.

net = nn.Sequential()
net:add(nn.SpatialConvolution(3,96,11,11,3,3))
net:add(nn.ReLU())
cudnn.convert(net, cudnn)
print(net)

net = nn.Sequential()
net:add(nn.SpatialConvolution(3,96,11,11,3,3))
net:add(nn.ReLU())
cudnn.convert(net, cudnn, function(module)
   return torch.type(module):find('ReLU')
end)
print(net)

will result in:

nn.Sequential {
  [input -> (1) -> (2) -> output]
  (1): cudnn.SpatialConvolution(3 -> 96, 11x11, 3,3)
  (2): cudnn.ReLU
}
nn.Sequential {
  [input -> (1) -> (2) -> output]
  (1): cudnn.SpatialConvolution(3 -> 96, 11x11, 3,3)
  (2): nn.ReLU
}

Older versions

For version CuDNN R1, checkout the branch R1 For version CuDNN R2, checkout the branch R2 For version CuDNN R3, checkout the branch R3 For version CuDNN R4, checkout the branch R4

More Repositories

1

ganhacks

starter from "How to Train a GAN?" at NIPS2016
10,908
star
2

convnet-benchmarks

Easy benchmarking of all publicly accessible implementations of convnets
Python
2,675
star
3

dcgan.torch

A torch implementation of http://arxiv.org/abs/1511.06434
Lua
1,427
star
4

cvpr2015

Jupyter Notebook
869
star
5

imagenet-multiGPU.torch

an imagenet example in torch.
Lua
395
star
6

torch-android

Torch-7 for Android
CMake
275
star
7

talks

Jupyter Notebook
261
star
8

net2net.torch

Implementation of http://arxiv.org/abs/1511.05641 that lets one build a larger net starting from a smaller one.
Lua
159
star
9

imagenetloader.torch

some old code that i wrote, might be useful to others
Shell
88
star
10

deepmind-atari

Lua
67
star
11

lua---audio

Module for torch to support audio i/o as well as do common operations like dFFT, generate spectrograms etc.
C
67
star
12

inception.torch

Torch port of https://github.com/google/inception
Jupyter Notebook
66
star
13

torch-signal

Signal processing toolbox for Torch 7
Lua
48
star
14

cuda-convnet2.torch

Torch7 bindings for cuda-convnet2 kernels!
Cuda
40
star
15

matio-ffi.torch

A LuaJIT FFI interface to MATIO and simple bindings for torch
Lua
39
star
16

galaxyzoo

Entry for GalaxyZoo challenge
Lua
35
star
17

eyescream

JavaScript
35
star
18

nextml

35
star
19

examplepackage.torch

A hello-world for torch packages
CMake
23
star
20

sunfish.lua

tiny and basic chess engine for lua. Port of https://github.com/thomasahle/sunfish
Lua
20
star
21

kaggle_retinopathy_starter.torch

A starter kit in Torch for Kaggle Diabetic Retinopathy Detection
Lua
19
star
22

neon.torch

Nervana Neon kernels in Torch
Lua
18
star
23

torch-ship-binaries

A page describing how to ship torch binaries without sharing the source code of your scripts.
17
star
24

nnjs

JavaScript
16
star
25

deep_gitstats

Based on SciPy's normalized git stats, adapted for Deep Learning frameworks
Jupyter Notebook
16
star
26

cifar.torch

Lua
15
star
27

torch.js

nodejs bindings for libTH (tensor library that powers torch). for fun!
JavaScript
14
star
28

fakecuda

A convenient package for the lazy torch programmer to leave all your :cuda() calls as-is when running on CPU
Lua
14
star
29

rgbd_streamer

Python
12
star
30

mscoco.torch

Lua
11
star
31

torch-docker

Dockerfile to create an image for Torch7
Shell
10
star
32

NeuralNetworks.jl

hacking torch-like neural networks in Julia
Julia
10
star
33

torch-cheatsheet

A quick page for everything Torch
9
star
34

fftw3-ffi

A LuaJIT FFI interface to FFTW3
Lua
5
star
35

thnb

iTorch notebooks
4
star
36

lzmqstatic

Self-contained statically linked zeromq bindings for lua
C++
3
star
37

nvblog_rnnlstm

HTML
3
star
38

fairmark1

Lua
2
star
39

cunnsparse

Lua
2
star
40

yasa

Yet another Sentiment analyzer. This one uses convolution networks.
Lua
1
star
41

cunnCUDA

some depreceated, ugly and old modules
Cuda
1
star
42

housenumbers_classifier

An attempt on the Stanford Housenumbers dataset
Lua
1
star
43

Bar__ZEbulLonX22L.torch

wtf
1
star