• Stars
    star
    215
  • Rank 183,925 (Top 4 %)
  • Language Cuda
  • License
    Other
  • Created about 11 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

# CUDA backend for the Neural Network Package #

This package provides a CUDA implementation for many of the modules in the base nn package: nn

  • Modules: There are also additional GPU-related modules not found in the nn package.

Installing from source

git clone https://github.com/torch/cunn
cd cunn
luarocks make rocks/cunn-scm-1.rockspec

To use

Simply convert your network model to CUDA by calling :cuda():

local model = nn.Sequential()
model:add(nn.Linear(2,2))
model:add(nn.LogSoftMax())

model:cuda()  -- convert model to CUDA

... and similarly for your tensors:

local input = torch.Tensor(32,2):uniform()
input = input:cuda()
local output = model:forward(input)

... or create them directly as CudaTensors:

local input = torch.CudaTensor(32,2):uniform()
local output = model:forward(input)

To run unit-tests

luajit -l cunn -e 'cunn.test()'

GPU Training Concepts

Performance

  • data should be transferred between main memory and gpu in batches, otherwise the transfer time will be dominated by latency associated with speed of light, and execution overheads, rather than by bandwidth
  • therefore, train and predict using mini-batches
  • allocating GPU memory causes a sync-point, which will noticeably affect performance
    • therefore try to allocate any CudaTensors once, at the start of the program, and then simply copy data backwards and forwards between main memory and existing CudaTensors
  • similarly, try to avoid any operations that implicitly allocate new tensors. For example, if you write:
require 'cutorch'

local a = torch.CudaTensor(1000):uniform()
for it=1,1000 do
  local b = torch.add(a, 1)
end

... this will allocate one thousand new CudaTensors, one for each call to torch.add(a, 1).

Use instead this form:

require 'cutorch'

local a = torch.CudaTensor(1000):uniform()
local b = torch.CudaTensor(1000):uniform()
for it=1,1000 do
  b:add(a, 1)
end

In this form, b is allocated only once, before the loop. Then the b:add(a,1) operation will perform the add inside the GPU kernel, and store the result into the original b CudaTensor. This will run noticeably faster, in general. It's also a lot less likely to eat up arbitrary amounts of memory, and less likely to need frequent calls to collectgarbage(); collectgarbage().

Benchmarking

  • GPU operations will typically continue after an instruction has been issued
  • eg, if you do:
require 'cutorch'
local a = torch.CudaTensor(1000,1000):uniform()
a:add(1)

... the GPU kernel to add 1 will only be scheduled for launch by a:add(1). It might not have completed yet, or even have reached the GPU, at the time that the a:add(1) returns

  • therefore for running wall-clock timings, you should call cutorch.synchronize() before each timecheck point:
require 'cutorch'
require 'sys'

local a = torch.CudaTensor(1000,1000):uniform()
cutorch.synchronize()
start = sys.tic()
a:add(1)
cutorch.synchronize()
print(sys.toc())

More Repositories

1

torch7

http://torch.ch
C
8,966
star
2

nn

Lua
1,334
star
3

tutorials

A series of machine learning tutorials for Torch7
Jupyter Notebook
622
star
4

distro

Torch installation in a self-contained folder
CMake
554
star
5

demos

Demos and tutorials around Torch7.
Lua
355
star
6

cutorch

A CUDA backend for Torch7
Cuda
338
star
7

nngraph

Graph Computation for nn
Lua
299
star
8

threads

Threads for Lua and LuaJIT. Transparent exchange of data between threads is allowed thanks to torch serialization.
Lua
250
star
9

DEPRECEATED-torch7-distro

Torch7: state-of-the-art machine learning algorithms
C
224
star
10

image

An Image toolbox for Torch.
C
209
star
11

qtlua

Lua interface to QT library
C++
204
star
12

optim

A numeric optimization package for Torch.
Lua
196
star
13

luajit-rocks

LuaJIT and luarocks in one location
C
155
star
14

trepl

A pure Lua-based, lightweight REPL for Torch.
Lua
81
star
15

tds

Torch C data structures
C
80
star
16

xlua

A set of useful functions to extend Lua (string, table, ...).
Lua
77
star
17

torch.github.io

Torch's web page.
HTML
75
star
18

ezinstall

One-line install scripts for Torch.
Shell
75
star
19

rocks

Rocks for torch
HTML
72
star
20

class

Oriented Object Programming for Lua
Lua
71
star
21

rnn

Torch recurrent neural networks
Lua
64
star
22

gnuplot

Lua
59
star
23

TH

Standalone C TH library
C
58
star
24

argcheck

A powerful (and blazing fast) argument checker and function overloading system for Lua or LuaJIT
Lua
53
star
25

paths

C
51
star
26

senna

NLP SENNA (http://ml.nec-labs.com/senna) interface to LuaJIT
Lua
49
star
27

sdl2-ffi

A LuaJIT interface to SDL2
Lua
37
star
28

graph

Graph package for Torch
Lua
35
star
29

cwrap

Lua
29
star
30

xt

torch TH/THC c++11 wrapper
C
14
star
31

sys

A system utility package for Torch.
Lua
13
star
32

ffi

FFI bindings for Torch7. Allows LuaJIT-speed access to Tensors and Storages.
Lua
9
star
33

sundown-ffi

A LuaJIT interface to the Sundown library (a Markdown implementation)
C
9
star
34

qttorch

C++
8
star
35

hash

Hashing functions for Torch7
C
8
star
36

cairo-ffi

LuaJIT FFI interface to Cairo Graphics
Lua
7
star
37

luarocks-mirror

because luarocks.org is not completely reliable!
Shell
6
star
38

dok

Lua
6
star
39

rational

rational numbers for lua
Lua
5
star
40

socketfile

adds file-over-sockets support for torch
Lua
5
star
41

env

Sets up default torch environment
Lua
4
star
42

vector

Lua
4
star
43

testme

Unit Testing for Torch.
Lua
2
star