• This repository has been archived on 02/Nov/2021
  • Stars
    star
    559
  • Rank 79,673 (Top 2 %)
  • Language
    Lua
  • License
    Apache License 2.0
  • Created about 9 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Autograd automatically differentiates native Torch code

Autograd

Slack Status

Build Status

Autograd automatically differentiates native Torch code. Inspired by the original Python version.

Scope

Autograd has multiple goals:

  • provide automatic differentiation of Torch expressions
  • support arbitrary Torch types (e.g. transparent and full support for CUDA-backed computations)
  • full integration with nn modules: mix and match auto-differentiation with user-provided gradients
  • the ability to define any new nn compliant Module with automatic differentiation
  • represent complex evaluation graphs, which is very useful to describe models with multiple loss functions and/or inputs
  • graphs are dynamic, i.e. can be different at each function call: for loops, or conditional, can depend on intermediate results, or on input parameters
  • enable gradients of gradients for transparent computation of Hessians

Updates

Jan 21, 2016: Two big new user-facing features:

  • First, we now support direct assignment (so you can now do x[k] = v inside optimize=true autograd code, where k can be a number, table or LongTensor, and v can be a tensor or number, whichever is appropriate. Here's a few examples.
  • Second, you can now take 2nd-order and higher gradients (supported in optimized mode. Either run autograd.optimize(true) or take the derivative of your function using df = autograd(f, {optimize = true}). Check out a simple example in our tests
  • Plus, lots of misc bugfixes and new utilities to help with tensor manipulation (autograd.util.cat can work with numbers, or tensors of any time. autograd.util.cast can cast a nested table of tensors to any type you like).

Nov 16, 2015: Runtime performance was improved dramatically, as well as ease of use with better debugging tools. Performance is now within 30% of a statically described version of an equivalent model (nn and nngraph).

  • a compute DAG is now generated and cached based on input tensors's dimensions
  • the DAG is compiled into Lua code, with several optimizations
  • all intermediate states (tensors) are saved and re-used in a tensor pool
  • debugging facilities have been added: when debugging is enabled, a nan or inf will trigger a callback, that can be used to render a DOT representation of the graph (see debugging)
  • now restricting user code to the functional API of Torch (a:add(b) forbidden, use res = torch.add(a,b) instead)
  • additional control flags can be passed to d(f, {...}) to compute subparts of the graph (fprop or bprop), useful to generate a compiled fprop (see fine grained control)

Nov 6, 2015: initial release.

Install

  • Install Torch (instructions here).
  • Retrieve this repo
  • Run: luarocks make

Examples

Autograd example

A simple neural network with a multinomial logistic loss:

-- libraries:
t = require 'torch'
grad = require 'autograd'

-- define trainable parameters:
params = {
   W = {
      t.randn(100,50),
      t.randn(50,10),
   },
   b = {
      t.randn(50),
      t.randn(10),
   }
}

-- define model
neuralNet = function(params, x, y)
   local h1 = t.tanh(x * params.W[1] + params.b[1])
   local h2 = t.tanh(h1 * params.W[2] + params.b[2])
   local yHat = h2 - t.log(t.sum(t.exp(h2)))
   local loss = - t.sum(t.cmul(yHat, y))
   return loss
end

-- gradients:
dneuralNet = grad(neuralNet)

-- some data:
x = t.randn(1,100)
y = t.Tensor(1,10):zero() y[1][3] = 1

-- compute loss and gradients wrt all parameters in params:
dparams, loss = dneuralNet(params, x, y)

-- in this case:
--> loss: is a scalar (Lua number)
--> dparams: is a table that mimics the structure of params; for
--  each Tensor in params, dparams provides the derivatives of the
--  loss wrt to that Tensor.

Important note: only variables packed in the first argument of the eval function will have their gradients computed. In the example above, if the gradients wrt x are needed, then x simply has to be moved into params. The params table can be arbitrarily nested.

See more complete examples in examples.

Assuming the model defined above, and a training set of {x,y} pairs, the model can easily be optimized using SGD:

for i,sample in datasetIterator() do
   -- estimate gradients wrt params:
   local grads, loss = dneuralNet(params, sample.x, sample.y)

   -- SGD step:
   for i = 1,#params.W do
      -- update params with an arbitrary learning rate:
      params.W[i]:add(-.01, grads.W[i])
      params.b[i]:add(-.01, grads.b[i])
   end
end

Optimization

To enable the optimizer, which produces optimized representations of your loss and gradient functions (as generated lua code):

grad = require 'autograd'
grad.optimize(true) -- global
local df = grad(f, { optimize = true }) -- for this function only
local grads = df(params)

Benefits:

  • Intermediate tensors are re-used between invocations of df(), dramatically reducing the amount of garbage produced.
  • Zero overhead from autograd itself, once the code for computing your gradients has been generated.
  • On average, a 2-3x overall performance improvement.

Caveats:

  • The generated code is cached based on the dimensions of the input tensors. If your problem is such that you have thousands of unique tensors configurations, you won't see any benefit.
  • Each invocation of grad(f) produces a new context for caching, so be sure to only call this once.
  • WARNING: Variables that you close over in an autograd function in optimize mode will never be updated -- they are treated as static as soon as the function is defined.
  • WARNING: If you make extensive use of control flow (any if-statements, for-loops or while-loops), you're better off using direct mode. In the best case, the variables used for control flow will be passed in as arguments, and trigger recompilation for as many possible branches as exist in your code. In the worst case, the variables used for control flow will be either computed internally, closed over, or not change in size or rank, and control flow changes will be completely ignored.

Wrapping nn modules

The nn library provides with all sorts of very optimized primitives, with gradient code written and optimized manually. Sometimes it's useful to rely on these for maximum performance.

Here we rewrite the neural net example from above, but this time relying on a mix of nn primitives and autograd-inferred gradients:

-- libraries:
t = require 'torch'
grad = require 'autograd'

-- define trainable parameters:
params = {
   linear1 = {
      t.randn(50,100), -- note that parameters are transposed (nn convention for nn.Linear)
      t.randn(50),
   },
   linear2 = {
      t.randn(10,50),
      t.randn(10),
   }
}

-- instantiate nn primitives:
-- Note: we do this outside of the eval function, so that memory
-- is only allocated once; moving these calls to within the body
-- of neuralNet would work too, but would be quite slower.
linear1 = grad.nn.Linear(100, 50)
acts1 = grad.nn.Tanh()
linear2 = grad.nn.Linear(50, 10)
acts2 = grad.nn.Tanh()

-- define model
neuralNet = function(params, x, y)
   local h1 = acts1(linear1(params.linear1, x))
   local h2 = acts2(linear2(params.linear2, h1))
   local yHat = h2 - t.log(t.sum(t.exp(h2)))
   local loss = - t.sum(t.cmul(yHat, y))
   return loss
end

-- gradients:
dneuralNet = grad(neuralNet)

-- some data:
x = t.randn(1,100)
y = t.Tensor(1,10):zero() y[1][3] = 1

-- compute loss and gradients wrt all parameters in params:
dparams, loss = dneuralNet(params, x, y)

This code is stricly equivalent to the code above, but will be more efficient (this is especially true for more complex primitives like convolutions, ...).

3rd party libraries that provide a similar API to nn can be registered like this:

local customnnfuncs = grad.functionalize('customnn')  -- requires 'customnn' and wraps it
module = customnnfuncs.MyNnxModule(...)

-- under the hood, this is already done for nn:
grad.nn = grad.functionalize('nn')

On top of this functional API, existing nn modules and containers, with arbitarily nested parameters, can also be wrapped into functions. This is particularly handy when doing transfer learning from existing models:

-- Define a standard nn model:
local model = nn.Sequential()
model:add(nn.SpatialConvolutionMM(3, 16, 3, 3, 1, 1, 1, 1))
model:add(nn.Tanh())
model:add(nn.Reshape(16*8*8))
model:add(nn.Linear(16*8*8, 10))
model:add(nn.Tanh())
-- Note that this model could have been pre-trained, and reloaded from disk.

-- Functionalize the model:
local modelf, params = autograd.functionalize(model)

-- The model can now be used as part of a regular autograd function:
local loss = autograd.nn.MSECriterion()
neuralNet = function(params, x, y)
   local h = modelf(params, x)
   return loss(h, y)
end

-- Note: the parameters are always handled as an array, passed as the first
-- argument to the model function (modelf). This API is similar to the other
-- model primitives we provide (see below in "Model Primitives").

-- Note 2: if there are no parameters in the model, then you need to pass the input only, e.g.:
local model = nn.Sigmoid()
-- Functionalize :
local sigmoid = autograd.functionalize(model)

-- The sigmoid can now be used as part of a regular autograd function:
local loss = autograd.nn.MSECriterion()
neuralNet = function(params, x, y)
   local h = sigmoid(x) -- please note the absence of params arg
   return loss(h, y)
end

Creating auto-differentiated nn modules

For those who have a training pipeline that heavily relies on the torch/nn API, torch-autograd defines the autograd.nn.AutoModule and autograd.nn.AutoCriterion functions. When given a name, it will create a new class locally under autograd.auto.name. This class can be instantiated by providing a function, a weight, and a bias. They are also clonable, savable and loadable. Here we show an example of writing a 2-layer fully-connected module and an MSE criterion using AutoModule and AutoCriterion:

Here we rewrite the neural net example from above, but this time relying on a mix of nn primitives and autograd-inferred gradients:

-- Define functions for modules
-- Linear
local linear  = function(input, weight, bias)
   local y = weight * input + bias
   return y
end

-- Linear + ReLU
local linearReLU  = function(input, weight, bias)
   local y = weight * input + bias
   local output = torch.mul( torch.abs( y ) + y, 0.5)
   return output
end

-- Define function for criterion
-- MSE
local mse = function(input, target)
   local buffer = input-target
   return torch.sum( torch.cmul(buffer, buffer) ) / (input:dim() == 2 and input:size(1)*input:size(2) or input:size(1))
end

-- Input size, nb of hiddens
local inputSize, outputSize = 100, 1000

-- Define auto-modules and auto-criteria
-- and instantiate them immediately
local autoModel = nn.Sequential()
local autoLinear1ReLU = autograd.nn.AutoModule('AutoLinearReLU')(linearReLU, linear1.weight:clone(), linear1.bias:clone())
local autoLinear2 = autograd.nn.AutoModule('AutoLinear')(linear, linear2.weight:clone(), linear2.bias:clone())
autoModel:add( autoLinear1ReLU )
autoModel:add( autoLinear2 )
local autoMseCriterion = autograd.nn.AutoCriterion('AutoMSE')(mse)
-- At this point, print(autograd.auto) should yield
-- {
--   AutoLinearReLU : {...}
--   AutoMSE : {...}
--   AutoLinear : {...}
-- }

-- Define number of iterations and learning rate
local n = 100000
local lr = 0.001
local autoParams,autoGradParams = autoModel:parameters()
local unifomMultiplier = torch.Tensor(inputSize):uniform()

-- Train: this should learn how to approximate e^(\alpha * x)
-- with an mlp aith both auto-modules and regular nn
for i=1,n do
   autoModel:zeroGradParameters()
   local input = torch.Tensor(inputSize):uniform(-5,5):cmul(uniformMultiplier)
   local target = input:clone():exp()
   -- Forward
   local output = autoModel:forward(input)
   local mseOut = autoMseCriterion:forward(output, target)
   -- Backward
   local gradOutput = autoMseCriterion:backward(output, target)
   local gradInput = autoModel:backward(input, gradOutput)
   for i=1,#autoParams do
      autoParams[i]:add(-lr, autoGradParams[i])
   end
end

Gradient checks

For ease of mind (and to write proper tests), a simple grad checker is provided. See test.lua for complete examples. In short, it can be used like this:

-- Parameters:
local W = t.Tensor(32,100):normal()
local x = t.Tensor(100):normal()

-- Function:
local func = function(inputs)
   return t.sum(inputs.W * inputs.x)
end

-- Check grads wrt all inputs:
tester:assert(gradcheck(func, {W=W, x=x}), 'incorrect gradients on W and x')

Model Primitives

To ease the construction of new models, we provide primitives to generate standard models.

Each constructor returns 2 things:

  • f: the function, can be passed to grad(f) to get gradients
  • params: the list of trainable parameters

Once instantiated, f and params can be used like this:

input = torch.randn(10)
pred = f(params, input)
grads = autograd(f)(params, input)

Current list of model primitives includes:

autograd.model.NeuralNetwork

API:

f,params = autograd.model.NeuralNetwork({
   -- number of input features:
   inputFeatures = 10,

   -- number of hidden features, per layer, in this case
   -- 2 layers, each with 100 and 10 features respectively:
   hiddenFeatures = {100,10},

   -- activation functions:
   activations = 'ReLU',

   -- if true, then no activation is used on the last layer;
   -- this is useful to feed a loss function (logistic, ...)
   classifier = false,

   -- dropouts:
   dropoutProbs = {.5, .5},
})

autograd.model.SpatialNetwork

API:

f,params = autograd.model.SpatialNetwork({
   -- number of input features (maps):
   inputFeatures = 3,

   -- number of hidden features, per layer:
   hiddenFeatures = {16, 32},

   -- poolings, for each layer:
   poolings = {2, 2},

   -- activation functions:
   activations = 'Sigmoid',

   -- kernel size:
   kernelSize = 3,

   -- dropouts:
   dropoutProbs = {.1, .1},
})

autograd.model.RecurrentNetwork

API:

f,params = autograd.model.RecurrentNetwork({
   -- number of input features (maps):
   inputFeatures = 100,

   -- number of output features:
   hiddenFeatures = 200,

   -- output is either the last h at step t,
   -- or the concatenation of all h states at all steps
   outputType = 'last', -- or 'all'
})

autograd.model.RecurrentLSTMNetwork

API:

f,params = autograd.model.RecurrentLSTMNetwork({
   -- number of input features (maps):
   inputFeatures = 100,

   -- number of output features:
   hiddenFeatures = 200,

   -- output is either the last h at step t,
   -- or the concatenation of all h states at all steps
   outputType = 'last', -- or 'all'
})

Loss Primitives

Similarly to model primitives, we provide common loss functions in autograd.loss:

-- cross entropy between 2 vectors:
-- (for categorical problems, the target should be encoded as one-hot)
loss = loss.crossEntropy(prediction, target)

-- binary cross entropy - same as above, but labels are considered independent bernoulli variables:
loss = loss.binaryEntropy(prediction, target)

-- least squares - mean square error between 2 vectors:
loss = loss.leastSquares(prediction, target)

Gradients of gradients

autograd can be called from within an autograd function, and the resulting gradients can used as part of your outer function:

local d = require 'autograd'
d.optimize(true)
local innerFn = function(params)
   -- compute something...
end
local ddf = d(function(params)
   local grads = d(innerFn)(params)
   -- do something with grads of innerFn...
end)
local gradGrads = ddf(params) -- second order gradient of innerFn
### Debugging and fine-grain control

Debugging hooks can be inserted when wrapping the function with autograd. The debugger will turn off any optimizations and insert NaN/Inf checks after every computation. If any of these trip the debugHook will be called with a message providing as much information as possible about the offending function, call stack and values. The debugHook also provides an interface to save or render a GraphViz dot file of the computation graph. We don't recommend leaving the debugHook installed all the time as your training speed will be significantly slower.

grad(f, {
   debugHook = function(debugger, msg, gen)
      -- dump a dot representation of the graph:
      debugger.generateDot('result.dot')

      -- or show it (OSX only, uses Safari):
      debugger.showDot()

      -- print the generated source line that caused the inf/nan
      print(string.split(gen.source, "\n")[gen.line])
   end
})

Consider this usage of autograd, it clearly contains a divide by zero.

local W = torch.Tensor(32,100):fill(.5)
local x = torch.Tensor(100):fill(.5)
local func = function(inputs)
   return torch.sum(torch.div(inputs.W * inputs.x, 0))  -- DIV ZERO!
end
local dFunc = autograd(func, {
   debugHook = function(debugger, msg)
      debugger.showDot()
      print(msg)
      os.exit(0)
   end
})
dFunc({W=W, x=x})

Will output:

autograd debugger detected a nan or inf value for locals[1]
   1: fn@path/to/code/example.lua:4

And render in Safari as:

Finer-grain control over execution can also be achieved using these flags:
-- All of these options default to true:
grad(f, {
   withForward = true | false,    -- compute the forward path
   withGradients = true | false,  -- compute the gradients (after forward)
   partialGrad = true | false     -- partial grad means that d(f) expects grads wrt output
})

-- Running this:
pred = grad(f, {withForward=true, withGradients=false})(inputs)
-- is equivalent to:
pred = f(inputs)
-- ... but the function is compiled, and benefits from tensor re-use!

License

Licensed under the Apache License, Version 2.0. See LICENSE file.

More Repositories

1

snowflake

Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees.
Scala
7,648
star
2

diffy

Find potential bugs in your services with Diffy
Scala
3,825
star
3

flockdb

A distributed, fault-tolerant graph database
Scala
3,337
star
4

kestrel

simple, distributed message queue system (inactive)
Scala
2,774
star
5

twui

A UI framework for Mac based on Core Animation
Objective-C
2,740
star
6

CocoaSPDY

SPDY for iOS and OS X
Objective-C
2,389
star
7

gizzard

[Archived] A flexible sharding framework for creating eventually-consistent distributed datastores
Scala
2,256
star
8

distributedlog

A high performance replicated log service. (The development is moved to Apache Incubator)
Java
2,224
star
9

recess

A simple and attractive code quality tool for CSS built on top of LESS
CSS
2,187
star
10

commons

Twitter common libraries for python and the JVM (deprecated)
Java
2,099
star
11

iago

A load generator, built for engineers
Scala
1,347
star
12

twitter-text-js

A JavaScript implementation of Twitter's text processing library
1,211
star
13

ambrose

A platform for visualization and real-time monitoring of data workflows
Java
1,181
star
14

twitter-kit-android

Twitter Kit for Android
Java
831
star
15

ostrich

A stats collector & reporter for Scala servers (deprecated)
Scala
773
star
16

twitter-kit-ios

Twitter Kit is a native SDK to include Twitter content inside mobile apps.
Objective-C
690
star
17

twitter-text-rb

A library that does auto linking and extraction of usernames, lists and hashtags in tweets
613
star
18

mysos

Cotton (formerly known as Mysos)
590
star
19

twitter-text-objc

An Objective-C implementation of Twitter's text processing library
587
star
20

ospriet

An example audience moderation app built on Twitter
JavaScript
408
star
21

cloudhopper-smpp

Efficient, scalable, and flexible Java implementation of the Short Messaging Peer to Peer Protocol (SMPP)
Java
382
star
22

twitter-text-java

A Java implementation of Twitter's text processing library
364
star
23

jvmgcprof

A simple utility for profile allocation and garbage collection activity in the JVM
C
342
star
24

css-flip

A CSS BiDi flipper
JavaScript
313
star
25

clockworkraven

Human-Powered Data Analysis with Mechanical Turk
Ruby
300
star
26

torch-twrl

Torch-twrl is a package that enables reinforcement learning in Torch.
Lua
251
star
27

cassie

A Scala client for Cassandra
Scala
244
star
28

twemperf

A tool for measuring memcached server performance
C
242
star
29

hdfs-du

Visualize your HDFS cluster usage
JavaScript
230
star
30

pycascading

A Python wrapper for Cascading
Python
222
star
31

RTLtextarea

Automatically detects RTL and configures a text input
JavaScript
169
star
32

haplocheirus

A Redis-backed storage engine for timelines
Scala
133
star
33

standard-project

A slightly more standard sbt project plugin library
Scala
132
star
34

torch-decisiontree

This project implements random forests and gradient boosted decision trees (GBDT). The latter uses gradient tree boosting. Both use ensemble learning to produce ensembles of decision trees (that is, forests).
Lua
129
star
35

elephant-twin

Elephant Twin is a framework for creating indexes in Hadoop
Java
96
star
36

torch-ipc

A set of primitives for parallel computation in Torch
C
95
star
37

torch-distlearn

A set of distributed learning algorithms for Torch
Lua
93
star
38

libcrunch

A lightweight mapping framework that maps data objects to a number of nodes, subject to constraints
Java
92
star
39

scribe

A Ruby client library for Scribe
Ruby
90
star
40

sbt-package-dist

sbt 11 plugin codifying best practices for building, packaging, and publishing
Scala
88
star
41

twisitor

A simple and spectacular photo-tweeting birdhouse
JavaScript
84
star
42

flockdb-client

A Ruby client library for FlockDB
Ruby
81
star
43

code-of-conduct

Open Source Code of Conduct at Twitter
80
star
44

twitter-text-conformance

Conformance testing data for the twitter-text-* repositories
77
star
45

torch-dataset

An extensible and high performance method of reading, sampling and processing data for Torch
Lua
76
star
46

cdk

CDK is a tool to quickly generate single-file html slide presentations from AsciiDoc
CSS
74
star
47

naggati2

Protocol builder for netty using scala (DEPRECATED)
Scala
74
star
48

twitter-kit-unity

Twitter Kit for Unity
C#
71
star
49

plumage.js

Batteries Included App Framework for Data Intensive UIs
JavaScript
66
star
50

gozer

Prototype mesos framework using new low-level API built in Go
Go
61
star
51

bookkeeper

Twitter's fork of Apache BookKeeper (will push changes upstream eventually)
Java
59
star
52

grabby-hands

A JVM Kestrel client that aggregates queues from multiple servers. Implemented in Scala with Java bindings. In use at Twitter for all JVM Search and Streaming Kestrel interactions.
Scala
56
star
53

gizzmo

A command-line client for Gizzard
Ruby
54
star
54

thrift

Twitter's out-of-date, forked thrift
C++
53
star
55

libkestrel

libkestrel
Scala
47
star
56

time_constants

Time constants, in seconds, so you don't have to use slow ActiveSupport helpers
Ruby
47
star
57

sbt-scrooge

An SBT plugin that adds a mixin for doing Thrift code auto-generation during your compile phase
Scala
44
star
58

cli-guide.js

CLI Guide JQuery Plugin
JavaScript
41
star
59

sbt-thrift

sbt rules for generating source stubs out of thrift IDLs, for java & scala
Ruby
38
star
60

jaqen

A type-safe heterogenous Map or a Named field Tuple
Scala
35
star
61

spitball

A very simple gem package generation tool built on bundler
Ruby
33
star
62

torch-thrift

A Thrift codec for Torch
C
29
star
63

jsr166e

JSR166e for Twitter
Java
27
star
64

unishark

Unishark: Another unittest extension for Python
Python
26
star
65

raggiana

A simple standalone Finagle stats viewer
JavaScript
21
star
66

sekhmet

foundational tools and building blocks for gaining insights and diagnosing system health in real-time
20
star
67

periscope-live-engagement-unity-sdk

Periscope Live Engagement Unity SDK
C#
20
star
68

twitterActors

Improved Scala actors library; used internally at Twitter
Scala
19
star
69

finatra-activator-http-seed

Typesafe activator template for constructing a Finatra HTTP server application:
Scala
18
star
70

killdeer

Killdeer is a simple server for replaying a sample of responses to sythentically recreate production response characteristics.
Scala
16
star
71

elephant-twin-lzo

Elephant Twin LZO uses Elephant Twin to create LZO block indexes
Java
15
star
72

bittern

Bittern Cache uses nvdimm to speed up block io operations
C
14
star
73

finatra-activator-thrift-seed

Typesafe activator template for constructing a Finatra Thrift server application: https://twitter.github.io/finatra/user-guide/ β€”
Scala
11
star
74

chainsaw

A thin Scala wrapper for SLF4J
Scala
10
star
75

PerfTracepoint

Perf tracepoint support for the JVM
Java
7
star
76

oscon-puzzles

OSCON 2014 Puzzle
JavaScript
7
star
77

scala-json

JSON in Scala (deprecated)
Scala
5
star
78

scala-csp-config

A Scala library for configuring Content Security Policy headers for HTTP responses.
Scala
4
star
79

.github

3
star
80

finatra-misc

Miscellaneous libraries and utils used by Finatra
Scala
3
star
81

autolog-clustering

USF Capstone Project for Auto-log Clustering
Python
1
star