• This repository has been archived on 02/Nov/2021
  • Stars
    star
    251
  • Rank 161,192 (Top 4 %)
  • Language
    Lua
  • License
    MIT License
  • Created about 8 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Torch-twrl is a package that enables reinforcement learning in Torch.

Build Status License Join the chat at https://gitter.im/torch-twrl/Lobby

torch-twrl: Reinforcement Learning in Torch

torch-twrl is an RL framework built in Lua/Torch by Twitter.

Installation

Install torch

git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh

Install torch-twrl

git clone --recursive https://github.com/twitter/torch-twrl.git
cd torch-twrl
luarocks make

Want to play in the gym?

  1. Start a virtual environment, not necessary but it helps keep your installation clean

  2. Download and install OpenAI Gym, gym-http-api requirements, and ffmpeg

pip install virtualenv
virtualenv venv
source venv/bin/activate
pip install gym
pip install -r src/gym-http-api/requirements.txt
brew install ffmpeg

Works so far?

You should have everything you need:

  • Start your gym_http_server with
python src/gym-http-api/gym_http_server.py
  • In a new console window (or tab), run the example script (policy gradient agent in environment CartPole-v0)
cd examples
chmod u+x cartpole-pg.sh
./cartpole-pg.sh

This script sets parameters for the experiment, in detail here is what it is calling:

th run.lua \
	-env 'CartPole-v0' \
	-policy categorical \
	-learningUpdate reinforce \
   	-model mlp \
	-optimAlpha 0.9 \
   	-timestepsPerBatch 1000 \
	-stepsizeStart 0.3 \
	-gamma 1 \
	-nHiddenLayerSize 10 \
	-gradClip 5 \
	-baselineType padTimeDepAvReturn \
	-beta 0.01 \
	-weightDecay 0 \
	-windowSize 10 \
   	-nSteps 1000 \
	-nIterations 1000 \
	-video 100 \
	-optimType rmsprop \
	-verboseUpdate true \
	-uploadResults false \
	-renderAllSteps false

Your results should look something our results from the OpenAI Gym leaderboard

Doesn't work?

  1. Test the gym-http-api
cd /src/gym-http-api/
nose2
  1. Start a Gym HTTP server in your virtual environment
python src/gym-http-api/gym_http_server.py
  1. In a new console window (or tab), run torch-twrl tests
luarocks make; th test/test.lua

Dependencies

Testing of RL development is a tricky endeavor, it requires well established, unified, baselines and a large community of active developers. The OpenAI Gym provides a great set of example environments for this purpose. Link: https://github.com/openai/gym

The OpenAI Gym is written in python and it expects algorithms which interact with its various environments to be as well. torch-twrl is compatible with the OpenAI Gym with the use of a Gym HTTP API from OpenAI; gym-http-api is a submodule of torch-twrl.

All Lua dependencies should be installed on your first build.

Note: if you make changes, you will need to recompile with

luarocks make

Agents

torch-twrl implements several agents, they are located in src/agents. Agents are defined by a model, policy, and learning update.

  • Random
    • model: noModel
    • policy: random
    • learningUpdate: noLearning
  • TD(Lambda)
    • model: qFunction
    • policy: egreedy
    • learningUpdate: tdLambda - implements temporal difference (Q-learning or SARSA) learning with eligibility traces (replacing or accumulating)
  • Policy Gradient Williams, 1992:
    • model: mlp - multilayer perceptron, final layeer: tanh for continuous, softmax for discrete
    • policy: stochasticModelPolicy, normal for continuous actions, categorical for discrete
    • learningUpdate: reinforce

Important note about agent/environment compatibility:

The OpenAI Gym has many environments and is continuously growing. Some agents may be compatible with only a subset of environments. That is, an agent built for continuous action space environments may not work if the environment expects discrete action spaces.

Here is a useful table of the environments, with details on the different variables that may help to configure agents appropriately.

Testing details:

Continuous integration is accomplished by building with Travis. Testing is done with LUAJIT21, LUA51 and LUA52 with compilers gcc and clang.

Tests are defined in the /tests directory with separate basic unit tests set and a Gym integration test set.

Known Issues:

  • LUA52 and libhash not working, so tilecoding examples fail in LUA52.

Future Work

References

  1. Boyan, J., & Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. Advances in neural information processing systems, 369-376.
  2. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine learning, 3(1), 9-44.
  3. Singh, S. P., & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine learning, 22(1-3), 123-158.
  4. Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. Systems, Man and Cybernetics, IEEE Transactions on, (5), 834-846.
  5. Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. Vol. 1. No. 1. Cambridge: MIT press, 1998.
  6. Williams, Ronald J. "Simple statistical gradient-following algorithms for connectionist reinforcement learning." Machine learning 8.3-4 (1992): 229-256.

License

torch-twrl is released under the MIT License. Copyright (c) 2016 Twitter, Inc.

More Repositories

1

snowflake

Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees.
Scala
7,648
star
2

diffy

Find potential bugs in your services with Diffy
Scala
3,825
star
3

flockdb

A distributed, fault-tolerant graph database
Scala
3,337
star
4

kestrel

simple, distributed message queue system (inactive)
Scala
2,774
star
5

twui

A UI framework for Mac based on Core Animation
Objective-C
2,740
star
6

CocoaSPDY

SPDY for iOS and OS X
Objective-C
2,389
star
7

gizzard

[Archived] A flexible sharding framework for creating eventually-consistent distributed datastores
Scala
2,256
star
8

distributedlog

A high performance replicated log service. (The development is moved to Apache Incubator)
Java
2,224
star
9

recess

A simple and attractive code quality tool for CSS built on top of LESS
CSS
2,187
star
10

commons

Twitter common libraries for python and the JVM (deprecated)
Java
2,099
star
11

iago

A load generator, built for engineers
Scala
1,347
star
12

twitter-text-js

A JavaScript implementation of Twitter's text processing library
1,211
star
13

ambrose

A platform for visualization and real-time monitoring of data workflows
Java
1,181
star
14

twitter-kit-android

Twitter Kit for Android
Java
831
star
15

ostrich

A stats collector & reporter for Scala servers (deprecated)
Scala
773
star
16

twitter-kit-ios

Twitter Kit is a native SDK to include Twitter content inside mobile apps.
Objective-C
690
star
17

twitter-text-rb

A library that does auto linking and extraction of usernames, lists and hashtags in tweets
613
star
18

mysos

Cotton (formerly known as Mysos)
590
star
19

twitter-text-objc

An Objective-C implementation of Twitter's text processing library
587
star
20

torch-autograd

Autograd automatically differentiates native Torch code
Lua
559
star
21

ospriet

An example audience moderation app built on Twitter
JavaScript
408
star
22

cloudhopper-smpp

Efficient, scalable, and flexible Java implementation of the Short Messaging Peer to Peer Protocol (SMPP)
Java
382
star
23

twitter-text-java

A Java implementation of Twitter's text processing library
364
star
24

jvmgcprof

A simple utility for profile allocation and garbage collection activity in the JVM
C
342
star
25

css-flip

A CSS BiDi flipper
JavaScript
313
star
26

clockworkraven

Human-Powered Data Analysis with Mechanical Turk
Ruby
300
star
27

cassie

A Scala client for Cassandra
Scala
244
star
28

twemperf

A tool for measuring memcached server performance
C
242
star
29

hdfs-du

Visualize your HDFS cluster usage
JavaScript
230
star
30

pycascading

A Python wrapper for Cascading
Python
222
star
31

RTLtextarea

Automatically detects RTL and configures a text input
JavaScript
169
star
32

haplocheirus

A Redis-backed storage engine for timelines
Scala
133
star
33

standard-project

A slightly more standard sbt project plugin library
Scala
132
star
34

torch-decisiontree

This project implements random forests and gradient boosted decision trees (GBDT). The latter uses gradient tree boosting. Both use ensemble learning to produce ensembles of decision trees (that is, forests).
Lua
129
star
35

elephant-twin

Elephant Twin is a framework for creating indexes in Hadoop
Java
96
star
36

torch-ipc

A set of primitives for parallel computation in Torch
C
95
star
37

torch-distlearn

A set of distributed learning algorithms for Torch
Lua
93
star
38

libcrunch

A lightweight mapping framework that maps data objects to a number of nodes, subject to constraints
Java
92
star
39

scribe

A Ruby client library for Scribe
Ruby
90
star
40

sbt-package-dist

sbt 11 plugin codifying best practices for building, packaging, and publishing
Scala
88
star
41

twisitor

A simple and spectacular photo-tweeting birdhouse
JavaScript
84
star
42

flockdb-client

A Ruby client library for FlockDB
Ruby
81
star
43

code-of-conduct

Open Source Code of Conduct at Twitter
80
star
44

twitter-text-conformance

Conformance testing data for the twitter-text-* repositories
77
star
45

torch-dataset

An extensible and high performance method of reading, sampling and processing data for Torch
Lua
76
star
46

cdk

CDK is a tool to quickly generate single-file html slide presentations from AsciiDoc
CSS
74
star
47

naggati2

Protocol builder for netty using scala (DEPRECATED)
Scala
74
star
48

twitter-kit-unity

Twitter Kit for Unity
C#
71
star
49

plumage.js

Batteries Included App Framework for Data Intensive UIs
JavaScript
66
star
50

twitcher

A tool for executing scripts when ZooKeeper nodes change.
Python
66
star
51

gozer

Prototype mesos framework using new low-level API built in Go
Go
61
star
52

bookkeeper

Twitter's fork of Apache BookKeeper (will push changes upstream eventually)
Java
59
star
53

grabby-hands

A JVM Kestrel client that aggregates queues from multiple servers. Implemented in Scala with Java bindings. In use at Twitter for all JVM Search and Streaming Kestrel interactions.
Scala
56
star
54

gizzmo

A command-line client for Gizzard
Ruby
54
star
55

thrift

Twitter's out-of-date, forked thrift
C++
53
star
56

libkestrel

libkestrel
Scala
47
star
57

time_constants

Time constants, in seconds, so you don't have to use slow ActiveSupport helpers
Ruby
47
star
58

sbt-scrooge

An SBT plugin that adds a mixin for doing Thrift code auto-generation during your compile phase
Scala
44
star
59

cli-guide.js

CLI Guide JQuery Plugin
JavaScript
41
star
60

sbt-thrift

sbt rules for generating source stubs out of thrift IDLs, for java & scala
Ruby
38
star
61

jaqen

A type-safe heterogenous Map or a Named field Tuple
Scala
35
star
62

spitball

A very simple gem package generation tool built on bundler
Ruby
33
star
63

torch-thrift

A Thrift codec for Torch
C
29
star
64

jsr166e

JSR166e for Twitter
Java
27
star
65

unishark

Unishark: Another unittest extension for Python
Python
26
star
66

raggiana

A simple standalone Finagle stats viewer
JavaScript
21
star
67

sekhmet

foundational tools and building blocks for gaining insights and diagnosing system health in real-time
20
star
68

periscope-live-engagement-unity-sdk

Periscope Live Engagement Unity SDK
C#
20
star
69

twitterActors

Improved Scala actors library; used internally at Twitter
Scala
19
star
70

finatra-activator-http-seed

Typesafe activator template for constructing a Finatra HTTP server application:
Scala
18
star
71

killdeer

Killdeer is a simple server for replaying a sample of responses to sythentically recreate production response characteristics.
Scala
16
star
72

elephant-twin-lzo

Elephant Twin LZO uses Elephant Twin to create LZO block indexes
Java
15
star
73

bittern

Bittern Cache uses nvdimm to speed up block io operations
C
14
star
74

finatra-activator-thrift-seed

Typesafe activator template for constructing a Finatra Thrift server application: https://twitter.github.io/finatra/user-guide/ —
Scala
11
star
75

chainsaw

A thin Scala wrapper for SLF4J
Scala
10
star
76

PerfTracepoint

Perf tracepoint support for the JVM
Java
7
star
77

oscon-puzzles

OSCON 2014 Puzzle
JavaScript
7
star
78

scala-json

JSON in Scala (deprecated)
Scala
5
star
79

scala-csp-config

A Scala library for configuring Content Security Policy headers for HTTP responses.
Scala
4
star
80

.github

3
star
81

finatra-misc

Miscellaneous libraries and utils used by Finatra
Scala
3
star
82

autolog-clustering

USF Capstone Project for Auto-log Clustering
Python
1
star