• Stars
    star
    164
  • Rank 230,032 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 7 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A high-performance Atari A3C agent in 180 lines of PyTorch

Baby A3C: solving Atari environments in 180 lines

Sam Greydanus | October 2017 | MIT License

Results after training on 40M frames:

breakout-v4.gif pong-v4.gif spaceinvaders-v4.gif

Usage

If you're working on OpenAI's Breakout-v4 environment:

  • To train: python baby-a3c.py --env Breakout-v4
  • To test: python baby-a3c.py --env Breakout-v4 --test True
  • To render: python baby-a3c.py --env Breakout-v4 --render True

About

Make things as simple as possible, but not simpler.

Frustrated by the number of deep RL implementations that are clunky and opaque? In this repo, I've stripped a high-performance A3C model down to its bare essentials. Everything you'll need is contained in 180 lines...

  • If you are trying to learn deep RL, the code is compact, readable, and commented
  • If you want quick results, I've included pretrained models
  • If something goes wrong, there's not a mountain of code to debug
  • If you want to try something new, this is a simple and strong baseline
  • Here's a quick intro to A3C that I wrote
Breakout-v4 Pong-v4 SpaceInvaders-v4
*Mean episode rewards @ 40M frames 140 Β± 20 18.2 Β± 1 470 Β± 30
*Mean episode rewards @ 80M frames 190 Β± 20 17.9 Β± 1 550 Β± 30

*same (default) hyperparameters across all environments

Architecture

self.conv1 = nn.Conv2d(channels, 32, 3, stride=2, padding=1)
self.conv2 = nn.Conv2d(32, 32, 3, stride=2, padding=1)
self.conv3 = nn.Conv2d(32, 32, 3, stride=2, padding=1)
self.conv4 = nn.Conv2d(32, 32, 3, stride=2, padding=1)
self.gru = nn.GRUCell(32 * 5 * 5, memsize) # *see below
self.critic_linear, self.actor_linear = nn.Linear(memsize, 1), nn.Linear(memsize, num_actions)

*we use a GRU cell because it has fewer params, uses one memory vector instead of two, and attains the same performance as an LSTM cell.

Environments that work

(Use pip freeze to check your environment settings)

  • Mac OSX (test mode only) or Linux (train and test)
  • Python 3.6
  • NumPy 1.13.1+
  • Gym 0.9.4+
  • SciPy 0.19.1 (just on two lines -> workarounds possible)
  • PyTorch 0.4.0

Known issues

  • I recently ported this code to Python 3.6 / PyTorch 0.4. If you want to run on Python 2.7 / PyTorch 0.2, then look at one of my earlier commits to this repo (there are different pretrained models as well)

More Repositories

1

hamiltonian-nn

Code for our paper "Hamiltonian Neural Networks"
Jupyter Notebook
412
star
2

scribe

Realistic Handwriting with Tensorflow
Jupyter Notebook
238
star
3

crypto-rnn

Learning the Enigma with Recurrent Neural Networks
Jupyter Notebook
156
star
4

visualize_atari

Code for our paper "Visualizing and Understanding Atari Agents" (https://goo.gl/AMAoSc)
Jupyter Notebook
115
star
5

mnist1d

A 1D analogue of the MNIST dataset for measuring spatial biases and answering Science of Deep Learning questions.
Jupyter Notebook
97
star
6

pythonic_ocr

A convolutional neural network implemented in pure numpy.
Python
68
star
7

excitationbp

Visualizing how deep networks make decisions
Jupyter Notebook
63
star
8

dnc

Differentiable Neural Computer in TensorFlow
Jupyter Notebook
29
star
9

optimize_wing

We simulate a wind tunnel, place a rectangular occlusion in it, and then use gradient descent to turn the occlusion into a wing.
Python
26
star
10

psi0nn

A neural network quantum ground state solver
Jupyter Notebook
26
star
11

ncf

Nature's Cost Function (NCF). Finding paths of least action with gradient descent.
Jupyter Notebook
12
star
12

greydanus.github.io

My academic blog
HTML
11
star
13

structural_optimization

Coding structural optimization, from scratch, in 200 lines of Python
Python
10
star
14

stereograms

Code for playing with random dot stereograms.
Jupyter Notebook
9
star
15

mr_london

A LSTM recurrent neural network implemented in pure numpy
Python
7
star
16

mnist-gan

Generative Adversarial Networks for the MNIST dataset
Jupyter Notebook
6
star
17

rlzoo

A central location for my reinforcement learning experiments
Jupyter Notebook
5
star
18

subspace-nn

Optimizing neural networks in subspaces
Jupyter Notebook
5
star
19

np_nets

Neural network experiments written purely in numpy
Jupyter Notebook
4
star
20

fractal_tree

A numerical model of fractal dynamics
Jupyter Notebook
4
star
21

studying_growth

Studying Cell Growth with Neural Cellular Automata
Python
4
star
22

piecewise_node

Temporal abstraction for autoregressive sampling
Python
3
star
23

regularization

I use a one-layer neural network trained on the MNIST dataset to give an intuition for how common regularization techniques affect learning.
Jupyter Notebook
3
star
24

dlfun

Forays into the world of deep learning using TensorFlow
Jupyter Notebook
2
star
25

compton

Exploring the quantum nature of light with compton scattering
Jupyter Notebook
2
star
26

friendly_qlearning

Exploring social behavior with qLearning agents
JavaScript
1
star
27

deep_thesaurus

Use a pretrained NLP model to rank thesaurus suggestions
HTML
1
star
28

artiste

The idea here was to teach an RNN to draw, pixel by pixel, over a template image using DDPG
Jupyter Notebook
1
star
29

baselines

Simple MNIST baselines for 1) numpy backprop 2) dense nns 3) cnns 3) seq2seq
Jupyter Notebook
1
star
30

billiards

A simple RL environment for studying planning.
Jupyter Notebook
1
star