greydanus/baby-a3c

Stars
164
Rank 230,032 (Top 5 %)
Language
Python
License
Apache License 2.0
Created about 7 years ago
Updated over 3 years ago

greydanus/baby-a3c

greydanus

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

A high-performance Atari A3C agent in 180 lines of PyTorch

Baby A3C: solving Atari environments in 180 lines

Sam Greydanus | October 2017 | MIT License

Results after training on 40M frames:

Usage

If you're working on OpenAI's Breakout-v4 environment:

To train: python baby-a3c.py --env Breakout-v4
To test: python baby-a3c.py --env Breakout-v4 --test True
To render: python baby-a3c.py --env Breakout-v4 --render True

About

Make things as simple as possible, but not simpler.

Frustrated by the number of deep RL implementations that are clunky and opaque? In this repo, I've stripped a high-performance A3C model down to its bare essentials. Everything you'll need is contained in 180 lines...

If you are trying to learn deep RL, the code is compact, readable, and commented
If you want quick results, I've included pretrained models
If something goes wrong, there's not a mountain of code to debug
If you want to try something new, this is a simple and strong baseline
Here's a quick intro to A3C that I wrote

	Breakout-v4	Pong-v4	SpaceInvaders-v4
*Mean episode rewards @ 40M frames	140 ± 20	18.2 ± 1	470 ± 30
*Mean episode rewards @ 80M frames	190 ± 20	17.9 ± 1	550 ± 30

*same (default) hyperparameters across all environments

Architecture

self.conv1 = nn.Conv2d(channels, 32, 3, stride=2, padding=1)
self.conv2 = nn.Conv2d(32, 32, 3, stride=2, padding=1)
self.conv3 = nn.Conv2d(32, 32, 3, stride=2, padding=1)
self.conv4 = nn.Conv2d(32, 32, 3, stride=2, padding=1)
self.gru = nn.GRUCell(32 * 5 * 5, memsize) # *see below
self.critic_linear, self.actor_linear = nn.Linear(memsize, 1), nn.Linear(memsize, num_actions)

*we use a GRU cell because it has fewer params, uses one memory vector instead of two, and attains the same performance as an LSTM cell.

Environments that work

(Use pip freeze to check your environment settings)

Mac OSX (test mode only) or Linux (train and test)
Python 3.6
NumPy 1.13.1+
Gym 0.9.4+
SciPy 0.19.1 (just on two lines -> workarounds possible)
PyTorch 0.4.0

Known issues

I recently ported this code to Python 3.6 / PyTorch 0.4. If you want to run on Python 2.7 / PyTorch 0.2, then look at one of my earlier commits to this repo (there are different pretrained models as well)

hamiltonian-nn

Code for our paper "Hamiltonian Neural Networks"

Jupyter Notebook

scribe

Realistic Handwriting with Tensorflow

Jupyter Notebook

crypto-rnn

Learning the Enigma with Recurrent Neural Networks

Jupyter Notebook

visualize_atari

Code for our paper "Visualizing and Understanding Atari Agents" (https://goo.gl/AMAoSc)

Jupyter Notebook

mnist1d

A 1D analogue of the MNIST dataset for measuring spatial biases and answering Science of Deep Learning questions.

Jupyter Notebook

pythonic_ocr

A convolutional neural network implemented in pure numpy.

excitationbp

Visualizing how deep networks make decisions

Jupyter Notebook

dnc

Differentiable Neural Computer in TensorFlow

Jupyter Notebook

optimize_wing

We simulate a wind tunnel, place a rectangular occlusion in it, and then use gradient descent to turn the occlusion into a wing.

psi0nn

A neural network quantum ground state solver

Jupyter Notebook

ncf

Nature's Cost Function (NCF). Finding paths of least action with gradient descent.

Jupyter Notebook

greydanus.github.io

My academic blog

structural_optimization

Coding structural optimization, from scratch, in 200 lines of Python

stereograms

Code for playing with random dot stereograms.

Jupyter Notebook

mr_london

A LSTM recurrent neural network implemented in pure numpy

mnist-gan

Generative Adversarial Networks for the MNIST dataset

Jupyter Notebook

rlzoo

A central location for my reinforcement learning experiments

Jupyter Notebook

subspace-nn

Optimizing neural networks in subspaces

Jupyter Notebook

np_nets

Neural network experiments written purely in numpy

Jupyter Notebook

fractal_tree

A numerical model of fractal dynamics

Jupyter Notebook

studying_growth

Studying Cell Growth with Neural Cellular Automata

piecewise_node

Temporal abstraction for autoregressive sampling

regularization

I use a one-layer neural network trained on the MNIST dataset to give an intuition for how common regularization techniques affect learning.

Jupyter Notebook

dlfun

Forays into the world of deep learning using TensorFlow

Jupyter Notebook

compton

Exploring the quantum nature of light with compton scattering

Jupyter Notebook

friendly_qlearning

Exploring social behavior with qLearning agents

deep_thesaurus

Use a pretrained NLP model to rank thesaurus suggestions

artiste

The idea here was to teach an RNN to draw, pixel by pixel, over a template image using DDPG

Jupyter Notebook

baselines

Simple MNIST baselines for 1) numpy backprop 2) dense nns 3) cnns 3) seq2seq

Jupyter Notebook

billiards

A simple RL environment for studying planning.

Jupyter Notebook