• Stars
    star
    400
  • Rank 107,843 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 8 years ago
  • Updated almost 8 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Replicating "Asynchronous Methods for Deep Reinforcement Learning" (http://arxiv.org/abs/1602.01783)

Async-RL

(2017/02/25) Now the A3C implementation in this repository has been ported into ChainerRL, a Chainer-based deep reinforcement learning library, with some enhancement such as support for continuous actions by Gaussian policies and n-step Q-learning, so I recommend using it instead of this repository.

A3C FF playing Breakout A3C LSTM playing Space Invaders

This is a repository where I attempt to reproduce the results of Asynchronous Methods for Deep Reinforcement Learning. Currently I have only replicated A3C FF/LSTM for Atari.

Any feedback is welcome :)

Supported Features

  • A3C FF/LSTM (only for discrete action space)
  • Atari environment
  • ViZDoom environment (experimental)

Current Status

A3C FF

I trained A3C FF for ALE's Breakout with 36 processes (AWS EC2 c4.8xlarge) for 80 million training steps, which took about 17 hours. The mean and median of scores of test runs along training are plotted below. Ten test runs for every 1 million training steps (counted by the global shared counter). The results seems slightly worse than theirs.

The trained model is uploaded at trained_model/breakout_ff/80000000_finish.h5, so you can make it to play Breakout by the following command:

python demo_a3c_ale.py <path-to-rom> trained_model/breakout_ff/80000000_finish.h5

The animation gif above is the episode I cherry-picked from 10 demo runs using that model.

A3C LSTM

I also trained A3C LSTM for ALE's Space Invaders in the same manner with A3C FF. Training A3C LSTM took about 24 hours for 80 million training steps.

The trained model is uploaded at trained_model/space_invaders_lstm/80000000_finish.h5, so you can make it to play Space Invaders by the following command:

python demo_a3c_ale.py <path-to-rom> trained_model/space_invaders_lstm/80000000_finish.h5 --use-lstm

The animation gif above is the episode I cherry-picked from 10 demo runs using that model.

Implementation details

I received a confirmation about their implementation details and some hyperparameters by e-mail from Dr. Mnih. I summarized them in the wiki: https://github.com/muupan/async-rl/wiki

Requirements

  • Python 3.5.1
  • chainer 1.8.1
  • cached-property 1.3.0
  • h5py 2.5.0
  • Arcade-Learning-Environment

Training

python a3c_ale.py <number-of-processes> <path-to-atari-rom> [--use-lstm]

a3c_ale.py will save best-so-far models and test scores into the output directory.

Unfortunately it seems this script has some bug now. Please see the issues #5 and #6. I'm trying to fix it.

Evaluation

python demo_a3c_ale.py <path-to-atari-rom> <trained-model> [--use-lstm]

Similar Projects

More Repositories

1

deep-reinforcement-learning-papers

A list of papers and resources dedicated to deep reinforcement learning
835
star
2

dqn-in-the-caffe

An implementation of Deep Q-Network using Caffe
C++
213
star
3

deep-ensemble-uncertainty

An implementation of "Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles" (http://arxiv.org/abs/1612.01474)
Jupyter Notebook
34
star
4

predictron

WIP implementation of "The Predictron: End-To-End Learning and Planning" (http://arxiv.org/abs/1612.08810) in Chainer
Python
11
star
5

chainer-cocob

COCOB-Backprop (https://arxiv.org/abs/1705.07795) implementation for Chainer
Python
6
star
6

SexprParser

This is S-expression parser in C++11. It is aimed at being used in General Game Playing.
C++
3
star
7

dobutsushogi

Dobutsu Shogi in Game Description Language (GDL v1)
2
star
8

ggpe

General Game Playing Engine in C++11 using YAP Prolog
C++
2
star
9

chainer-weight-normalization

Weight normalization https://arxiv.org/abs/1602.07868
Python
1
star
10

chainer-eve

An Eve optimizer implementation in Chainer
Python
1
star
11

mcts

Monte-Carlo Tree Search Implementation for general purpose.
C++
1
star
12

chainer-elu

Chainer implementation of Exponential Linear Unit (ELU)
Python
1
star
13

nonogram

JavaScript
1
star
14

chainer-oplu

Orthogonal Permuatation Linear Unit (OPLU) https://arxiv.org/abs/1604.02313v3
Python
1
star
15

chainer-entropy-adam

Chainer-based implementation of Entropy-Adam https://arxiv.org/abs/1611.01838
Python
1
star
16

project_euler

Project Euler solutions written by muupan. Most of the solutions are written in ruby or python.
Python
1
star