• Stars
    star
    161
  • Rank 232,114 (Top 5 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created about 8 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Testbed for deep reinforcement learning

Mindpark

Testbed for deep reinforcement learning algorithms.

DQN playing Breakoutย ย  DQN playing Doom Health Gatheringย ย  DQN trying to play Doom Deathmatch

Introduction

Reinforcement learning is a fundamental problem in artificial intelligence. In this setting, an agent interacts with an environment in order to maximize a reward. For example, we show our bot pixel screens of a game and want it to choose actions that result in a high score.

Mindpark is an environment for prototyping, testing, and comparing algorithms that do reinforcement learning. The library makes it easy to reuse part of behavior between algorithms, and monitor all kinds of metrics about your algorithms. It integrates well with TensorFlow, Theano, and other deep learning libraries, and with OpenAI's gym environments.

These are the algorithms that I implemented so far (feel free to contribute to this list):

Algorithm Publication Status
Deep Q-Network (DQN) Mnih et al. 2015 (PDF) Working consistently.
Double Deep Q-Network (DDQN) Hasselt, Guez, Silver. 2015 (PDF) Working consistently.
Asynchronous Advantage Actor-Critic (A3C) Mnih et al. 2016 (PDF) Partly working.
Reinforce Williams 1992 (PDF) Currently being tested.

Instructions

To get started, clone the repository and install dependencies:

git clone [email protected]:danijar/mindpark.git && cd mindpark
sudo -H pip3 install .

An experiment compares between algorithms, hyper parameters, and environments. To start an experiment, run (-O turns on Python's optimizations):

python3 -O -m mindpark run definition/breakout.yaml

Videos and metrics are stored in a result directory, which is ~/experiment/mindpark/<timestamp>-breakout/ by default. You can plot statistics during or after the simulation by fuzzy matching an the folder name:

python3 -m mindpark stats breakout

Statistics

Let's take a look at what the previous command creates.

Experiments consist of interleaved phases of training and evaluation. For example, an algorithm might use a lower exploration rate in favor of exploitation while being evaluated. Therefore, we display the metrics in two rows:

DQN statistics on Breakout

This illustrates the metrics after a few episodes of training, as you can see on the horizontal axes. This small example is good for explanation. But if you want to take a look, here are the metrics of a longer experiment.

Metric Description
score During the first 80 episodes of training (the time when I ran mindpark stats), the algorithm manages to get a score of 9, but usually get scores around 3 and 4. Below is the score during evaluation. It's lower because the algorithm hasn't learned much yet and performs worse than the random exploration done during training.
dqn/cost The training cost of the neural network. It starts at episode 10 which is when the training starts, before that, DQN builds up its replay memory. We don't train the neural network during evaluation, so that plot is empty.
epsilion_greedy/values That's the Q-values that the dqn behavior sends to epsilon_greedy to act greedily on. You can see that they evolve over time: Action 4 seems to be quite good. But that's only for a short run, so we shouldn't conclude too much.
epsilion_greedy/random A histogram whether the current action was chosen randomly or greedy wrt the predicted Q-values. During training, epsilon is annealed, so you see a shift in the distribution. During testing, epsilon is always 0.05, so not many random actions there.

The metric names are prefixed by the classes they come from. That's because algorithms are composed of reusable partial behaviors. See the Algorithms section for details.

Definitions

Definitions are YAML files that contain all you need to run or reproduce an experiment:

epochs: 100
test_steps: 4e4
repeats: 5
envs:
  - Pong-v0
  - Breakout-v0
algorithms:
  -
    name: LSTM-A3C (3 layers)
    type: A3C
    train_steps: 8e5
    config:
      network: lstm_three_layers
      initial_learning_rate: 2e-4
  -
    name: DQN (Mnih et al. 2015)
    type: DQN
    train_steps: 2e5
  -
    name: Random
    type: Random
    train_steps: 0

Each algorithm will be trained on each environment for the specified number of repeats. A simulation is divided into epochs that consist of a training and an evaluation phase.

Algorithms

To implement your own algorithm, subclass mindpark.core.Algorithm. Please refer to the existing algorithms for details, and ask if you have questions. Algorithms are composed of partial behaviors that can do preprocessing, exploration, learning, and more. To create a reusable chunk of behavior, subclass mindpark.core.Partial.

There are quite a few existing behaviors that you can import from mindpark.step and reuse in your algorithms. For more details, please look at the according Python files or open an issue. Current behaviors include: ActionMax, ActionSample, ClampReward, Delta, EpsilonGreedy, Experience, Filter, Grayscale, History, Identity, Maximum, Normalize, Random, RandomStart, Resize, Score, Skip, Subsample.

Dependencies

Mindpark is a Python 3 package, and there are no plans to support Python 2. Please install gym_doom manually.

sudo apt-get install -y gtk2.0-dev libsdl2-dev libfluidsynth-dev libopenal-dev libboost-all-dev
sudo -H python3 -c "import gym_pull; gym_pull.pull('github.com/ppaquette/gym-doom')"

TensorFlow is only needed for the existing algorithms. You are free to use your libraries of choice to implement your own algorithms.

Contributions

Your pull request is very welcome. I will set up a contributors file in that case, and you can choose if and how you want to be listed.

Please follow the existing code style, and run unit tests and the integration test after changes:

python3 setup.py test
python3 -m mindpark run definition/test.yaml -x

Contact

Feel free to reach out at [email protected] or open an issue here on Github if you have any questions.

More Repositories

1

handout

Turn Python scripts into handouts with Markdown and figures
Python
1,994
star
2

dreamerv2

Mastering Atari with Discrete World Models
Python
770
star
3

dreamerv3

Mastering Diverse Domains through World Models
Python
603
star
4

dreamer

Dream to Control: Learning Behaviors by Latent Imagination
Python
456
star
5

crafter

Benchmarking the Spectrum of Agent Capabilities
Python
279
star
6

layered

Clean implementation of feed forward neural networks
Python
237
star
7

daydreamer

DayDreamer: World Models for Physical Robot Learning
Jupyter Notebook
141
star
8

director

Deep Hierarchical Planning from Pixels
Python
60
star
9

embodied

Fast reinforcement learning research
Python
50
star
10

ninjax

General Modules for JAX
Python
45
star
11

computer-game

Data-oriented voxel game engine
C++
37
star
12

elements

Building blocks for productive research
Python
36
star
13

crafter-baselines

Docker containers of baseline agents for the Crafter environment
Python
25
star
14

sets

Read datasets in a standard way
Python
19
star
15

diamond_env

Standardized Minecraft Diamond Environment for Reinforcement Learning
Python
18
star
16

voxel-smoothing-2d

Orientation independent bรฉzier smoothing of voxel grids
C++
17
star
17

course-machine-intelligence-2

Jupyter Notebook
13
star
18

npgame

Write simple games in Numpy!
Python
12
star
19

dotfiles

My Linux and Mac configuration
Perl
12
star
20

semantic

Python
10
star
21

training-py

My solutions to programming puzzles
Python
8
star
22

imptools

Tools for improving Python imports
Python
8
star
23

bridgewalk

Visual reinforcement learning benchmark for controllability
Python
6
star
24

cowherd

Partially-observed visual reinforcement learning domain
Python
6
star
25

definitions

Load and validate YAML definitions against a schema
Python
5
star
26

map-pdf

Generate printable PDF documents from Leaflet maps
JavaScript
4
star
27

modurale

Modular real time engine for computer graphics applications
CMake
4
star
28

seminar-knowledge-mining

Wikimedia image classification and suggestings for article authors
Python
3
star
29

couse-ml-stanford

Programming assignments for the Stanford Machine Learning course by Andrew Ng
MATLAB
3
star
30

invoicepad

Freelancer solution covering time tracking, invoice generation and archiving
JavaScript
3
star
31

teleport

Efficiently send large arrays across machines
Python
2
star
32

training-ml

Python
2
star
33

chunkedfile

Save file writes into multiple chunks
Python
1
star
34

notebook-big-data

Jupyter Notebook
1
star
35

course-ml-fuberlin

Python
1
star
36

bookmarks-switcher

Chrome plugin to select which bookmarks folder to show as the bookmarks bar
JavaScript
1
star
37

training-cpp

My solutions to programming puzzles
C++
1
star
38

scope

Metrics logging and analysis
Python
1
star
39

jumper

Platformer and puzzle solving game written in Python
Python
1
star