• Stars
    star
    770
  • Rank 58,584 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Mastering Atari with Discrete World Models

Status: Stable release

PyPI

Mastering Atari with Discrete World Models

Implementation of the DreamerV2 agent in TensorFlow 2. Training curves for all 55 games are included.

If you find this code useful, please reference in your paper:

@article{hafner2020dreamerv2,
  title={Mastering Atari with Discrete World Models},
  author={Hafner, Danijar and Lillicrap, Timothy and Norouzi, Mohammad and Ba, Jimmy},
  journal={arXiv preprint arXiv:2010.02193},
  year={2020}
}

Method

DreamerV2 is the first world model agent that achieves human-level performance on the Atari benchmark. DreamerV2 also outperforms the final performance of the top model-free agents Rainbow and IQN using the same amount of experience and computation. The implementation in this repository alternates between training the world model, training the policy, and collecting experience and runs on a single GPU.

World Model Learning

DreamerV2 learns a model of the environment directly from high-dimensional input images. For this, it predicts ahead using compact learned states. The states consist of a deterministic part and several categorical variables that are sampled. The prior for these categoricals is learned through a KL loss. The world model is learned end-to-end via straight-through gradients, meaning that the gradient of the density is set to the gradient of the sample.

Actor Critic Learning

DreamerV2 learns actor and critic networks from imagined trajectories of latent states. The trajectories start at encoded states of previously encountered sequences. The world model then predicts ahead using the selected actions and its learned state prior. The critic is trained using temporal difference learning and the actor is trained to maximize the value function via reinforce and straight-through gradients.

For more information:

Using the Package

The easiest way to run DreamerV2 on new environments is to install the package via pip3 install dreamerv2. The code automatically detects whether the environment uses discrete or continuous actions. Here is a usage example that trains DreamerV2 on the MiniGrid environment:

import gym
import gym_minigrid
import dreamerv2.api as dv2

config = dv2.defaults.update({
    'logdir': '~/logdir/minigrid',
    'log_every': 1e3,
    'train_every': 10,
    'prefill': 1e5,
    'actor_ent': 3e-3,
    'loss_scales.kl': 1.0,
    'discount': 0.99,
}).parse_flags()

env = gym.make('MiniGrid-DoorKey-6x6-v0')
env = gym_minigrid.wrappers.RGBImgPartialObsWrapper(env)
dv2.train(env, config)

Manual Instructions

To modify the DreamerV2 agent, clone the repository and follow the instructions below. There is also a Dockerfile available, in case you do not want to install the dependencies on your system.

Get dependencies:

pip3 install tensorflow==2.6.0 tensorflow_probability ruamel.yaml 'gym[atari]' dm_control

Train on Atari:

python3 dreamerv2/train.py --logdir ~/logdir/atari_pong/dreamerv2/1 \
  --configs atari --task atari_pong

Train on DM Control:

python3 dreamerv2/train.py --logdir ~/logdir/dmc_walker_walk/dreamerv2/1 \
  --configs dmc_vision --task dmc_walker_walk

Monitor results:

tensorboard --logdir ~/logdir

Generate plots:

python3 common/plot.py --indir ~/logdir --outdir ~/plots \
  --xaxis step --yaxis eval_return --bins 1e6

Docker Instructions

The Dockerfile lets you run DreamerV2 without installing its dependencies in your system. This requires you to have Docker with GPU access set up.

Check your setup:

docker run -it --rm --gpus all tensorflow/tensorflow:2.4.2-gpu nvidia-smi

Train on Atari:

docker build -t dreamerv2 .
docker run -it --rm --gpus all -v ~/logdir:/logdir dreamerv2 \
  python3 dreamerv2/train.py --logdir /logdir/atari_pong/dreamerv2/1 \
    --configs atari --task atari_pong

Train on DM Control:

docker build -t dreamerv2 . --build-arg MUJOCO_KEY="$(cat ~/.mujoco/mjkey.txt)"
docker run -it --rm --gpus all -v ~/logdir:/logdir dreamerv2 \
  python3 dreamerv2/train.py --logdir /logdir/dmc_walker_walk/dreamerv2/1 \
    --configs dmc_vision --task dmc_walker_walk

Tips

  • Efficient debugging. You can use the debug config as in --configs atari debug. This reduces the batch size, increases the evaluation frequency, and disables tf.function graph compilation for easy line-by-line debugging.

  • Infinite gradient norms. This is normal and described under loss scaling in the mixed precision guide. You can disable mixed precision by passing --precision 32 to the training script. Mixed precision is faster but can in principle cause numerical instabilities.

  • Accessing logged metrics. The metrics are stored in both TensorBoard and JSON lines format. You can directly load them using pandas.read_json(). The plotting script also stores the binned and aggregated metrics of multiple runs into a single JSON file for easy manual plotting.

More Repositories

1

handout

Turn Python scripts into handouts with Markdown and figures
Python
1,994
star
2

dreamerv3

Mastering Diverse Domains through World Models
Python
603
star
3

dreamer

Dream to Control: Learning Behaviors by Latent Imagination
Python
456
star
4

crafter

Benchmarking the Spectrum of Agent Capabilities
Python
279
star
5

layered

Clean implementation of feed forward neural networks
Python
237
star
6

mindpark

Testbed for deep reinforcement learning
Python
161
star
7

daydreamer

DayDreamer: World Models for Physical Robot Learning
Jupyter Notebook
141
star
8

director

Deep Hierarchical Planning from Pixels
Python
60
star
9

embodied

Fast reinforcement learning research
Python
50
star
10

ninjax

General Modules for JAX
Python
45
star
11

computer-game

Data-oriented voxel game engine
C++
37
star
12

elements

Building blocks for productive research
Python
36
star
13

crafter-baselines

Docker containers of baseline agents for the Crafter environment
Python
25
star
14

sets

Read datasets in a standard way
Python
19
star
15

diamond_env

Standardized Minecraft Diamond Environment for Reinforcement Learning
Python
18
star
16

voxel-smoothing-2d

Orientation independent bรฉzier smoothing of voxel grids
C++
17
star
17

course-machine-intelligence-2

Jupyter Notebook
13
star
18

npgame

Write simple games in Numpy!
Python
12
star
19

dotfiles

My Linux and Mac configuration
Perl
12
star
20

semantic

Python
10
star
21

training-py

My solutions to programming puzzles
Python
8
star
22

imptools

Tools for improving Python imports
Python
8
star
23

bridgewalk

Visual reinforcement learning benchmark for controllability
Python
6
star
24

cowherd

Partially-observed visual reinforcement learning domain
Python
6
star
25

definitions

Load and validate YAML definitions against a schema
Python
5
star
26

map-pdf

Generate printable PDF documents from Leaflet maps
JavaScript
4
star
27

modurale

Modular real time engine for computer graphics applications
CMake
4
star
28

seminar-knowledge-mining

Wikimedia image classification and suggestings for article authors
Python
3
star
29

couse-ml-stanford

Programming assignments for the Stanford Machine Learning course by Andrew Ng
MATLAB
3
star
30

invoicepad

Freelancer solution covering time tracking, invoice generation and archiving
JavaScript
3
star
31

teleport

Efficiently send large arrays across machines
Python
2
star
32

training-ml

Python
2
star
33

chunkedfile

Save file writes into multiple chunks
Python
1
star
34

notebook-big-data

Jupyter Notebook
1
star
35

course-ml-fuberlin

Python
1
star
36

bookmarks-switcher

Chrome plugin to select which bookmarks folder to show as the bookmarks bar
JavaScript
1
star
37

training-cpp

My solutions to programming puzzles
C++
1
star
38

scope

Metrics logging and analysis
Python
1
star
39

jumper

Platformer and puzzle solving game written in Python
Python
1
star