• Stars
    star
    1,124
  • Rank 41,395 (Top 0.9 %)
  • Language
    Python
  • License
    MIT License
  • Created about 6 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.

WARNING: This repository is no longer maintained, please use the RL-Baselines3 Zoo for an up-to-date version, powered by Stable-Baselines3

Build Status

RL Baselines Zoo: a Collection of Pre-Trained Reinforcement Learning Agents

A collection of trained Reinforcement Learning (RL) agents, with tuned hyperparameters, using Stable Baselines.

We are looking for contributors to complete the collection!

Goals of this repository:

  1. Provide a simple interface to train and enjoy RL agents
  2. Benchmark the different Reinforcement Learning algorithms
  3. Provide tuned hyperparameters for each environment and RL algorithm
  4. Have fun with the trained agents!

Enjoy a Trained Agent

If the trained agent exists, then you can see it in action using:

python enjoy.py --algo algo_name --env env_id

For example, enjoy A2C on Breakout during 5000 timesteps:

python enjoy.py --algo a2c --env BreakoutNoFrameskip-v4 --folder trained_agents/ -n 5000

If you have trained an agent yourself, you need to do:

# exp-id 0 corresponds to the last experiment, otherwise, you can specify another ID
python enjoy.py --algo algo_name --env env_id -f logs/ --exp-id 0

To load the best model (when using evaluation environment):

python enjoy.py --algo algo_name --env env_id -f logs/ --exp-id 1 --load-best

Train an Agent

The hyperparameters for each environment are defined in hyperparameters/algo_name.yml.

If the environment exists in this file, then you can train an agent using:

python train.py --algo algo_name --env env_id

For example (with tensorboard support):

python train.py --algo ppo2 --env CartPole-v1 --tensorboard-log /tmp/stable-baselines/

Evaluate the agent every 10000 steps using 10 episodes for evaluation:

python train.py --algo sac --env HalfCheetahBulletEnv-v0 --eval-freq 10000 --eval-episodes 10

Save a checkpoint of the agent every 100000 steps:

python train.py --algo td3 --env HalfCheetahBulletEnv-v0 --save-freq 100000

Continue training (here, load pretrained agent for Breakout and continue training for 5000 steps):

python train.py --algo a2c --env BreakoutNoFrameskip-v4 -i trained_agents/a2c/BreakoutNoFrameskip-v4.pkl -n 5000

Note: when training TRPO, you have to use mpirun to enable multiprocessing:

mpirun -n 16 python train.py --algo trpo --env BreakoutNoFrameskip-v4

Hyperparameter Tuning

We use Optuna for optimizing the hyperparameters.

Note: hyperparameters search is not implemented for ACER and DQN for now. when using SuccessiveHalvingPruner ("halving"), you must specify --n-jobs > 1

Budget of 1000 trials with a maximum of 50000 steps:

python train.py --algo ppo2 --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 \
  --sampler tpe --pruner median

Env Wrappers

You can specify in the hyperparameter config one or more wrapper to use around the environment:

for one wrapper:

env_wrapper: gym_minigrid.wrappers.FlatObsWrapper

for multiple, specify a list:

env_wrapper:
    - utils.wrappers.DoneOnSuccessWrapper:
        reward_offset: 1.0
    - utils.wrappers.TimeFeatureWrapper

Note that you can easily specify parameters too.

Env keyword arguments

You can specify keyword arguments to pass to the env constructor in the command line, using --env-kwargs:

python enjoy.py --algo ppo2 --env MountainCar-v0 --env-kwargs goal_velocity:10

Overwrite hyperparameters

You can easily overwrite hyperparameters in the command line, using --hyperparams:

python train.py --algo a2c --env MountainCarContinuous-v0 --hyperparams learning_rate:0.001 policy_kwargs:"dict(net_arch=[64, 64])"

Record a Video of a Trained Agent

Record 1000 steps:

python -m utils.record_video --algo ppo2 --env BipedalWalkerHardcore-v2 -n 1000

Current Collection: 120+ Trained Agents!

Scores can be found in benchmark.md. To compute them, simply run python -m utils.benchmark.

Atari Games

7 atari games from OpenAI benchmark (NoFrameskip-v4 versions).

RL Algo BeamRider Breakout Enduro Pong Qbert Seaquest SpaceInvaders
A2C ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
ACER ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
ACKTR ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
PPO2 ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
DQN ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
TRPO

Additional Atari Games (to be completed):

RL Algo MsPacman
A2C ✔️
ACER ✔️
ACKTR ✔️
PPO2 ✔️
DQN ✔️

Classic Control Environments

RL Algo CartPole-v1 MountainCar-v0 Acrobot-v1 Pendulum-v0 MountainCarContinuous-v0
A2C ✔️ ✔️ ✔️ ✔️ ✔️
ACER ✔️ ✔️ ✔️ N/A N/A
ACKTR ✔️ ✔️ ✔️ ✔️ ✔️
PPO2 ✔️ ✔️ ✔️ ✔️ ✔️
DQN ✔️ ✔️ ✔️ N/A N/A
DDPG N/A N/A N/A ✔️ ✔️
SAC N/A N/A N/A ✔️ ✔️
TD3 N/A N/A N/A ✔️ ✔️
TRPO ✔️ ✔️ ✔️ ✔️

Box2D Environments

RL Algo BipedalWalker-v2 LunarLander-v2 LunarLanderContinuous-v2 BipedalWalkerHardcore-v2 CarRacing-v0
A2C ✔️ ✔️ ✔️ ✔️
ACER N/A ✔️ N/A N/A N/A
ACKTR ✔️ ✔️ ✔️ ✔️
PPO2 ✔️ ✔️ ✔️ ✔️
DQN N/A ✔️ N/A N/A N/A
DDPG ✔️ N/A ✔️
SAC ✔️ N/A ✔️ ✔️
TD3 ✔️ N/A ✔️
TRPO ✔️ ✔️ ✔️

PyBullet Environments

See https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet/gym/pybullet_envs. Similar to MuJoCo Envs but with a free simulator: pybullet. We are using BulletEnv-v0 version.

Note: those environments are derived from Roboschool and are much harder than the Mujoco version (see Pybullet issue)

RL Algo Walker2D HalfCheetah Ant Reacher Hopper Humanoid
A2C ✔️ ✔️ ✔️ ✔️
ACKTR ✔️
PPO2 ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
DDPG ✔️ ✔️ ✔️
SAC ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
TD3 ✔️ ✔️ ✔️ ✔️ ✔️
TRPO ✔️ ✔️ ✔️ ✔️

PyBullet Envs (Continued)

RL Algo Minitaur MinitaurDuck InvertedDoublePendulum InvertedPendulumSwingup
A2C
ACKTR
PPO2 ✔️ ✔️ ✔️ ✔️
DDPG
SAC ✔️ ✔️
TD3 ✔️ ✔️
TRPO

MiniGrid Envs

See https://github.com/maximecb/gym-minigrid A simple, lightweight and fast Gym environments implementation of the famous gridworld.

RL Algo Empty FourRooms DoorKey MultiRoom Fetch
A2C
PPO2 ✔️ ✔️
DDPG
SAC
TRPO

There are 19 environment groups (variations for each) in total.

Note that you need to specify --gym-packages gym_minigrid with enjoy.py and train.py as it is not a standard Gym environment, as well as installing the custom Gym package module or putting it in python path.

pip install gym-minigrid
python train.py --algo ppo2 --env MiniGrid-DoorKey-5x5-v0 --gym-packages gym_minigrid

This does the same thing as:

import gym_minigrid

Also, you may need to specify a Gym environment wrapper in hyperparameters, as MiniGrid environments have Dict observation space, which is not supported by StableBaselines for now.

MiniGrid-DoorKey-5x5-v0:
  env_wrapper: gym_minigrid.wrappers.FlatObsWrapper

Colab Notebook: Try it Online!

You can train agents online using colab notebook.

Installation

Stable-Baselines PyPi Package

Min version: stable-baselines[mpi] >= 2.10.0

apt-get install swig cmake libopenmpi-dev zlib1g-dev ffmpeg
pip install -r requirements.txt

Please see Stable Baselines README for alternatives.

Docker Images

Build docker image (CPU):

./scripts/build_docker.sh

GPU:

USE_GPU=True ./scripts/build_docker.sh

Pull built docker image (CPU):

docker pull stablebaselines/rl-baselines-zoo-cpu

GPU image:

docker pull stablebaselines/rl-baselines-zoo

Run script in the docker image:

./scripts/run_docker_cpu.sh python train.py --algo ppo2 --env CartPole-v1

Tests

To run tests, first install pytest, then:

python -m pytest -v tests/

Citing the Project

To cite this repository in publications:

@misc{rl-zoo,
  author = {Raffin, Antonin},
  title = {RL Baselines Zoo},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/araffin/rl-baselines-zoo}},
}

Contributing

If you trained an agent that is not present in the rl zoo, please submit a Pull Request (containing the hyperparameters and the score too).

Contributors

We would like to thanks our contributors: @iandanforth, @tatsubori @Shade5

More Repositories

1

robotics-rl-srl

S-RL Toolbox: Reinforcement Learning (RL) and State Representation Learning (SRL) for Robotics
Python
609
star
2

rl-tutorial-jnrr19

Stable-Baselines tutorial for Journées Nationales de la Recherche en Robotique 2019
Jupyter Notebook
583
star
3

sbx

SBX: Stable Baselines Jax (SB3 + Jax)
Python
303
star
4

srl-zoo

State Representation Learning (SRL) zoo with PyTorch - Part of S-RL Toolbox
Python
162
star
5

arduino-robust-serial

A simple and robust serial communication protocol. It was designed for Arduino but can be used for other purposes (e.g. bluetooth, sockets). Implementation in C Arduino, C++, Python and Rust.
C++
134
star
6

tools-for-robotic-rl-icra2022

Tutorial: Tools for Robotic Reinforcement Learning, Hands-on RL for Robotics with EAGER and Stable-Baselines3
HTML
91
star
7

rl-handson-rlvs21

Stable-Baselines3 (SB3) reinforcement learning tutorial for the Reinforcement Learning Virtual School 2021.
Jupyter Notebook
48
star
8

rlss23-dqn-tutorial

Deep Q-Network (DQN) and Fitted Q-Iteration (FQI) tutorial for RL Summer School 2023
Jupyter Notebook
47
star
9

aae-train-donkeycar

Code used to train an augmented auto-encoder (aka denoising auto-encoder with more augmentations) for the DonkeyCar simulator.
Python
43
star
10

rust-arduino-serial

A simple and robust serial communication protocol. It was designed for Arduino but can be used for other purposes (e.g. bluetooth, sockets). This is the Rust implementation.
Rust
35
star
11

python-arduino-serial

A simple and robust serial communication protocol. It was designed for Arduino but can be used for other purposes (e.g. bluetooth, sockets). This is the Python implementation.
Python
31
star
12

datasaurust

Blazingly fast implementation of the Datasaurus paper. Same Stats, Different Graphs.
Rust
19
star
13

cpp-arduino-serial

A simple and robust serial communication protocol. It was designed for Arduino but can be used for other purposes (e.g. bluetooth, sockets). This is the C++ implementation.
C++
18
star
14

pybullet_envs_gymnasium

Port of pybullet envs to gymnasium
Python
13
star
15

graph-annotation-tool

A graph annotation tool using a flask server and javascript
HTML
12
star
16

advent-of-code-2022

advent-of-code-2022 with Rust, easy to read solutions
Rust
4
star
17

es-with-search-path

Implementation of the (μ/μ,λ)-Evolution Strategy (ES) with Search Path algorithm in C++
C
2
star
18

araffin.github.io

Personal Homepage
JavaScript
2
star
19

eagerx_quadruped

Python
1
star
20

custom-hugo-academic

HTML
1
star
21

her-experiments

Python
1
star