• Stars
    star
    2,034
  • Rank 22,745 (Top 0.5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.

CI Documentation Status coverage report codestyle

RL Baselines3 Zoo: A Training Framework for Stable Baselines3 Reinforcement Learning Agents

RL Baselines3 Zoo is a training framework for Reinforcement Learning (RL), using Stable Baselines3.

It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.

In addition, it includes a collection of tuned hyperparameters for common environments and RL algorithms, and agents trained with those settings.

We are looking for contributors to complete the collection!

Goals of this repository:

  1. Provide a simple interface to train and enjoy RL agents
  2. Benchmark the different Reinforcement Learning algorithms
  3. Provide tuned hyperparameters for each environment and RL algorithm
  4. Have fun with the trained agents!

This is the SB3 version of the original SB2 rl-zoo.

Documentation

Documentation is available online: https://rl-baselines3-zoo.readthedocs.io/

Installation

Minimal installation

From source:

pip install -e .

As a python package:

pip install rl_zoo3

Note: you can do python -m rl_zoo3.train from any folder and you have access to rl_zoo3 command line interface, for instance, rl_zoo3 train is equivalent to python train.py

Full installation (with extra envs and test dependencies)

apt-get install swig cmake ffmpeg
pip install -r requirements.txt

Please see Stable Baselines3 documentation for alternatives to install stable baselines3.

Train an Agent

The hyperparameters for each environment are defined in hyperparameters/algo_name.yml.

If the environment exists in this file, then you can train an agent using:

python train.py --algo algo_name --env env_id

Evaluate the agent every 10000 steps using 10 episodes for evaluation (using only one evaluation env):

python train.py --algo sac --env HalfCheetahBulletEnv-v0 --eval-freq 10000 --eval-episodes 10 --n-eval-envs 1

More examples are available in the documentation.

Integrations

The RL Zoo has some integration with other libraries/services like Weights & Biases for experiment tracking or Hugging Face for storing/sharing trained models. You can find out more in the dedicated section of the documentation.

Plot Scripts

Please the see dedicated section of the documentation.

Enjoy a Trained Agent

Note: to download the repo with the trained agents, you must use git clone --recursive https://github.com/DLR-RM/rl-baselines3-zoo in order to clone the submodule too.

If the trained agent exists, then you can see it in action using:

python enjoy.py --algo algo_name --env env_id

For example, enjoy A2C on Breakout during 5000 timesteps:

python enjoy.py --algo a2c --env BreakoutNoFrameskip-v4 --folder rl-trained-agents/ -n 5000

Hyperparameters Tuning

Please the see dedicated section of the documentation.

Custom Configuration

Please the see dedicated section of the documentation.

Current Collection: 200+ Trained Agents!

Final performance of the trained agents can be found in benchmark.md. To compute them, simply run python -m rl_zoo3.benchmark.

List and videos of trained agents can be found on our Huggingface page: https://huggingface.co/sb3

NOTE: this is not a quantitative benchmark as it corresponds to only one run (cf issue #38). This benchmark is meant to check algorithm (maximal) performance, find potential bugs and also allow users to have access to pretrained agents.

Atari Games

7 atari games from OpenAI benchmark (NoFrameskip-v4 versions).

RL Algo BeamRider Breakout Enduro Pong Qbert Seaquest SpaceInvaders
A2C โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
PPO โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
DQN โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
QR-DQN โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ

Additional Atari Games (to be completed):

RL Algo MsPacman Asteroids RoadRunner
A2C โœ”๏ธ โœ”๏ธ โœ”๏ธ
PPO โœ”๏ธ โœ”๏ธ โœ”๏ธ
DQN โœ”๏ธ โœ”๏ธ โœ”๏ธ
QR-DQN โœ”๏ธ โœ”๏ธ โœ”๏ธ

Classic Control Environments

RL Algo CartPole-v1 MountainCar-v0 Acrobot-v1 Pendulum-v1 MountainCarContinuous-v0
ARS โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
A2C โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
PPO โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
DQN โœ”๏ธ โœ”๏ธ โœ”๏ธ N/A N/A
QR-DQN โœ”๏ธ โœ”๏ธ โœ”๏ธ N/A N/A
DDPG N/A N/A N/A โœ”๏ธ โœ”๏ธ
SAC N/A N/A N/A โœ”๏ธ โœ”๏ธ
TD3 N/A N/A N/A โœ”๏ธ โœ”๏ธ
TQC N/A N/A N/A โœ”๏ธ โœ”๏ธ
TRPO โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ

Box2D Environments

RL Algo BipedalWalker-v3 LunarLander-v2 LunarLanderContinuous-v2 BipedalWalkerHardcore-v3 CarRacing-v0
ARS โœ”๏ธ โœ”๏ธ
A2C โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
PPO โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
DQN N/A โœ”๏ธ N/A N/A N/A
QR-DQN N/A โœ”๏ธ N/A N/A N/A
DDPG โœ”๏ธ N/A โœ”๏ธ
SAC โœ”๏ธ N/A โœ”๏ธ โœ”๏ธ
TD3 โœ”๏ธ N/A โœ”๏ธ โœ”๏ธ
TQC โœ”๏ธ N/A โœ”๏ธ โœ”๏ธ
TRPO โœ”๏ธ โœ”๏ธ

PyBullet Environments

See https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet/gym/pybullet_envs. Similar to MuJoCo Envs but with a free (MuJoCo 2.1.0+ is now free!) easy to install simulator: pybullet. We are using BulletEnv-v0 version.

Note: those environments are derived from Roboschool and are harder than the Mujoco version (see Pybullet issue)

RL Algo Walker2D HalfCheetah Ant Reacher Hopper Humanoid
ARS
A2C โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
PPO โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
DDPG โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
SAC โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
TD3 โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
TQC โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
TRPO โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ

PyBullet Envs (Continued)

RL Algo Minitaur MinitaurDuck InvertedDoublePendulum InvertedPendulumSwingup
A2C
PPO
DDPG
SAC
TD3
TQC

MuJoCo Environments

RL Algo Walker2d HalfCheetah Ant Swimmer Hopper Humanoid
ARS โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
A2C โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
PPO โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
DDPG
SAC โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
TD3 โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
TQC โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
TRPO โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ

Robotics Environments

See https://gym.openai.com/envs/#robotics and #71

MuJoCo version: 1.50.1.0 Gym version: 0.18.0

We used the v1 environments.

RL Algo FetchReach FetchPickAndPlace FetchPush FetchSlide
HER+TQC โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ

Panda robot Environments

See https://github.com/qgallouedec/panda-gym/.

Similar to MuJoCo Robotics Envs but with a free easy to install simulator: pybullet.

We used the v1 environments.

RL Algo PandaReach PandaPickAndPlace PandaPush PandaSlide PandaStack
HER+TQC โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ

MiniGrid Envs

See https://github.com/Farama-Foundation/Minigrid. A simple, lightweight and fast Gym environments implementation of the famous gridworld.

RL Algo Empty-Random-5x5 FourRooms DoorKey-5x5 MultiRoom-N4-S5 Fetch-5x5-N2 GoToDoor-5x5 PutNear-6x6-N2 RedBlueDoors-6x6 LockedRoom KeyCorridorS3R1 Unlock ObstructedMaze-2Dlh
A2C
PPO โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
DQN
QR-DQN
TRPO

There are 22 environment groups (variations for each) in total.

Colab Notebook: Try it Online!

You can train agents online using Colab notebook.

Passing arguments in an interactive session

The zoo is not meant to be executed from an interactive session (e.g: Jupyter Notebooks, IPython), however, it can be done by modifying sys.argv and adding the desired arguments.

Example

import sys
from rl_zoo3.train import train

sys.argv = ["python", "--algo", "ppo", "--env", "MountainCar-v0"]

train()

Tests

To run tests, first install pytest, then:

make pytest

Same for type checking with pytype:

make type

Citing the Project

To cite this repository in publications:

@misc{rl-zoo3,
  author = {Raffin, Antonin},
  title = {RL Baselines3 Zoo},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/DLR-RM/rl-baselines3-zoo}},
}

Contributing

If you trained an agent that is not present in the RL Zoo, please submit a Pull Request (containing the hyperparameters and the score too).

Contributors

We would like to thank our contributors: @iandanforth, @tatsubori @Shade5 @mcres, @ernestum, @qgallouedec

More Repositories

1

stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
Python
8,960
star
2

BlenderProc

A procedural Blender pipeline for photorealistic training image generation
Python
2,083
star
3

3DObjectTracking

Algorithms and Publications on 3D Object Tracking
C++
710
star
4

AugmentedAutoencoder

Official Code: Implicit 3D Orientation Learning for 6D Object Detection from RGB Images
Python
318
star
5

SingleViewReconstruction

Official Code: 3D Scene Reconstruction from a Single Viewport
Python
258
star
6

RAFCON

RAFCON (RMC advanced flow control) uses hierarchical state machines, featuring concurrent state execution, to represent robot programs. It ships with a graphical user interface supporting the creation of state machines and contains IDE like debugging mechanisms. Alternatively, state machines can programmatically be generated using RAFCON's API.
Python
180
star
7

rl-trained-agents

A collection of pre-trained RL agents using Stable Baselines3
Python
102
star
8

oaisys

Python
52
star
9

granite

C++
49
star
10

instr

code of paper โ€žUnknown Object Segmentation from Stereo Imagesโ€œ, IROS 2021
Python
43
star
11

curvature

Official Code: Estimating Model Uncertainty of Neural Networks in Sparse Information Form, ICML2020.
Python
23
star
12

UMF

Python
17
star
13

amp

Point-to-point motion planning library for articulated robots.
C++
13
star
14

DistinctNet

"What's This?" - Learning to Segment Unknown Objects from Manipulation Sequences
Python
11
star
15

GRACE

Graph Assembly processing networks for robotic assembly sequence planning and feasibility learning
Python
10
star
16

rosmc

ROS Mission Control (ROSMC) -- A high-level mission designining and monitoring tool with intuitive graphical interfaces
Python
9
star
17

python-jsonconversion

Convert arbitrary Python objects into JSON strings and back.
Python
8
star
18

moegplib

Official Code: Trust Your Robots! Predictive Uncertainty Estimation of Neural Networks with Sparse Gaussian Processes
Python
8
star
19

ExReNet

Learning to Localize in New Environments from Synthetic Training Data
Python
7
star
20

RAFCON-ros-state-machines

RAFCON state machine examples using the ROS middleware
Python
5
star
21

python-yaml-configuration

Python
4
star
22

BayesSim2Real

Source code for IROS 2022 paper: Bayesian Active Learning for Sim-to-Real Robotic Perception.
Python
4
star
23

TendonDrivenContinuum

4
star
24

multicam_dataset_reader

C++
2
star
25

stios-utils

utility functions for the [Stereo Instance on Surfaces (STOIS) dataset
Python
2
star
26

SemanticSingleViewReconstruction

C++
1
star
27

rafcon-task-planner-plugin

A Plugin for RAFCON to interface arbitrary PDDL Planner.
Python
1
star
28

RECALL

Code and image database for IROS2022 paper "RECALL: Rehearsal-free Continual Learning for Object Classification". A algorithm to learn new object categories on the fly without forgetting the old ones and without the need to save previous images.
Python
1
star