• Stars
    star
    5,285
  • Rank 7,744 (Top 0.2 %)
  • Language
    Python
  • License
    Other
  • Created over 5 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

CleanRL (Clean Implementation of RL Algorithms)

tests docs Code style: black Imports: isort Open In Colab

CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features. The implementation is clean and simple, yet we can scale it to run thousands of experiments using AWS Batch. The highlight features of CleanRL are:

  • ๐Ÿ“œ Single-file implementation
    • Every detail about an algorithm variant is put into a single standalone file.
    • For example, our ppo_atari.py only has 340 lines of code but contains all implementation details on how PPO works with Atari games, so it is a great reference implementation to read for folks who do not wish to read an entire modular library.
  • ๐Ÿ“Š Benchmarked Implementation (7+ algorithms and 34+ games at https://benchmark.cleanrl.dev)
  • ๐Ÿ“ˆ Tensorboard Logging
  • ๐Ÿช› Local Reproducibility via Seeding
  • ๐ŸŽฎ Videos of Gameplay Capturing
  • ๐Ÿงซ Experiment Management with Weights and Biases
  • ๐Ÿ’ธ Cloud Integration with docker and AWS

You can read more about CleanRL in our JMLR paper and documentation.

CleanRL only contains implementations of online deep reinforcement learning algorithms. If you are looking for offline algorithms, please check out tinkoff-ai/CORL, which shares a similar design philosophy as CleanRL.

โ„น๏ธ Support for Gymnasium: Farama-Foundation/Gymnasium is the next generation of openai/gym that will continue to be maintained and introduce new features. Please see their announcement for further detail. We are migrating to gymnasium and the progress can be tracked in vwxyzjn/cleanrl#277.

โš ๏ธ NOTE: CleanRL is not a modular library and therefore it is not meant to be imported. At the cost of duplicate code, we make all implementation details of a DRL algorithm variant easy to understand, so CleanRL comes with its own pros and cons. You should consider using CleanRL if you want to 1) understand all implementation details of an algorithm's varaint or 2) prototype advanced features that other modular DRL libraries do not support (CleanRL has minimal lines of code so it gives you great debugging experience and you don't have do a lot of subclassing like sometimes in modular DRL libraries).

Get started

Prerequisites:

To run experiments locally, give the following a try:

git clone https://github.com/vwxyzjn/cleanrl.git && cd cleanrl
poetry install

# alternatively, you could use `poetry shell` and do
# `python run cleanrl/ppo.py`
poetry run python cleanrl/ppo.py \
    --seed 1 \
    --env-id CartPole-v0 \
    --total-timesteps 50000

# open another temrminal and enter `cd cleanrl/cleanrl`
tensorboard --logdir runs

To use experiment tracking with wandb, run

wandb login # only required for the first time
poetry run python cleanrl/ppo.py \
    --seed 1 \
    --env-id CartPole-v0 \
    --total-timesteps 50000 \
    --track \
    --wandb-project-name cleanrltest

If you are not using poetry, you can install CleanRL with requirements.txt:

# core dependencies
pip install -r requirements/requirements.txt

# optional dependencies
pip install -r requirements/requirements-atari.txt
pip install -r requirements/requirements-mujoco.txt
pip install -r requirements/requirements-mujoco_py.txt
pip install -r requirements/requirements-procgen.txt
pip install -r requirements/requirements-envpool.txt
pip install -r requirements/requirements-pettingzoo.txt
pip install -r requirements/requirements-jax.txt
pip install -r requirements/requirements-docs.txt
pip install -r requirements/requirements-cloud.txt

To run training scripts in other games:

poetry shell

# classic control
python cleanrl/dqn.py --env-id CartPole-v1
python cleanrl/ppo.py --env-id CartPole-v1
python cleanrl/c51.py --env-id CartPole-v1

# atari
poetry install -E atari
python cleanrl/dqn_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/c51_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/ppo_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/sac_atari.py --env-id BreakoutNoFrameskip-v4

# NEW: 3-4x side-effects free speed up with envpool's atari (only available to linux)
poetry install -E envpool
python cleanrl/ppo_atari_envpool.py --env-id BreakoutNoFrameskip-v4
# Learn Pong-v5 in ~5-10 mins
# Side effects such as lower sample efficiency might occur
poetry run python ppo_atari_envpool.py --clip-coef=0.2 --num-envs=16 --num-minibatches=8 --num-steps=128 --update-epochs=3

# procgen
poetry install -E procgen
python cleanrl/ppo_procgen.py --env-id starpilot
python cleanrl/ppg_procgen.py --env-id starpilot

# ppo + lstm
poetry install -E atari
python cleanrl/ppo_atari_lstm.py --env-id BreakoutNoFrameskip-v4

You may also use a prebuilt development environment hosted in Gitpod:

Open in Gitpod

Algorithms Implemented

Algorithm Variants Implemented
โœ… Proximal Policy Gradient (PPO) ppo.py, docs
ppo_atari.py, docs
ppo_continuous_action.py, docs
ppo_atari_lstm.py, docs
ppo_atari_envpool.py, docs
ppo_atari_envpool_xla_jax.py, docs
ppo_atari_envpool_xla_jax_scan.py, docs)
ppo_procgen.py, docs
ppo_atari_multigpu.py, docs
ppo_pettingzoo_ma_atari.py, docs
ppo_continuous_action_isaacgym.py, docs
โœ… Deep Q-Learning (DQN) dqn.py, docs
dqn_atari.py, docs
dqn_jax.py, docs
dqn_atari_jax.py, docs
โœ… Categorical DQN (C51) c51.py, docs
c51_atari.py, docs
c51_jax.py, docs
c51_atari_jax.py, docs
โœ… Soft Actor-Critic (SAC) sac_continuous_action.py, docs
sac_atari.py, docs
โœ… Deep Deterministic Policy Gradient (DDPG) ddpg_continuous_action.py, docs
ddpg_continuous_action_jax.py, docs
โœ… Twin Delayed Deep Deterministic Policy Gradient (TD3) td3_continuous_action.py, docs
td3_continuous_action_jax.py, docs
โœ… Phasic Policy Gradient (PPG) ppg_procgen.py, docs
โœ… Random Network Distillation (RND) ppo_rnd_envpool.py, docs
โœ… Qdagger qdagger_dqn_atari_impalacnn.py, docs
qdagger_dqn_atari_jax_impalacnn.py, docs

Open RL Benchmark

To make our experimental data transparent, CleanRL participates in a related project called Open RL Benchmark, which contains tracked experiments from popular DRL libraries such as ours, Stable-baselines3, openai/baselines, jaxrl, and others.

Check out https://benchmark.cleanrl.dev/ for a collection of Weights and Biases reports showcasing tracked DRL experiments. The reports are interactive, and researchers can easily query information such as GPU utilization and videos of an agent's gameplay that are normally hard to acquire in other RL benchmarks. In the future, Open RL Benchmark will likely provide an dataset API for researchers to easily access the data (see repo).

Support and get involved

We have a Discord Community for support. Feel free to ask questions. Posting in Github Issues and PRs are also welcome. Also our past video recordings are available at YouTube

Citing CleanRL

If you use CleanRL in your work, please cite our technical paper:

@article{huang2022cleanrl,
  author  = {Shengyi Huang and Rousslan Fernand Julien Dossa and Chang Ye and Jeff Braga and Dipam Chakraborty and Kinal Mehta and Joรฃo G.M. Araรบjo},
  title   = {CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms},
  journal = {Journal of Machine Learning Research},
  year    = {2022},
  volume  = {23},
  number  = {274},
  pages   = {1--18},
  url     = {http://jmlr.org/papers/v23/21-1342.html}
}

More Repositories

1

portwarden

Create Encrypted Backups of Your Bitwarden Vault with Attachments
Go
568
star
2

ppo-implementation-details

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
Python
262
star
3

lm-human-preference-details

RLHF implementation details of OAI's 2019 codebase
Python
143
star
4

cleanba

CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
Python
88
star
5

invalid-action-masking

Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
Python
88
star
6

summarize_from_feedback_details

Python
81
star
7

PPO-Implementation-Deep-Dive

DEPRECATED - please visit https://github.com/vwxyzjn/ppo-implementation-details
Python
37
star
8

gym-microrts-paper

The source code for the gym-microrts paper.
Python
36
star
9

a2c_is_a_special_case_of_ppo

A2C is a special case of PPO!
Python
15
star
10

SC2AI

Integrated Tensorforce and OpenAI Gym to train SC II game agents.
Jupyter Notebook
13
star
11

jupyter_disqus

Add Disqus to your Jupyter notebook.
Python
13
star
12

gym-pysc2

Gym wrapper for pysc2
Python
8
star
13

envpool-cleanrl

Python
6
star
14

action-guidance

Python
6
star
15

ppo-atari-metrics

Python
4
star
16

vectorized-value-methods

[WIP] Vectorized architecture for value-based methods such as DQN and DDPG
Python
3
star
17

entity-ppo-demo

Python
2
star
18

CS583FinalProject

Python
1
star
19

Resume-master

TeX
1
star
20

minimal-adam-layer-norm-bug-repro

Python
1
star
21

embedding_projector

Python
1
star
22

RLControlSkipFrames

Python
1
star
23

launcha

Launcha is a simple Docker-based cloud job launcher.
Python
1
star
24

gym_minigrid

Python
1
star
25

CS618

Jupyter Notebook
1
star
26

validate-new-gym-mujoco-envs

Python
1
star
27

vuetify-parallax-starter2

JavaScript
1
star
28

envpool-xla-cleanrl

Python
1
star
29

cleanba-test

Python
1
star
30

envpool_bug

Python
1
star
31

Sentiment-Analysis-LSTM

Used neural network to classify movie reviews based on sentiment
Jupyter Notebook
1
star
32

aws-sagemaker-example

Jupyter Notebook
1
star
33

LP_optimization_python

Linear Programming for Optimal Scheduling by Using Gurobipy
TeX
1
star
34

CS583

Python
1
star
35

lm-human-preferences

Code for the paper Fine-Tuning Language Models from Human Preferences
Python
1
star