• Stars
    star
    262
  • Rank 155,253 (Top 4 %)
  • Language
    Python
  • License
    Other
  • Created over 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

The 37 Implementation Details of Proximal Policy Optimization

This repo contains the source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

If you like this repo, consider checking out CleanRL (https://github.com/vwxyzjn/cleanrl), the RL library that we used to build this repo.

Get started

Prerequisites:

Install dependencies:

poetry install

Train agents:

poetry run python ppo.py

Train agents with experiment tracking:

poetry run python ppo.py --track --capture-video

Atari

Install dependencies:

poetry install -E atari

Train agents:

poetry run python ppo_atari.py

Train agents with experiment tracking:

poetry run python ppo_atari.py --track --capture-video

Pybullet

Install dependencies:

poetry install -E pybullet

Train agents:

poetry run python ppo_continuous_action.py

Train agents with experiment tracking:

poetry run python ppo_continuous_action.py --track --capture-video

Gym-microrts (MultiDiscrete)

Install dependencies:

poetry install -E gym-microrts

Train agents:

poetry run python ppo_multidiscrete.py

Train agents with experiment tracking:

poetry run python ppo_multidiscrete.py --track --capture-video

Train agents with invalid action masking:

poetry run python ppo_multidiscrete_mask.py

Train agents with invalid action masking and experiment tracking:

poetry run python ppo_multidiscrete_mask.py --track --capture-video

Atari with Envpool

Install dependencies:

poetry install -E envpool

Train agents:

poetry run python ppo_atari_envpool.py

Train agents with experiment tracking:

poetry run python ppo_atari_envpool.py --track

Solve Pong-v5 in 5 mins:

poetry run python ppo_atari_envpool.py --clip-coef=0.2 --num-envs=16 --num-minibatches=8 --num-steps=128 --update-epochs=3

400 game scores in Breakout-v5 with PPO in ~1 hour (side-effects-free 3-4x speed up compared to ppo_atari.py with SyncVectorEnv):

poetry run python ppo_atari_envpool.py --gym-id Breakout-v5

Procgen

Install dependencies:

poetry install -E procgen

Train agents:

poetry run python ppo_procgen.py

Train agents with experiment tracking:

poetry run python ppo_procgen.py --track

Reproduction of all of our results

To reproduce the results run with openai/baselines, install our fork at hhttps://github.com/vwxyzjn/baselines. Then follow the scripts in scripts/baselines. To reproduce our results, follow the scripts in scripts/ours.

Citation

@inproceedings{shengyi2022the37implementation,
  author = {Huang, Shengyi and Dossa, Rousslan Fernand Julien and Raffin, Antonin and Kanervisto, Anssi and Wang, Weixun},
  title = {The 37 Implementation Details of Proximal Policy Optimization},
  booktitle = {ICLR Blog Track},
  year = {2022},
  note = {https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/},
  url  = {https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/}
}

More Repositories

1

cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Python
5,285
star
2

portwarden

Create Encrypted Backups of Your Bitwarden Vault with Attachments
Go
568
star
3

lm-human-preference-details

RLHF implementation details of OAI's 2019 codebase
Python
143
star
4

cleanba

CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
Python
88
star
5

invalid-action-masking

Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
Python
88
star
6

summarize_from_feedback_details

Python
81
star
7

PPO-Implementation-Deep-Dive

DEPRECATED - please visit https://github.com/vwxyzjn/ppo-implementation-details
Python
37
star
8

gym-microrts-paper

The source code for the gym-microrts paper.
Python
36
star
9

a2c_is_a_special_case_of_ppo

A2C is a special case of PPO!
Python
15
star
10

SC2AI

Integrated Tensorforce and OpenAI Gym to train SC II game agents.
Jupyter Notebook
13
star
11

jupyter_disqus

Add Disqus to your Jupyter notebook.
Python
13
star
12

gym-pysc2

Gym wrapper for pysc2
Python
8
star
13

envpool-cleanrl

Python
6
star
14

action-guidance

Python
6
star
15

ppo-atari-metrics

Python
4
star
16

vectorized-value-methods

[WIP] Vectorized architecture for value-based methods such as DQN and DDPG
Python
3
star
17

entity-ppo-demo

Python
2
star
18

CS583FinalProject

Python
1
star
19

Resume-master

TeX
1
star
20

minimal-adam-layer-norm-bug-repro

Python
1
star
21

embedding_projector

Python
1
star
22

RLControlSkipFrames

Python
1
star
23

launcha

Launcha is a simple Docker-based cloud job launcher.
Python
1
star
24

gym_minigrid

Python
1
star
25

CS618

Jupyter Notebook
1
star
26

validate-new-gym-mujoco-envs

Python
1
star
27

vuetify-parallax-starter2

JavaScript
1
star
28

envpool-xla-cleanrl

Python
1
star
29

cleanba-test

Python
1
star
30

envpool_bug

Python
1
star
31

Sentiment-Analysis-LSTM

Used neural network to classify movie reviews based on sentiment
Jupyter Notebook
1
star
32

aws-sagemaker-example

Jupyter Notebook
1
star
33

LP_optimization_python

Linear Programming for Optimal Scheduling by Using Gurobipy
TeX
1
star
34

CS583

Python
1
star
35

lm-human-preferences

Code for the paper Fine-Tuning Language Models from Human Preferences
Python
1
star