• Stars
    star
    148
  • Rank 241,886 (Top 5 %)
  • Language
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Keeping track of RL experiments

RLlib Reference Results

Benchmarks of RLlib algorithms against published results. These benchmarks are a work in progress. For other results to compare against, see yarlp and more plots from OpenAI.

Ape-X Distributed Prioritized Experience Replay

rllib train -f atari-apex/atari-apex.yaml

Comparison of RLlib Ape-X to Async DQN after 10M time-steps (40M frames). Results compared to learning curves from Mnih et al, 2016 extracted at 10M time-steps from Figure 3.

env RLlib Ape-X 8-workers Mnih et al Async DQN 16-workers Mnih et al DQN 1-worker
BeamRider 6134 ~6000 ~3000
Breakout 123 ~50 ~10
QBert 15302 ~1200 ~500
SpaceInvaders 686 ~600 ~500

Here we use only eight workers per environment in order to run all experiments concurrently on a single g3.16xl machine. Further speedups may be obtained by using more workers. Comparing wall-time performance after 1 hour of training:

env RLlib Ape-X 8-workers Mnih et al Async DQN 16-workers Mnih et al DQN 1-worker
BeamRider 4873 ~1000 ~300
Breakout 77 ~10 ~1
QBert 4083 ~500 ~150
SpaceInvaders 646 ~300 ~160

Ape-X plots: apex

IMPALA and A2C

rllib train -f atari-impala/atari-impala.yaml

rllib train -f atari-a2c/atari-a2c.yaml

RLlib IMPALA and A2C on 10M time-steps (40M frames). Results compared to learning curves from Mnih et al, 2016 extracted at 10M time-steps from Figure 3.

env RLlib IMPALA 32-workers RLlib A2C 5-workers Mnih et al A3C 16-workers
BeamRider 2071 1401 ~3000
Breakout 385 374 ~150
QBert 4068 3620 ~1000
SpaceInvaders 719 692 ~600

IMPALA and A2C vs A3C after 1 hour of training:

env RLlib IMPALA 32-workers RLlib A2C 5-workers Mnih et al A3C 16-workers
BeamRider 3181 874 ~1000
Breakout 538 268 ~10
QBert 10850 1212 ~500
SpaceInvaders 843 518 ~300

IMPALA plots: tensorboard

A2C plots: tensorboard

Pong in 3 minutes

With a bit of tuning, RLlib IMPALA can solve Pong in ~3 minutes:

rllib train -f pong-speedrun/pong-impala-fast.yaml

tensorboard

DQN / Rainbow

rllib train -f atari-dqn/basic-dqn.yaml rllib train -f atari-dqn/duel-ddqn.yaml rllib train -f atari-dqn/dist-dqn.yaml

RLlib DQN after 10M time-steps (40M frames). Note that RLlib evaluation scores include the 1% random actions of epsilon-greedy exploration. You can expect slightly higher rewards when rolling out the policies without any exploration at all.

env RLlib Basic DQN RLlib Dueling DDQN RLlib Distributional DQN Hessel et al. DQN Hessel et al. Rainbow
BeamRider 2869 1910 4447 ~2000 ~13000
Breakout 287 312 410 ~150 ~300
QBert 3921 7968 15780 ~4000 ~20000
SpaceInvaders 650 1001 1025 ~500 ~2000

Basic DQN plots: tensorboard

Dueling DDQN plots: tensorboard

Distributional DQN plots: tensorboard

Proximal Policy Optimization

rllib train -f atari-ppo/atari-ppo.yaml

rllib train -f halfcheetah-ppo/halfcheetah-ppo.yaml

2018-09:

RLlib PPO with 10 workers (5 envs per worker) after 10M and 25M time-steps (40M/100M frames). Note that RLlib does not use clip parameter annealing.

env RLlib PPO @10M RLlib PPO @25M Baselines PPO @10M
BeamRider 2807 4480 ~1800
Breakout 104 201 ~250
QBert 11085 14247 ~14000
SpaceInvaders 671 944 ~800

tensorboard

RLlib PPO wall-time performance vs other implementations using a single Titan XP and the same number of CPUs. Results compared to learning curves from Fan et al, 2018 extracted at 1 hour of training from Figure 7. Here we get optimal results with a vectorization of 32 environment instances per worker:

env RLlib PPO 16-workers Fan et al PPO 16-workers TF BatchPPO 16-workers
HalfCheetah 9664 ~7700 ~3200

tensorboard

2020-01:

Same as 2018-09, comparing only RLlib PPO-tf vs PPO-torch.

env RLlib PPO @20M (tf) RLlib PPO @20M (torch) plot
BeamRider 4142 3850 tensorboard
Breakout 132 166 tensorboard
QBert 7987 14294 tensorboard
SpaceInvaders 956 1016 tensorboard

Soft Actor Critic

rllib train -f halfcheetah-sac/halfcheetah-sac.yaml

RLlib SAC after 3M time-steps.

RLlib SAC versus SoftLearning implementation Haarnoja et al, 2018 benchmarked at 500k and 3M timesteps respectively.

env RLlib SAC @500K Haarnoja et al SAC @500K RLlib SAC @3M Haarnoja et al SAC @3M
HalfCheetah 9000 ~9000 13000 ~15000

tensorboard

MAML

MAML uses additional metrics to measure performance; episode_reward_mean measures the agent's returns before adaptation, episode_reward_mean_adapt_N measures the agent's returns after N gradient steps of inner adaptation, and adaptation_delta measures the difference in performance before and after adaptation.

rllib train -f maml/halfcheetah-rand-direc-maml.yaml

tensorboard

rllib train -f maml/ant-rand-goal-maml.yaml

tensorboard

rllib train -f maml/pendulum-mass-maml.yaml

tensorboard

MB-MPO

rllib train -f mbmpo/halfcheetah-mbmpo.yaml

rllib train -f mbmpo/hopper-mbmpo.yaml

MBMPO uses additional metrics to measure performance. For each MBMPO iteration, MBMPO samples fake data from the transition dynamics workers and steps through MAML for N iterations. MAMLIter$i$_DynaTrajInner_$j$_episode_reward_mean corresponds to agent's performance across the dynamics models at the ith iteration of MAML and the jth step of inner adaptation.

RLlib MBMPO versus Clavera et al, 2018 benchmarked at 100k timesteps. Results reported below were ran on RLLib and the master branch of the original codebase respectively.

env RLlib MBPO @100K Clavera et al MBMPO @100K
HalfCheetah 520 ~550
Hopper 620 ~650

tensorboard

Dreamer

rllib train -f dreamer/dreamer-deepmind-control.yaml

RLlib Dreamer at 1M time-steps.

RLlib Dreamer versus Google implementation Danijar et al, 2020 benchmarked at 100k and 1M timesteps respectively.

env RLlib Dreamer @100K Danijar et al Dreamer @100K RLlib Dreamer @1M Danijar et al Dreamer @1M
Walker 320 ~250 920 ~930
Cheetah 300 ~250 640 ~800

tensorboard

RLlib Dreamer also logs gifs of Dreamer's imagined trajectories (Top: Ground truth, Middle: Model prediction, Bottom: Delta).

Alt Text Alt Text

CQL

rllib train -f halfcheetah-cql/halfcheetah-cql.yaml

rllib train -f halfcheetah-cql/halfcheetah-bc.yaml

Since CQL is an offline RL algorithm, CQL's returns are evaluated only during the evaluation loop (once every 1000 gradient steps for Mujoco-based envs).

RLlib CQL versus Behavior Cloning (BC) benchmarked at 1M gradient steps over the dataset derived from the D4RL benchmark (Fu et al, 2020). Results reported below were ran on RLLib. The only difference between BC and CQL is the bc_iters parameter in CQL (how many iterations to run BC loss).

RLlib's CQL is evaluated on four different enviornments: HalfCheetah-Random-v0 and Hopper-Random-v0 contain datasets collected by a random policy, while HalfCheetah-Medium-v0 and Hopper-Medium-v0 contain datasets collected by a policy trained 1/3 of the way through. In all envs, CQL does better than BC by a significant margin (especially HalfCheetah-Random-v0).

env RLlib BC @1M RLlib CQL @1M
HalfCheetah-Random-v0 -320 3000
Hopper-Random-v0 290 320
HalfCheetah-Medium-v0 3450 3850
Hopper-Medium-v0 1000 2000

rllib train -f cql/halfcheetah-cql.yaml & rllib train -f cql/halfcheetah-bc.yaml

tensorboard

tensorboard

rllib train -f cql/hopper-cql.yaml & rllib train -f cql/hopper-bc.yaml

tensorboard

tensorboard

Transformers

rllib train -f vizdoom-attention/vizdoom-attention.yaml

RLlib's model catalog feature implements a variety of different models for the policy and value network, one of which supports using attention in RL. In particular, RLlib implements a Gated Transformer (Parisotta et al, 2019), abbreviated as GTrXL.

GTrXL is benchmarked in the Vizdoom environment, where the goal is to shoot a monster as quickly as possible. With PPO as the algorithm and GTrXL as the model, RLlib can successfuly solve the Vizdoom environment and reach human level performance.

env RLlib Transformer @2M
VizdoomBasic-v0 ~75

tensorboard

More Repositories

1

ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Python
30,993
star
2

llm-numbers

Numbers every LLM developer should know
3,845
star
3

ray-llm

RayLLM - LLMs on Ray
Python
1,029
star
4

kuberay

A toolkit to run Ray applications on Kubernetes
Go
861
star
5

tutorial

Jupyter Notebook
772
star
6

tune-sklearn

A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.
Python
464
star
7

llmperf

LLMPerf is a library for validating and benchmarking LLMs
Python
366
star
8

llmperf-leaderboard

358
star
9

ray-educational-materials

This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.
Jupyter Notebook
272
star
10

ray_lightning

Pytorch Lightning Distributed Accelerators using Ray
Python
204
star
11

langchain-ray

Examples on how to use LangChain and Ray
Python
202
star
12

xgboost_ray

Distributed XGBoost on Ray
Python
132
star
13

deltacat

A portable Pythonic Data Catalog API powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.
Python
97
star
14

rayfed

A multiple parties joint, distributed execution engine based on Ray, to help build your own federated learning frameworks in minutes.
Python
81
star
15

mobius

Mobius is an AI infrastructure platform for distributed online learning, including online sample processing, training and serving.
Java
78
star
16

plasma

A minimal shared memory object store design
C
40
star
17

enhancements

Tracking Ray Enhancement Proposals
40
star
18

lightgbm_ray

LightGBM on Ray
Python
40
star
19

ray_beam_runner

Ray-based Apache Beam runner
Python
37
star
20

mlflow-ray-serve

MLFlow Deployment Plugin for Ray Serve
Python
35
star
21

distml

Distributed ML Optimizer
Python
29
star
22

llms-in-prod-workshop-2023

Deploy and Scale LLM-based applications
Jupyter Notebook
23
star
23

ray-legacy

An experimental distributed execution engine
Python
21
star
24

ray_shuffling_data_loader

A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed training of machine learning models.
Python
18
star
25

pygloo

Pygloo provides Python bindings for Gloo.
C++
15
star
26

contrib-workflow-dag

Python
11
star
27

anyscale-berkeley-ai-hackathon

Ray and Anyscale for UC Berkeley AI Hackathon!
Jupyter Notebook
11
star
28

credis

C++
9
star
29

ray-acm-workshop-2023

Scalable/Distributed Computer Vision with Ray
Jupyter Notebook
9
star
30

spark-ray-example

A simple demonstration of embedding Ray in a Spark UDF. For Spark + AI Summit 2020.
Jupyter Notebook
8
star
31

community

Artifacts intended to support the Ray Developer Community: SIGs, RFC overviews, and governance. We're very glad you're here! ✨
8
star
32

llm-application

Jupyter Notebook
6
star
33

releaser

Python
5
star
34

scalable-learning

Scaling multi-node multi-GPU workloads
5
star
35

raynomics

Experimental genomics algorithms in Ray
Python
5
star
36

air-reference-arch

Jupyter Notebook
5
star
37

serve-movie-rec-demo

Python
5
star
38

maze-raylit

Hackathon 2020! Max Archit Zhe
Python
5
star
39

ray-serve-arize-observe

Building Real-Time Inference Pipelines with Ray Serve
Jupyter Notebook
5
star
40

anyscale-workshop-nyc-2023

Scalable NLP model fine-tuning and batch inference with Ray and Anyscale
Jupyter Notebook
5
star
41

kuberay-helm

Helm charts for the KubeRay project
Mustache
4
star
42

ray-saturday-dec-2022

Ray Saturday Dec 2022 edition
Jupyter Notebook
4
star
43

RFC

Community Documents
4
star
44

sandbox

Ray repository sandbox
Python
4
star
45

ray-demos

Collection of demos build with Ray
Jupyter Notebook
4
star
46

prototype_gpu_buffer

Python
3
star
47

arrow-build

Queue for building arrow
3
star
48

numbuf

Serializing primitive Python types in Arrow
C++
3
star
49

odsc-west-workshop-2023

Jupyter Notebook
3
star
50

2022_04_13_ray_serve_meetup_demo

Code samples for Ray Serve Meetup on 04/13/2022
Python
2
star
51

q4-2021-docs-hackathon

HTML
2
star
52

ray-scripts

Experimental scripts for deploying and using Ray
Shell
2
star
53

raytracer

Polymer WebUI for Ray
HTML
2
star
54

travis-tracker-v2

Python
2
star
55

scipy-ray-scalable-ml-tutorial-2023

Jupyter Notebook
2
star
56

rllib-contrib

Python
2
star
57

serve_workloads

Python
2
star
58

qcon-workshop-2023

Jupyter Notebook
2
star
59

travis-tracker

Dashboard for Tracking Travis Python Test Result.
TypeScript
1
star
60

common

Code that is shared between Ray projects
C
1
star
61

photon

A local scheduler and node manager for Ray
C
1
star
62

spmd_grid

Grid-style gang-scheduling and collective communication for Ray
Python
1
star
63

checkstyle_java

Python
1
star
64

raylibs

Libraries for Ray
1
star
65

issues-to-airtable

JavaScript
1
star
66

ray-docs-zh

Chinese translation of Ray documentation. This may not be update to date.
1
star
67

ray-project.github.io

The Ray project website
HTML
1
star
68

streaming

Streaming processing engine based on ray platform.
1
star
69

train-serve-primer

Jupyter Notebook
1
star
70

serve_config_examples

Python
1
star
71

Ray-Forward

Some resources about Ray Forward Meetup
1
star
72

ray-summit-2022

Website for Ray Summit 2022
HTML
1
star