• Stars
    star
    175
  • Rank 210,958 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 1 year ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Reinforcement Learning with Prior Data (RLPD)

alt text

This is code to accompany the paper "Efficient Online Reinforcement Learning with Offline Data", available here. This code can be readily adapted to work on any offline dataset.

Installation

conda create -n rlpd python=3.9 # If you use conda.
conda activate rlpd
conda install patchelf  # If you use conda.
pip install -r requirements.txt
conda deactivate
conda activate rlpd

Experiments

D4RL Locomotion

XLA_PYTHON_CLIENT_PREALLOCATE=false python train_finetuning.py --env_name=halfcheetah-expert-v0 \
                --utd_ratio=20 \
                --start_training 5000 \
                --max_steps 250000 \
                --config=configs/rlpd_config.py \
                --project_name=rlpd_locomotion

D4RL Antmaze

XLA_PYTHON_CLIENT_PREALLOCATE=false python train_finetuning.py --env_name=antmaze-umaze-v2 \
                --utd_ratio=20 \
                --start_training 5000 \
                --max_steps 300000 \
                --config=configs/rlpd_config.py \
                --config.backup_entropy=False \
                --config.hidden_dims="(256, 256, 256)" \
                --config.num_min_qs=1 \
                --project_name=rlpd_antmaze

Adroit Binary

First, download and unzip .npy files into ~/.datasets/awac-data/ from here.

Make sure you have mjrl installed:

git clone https://github.com/aravindr93/mjrl
cd mjrl
pip install -e .

Then, recursively clone mj_envs from this fork:

git clone --recursive https://github.com/philipjball/mj_envs.git

Then sync the submodules (add the --init flag if you didn't recursively clone):

$ cd mj_envs  
$ git submodule update --remote

Finally:

$ pip install -e .

Now you can run the following in this directory

XLA_PYTHON_CLIENT_PREALLOCATE=false python train_finetuning.py --env_name=pen-binary-v0 \
                --utd_ratio=20 \
                --start_training 5000 \
                --max_steps 1000000 \
                --config=configs/rlpd_config.py \
                --config.backup_entropy=False \
                --config.hidden_dims="(256, 256, 256)" \
                --project_name=rlpd_adroit

V-D4RL

These are pixel-based datasets for offline RL (paper here).

Download the 64px Main V-D4RL datsets into ~/.vd4rl here or here.

For instance, the Medium Cheetah Run .npz files should be in ~/.vd4rl/main/cheetah_run/medium/64px.

XLA_PYTHON_CLIENT_PREALLOCATE=false python train_finetuning_pixels.py --env_name=cheetah-run-v0 \
                --start_training 5000 \
                --max_steps 300000 \
                --config=configs/rlpd_pixels_config.py \
                --project_name=rlpd_vd4rl

More Repositories

1

pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
Python
3,465
star
2

pytorch-a3c

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".
Python
1,186
star
3

TensorFlow-VAE-GAN-DRAW

A collection of generative methods implemented with TensorFlow (Deep Convolutional Generative Adversarial Networks (DCGAN), Variational Autoencoder (VAE) and DRAW: A Recurrent Neural Network For Image Generation).
Python
595
star
4

jaxrl

JAX (Flax) implementation of algorithms for Deep Reinforcement Learning with continuous action spaces.
Jupyter Notebook
575
star
5

pytorch-flows

PyTorch implementations of algorithms for density estimation
Python
567
star
6

pytorch-trpo

PyTorch implementation of Trust Region Policy Optimization
Python
410
star
7

pytorch-meta-optimizer

A PyTorch implementation of Learning to learn by gradient descent by gradient descent
Python
308
star
8

pytorch-ddpg-naf

Implementation of algorithms for continuous control (DDPG and NAF).
Python
303
star
9

walk_in_the_park

Python
233
star
10

implicit_q_learning

Python
205
star
11

TensorFlow-Pointer-Networks

TensorFlow implementation of Pointer Networks
Python
205
star
12

pytorch-rl

58
star
13

jaxrl2

Jupyter Notebook
39
star
14

dmcgym

Python
24
star
15

linenplus

Flax extensions.
Python
5
star
16

gail-experts

4
star
17

cql-results

Python
3
star
18

gym_dmc

Python
2
star