• Stars
    star
    205
  • Rank 185,008 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Offline Reinforcement Learning with Implicit Q-Learning

This repository contains the official implementation of Offline Reinforcement Learning with Implicit Q-Learning by Ilya Kostrikov, Ashvin Nair, and Sergey Levine.

If you use this code for your research, please consider citing the paper:

@article{kostrikov2021iql,
    title={Offline Reinforcement Learning with Implicit Q-Learning},
    author={Ilya Kostrikov and Ashvin Nair and Sergey Levine},
    year={2021},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

For a PyTorch reimplementation see https://github.com/rail-berkeley/rlkit/tree/master/examples/iql

How to run the code

Install dependencies

pip install --upgrade pip

pip install -r requirements.txt

# Installs the wheel compatible with Cuda 11 and cudnn 8.
pip install --upgrade "jax[cuda]>=0.2.27" -f https://storage.googleapis.com/jax-releases/jax_releases.html

Also, see other configurations for CUDA here.

Run training

Locomotion

python train_offline.py --env_name=halfcheetah-medium-expert-v2 --config=configs/mujoco_config.py

AntMaze

python train_offline.py --env_name=antmaze-large-play-v0 --config=configs/antmaze_config.py --eval_episodes=100 --eval_interval=100000

Kitchen and Adroit

python train_offline.py --env_name=pen-human-v0 --config=configs/kitchen_config.py

Finetuning on AntMaze tasks

python train_finetune.py --env_name=antmaze-large-play-v0 --config=configs/antmaze_finetune_config.py --eval_episodes=100 --eval_interval=100000 --replay_buffer_size 2000000

Misc

The implementation is based on JAXRL.

More Repositories

1

pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
Python
3,465
star
2

pytorch-a3c

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".
Python
1,186
star
3

TensorFlow-VAE-GAN-DRAW

A collection of generative methods implemented with TensorFlow (Deep Convolutional Generative Adversarial Networks (DCGAN), Variational Autoencoder (VAE) and DRAW: A Recurrent Neural Network For Image Generation).
Python
595
star
4

jaxrl

JAX (Flax) implementation of algorithms for Deep Reinforcement Learning with continuous action spaces.
Jupyter Notebook
575
star
5

pytorch-flows

PyTorch implementations of algorithms for density estimation
Python
567
star
6

pytorch-trpo

PyTorch implementation of Trust Region Policy Optimization
Python
410
star
7

pytorch-meta-optimizer

A PyTorch implementation of Learning to learn by gradient descent by gradient descent
Python
308
star
8

pytorch-ddpg-naf

Implementation of algorithms for continuous control (DDPG and NAF).
Python
303
star
9

walk_in_the_park

Python
233
star
10

TensorFlow-Pointer-Networks

TensorFlow implementation of Pointer Networks
Python
205
star
11

rlpd

Python
175
star
12

pytorch-rl

58
star
13

jaxrl2

Jupyter Notebook
39
star
14

dmcgym

Python
24
star
15

linenplus

Flax extensions.
Python
5
star
16

gail-experts

4
star
17

cql-results

Python
3
star
18

gym_dmc

Python
2
star