• Stars
    star
    182
  • Rank 209,896 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created about 4 years ago
  • Updated almost 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PyTorch implementation of GAIL and AIRL based on PPO.

GAIL and AIRL in PyTorch

This is a PyTorch implementation of Generative Adversarial Imitation Learning(GAIL)[1] and Adversarial Inverse Reinforcement Learning(AIRL)[2] based on PPO[3]. I tried to make it easy for readers to understand the algorithm. Please let me know if you have any questions.

Setup

You can install Python liblaries using pip install -r requirements.txt. Note that you need a MuJoCo license. Please follow the instruction in mujoco-py for help.

Example

Train expert

You can train experts using Soft Actor-Critic(SAC)[4,5]. We set num_steps to 100000 for InvertedPendulum-v2 and 1000000 for Hopper-v3. Also, I've prepared the expert's weights here. Please use them if you're only interested in the experiments ahead.

python train_expert.py --cuda --env_id InvertedPendulum-v2 --num_steps 100000 --seed 0

Collect demonstrations

You need to collect demonstraions using trained expert's weight. Note that --std specifies the standard deviation of the gaussian noise add to the action, and --p_rand specifies the probability the expert acts randomly. We set std to 0.01 not to collect too similar trajectories.

python collect_demo.py \
    --cuda --env_id InvertedPendulum-v2 \
    --weight weights/InvertedPendulum-v2.pth \
    --buffer_size 1000000 --std 0.01 --p_rand 0.0 --seed 0

Mean returns of experts we use in the experiments are listed below.

Weight(Env) std p_rand Mean Return(without noise)
InvertedPendulum-v2.pth 0.01 0.0 1000(1000)
Hopper-v3.pth 0.01 0.0 2534(2791)

Train Imitation Learning

You can train IL using demonstrations. We set rollout_length to 2000 for InvertedPendulum-v2 and 50000 for Hopper-v3.

python train_imitation.py \
    --algo gail --cuda --env_id InvertedPendulum-v2 \
    --buffer buffers/InvertedPendulum-v2/size1000000_std0.01_prand0.0.pth \
    --num_steps 100000 --eval_interval 5000 --rollout_length 2000 --seed 0

References

[1] Ho, Jonathan, and Stefano Ermon. "Generative adversarial imitation learning." Advances in neural information processing systems. 2016.

[2] Fu, Justin, Katie Luo, and Sergey Levine. "Learning robust rewards with adversarial inverse reinforcement learning." arXiv preprint arXiv:1710.11248 (2017).

[3] Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).

[4] Haarnoja, Tuomas, et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." arXiv preprint arXiv:1801.01290 (2018).

[5] Haarnoja, Tuomas, et al. "Soft actor-critic algorithms and applications." arXiv preprint arXiv:1812.05905 (2018).

More Repositories

1

sac-discrete.pytorch

PyTorch implementation of SAC-Discrete.
Python
273
star
2

fqf-iqn-qrdqn.pytorch

PyTorch implementation of FQF, IQN and QR-DQN.
Python
158
star
3

soft-actor-critic.pytorch

PyTorch implementation of Soft Actor-Critic(SAC).
Python
94
star
4

rljax

A collection of RL algorithms written in JAX.
Python
92
star
5

slac.pytorch

PyTorch implementation of Stochastic Latent Actor-Critic(SLAC).
Python
87
star
6

discor.pytorch

PyTorch implementation of Distribution Correction(DisCor) based on Soft Actor-Critic.
Python
38
star
7

alfred-aws-icons

Alfred Workflow for quickly pasting AWS architecture icons onto PowerPoint.
Go
27
star
8

rltorch

A simple framework for distributed reinforcement learning in PyTorch.
Python
16
star
9

vae.pytorch

PyTorch Implementation of Deep Feature Consistent Variational Autoencoder.
Python
12
star
10

simple-rl.pytorch

Simple implementation of model-free RL algorithms written in PyTorch.
Python
9
star
11

wappo.pytorch

PyTorch implementation of Wasserstein Adversarial Proximal Policy Optimization(WAPPO).
Python
6
star
12

slac-discrete.pytorch

PyTorch implementation of Stochastic Latent Actor-Critic(SLAC) extended for discrete action settings.
Python
2
star
13

gec-app

This project contains frontend/backend application code and infrastructure for grammatical error correction.
Python
2
star
14

dmm-schedule-checker

DMM schedule checker continuously monitors the schedule of your favorite teachers, and notifies via LINE whenever new slots are available.
Go
1
star
15

sagemaker-tutorial

Amazon SageMaker tutorial
Jupyter Notebook
1
star
16

ssm-enforcement-tool

This project contains a set of infrastructure implemented in Terraform to monitor your "not-managed-by-SSM" instances accross all regions.
Go
1
star