• Stars
    star
    1,294
  • Rank 36,356 (Top 0.8 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 6 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Clean PyTorch implementations of imitation and reward learning algorithms

CircleCI Documentation Status codecov PyPI version

Imitation Learning Baseline Implementations

This project aims to provide clean implementations of imitation and reward learning algorithms. Currently, we have implementations of the algorithms below. 'Discrete' and 'Continous' stands for whether the algorithm supports discrete or continuous action/state spaces respectively.

Algorithm (+ link to paper) API Docs Discrete Continuous
Behavioral Cloning algorithms.bc βœ… βœ…
DAgger algorithms.dagger βœ… βœ…
Density-Based Reward Modeling algorithms.density βœ… βœ…
Maximum Causal Entropy Inverse Reinforcement Learning algorithms.mce_irl βœ… ❌
Adversarial Inverse Reinforcement Learning algoritms.airl βœ… βœ…
Generative Adversarial Imitation Learning algorithms.gail βœ… βœ…
Deep RL from Human Preferences algorithms.preference_comparisons βœ… βœ…

You can find the documentation here.

Installation

Prerequisites

  • Python 3.8+
  • (Optional) OpenGL (to render Gym environments)
  • (Optional) FFmpeg (to encode videos of renders)
  • (Optional) MuJoCo (follow instructions to install mujoco_py v1.5 here)

Installing PyPI release

Installing the PyPI release is the standard way to use imitation, and the recommended way for most users.

pip install imitation

Install from source

If you like, you can install imitation from source to contribute to the project or access the very last features before a stable release. You can do this by cloning the GitHub repository and running the installer directly. First run: git clone http://github.com/HumanCompatibleAI/imitation && cd imitation.

For development mode, then run:

pip install -e ".[dev]"

This will run setup.py in development mode, and install the additional dependencies required for development. For regular use, run instead

pip install .

Additional extras are available depending on your needs. Namely, tests for running the test suite, docs for building the documentation, parallel for parallelizing the training, and atari for including atari environments. The dev extra already installs the tests, docs, and atari dependencies automatically, and tests installs the atari dependencies.

For macOS users, some packages are required to run experiments (see ./experiments/README.md for details). First, install Homebrew if not available (see Homebrew). Then, run:

brew install coreutils gnu-getopt parallel

CLI Quickstart

We provide several CLI scripts as a front-end to the algorithms implemented in imitation. These use Sacred for configuration and replicability.

From examples/quickstart.sh:

# Train PPO agent on pendulum and collect expert demonstrations. Tensorboard logs saved in quickstart/rl/
python -m imitation.scripts.train_rl with pendulum environment.fast policy_evaluation.fast rl.fast fast logging.log_dir=quickstart/rl/

# Train GAIL from demonstrations. Tensorboard logs saved in output/ (default log directory).
python -m imitation.scripts.train_adversarial gail with pendulum environment.fast demonstrations.fast policy_evaluation.fast rl.fast fast demonstrations.rollout_path=quickstart/rl/rollouts/final.npz

# Train AIRL from demonstrations. Tensorboard logs saved in output/ (default log directory).
python -m imitation.scripts.train_adversarial airl with pendulum environment.fast demonstrations.fast policy_evaluation.fast rl.fast fast demonstrations.rollout_path=quickstart/rl/rollouts/final.npz

Tips:

  • Remove the "fast" options from the commands above to allow training run to completion.
  • python -m imitation.scripts.train_rl print_config will list Sacred script options. These configuration options are documented in each script's docstrings.

For more information on how to configure Sacred CLI options, see the Sacred docs.

Python Interface Quickstart

See examples/quickstart.py for an example script that loads CartPole-v1 demonstrations and trains BC, GAIL, and AIRL models on that data.

Density reward baseline

We also implement a density-based reward baseline. You can find an example notebook here.

Citations (BibTeX)

@misc{gleave2022imitation,
  author = {Gleave, Adam and Taufeeque, Mohammad and Rocamonde, Juan and Jenner, Erik and Wang, Steven H. and Toyer, Sam and Ernestus, Maximilian and Belrose, Nora and Emmons, Scott and Russell, Stuart},
  title = {imitation: Clean Imitation Learning Implementations},
  year = {2022},
  howPublished = {arXiv:2211.11972v1 [cs.LG]},
  archivePrefix = {arXiv},
  eprint = {2211.11972},
  primaryClass = {cs.LG},
  url = {https://arxiv.org/abs/2211.11972},
}

Contributing

See Contributing to imitation for more information.

More Repositories

1

overcooked_ai

A benchmark environment for fully cooperative human-AI performance.
Jupyter Notebook
706
star
2

adversarial-policies

Find best-response to a fixed policy in multi-agent RL
Python
272
star
3

human_aware_rl

Code for "On the Utility of Learning about Humans for Human-AI Coordination"
Python
107
star
4

evaluating-rewards

Library to compare and evaluate reward functions
Python
61
star
5

overcooked-demo

Web application where humans can play Overcooked with AI agents.
JavaScript
55
star
6

seals

Benchmark environments for reward modelling and imitation learning algorithms.
Python
44
star
7

rlsp

Reward Learning by Simulating the Past
Python
43
star
8

tensor-trust

A prompt injection game to collect data for robust ML research
Python
40
star
9

eirli

An Empirical Investigation of Representation Learning for Imitation (EIRLI), NeurIPS'21
Python
36
star
10

tensor-trust-data

Dataset for the Tensor Trust project
Jupyter Notebook
31
star
11

go_attack

Python
31
star
12

ranking-challenge

Testing ranking algorithms to improve social cohesion
Python
27
star
13

atari-irl

Python
26
star
14

deep-rlsp

Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.
Python
26
star
15

population-irl

(Experimental) Inverse reinforcement learning from trajectories generated by multiple agents with different (but correlated) rewards
Python
25
star
16

learning_biases

Infer how suboptimal agents are suboptimal while planning, for example if they are hyperbolic time discounters.
Jupyter Notebook
22
star
17

human_ai_robustness

Python
21
star
18

learning-from-human-preferences

Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"
Python
21
star
19

overcooked-hAI-exp

Overcooked-AI Experiment Psiturk Demo (for MTurk experiments)
JavaScript
12
star
20

leela-interp

Code for "Evidence of Learned Look-Ahead in a Chess-Playing Neural Network"
Jupyter Notebook
11
star
21

better-adversarial-defenses

Training in bursts for defending against adversarial policies
Python
11
star
22

interpreting-rewards

Experiments in applying interpretability techniques to learned reward functions.
Jupyter Notebook
9
star
23

nn-clustering-pytorch

Checking the divisibility of neural networks, and investigating the nature of the pieces networks can be divided into.
Python
6
star
24

reward-preprocessing

Preprocessing reward functions to make them more interpretable
Python
5
star
25

recon-email

Script for automatically creating the reconnaissance email.
HTML
5
star
26

assistance-games

Supporting code for Assistance Games as a Framework paper
Python
3
star
27

KataGo-custom

Child repository of https://github.com/HumanCompatibleAI/go_attack.
C++
3
star
28

reducing-exploitability

Python
3
star
29

KataGoVisualizer

Jupyter Notebook
2
star
30

multi-agent

Python
2
star
31

derail

Supporting code for diagnostic seals paper
Python
2
star
32

epic

Implements the Equivalent-Policy Invariant Comparison (EPIC) distance for reward functions.
Python
1
star
33

cs294-149-fa18-notes

LaTeX Notes from the Fall 2018 version of CS294-149: AGI Safety and Control
TeX
1
star
34

simulation-awareness

(experimental) RL agents should be more aligned if they do not know whether they are in simulation or in the real world
Python
1
star
35

logical-active-classification

Use active learning to classify data represented as boundaries of regions in parameter space where a parametrised logical formula holds.
Python
1
star
36

reward-function-interpretability

Jupyter Notebook
1
star