A Pragmatic Look at Deep Imitation Learning

Imitation learning algorithms (with PPO [1]):

AIRL [2]
BC [3]
DRIL [4] (without BC)
FAIRL [5]
GAIL [6]
GMMIL [7] (including an optional self-similarity term [8])
nn-PUGAIL [9]
RED [10]

Options include:

State-only imitation learning: state-only: true/false
R1 gradient regularisation [11]: r1-reg-coeff: 0.5

Requirements

Requirements can be installed with:

pip install -r requirements.txt

Notable required packages are PyTorch, OpenAI Gym, D4RL-PyBullet and Hydra. Ax and the Hydra Ax sweeper plugin are required for hyperparameter optimisation; if unneeded they can be removed from requirements.txt.

Run

The training of each imitation learning algorithm can be started with:

python main.py algorithm=ALG/ENV

python main.py algorithm=AIRL/hopper

Hyperparameters can be found in conf/config.yaml and conf/algorithm/ALG/ENV.yaml, with the latter containing algorithm- and environment-specific hyperparameters that were tuned with Ax.

Results will be saved in outputs/ENV_ALGO/m-d_H-M-S with the last subfolder indicating the current datetime.

Hyperparameter optimisation

Hyperparameter optimisation can be run by adding -m hydra/sweeper=ax hyperparam_opt=ALG, for example:

python main.py -m algorithm=AIRL/hopper hydra/sweeper=ax hyperparam_opt=AIRL

hyperparam_opt specifies the hyperparameter search space.

Seed sweep

A seed sweep can be performed as follows:

python main.py -m algorithm=AIRL/hopper seed=1,2,3,4,5

or via the existing bash script:

./scripts/run_seed_experiments.sh ALG ENV

The results will be available in ./output/seed_sweeper_ENV_ALG folder (note that running this code twice will overwrite the previous results).

Results

Acknowledgements

@ikostrikov for https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail

Citation

If you find this work useful and would like to cite it, the following would be appropriate:

@article{arulkumaran2021pragmatic,
  author = {Arulkumaran, Kai and Ogawa Lillrank, Dan},
  title = {A Pragmatic Look at Deep Imitation Learning},
  journal={arXiv preprint arXiv:2108.01867},
  year = {2021}
}

References

[1] Proximal Policy Optimization Algorithms
[2] Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
[3] Efficient Training of Artificial Neural Networks for Autonomous Navigation
[4] Disagreement-Regularized Imitation Learning
[5] A Divergence Minimization Perspective on Imitation Learning Methods
[6] Generative Adversarial Imitation Learning
[7] Imitation Learning via Kernel Mean Embedding
[8] A Pragmatic Look at Deep Imitation Learning
[9] Positive-Unlabeled Reward Learning
[10] Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation
[11] Which Training Methods for GANs do actually Converge?

Kaixhin/imitation-learning

Kaixhin

Reviews

Repository Details