A Pragmatic Look at Deep Imitation Learning
Imitation learning algorithms (with PPO [1]):
- AIRL [2]
- BC [3]
- DRIL [4] (without BC)
- FAIRL [5]
- GAIL [6]
- GMMIL [7] (including an optional self-similarity term [8])
- nn-PUGAIL [9]
- RED [10]
Options include:
- State-only imitation learning:
state-only: true/false
- R1 gradient regularisation [11]:
r1-reg-coeff: 0.5
Requirements
Requirements can be installed with:
pip install -r requirements.txt
Notable required packages are PyTorch, OpenAI Gym, D4RL-PyBullet and Hydra. Ax and the Hydra Ax sweeper plugin are required for hyperparameter optimisation; if unneeded they can be removed from requirements.txt
.
Run
The training of each imitation learning algorithm can be started with:
python main.py algorithm=ALG/ENV
where ALG
is one of [AIRL|BC|DRIL|FAIRL|GAIL|GMMIL|PUGAIL|RED]
and ENV
is one of [ant|halfcheetah|hopper|walker2d]
. For example:
python main.py algorithm=AIRL/hopper
Hyperparameters can be found in conf/config.yaml
and conf/algorithm/ALG/ENV.yaml
, with the latter containing algorithm- and environment-specific hyperparameters that were tuned with Ax.
Results will be saved in outputs/ENV_ALGO/m-d_H-M-S
with the last subfolder indicating the current datetime.
Hyperparameter optimisation
Hyperparameter optimisation can be run by adding -m hydra/sweeper=ax hyperparam_opt=ALG
, for example:
python main.py -m algorithm=AIRL/hopper hydra/sweeper=ax hyperparam_opt=AIRL
hyperparam_opt
specifies the hyperparameter search space.
Seed sweep
A seed sweep can be performed as follows:
python main.py -m algorithm=AIRL/hopper seed=1,2,3,4,5
or via the existing bash script:
./scripts/run_seed_experiments.sh ALG ENV
The results will be available in ./output/seed_sweeper_ENV_ALG
folder (note that running this code twice will overwrite the previous results).
Results
Acknowledgements
Citation
If you find this work useful and would like to cite it, the following would be appropriate:
@article{arulkumaran2021pragmatic,
author = {Arulkumaran, Kai and Ogawa Lillrank, Dan},
title = {A Pragmatic Look at Deep Imitation Learning},
journal={arXiv preprint arXiv:2108.01867},
year = {2021}
}
References
[1] Proximal Policy Optimization Algorithms
[2] Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
[3] Efficient Training of Artificial Neural Networks for Autonomous Navigation
[4] Disagreement-Regularized Imitation Learning
[5] A Divergence Minimization Perspective on Imitation Learning Methods
[6] Generative Adversarial Imitation Learning
[7] Imitation Learning via Kernel Mean Embedding
[8] A Pragmatic Look at Deep Imitation Learning
[9] Positive-Unlabeled Reward Learning
[10] Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation
[11] Which Training Methods for GANs do actually Converge?