PyTorch implementation of reinforcement learning algorithms
This repository contains:
- policy gradient methods (TRPO, PPO, A2C)
- Generative Adversarial Imitation Learning (GAIL)
Important notes
- The code now works for PyTorch 0.4. For PyTorch 0.3, please check out the 0.3 branch.
- To run mujoco environments, first install mujoco-py and gym.
- If you have a GPU, I recommend setting the OMP_NUM_THREADS to 1 (PyTorch will create additional threads when performing computations which can damage the performance of multiprocessing. This problem is most serious with Linux, where multiprocessing can be even slower than a single thread):
export OMP_NUM_THREADS=1
Features
- Support discrete and continous action space.
- Support multiprocessing for agent to collect samples in multiple environments simultaneously. (x8 faster than single thread)
- Fast Fisher vector product calculation. For this part, Ankur kindly wrote a blog explaining the implementation details.
Policy gradient methods
- Trust Region Policy Optimization (TRPO) -> examples/trpo_gym.py
- Proximal Policy Optimization (PPO) -> examples/ppo_gym.py
- Synchronous A3C (A2C) -> examples/a2c_gym.py
Example
- python examples/ppo_gym.py --env-name Hopper-v2
Reference
Generative Adversarial Imitation Learning (GAIL)
To save trajectory
- python gail/save_expert_traj.py --model-path assets/learned_models/Hopper-v2_ppo.p
To do imitation learning
- python gail/gail_gym.py --env-name Hopper-v2 --expert-traj-path assets/expert_traj/Hopper-v2_expert_traj.p