PyTorch Reinforcement Learning
This repo contains tutorials covering reinforcement learning using PyTorch 1.3 and Gym 0.15.4 using Python 3.7.
If you find any mistakes or disagree with any of the explanations, please do not hesitate to submit an issue. I welcome any feedback, positive or negative!
Getting Started
To install PyTorch, see installation instructions on the PyTorch website.
To install Gym, see installation instructions on the Gym GitHub repo.
Tutorials
All tutorials use Monte Carlo methods to train the CartPole-v1 environment with the goal of reaching a total episode reward of 475 averaged over the last 25 episodes. There are also alternate versions of some algorithms to show how to use those algorithms with other environments.
-
1 - Vanilla Policy Gradient (REINFORCE)
This tutorial covers the workflow of a reinforcement learning project. We'll learn how to: create an environment, initialize a model to act as our policy, create a state/action/reward loop and update our policy. We update our policy with the vanilla policy gradient algorithm, also known as REINFORCE.
-
2 - Actor Critic
This tutorial introduces the family of actor-critic algorithms, which we will use for the next few tutorials.
-
3 - Advantage Actor Critic (A2C)
We cover an improvement to the actor-critic framework, the A2C (advantage actor-critic) algorithm.
-
4 - Generalized Advantage Estimation (GAE)
We improve on A2C by adding GAE (generalized advantage estimation).
-
5 - Proximal Policy Evaluation
We cover another improvement on A2C, PPO (proximal policy optimization).
Potential algorithms covered in future tutorials: DQN, ACER, ACKTR.
References
- 'Reinforcement Learning: An Introduction' - http://incompleteideas.net/sutton/book/the-book-2nd.html
- 'Algorithms for Reinforcement Learning' - https://sites.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf
- List of key papers in deep reinforcement learning - https://spinningup.openai.com/en/latest/spinningup/keypapers.html