Clean, Robust, and Unified implementation of classical Deep Reinforcement Learning Algorithms
Link of my code:
- Q-learning:
- DQN/DDQN on Classic Control:
- DQN/DDQN on Atari Game:
- Prioritized DQN/DDQN on Classic Control:
- Proximal Policy Optimization(PPO) for Discrete Action Space:
- Proximal Policy Optimization(PPO) for Continuous Action Space:
- Deep Deternimistic Policy Gradient(DDPG):
- Twin Delayed Deep Deterministic Policy Gradient(TD3):
- Soft Actor Critic(SAC) for Discrete Action Space:
- Soft Actor Critic(SAC) for Continuous Action Space:
- Actor-Sharer-Learner(ASL):
Recommended Resources for DRL
Books:
- 《Reinforcement learning: An introduction》--Richard S. Sutton
- 《深度学习入门:基于Python的理论与实现》--斋藤康毅
Online Courses:
- RL Courses(bilibili)--李宏毅(Hongyi Li)
- RL Courses(Youtube)--李宏毅(Hongyi Li)
- UCL Course on RL--David Silver
- 动手强化学习--上海交通大学
Blogs:
- OpenAI Spinning Up
- Policy Gradient Theorem --Cangxi
- Policy Gradient Algorithms --Lilian
- Theorem of PPO
- The 37 Implementation Details of Proximal Policy Optimization
- Prioritized Experience Replay
- Soft Actor Critic
- A (Long) Peek into Reinforcement Learning --Lilian
- Introduction to TD3
Simulation Environments:
Important Papers
Training Curves of my Code:
Q-learning:
DQN/DDQN on Classic Control:
DQN/DDQN on Atari Game:
Pong | Enduro |
---|---|
Prioritized DQN/DDQN on Classic Control:
CartPole | LunarLander |
---|---|
PPO Discrete:
PPO Continuous:
DDPG:
Pendulum | LunarLanderContinuous |
---|---|