XinJingHao/DRL-Pytorch

Stars
1,151
Rank 40,523 (Top 0.8 %)
Language
Python
Created almost 3 years ago
Updated 4 months ago

XinJingHao/DRL-Pytorch

XinJingHao

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Clean, Robust, and Unified PyTorch implementation of popular Deep Reinforcement Learning (DRL) algorithms (Q-learning, Duel DDQN, PER, C51, Noisy DQN, PPO, DDPG, TD3, SAC, ASL)

Clean, Robust, and Unified implementation of classical Deep Reinforcement Learning Algorithms

Link of my code:

Recommended Resources for DRL

Books：

《Reinforcement learning: An introduction》--Richard S. Sutton
《深度学习入门：基于Python的理论与实现》--斋藤康毅

Online Courses:

RL Courses(bilibili)--李宏毅(Hongyi Li)
RL Courses(Youtube)--李宏毅(Hongyi Li)
UCL Course on RL--David Silver
动手强化学习--上海交通大学

Blogs:

Simulation Environments:

Important Papers

DQN: Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. nature, 2015, 518(7540): 529-533.

Double DQN: Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI conference on artificial intelligence. 2016, 30(1).

PER: Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay[J]. arXiv preprint arXiv:1511.05952, 2015.

PPO: Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv:1707.06347, 2017.

DDPG: Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015.

TD3: Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods[C]//International conference on machine learning. PMLR, 2018: 1587-1596.

SAC: Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International conference on machine learning. PMLR, 2018: 1861-1870.

ASL: Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity

Training Curves of my Code:

Q-learning:

DQN/DDQN on Classic Control:

DQN/DDQN on Atari Game:

Pong	Enduro

Prioritized DQN/DDQN on Classic Control:

CartPole	LunarLander

PPO Discrete:

PPO Continuous:

DDPG:

Pendulum	LunarLanderContinuous

TD3:

SAC Continuous:

SAC Discrete:

Actor-Sharer-Learner (ASL):

PPO-Continuous-Pytorch

A clean and robust Pytorch implementation of PPO on continuous action space.

TD3-BipedalWalkerHardcore-v2

Solve BipedalWalkerHardcore-v2 with TD3

PPO-Discrete-Pytorch

A clean and robust Pytorch implementation of PPO on Discrete action space

SAC-Continuous-Pytorch

a clean and robust Pytorch implementation of SAC on continuous action space

SAC-Discrete-Pytorch

A clean and robust Pytorch implementation of SAC on discrete action space

Duel-Double-DQN-Pytorch

A clean and robust implementation of Duel Double DQN

OkayPlan

OkayPlan: A real-time global path palnning algorithm for dynamic environments

TD3-Pytorch

A clean and robust Pytorch implementation of TD3 on continuous action space

Actor-Sharer-Learner

Actor-Sharer-Learner training framework for off-policy DRL algorithms

Prioritized-Experience-Replay-DDQN-Pytorch

A clean and robust implementation of Prioritized DQN and Prioritized Double DQN

Sparrow-V0

A Reinforcement Learning Friendly Simulator for Mobile Robot

okayplan_ros

Real-time global path planning algorithm for dynamic environments

DDPG-Pytorch

A clean Pytorch implementation of DDPG on continuous action space.

Real-time-Path-planning-with-SEPSO

Efficient Real-time Path Planning with SEPSO in Dynamic Scenarios

Color

Color: Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity

Noisy-Duel-DDQN-Atari-Pytorch

A clean and robust implementation of Noisy-Duel-DDQN on Atari games

Sparrow-V1

A Reinforcement Learning Friendly Simulator for Mobile Robot

Q-learning

An implementation of Q-learning

C51-Categorical-DQN-Pytorch

A clean and robust Pytorch implementation of Categorical DQN (C51)