awesome-reinforcement-learning
强化学习的相关学习资源、链接。
本仓库含有强化学习最基础的实现,和部分强化学习经典书籍资源。
- 强化学习各个代码实现,相对于网络上其它实现最大的优势在于算法实现的框架遵循最简洁的原则,不同算法之间的实现差异只体现在算法最核心的不同点。能够非常快速地帮助初学者了解各个算法之间的差异。
网址教程资源
教程网址
Book
- Hands-On Reinforcement Learning With Python
- Reinforcement Learning: Theory and Python Implementation
- An Introduction to Deep Reinforcement Learning
- Foundations and Trends® in Machine Learning
- REINFORCEMENT LEARNING AND OPTIMAL CONTROL
Video Course
- CS 294: Deep Reinforcement Learning;
- David Silver's course;
- John Schulmann's lectures
- Deep RL Bootcamp
- CS 287: Advanced Robotics, Fall 2015
- CS234: Reinforcement Learning Winter 2019
- Deep Learning (DLSS) and Reinforcement Learning (RLSS) Summer School, Montreal 2017
- Advanced Deep Learning and Reinforcement Learning
- 强化学习教程(莫烦)
博客网址
- Play pong with deep reinforcement learning based on pixel
- Deep Learning in a Nutshell: Reinforcement Learning
- AlphaGo
领域专家
- 加州大学伯克利分校机器人学专家 Sergey Levine
- 前百度首席科学家 Andrew Ng
- 加拿大阿尔伯塔大学著名增强学习大师Richard S. Sutton 教授
- Google DeepMind AlphaGo项目的主程序员 David Silver 博士
- 机器博弈专家Tuomas Sandholm教授
Awesome
- Reinforcement learning resources curated
- Awesome Reinforcement Learning(RL) for Natural Language Processing(NLP))
- Paper list of multi-agent reinforcement learning (MARL) )
- A list of recent papers regarding deep reinforcement learning
- TensorFlow implementation of Deep Reinforcement Learning papers
- Deep Reinforcement Learning Papers
- Reinforcement learning resources curated
- This project is for learning and researching on Deep RL. Maintained by University AI researchers
- 强化学习从入门到放弃的资料
- Reinforcement Learning Notebooks
- Deep Reinforcement Learning(深度强化学习)
Algorithm Repos
- rllab
- Baseline
- Stable Baselines
- keras-rl
- BURLAP
- PyBrain
- RLPy
- A Matlab Toolbox for Approximate RL and DP
强化学习实战资源
Implementation of Algorithms
- Pytorch Implementation of DQN / DDQN / Prioritized replay/ noisy networks/ distributional values/ Rainbow/ hierarchical RL
- PyTorch implementations of various DRL algorithms for both single agent and multi-agent
- Deep Reinforcement Learning for Keras
- PyTorch 实现 DQN, AC, A2C, A3C, , Policy Gradient, DDPG, TRPO, PPO, ACER
- Deep Reinforcement learning framework
- Codes for understanding Reinforcement Learning( updating... )
- Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch
- Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course
- Repo for the Deep Reinforcement Learning Nanodegree program
- 教程 | 如何在Unity环境中用强化学习训练Donkey Car
- 深入浅出解读"多巴胺(Dopamine)论文"、环境配置和实例分析
Project
- DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills
- StarCraft II - pysc2 Deep Reinforcement Learning Examples
- An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
- Using reinforcement learning to teach a car to avoid obstacles
- A reinforcement learning algorithm for the 2048 game
论文
- DQN-arxiv (Deep Q-Networks ): Mnih et al, 2013
- DQN-nature(Deep Q-Network ); Mnih et al, 2015
- Double DQN (Double Q Network) : Hasselt et al, 2015
- Dueling DQN (Duling Q Network) : Ziyu Wang et al, 2015
- QR-DQN (Quantile Regression DQN): Dabney et al, 2017
- Alpha Go(Mastering the game of Go with deep neural networks and tree search)
- AlphaZero-arxiv (Mastering Chess and Shogi by Self-Play) :Silver et al, 2017
- AlphaZero-nature (Go without human knowledge) :Silver et al, 2017
- SAC (Off-Policy Maximum Entropy): Haarnoja et al, 2018
- SAC (Algorithms and Applications) : Haarnoja, et al 2018
- A2C / A3C (Asynchronous Advantage Actor-Critic): Mnih et al, 2016
- PPO (Proximal Policy Optimization): Schulman et al, 2017
- TRPO (Trust Region Policy Optimization): Schulman et al, 2015
- DPG (Deterministic Policy Gradient) : DavidSilver et al, 2014
- DDPG (Deep Deterministic Policy Gradient): Lillicrap et al, 2015
- TD3 (Twin Delayed DDPG): Fujimoto et al, 2018
- NAF (Normalized adantage functions) : ShixiangGu et al, 2016
- C51 (Categorical 51-Atom DQN): Bellemare et al, 2017
- HER (Hindsight Experience Replay): Andrychowicz et al, 2017
- World Models Ha and Schmidhuber, 2018
- I2A (Imagination-Augmented Agents): Weber et al, 2017
- MBMF (Model-Based RL with Model-Free Fine-Tuning): Nagabandi et al, 2017
- MBVE (Model-Based Value Expansion): Feinberg et al, 2018
- PathNet(Evolution Channels Gradient Descent): Fernando et al, 2017
- plannet(Learning Latent Dynamics) : Hafner, et al, 2018
- TCN (Time-Contrastive Networks):Sermanet, et al, 2017
- Reinforcement and Imitation Learning : Yuke Zhu†, et al 2018
- Prioritized experience replay:Schaul, et al 2015
- Policy distillation : Rusu, et al 2015
- Unifying Count-Based Exploration and Intrinsic Motivation : Bellemare, et al 2015
- Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models : Stadie, et al 2015
- Action-Conditional Video Prediction using Deep Networks in Atari Games : JunhyukOh, et al 2015
- Control of Memory, Active Perception, and Action in Minecraft : JunhyukOh, et al 2015