Awesome Deep Reinforcement Learning
July 2022 update: EDDICT added
Mar 2022 update: a few papers released in early 2022
Dec 2021 update: Unsupervised RL
Introduction to awesome drl
Reinforcement learning is the fundamental framework for building AGI. Therefore we share important contributions within this awesome drl project.
Landscape of Deep RL
Content
- Awesome Deep Reinforcement Learning
- Introduction to awesome drl
- Landscape of Deep RL
- Content
- General guidances
- 2022
- Foundations and theory
- General benchmark frameworks
- Unsupervised
- Offline
- Value based
- Policy gradient
- Explorations
- Actor-Critic
- Model-based
- Model-free + Model-based
- Hierarchical
- Option
- Connection with other methods
- Connecting value and policy methods
- Reward design
- Unifying
- Faster DRL
- Multi-agent
- New design
- Multitask
- Observational Learning
- Meta Learning
- Distributional
- Planning
- Safety
- Inverse RL
- No reward RL
- Time
- Adversarial learning
- Use Natural Language
- Generative and contrastive representation learning
- Belief
- PAC
- Applications
Illustrations:
Recommendations and suggestions are welcome.
General guidances
- Awesome Offline RL
- Reinforcement Learning Today
- Multiagent Reinforcement Learning by Marc Lanctot RLSS @ Lille 11 July 2019
- RLDM 2019 Notes by David Abel 11 July 2019
- A Survey of Reinforcement Learning Informed by Natural Language 10 Jun 2019 arxiv
- Challenges of Real-World Reinforcement Learning 29 Apr 2019 arxiv
- Ray Interference: a Source of Plateaus in Deep Reinforcement Learning 25 Apr 2019 arxiv
- Principles of Deep RL by David Silver
- University AI's General introduction to deep rl (in Chinese)
- OpenAI's spinningup
- The Promise of Hierarchical Reinforcement Learning 9 Mar 2019
- Deep Reinforcement Learning that Matters 30 Jan 2019 arxiv
2022
Foundations and theory
- General non-linear Bellman equations 9 July 2019 arxiv
- Monte Carlo Gradient Estimation in Machine Learning 25 Jun 2019 arxiv
General benchmark frameworks
- Android-Env
- MuJoCo | MuJoCo Chinese version
- Unsupervised RL Benchmark
- Dataset for Offline RL
- Spriteworld: a flexible, configurable python-based reinforcement learning environment
- Chainerrl Visualizer
- Behaviour Suite for Reinforcement Learning 13 Aug 2019 arxiv | code
- Quantifying Generalization in Reinforcement Learning 20 Dec 2018 arxiv
- S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning 25 Sept 2018
- dopamine
- StarCraft II
- tfrl
- chainerrl
- PARL
- DI-engine: a generalized decision intelligence engine. It supports various Deep RL algorithms
- PPO x Family: Course in Chinese for Deep RL
Unsupervised
- URLB: Unsupervised Reinforcement Learning Benchmark 28 Oct 2021
- APS: Active Pretraining with Successor Feature 31 Aug 2021
- Behavior From the Void: Unsupervised Active Pre-Training 8 Mar 2021
- Reinforcement Learning with Prototypical Representations 22 Feb 2021
- Efficient Exploration via State Marginal Matching 12 Jun 2019
- Self-Supervised Exploration via Disagreement 10 Jun 2019
- Exploration by Random Network Distillation 30 Oct 2018
- Diversity is All You Need: Learning Skills without a Reward Function 16 Feb 2018
- Curiosity-driven Exploration by Self-supervised Prediction 15 May 2017
Offline
- PerSim: Data-efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators 10 Nov 2021
- A General Offline Reinforcement Learning Framework for Interactive Recommendation AAAI 2021
Value based
- Harnessing Structures for Value-Based Planning and Reinforcement Learning 5 Feb 2020 arxiv | code
- Recurrent Value Functions 23 May 2019 arxiv
- Stochastic Lipschitz Q-Learning 24 Apr 2019 arxiv
- TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning 8 Mar 2018
- DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY 2 Mar 2018
- Rainbow: Combining Improvements in Deep Reinforcement Learning 6 Oct 2017
- Learning from Demonstrations for Real World Reinforcement Learning 12 Apr 2017
- Dueling Network Architecture
- Double DQN
- Prioritized Experience
- Deep Q-Networks
Policy gradient
- Phasic Policy Gradient 9 Sep 2020 arxiv code
- An operator view of policy gradient methods 22 Jun 2020 arxiv
- Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces 14 Jun 2019 arxiv
- Policy Gradient Search: Online Planning and Expert Iteration without Search Trees 7 Apr 2019 arxiv
- SUPERVISED POLICY UPDATE FOR DEEP REINFORCEMENT LEARNING 24 Dec 2018 arxiv
- PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation 5 Oct 2018 arxiv
- Clipped Action Policy Gradient 22 June 2018
- Expected Policy Gradients for Reinforcement Learning 10 Jan 2018
- Proximal Policy Optimization Algorithms 20 July 2017
- Emergence of Locomotion Behaviours in Rich Environments 7 July 2017
- Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning 1 Jun 2017
- Equivalence Between Policy Gradients and Soft Q-Learning
- Trust Region Policy Optimization
- Reinforcement Learning with Deep Energy-Based Policies
- Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY CRITIC
Explorations
- Entropic Desired Dynamics for Intrinsic Control 2021 openreview
- Self-Supervised Exploration via Disagreement 10 Jun 2019 arxiv
- Approximate Exploration through State Abstraction 24 Jan 2019
- The Uncertainty Bellman Equation and Exploration 15 Sep 2017
- Noisy Networks for Exploration 30 Jun 2017 implementation
- Count-Based Exploration in Feature Space for Reinforcement Learning 25 Jun 2017
- Count-Based Exploration with Neural Density Models 14 Jun 2017
- UCB and InfoGain Exploration via Q-Ensembles 11 Jun 2017
- Minimax Regret Bounds for Reinforcement Learning 16 Mar 2017
- Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
- EX2: Exploration with Exemplar Models for Deep Reinforcement Learning
Actor-Critic
- Generalized Off-Policy Actor-Critic 27 Mar 2019
- Soft Actor-Critic Algorithms and Applications 29 Jan 2019
- The Reactor: A Sample-Efficient Actor-Critic Architecture 15 Apr 2017
- SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY
- REINFORCEMENT LEARNING WITH UNSUPERVISED AUXILIARY TASKS
- Continuous control with deep reinforcement learning
Model-based
- Self-Consistent Models and Values 25 Oct 2021 arxiv
- When to use parametric models in reinforcement learning? 12 Jun 2019 arxiv
- Model Based Reinforcement Learning for Atari 5 Mar 2019
- Model-Based Stabilisation of Deep Reinforcement Learning 6 Sep 2018
- Learning model-based planning from scratch 19 July 2017
Model-free + Model-based
Hierarchical
- WHY DOES HIERARCHY (SOMETIMES) WORK SO WELL IN REINFORCEMENT LEARNING? 23 Sep 2019 arxiv
- Language as an Abstraction for Hierarchical Deep Reinforcement Learning 18 Jun 2019 arxiv
Option
- Variational Option Discovery Algorithms 26 July 2018
- A Laplacian Framework for Option Discovery in Reinforcement Learning 16 Jun 2017
Connection with other methods
- Robust Imitation of Diverse Behaviors
- Learning human behaviors from motion capture by adversarial imitation
- Connecting Generative Adversarial Networks and Actor-Critic Methods
Connecting value and policy methods
- Bridging the Gap Between Value and Policy Based Reinforcement Learning
- Policy gradient and Q-learning
Reward design
- End-to-End Robotic Reinforcement Learning without Reward Engineering 16 Apr 2019 arxiv
- Reinforcement Learning with Corrupted Reward Channel 23 May 2017
Unifying
Faster DRL
Multi-agent
- No Press Diplomacy: Modeling Multi-Agent Gameplay 4 Sep 2019 arxiv
- Options as responses: Grounding behavioural hierarchies in multi-agent RL 6 Jun 2019 arxiv
- Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination 18 Jun 2019 arxiv
- A Regularized Opponent Model with Maximum Entropy Objective 17 May 2019 arxiv
- Deep Q-Learning for Nash Equilibria: Nash-DQN 23 Apr 2019 arxiv
- Malthusian Reinforcement Learning 3 Mar 2019 arxiv
- Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning 4 Nov 2018
- INTRINSIC SOCIAL MOTIVATION VIA CAUSAL INFLUENCE IN MULTI-AGENT RL 19 Oct 2018
- QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning 30 Mar 2018
- Modeling Others using Oneself in Multi-Agent Reinforcement Learning 26 Feb 2018
- The Mechanics of n-Player Differentiable Games 15 Feb 2018
- Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments 10 Oct 2017
- Learning with Opponent-Learning Awareness 13 Sep 2017
- Counterfactual Multi-Agent Policy Gradients
- Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments 7 Jun 2017
- Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games 29 Mar 2017
New design
- IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures 9 Feb 2018
- Reverse Curriculum Generation for Reinforcement Learning
- Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
- Learning to Design Games: Strategic Environments in Deep Reinforcement Learning 5 July 2017
Multitask
- Kickstarting Deep Reinforcement Learning 10 Mar 2018
- Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning 7 Nov 2017
- Distral: Robust Multitask Reinforcement Learning 13 July 2017
Observational Learning
- Observational Learning by Reinforcement Learning 20 Jun 2017
Meta Learning
- Discovery of Useful Questions as Auxiliary Tasks 10 Sep 2019 arxiv
- Meta-learning of Sequential Strategies 8 May 2019 arxiv
- Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables 19 Mar 2019 arxiv
- Some Considerations on Learning to Explore via Meta-Reinforcement Learning 11 Jan 2019 arxiv
- Meta-Gradient Reinforcement Learning 24 May 2018 arxiv
- ProMP: Proximal Meta-Policy Search 16 Oct 2018 arxiv
- Unsupervised Meta-Learning for Reinforcement Learning 12 Jun 2018
Distributional
- GAN Q-learning 20 July 2018
- Implicit Quantile Networks for Distributional Reinforcement Learning 14 Jun 2018
- Nonlinear Distributional Gradient Temporal-Difference Learning 20 May 2018
- DISTRIBUTED DISTRIBUTIONAL DETERMINISTIC POLICY GRADIENTS 23 Apr 2018
- An Analysis of Categorical Distributional Reinforcement Learning 22 Feb 2018
- Distributional Reinforcement Learning with Quantile Regression 27 Oct 2017
- A Distributional Perspective on Reinforcement Learning 21 July 2017
Planning
Safety
- Robust Reinforcement Learning for Continuous Control with Model Misspecification 18 Jun 2019 arxiv
- Verifiable Reinforcement Learning via Policy Extraction 22 May 2018 arxiv
Inverse RL
No reward RL
- Fast Task Inference with Variational Intrinsic Successor Features 2 Jun 2019 arxiv
- Curiosity-driven Exploration by Self-supervised Prediction 15 May 2017
Time
- Interval timing in deep reinforcement learning agents 31 May 2019 arxiv
- Time Limits in Reinforcement Learning