here.
Check out the completely revised and updated second editon of this book which covers basic to advanced deep RL algorithms with extensive math. Check out the new repoHands-On Reinforcement Learning With Python
Master reinforcement and deep reinforcement learning using OpenAI Gym and TensorFlow
About the book
Reinforcement Learning with Python will help you to master basic reinforcement learning algorithms to the advanced deep reinforcement learning algorithms.
The book starts with an introduction to Reinforcement Learning followed by OpenAI and Tensorflow. You will then explore various RL algorithms and concepts such as the Markov Decision Processes, Monte-Carlo methods, and dynamic programming, including value and policy iteration. This example-rich guide will introduce you to deep learning, covering various deep learning algorithms. You will then explore deep reinforcement learning in depth, which is a combination of deep learning and reinforcement learning. You will master various deep reinforcement learning algorithms such as DQN, Double DQN. Dueling DQN, DRQN, A3C, DDPG, TRPO, and PPO. You will also learn about recent advancements in reinforcement learning such as imagination augmented agents, learn from human preference, DQfD, HER and many more.
Get the book
Get the Chinese Version (中文版)
The book is also translated into chinese and you can get it from here (这本书也被翻译成中文,你可以从这里得到它):https://item.jd.com/12506442.html
Table of Contents
1. Introduction to Reinforcement Learning
- 1.1. What is Reinforcement Learning?
- 1.2. Reinforcement Learning Cycle
- 1.3. How RL differs from other ML Paradigms?
- 1.4. Elements of Reinforcement Learning
- 1.5. Agent Environment Interface
- 1.6. Types of RL Environments
- 1.7. Reinforcement Learning Platforms
- 1.8. Applications of Reinforcement Learning
2. Getting Started with OpenAI and Tensorflow
- 2.1. Setting Up Your Machine
- 2.2. Installing Anaconda
- 2.3. Installing Docker
- 2.4. Installing OpenAI Gym and Universe
- 2.5. Common Error Fixes
- 2.6. OpenAI Gym
- 2.7. Basic Simulations
- 2.8. Training a Robot to walk
- 2.9. Building a Video Game Bot
- 2.10. Tensorflow Fundamentals
- 2.11. Tensorboard
3. Markov Decision Process and Dynamic Programming
- 3.1. Markov Chain and Markov Process
- 3.2. Markov Decision Process
- 3.3. Rewards and Returns
- 3.4. Episodic and Continous Tasks
- 3.5. Policy Function
- 3.6. State Value Function
- 3.7. State-Action Value Function (Q Function)
- 3.8. Bellman Equation and Optimality
- 3.9. Deriving Bellman Equation for Value and Q functions
- 3.10. Solving the Bellman Equation
- 3.11. Dynamic Programming
- 3.12. Solving Frozen Lake Problem using Value Iteration
- 3.13. Solving Frozen Lake Problem using Policy Iteration
4. Gaming with Monte Carlo Methods
- 4.1. Monte Carlo Methods
- 4.2. Estimating Value of Pi Using Monte Carlo
- 4.3. Monte Carlo Prediction
- 4.4. First visit Monte Carlo
- 4.5. Every visit Monte Carlo
- 4.6. BlackJack with Monte Carlo
- 4.7. Monte Carlo Control
- 4.8. Monte Carlo Exploration Starts
- 4.9. On Policy Monte Carlo Control
- 4.10. Off Policy Monte Carlo Control
5. Temporal Difference Learning
- 5.1. Temporal Difference Learning
- 5.2. TD Prediction
- 5.3. TD Control
- 5.4. Q Learning
- 5.5. Solving the Taxi Problem using Q learning
- 5.6. SARSA
- 5.7. Solving the Taxi Problem using SARSA
- 5.8. Difference Between Q learning and SARSA
6. Multi-Armed Bandit Problem
- 6.1. Multi-armed Bandit Problem
- 6.2. Epsilon-Greedy Algorithm
- 6.3. Softmax Exploration Algorithm
- 6.4. Upper Confidence Bound Algorithm
- 6.5. Thompson Sampling Algorithm
- 6.6. Applications of MAB
- 6.7. Identifying Right Advertisement Banner Using MAB
- 6.8. Contextual Bandits
7. Deep Learning Fundamentals
- 7.1. Artificial Neurons
- 7.2. Artificial Neural Network
- 7.3. Activation Functions
- 7.4. Deep Dive into ANN
- 7.5. Gradient Descent
- 7.6. Neural Networks in Tensorflow
- 7.7. Recurrent Neural Network
- 7.8. Backpropagation Through Time
- 7.9. Long Short Term Memory RNN
- 7.10. Generating Song Lyrics using LSTM RNN
- 7.11. Convolutional Neural Networks
- 7.12. CNN Architecture
- 7.13. Classifying Fashion Products Using CNN
8. Atari Games With Deep Q Network
- 8.1. What is Deep Q network
- 8.2. Architecture of DQN
- 8.3. Convolutional Network
- 8.4. Experience Replay
- 8.5. Target Network
- 8.6. Clipping Rewards
- 8.7. DQN Algorithm
- 8.8. Building an Agent to Play Atari Games
- 8.9. Double DQN
- 8.10. Dueling Architecture
9. Playing Doom With Deep Recurrent Q Network
- 9.1. Deep Recurrent Q Network
- 9.2. Partially Observable MDP
- 9.3. Architecture of DRQN
- 9.4. Basic Doom Game
- 9.5. Build an Agent to Play Doom Game using DRQN
- 9.6. Deep Attention Recurrent Q Network
10. Asynchronous Advantage Actor Critic Network
- 10.1. Asynchronous Actor Critic Algorithm
- 10.2. The three A's
- 10.3. Architecture of A3C
- 10.4. Working of A3C
- 10.5. Drive up the Mountain with A3C
- 10.6. Visualization in Tensorboard
11. Policy Gradients and Optimization
- 11.1. Policy Gradient
- 11.2. Lunar Lander Using Policy Gradient
- 11.3. Deep Deterministic Policy Gradient
- 11.4. Swinging up the Pendulum using DDPG
- 11.5. Trust Region Policy Optimizatio
- 11.6. Proximal Policy Optimization
12. Capstone Project: Car Racing using DQN
- 12.1. Environment Wrapper Functions
- 12.2. Dueling Network
- 12.3. Replay Buffer
- 12.4. Training the Network
- 12.5. Car Racing
13. Recent Advancements and Next Steps
- 13.1. Imagination Augmented Agents
- 13.2. Learning From Human Preference
- 13.3. Deep Q Learning From Demonstrations
- 13.4. Hindsight Experience Replay
- 13.5. Hierarchical Reinforcement Learning
- 13.6. Inverse Reinforcement Learning