Deep Q-Learning

Overview

Our version of the deep q-learning algorithm from The DQN paper. This algorithm reads the screen and the integer score of the Atari 2600 game Space Invaders. The output is the same control commands as a human would have with a controller (albeit, without the physical controller).

Installation Dependencies:

Python 2.7
Theano
Lasagne
pygame
Arcade Learning Environment (ALE) 0.5.1
Atari 2600 ROM of space_invaders.bin

Amazon Instance Installation

Look at /provision/aws_installation.sh for a concise shell history to install the environment.

External References

The DQN paper

Human-level control through deep reinforcement learning

Deep Reinforcement Learning with Double Q-learning - more stable learning through double q-learning

Action-Conditional Video Prediction using Deep Networks in Atari Games - predicting future frames

Dueling Network Architectures for Deep Q-learning

Arcade Learning Environment

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Reccurent Model of Visual Attention - applying q-learning to figure out what part of the image to look at.

Prioritized Experience Replay - drawing from the memory should be more likely if the memory is more shocking

Deep Recurrent Q-Learning For Partially Observable MDPs - by using LSTM you can get rid of preprocessing done in DQN paper. "The recurrent net can better adapt at evaluation time if the quality of observations changes"

A fast learning algorithm for deep belief nets - Training one layer at a time

Reinforcement Learning and Automated Planning: A Survey

Autoregressive Neural Networks - Neural Networks applied to Time Series.

Deep Autoregressive Neural Networks - predicting future frames of an Atari Game.

Reinforcement Learning: An introduction - very thorough introduction to Reinforcement Learning.

A survey of robot learning by demonstration Learning by|from demonstration = Learning by watching = Learning from observation = Programming by demonstration = Behaviour cloning|imitation|mimicry

DynaQ

Deep Reinforcement Learning Nice summary of recent advances in Deep Q-learning.

Concurrent Q-learning for Autonomous Mapping and Navigation One-trial learning???

Using Reinforcement Learning to Adapt an Imitation Task Overcoming new obstacles ???

On the importance of initialization and momentum in deep learning - Nesterov Momentum vs Nesterov Accelerated Gradient

CNN Features off-the-shelf: an Astounding Baseline for Recognition NN generated features are better then manually-made

Prioritized Experience Replay - on Atari games

Network in Network - MaxPooling looses information, let's keep some more information.

Concurrent Reinforcement Learning - RL in time dependent environments