Deep Reinforcement Learning in TensorFlow
TensorFlow implementation of Deep Reinforcement Learning papers. This implementation contains:
[1] Playing Atari with Deep Reinforcement Learning
[2] Human-Level Control through Deep Reinforcement Learning
[3] Deep Reinforcement Learning with Double Q-learning
[4] Dueling Network Architectures for Deep Reinforcement Learning
[5] Prioritized Experience Replay (in progress)
[6] Deep Exploration via Bootstrapped DQN (in progress)
[7] Asynchronous Methods for Deep Reinforcement Learning (in progress)
[8] Continuous Deep q-Learning with Model-based Acceleration (in progress)
Requirements
- Python 2.7
- gym
- tqdm
- OpenCV2 or Scipy
- TensorFlow 0.12.0
Usage
First, install prerequisites with:
$ pip install -U 'gym[all]' tqdm scipy
Don't forget to also install the latest
TensorFlow. Also note that you need to install
the dependences of doom-py
which is
required by gym[all]
Train with DQN model described in [1] without gpu:
$ python main.py --network_header_type=nips --env_name=Breakout-v0 --use_gpu=False
Train with DQN model described in [2]:
$ python main.py --network_header_type=nature --env_name=Breakout-v0
Train with Double DQN model described in [3]:
$ python main.py --double_q=True --env_name=Breakout-v0
Train with Deuling network with Double Q-learning described in [4]:
$ python main.py --double_q=True --network_output_type=dueling --env_name=Breakout-v0
Train with MLP model described in [4] with corridor environment (useful for debugging):
$ python main.py --network_header_type=mlp --network_output_type=normal --observation_dims='[16]' --env_name=CorridorSmall-v5 --t_learn_start=0.1 --learning_rate_decay_step=0.1 --history_length=1 --n_action_repeat=1 --t_ep_end=10 --display=True --learning_rate=0.025 --learning_rate_minimum=0.0025
$ python main.py --network_header_type=mlp --network_output_type=normal --double_q=True --observation_dims='[16]' --env_name=CorridorSmall-v5 --t_learn_start=0.1 --learning_rate_decay_step=0.1 --history_length=1 --n_action_repeat=1 --t_ep_end=10 --display=True --learning_rate=0.025 --learning_rate_minimum=0.0025
$ python main.py --network_header_type=mlp --network_output_type=dueling --observation_dims='[16]' --env_name=CorridorSmall-v5 --t_learn_start=0.1 --learning_rate_decay_step=0.1 --history_length=1 --n_action_repeat=1 --t_ep_end=10 --display=True --learning_rate=0.025 --learning_rate_minimum=0.0025
$ python main.py --network_header_type=mlp --network_output_type=dueling --double_q=True --observation_dims='[16]' --env_name=CorridorSmall-v5 --t_learn_start=0.1 --learning_rate_decay_step=0.1 --history_length=1 --n_action_repeat=1 --t_ep_end=10 --display=True --learning_rate=0.025 --learning_rate_minimum=0.0025
Results
Result of Corridor-v5
in [4] for DQN (purple), DDQN (red), Dueling DQN (green), Dueling DDQN (blue).
Result of `Breakout-v0' for DQN without frame-skip (white-blue), DQN with frame-skip (light purple), Dueling DDQN (dark blue).
The hyperparameters and gradient clipping are not implemented as it is as [4].
References
Author
Taehoon Kim / @carpedm20