Introduction
This repository implements NIPS 2017 Value Prediction Network (Oh et al.) in Tensorflow.
@inproceedings{Oh2017VPN,
title={Value Prediction Network},
author={Junhyuk Oh and Satinder Singh and Honglak Lee},
booktitle={NIPS},
year={2017}
}
Our code is based on OpenAI's A3C implemenation.
Dependencies
- Tensorflow
- Beutiful Soup
- Golang
- six (for py2/3 compatibility)
- tmux (the start script opens up a tmux session with multiple windows)
- htop (shown in one of the tmux windows)
- gym
- gym[atari]
- universe
- opencv-python
- numpy
- scipy
Training
The following command trains a value prediction network (VPN) with plan depth of 3 on stochastic Collect domain:
python train.py --config config/collect_deterministic.xml --branch 4,4,4 --alg VPN
train_vpn
script contains commands for reproducing the main result of the paper.
Notes
- Tensorboard shows the performance of the epsilon-greedy policy. This is NOT the learning curve in the paper, because epsilon decreases from 1.0 to 0.05 for the first 1e6 steps. Instead,
[logdir]/eval.csv
shows the performance of the agent using greedy-policy. - Our code supports multi-gpu training. You can specify GPU IDs in
--gpu
option (e.g.,--gpu 0,1,2,3
).