• Stars
    star
    165
  • Rank 228,906 (Top 5 %)
  • Language
    Python
  • Created about 7 years ago
  • Updated almost 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

NIPS 2017 Value Prediction Network

Introduction

This repository implements NIPS 2017 Value Prediction Network (Oh et al.) in Tensorflow.

@inproceedings{Oh2017VPN,
  title={Value Prediction Network},
  author={Junhyuk Oh and Satinder Singh and Honglak Lee},
  booktitle={NIPS},
  year={2017}
}

Our code is based on OpenAI's A3C implemenation.

Dependencies

Training

The following command trains a value prediction network (VPN) with plan depth of 3 on stochastic Collect domain:

python train.py --config config/collect_deterministic.xml --branch 4,4,4 --alg VPN

train_vpn script contains commands for reproducing the main result of the paper.

Notes

  • Tensorboard shows the performance of the epsilon-greedy policy. This is NOT the learning curve in the paper, because epsilon decreases from 1.0 to 0.05 for the first 1e6 steps. Instead, [logdir]/eval.csv shows the performance of the agent using greedy-policy.
  • Our code supports multi-gpu training. You can specify GPU IDs in --gpu option (e.g., --gpu 0,1,2,3).