• Stars
    star
    544
  • Rank 81,699 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PyTorch implementation of Neural Combinatorial Optimization with Reinforcement Learning https://arxiv.org/abs/1611.09940

neural-combinatorial-rl-pytorch

PyTorch implementation of Neural Combinatorial Optimization with Reinforcement Learning.

I have implemented the basic RL pretraining model with greedy decoding from the paper. An implementation of the supervised learning baseline model is available here. Instead of a critic network, I got my results below on TSP from using an exponential moving average critic. The critic network is simply commented out in my code right now. From correspondence with a few others, it was determined that the exponential moving average critic significantly helped improve results.

My implementation uses a stochastic decoding policy in the pointer network, realized via PyTorch's torch.multinomial(), during training, and beam search (not yet finished, only supports 1 beam a.k.a. greedy) for decoding when testing the model.

Currently, there is support for a sorting task and the planar symmetric Euclidean TSP.

See main.sh for an example of how to run the code.

Use the --load_path $LOAD_PATH and --is_train False flags to load a saved model.

To load a saved model and view the pointer network's attention layer, also use the --plot_attention True flag.

Please, feel free to notify me if you encounter any errors, or if you'd like to submit a pull request to improve this implementation.

Adding other tasks

This implementation can be extended to support other combinatorial optimization problems. See sorting_task.py and tsp_task.py for examples on how to add. The key thing is to provide a dataset class and a reward function that takes in a sample solution, selected by the pointer network from the input, and returns a scalar reward. For the sorting task, the agent received a reward proportional to the length of the longest strictly increasing subsequence in the decoded output (e.g., [1, 3, 5, 2, 4] -> 3/5 = 0.6).

Dependencies

  • Python=3.6 (should be OK with v >= 3.4)
  • PyTorch=0.2 and 0.3
  • tqdm
  • matplotlib
  • tensorboard_logger

PyTorch 0.4 compatibility is available on branch pytorch-0.4.

TSP Results

Results for 1 random seed over 50 epochs (each epoch is 10,000 batches of size 128). After each epoch, I validated performance on 1000 held out graphs. I used the same hyperparameters from the paper, as can be seen in main.sh. The dashed line shows the value indicated in Table 2 of Bello, et. al for comparison. The log scale x axis for the training reward is used to show how the tour length drops early on.

TSP 20 Train TSP 20 Val TSP 50 Train TSP 50 Val

Sort Results

I trained a model on sort10 for 4 epochs of 1,000,000 randomly generated samples. I tested it on a dataset of size 10,000. Then, I tested the same model on sort15 and sort20 to test the generalization capabilities.

Test results on 10,000 samples (A reward of 1.0 means the network perfectly sorted the input):

task average reward variance
sort10 0.9966 0.0005
sort15 0.7484 0.0177
sort20 0.5586 0.0060

Example prediction on sort10:

input: [4, 7, 5, 0, 3, 2, 6, 8, 9, 1]
output: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Attention visualization

Plot the pointer network's attention layer with the argument --plot_attention True

TODO

  • Add RL pretraining-Sampling
  • Add RL pretraining-Active Search
  • Active Search
  • Asynchronous training a la A3C
  • Refactor USE_CUDA variable
  • Finish implementing beam search decoding to support > 1 beam
  • Add support for variable length inputs

Acknowledgements

Special thanks to the repos devsisters/neural-combinatorial-rl-tensorflow and MaximumEntropy/Seq2Seq-PyTorch for getting me started, and @ricgama for figuring out that weird bug with clone()

More Repositories

1

deep-rl

Collection of Deep Reinforcement Learning algorithms
Python
297
star
2

POMDPy

POMDPs in Python.
Python
241
star
3

awesome-hyperparams

A curated list of awesome hyperparameters for deep learning
78
star
4

sinkhorn-policy-gradient.pytorch

Code accompanying the paper "Learning Permutations with Sinkhorn Policy Gradient"
Python
38
star
5

EfficientMORL

EfficientMORL (ICML'21)
Python
23
star
6

REBAR-pytorch

Implementation of REBAR in PyTorch
Jupyter Notebook
17
star
7

V-RAVE

Virtual Reality Autonomous Vehicle Experience
C#
15
star
8

ppde

Official repository for "Plug & Play Directed Evolution for Proteins with Gradient-Based Discrete MCMC"
Python
10
star
9

pemami4911.github.io

https://pemami4911.github.io
HTML
6
star
10

TF-Queues-Full-MNIST-Example

Python
4
star
11

IODINE.pytorch

Unofficial PyTorch implementation of IODINE https://arxiv.org/abs/1903.00450
Python
4
star
12

Nonlinear-Programming-Exercises

Programming exercises from Nonlinear Programming (3rd Edition) by Dimitri P. Bertsekas
Jupyter Notebook
4
star
13

MobileDR

MobileDR + traffic radar for vehicle tracking at traffic intersections.
3
star
14

ML-practice

Python
2
star
15

nvm-utils

utils for .nvm files from VisualSFM (http://ccwu.me/vsfm/)
Python
2
star
16

distributed-systems

Learning how to code distributed systems with Elixir
Elixir
2
star
17

esm_one_hot

Python
2
star
18

symmetric-and-object-centric-world-models

Code accompanying "A Symmetric and Object-Centric World Model for Stochastic Environments" https://github.com/orlrworkshop/orlrworkshop.github.io/blob/master/pdf/ORLR_3.pdf
Python
2
star
19

Osprey

CEN3031 Intro to Software Engineering // Health Accelerator // Osprey
JavaScript
1
star
20

machine-learning-practice.pytorch

Implementations of various machine learning algorithms from scratch for practice.
Jupyter Notebook
1
star
21

genesis.pytorch

Unofficial PyTorch implementation of GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations
Python
1
star
22

ppi-with-stacked-autoencoders

Jupyter Notebook
1
star