• Stars
    star
    237
  • Rank 169,885 (Top 4 %)
  • Language
    Lua
  • License
    MIT License
  • Created almost 8 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for Structured Attention Networks https://arxiv.org/abs/1702.00887

Structured Attention Networks

Code for the paper:

Structured Attention Networks
Yoon Kim, Carl Denton, Luong Hoang, and Alexander M. Rush
ICLR 2017

Dependencies

  • Python: h5py, numpy
  • Lua: nn, nngraph, cutorch, cunn, nngraph

We additionally require a custom cuda-mod package which implements some custom CUDA functions for the linear-chain CRF. This can be installed via

git clone https://github.com/harvardnlp/cuda-mod
cd cuda-mod && luarocks install rocks/cuda-mod-1.0-0.rockspec

Models

The structured attention layers described in the paper can be found under the folder models/. Specifically:

  • CRF.lua: Segmentation attention layer (i.e. linear-chain CRF)
  • EisnerCRF.lua: Syntactic attention layer (i.e. first-order graph-based dependency parser)

These layers are modular and can be plugged into other deep models. We use them in place of standard simple (softmax) attention layers for neural machine translation, natural langage inference, and question answering (see below).

Neural Machine Translation

Data

The Japanese-English data used for the paper can be downloaded by following the instructions at http://lotus.kuee.kyoto-u.ac.jp/ASPEC

Preprocessing

To preprocess the data, run

python preprocess-nmt.py --srcfile path-to-source-train --targetfile path-to-target-train
--srcvalfile path-to-source-val --targetvalfile path-to-target-val --outputfile data/nmt

See the preprocess-nmt.py file for other arguments like maximum sequence length, vocabulary size, batch size, etc.

Training

Baseline simple (i.e. softmax) attention model

th train-nmt.lua -data_file path-to-train -val_data_file path-to-val -attn softmax -savefile nmt-simple

Sigmoid attention

th train-nmt.lua -data_file path-to-train -val_data_file path-to-val -attn sigmoid -savefile nmt-sigmoid

Structured attention (i.e. segmentation attention)

th train-nmt.lua -data_file path-to-train -val_data_file path-to-val -attn crf -savefile nmt-struct

Here path-to-train and path-to-val are the *.hdf5 files from running preprocess-nmt.py. You can add -gpuid 1 to use the (first) GPU, and change the argument to -savefile if you wish to save to a different path.

Note: structured attention only works with the GPU.

Evaluating

th predict-nmt.lua -src_file path-to-source-test -targ_file path-to-target-test
-src_dict path-to-source-dict -targ_dict -path-to-target-dict -output_file pred.txt

-src_dict and -targ_dict are the *.dict files created from running preprocess-nmt.py. Argument to -targ_file is optional. The code will output predictions to pred.txt, and you can again add -gpuid 1 to use the GPU.

Evaluation is done with the multi-bleu.perl script from Moses.

Natural Language Inference

Data

Stanford Natural Language Inference (SNLI) dataset can be downloaded from http://nlp.stanford.edu/projects/snli/

Pre-trained GloVe embeddings can be downloaded from http://nlp.stanford.edu/projects/glove/

Preprocessing

First we need to process the SNLI data:

python process-snli.py --data_filder path-to-snli-folder --out_folder path-to-output-folder

Then run:

python preprocess-entail.py --srcfile path-to-sent1-train --targetfile path-to-sent2-train
--labelfile path-to-label-train --srcvalfile path-to-sent1-val --targetvalfile path-to-sent2-val
--labelvalfile path-to-label-val --srctestfile path-to-sent1-test --targettestfile path-to-sent2-test
--labeltestfile path-to-label-test --outputfile data/entail --glove path-to-glove

Here path-to-sent1-train is the path to the src-train.txt file created from running process-snli.py (and path-to-sent2-train = targ-train.txt, path-to-label-train = label-train.txt, etc.)

preprocess-entail.py will create the data hdf5 files. Vocabulary is based on the pretrained Glove embeddings, with path-to-glove being the path to the pretrained Glove word vecs (i.e. the glove.840B.300d.txt file). sent1 is the premise and sent2 is the hypothesis.

Now run:

python get_pretrain_vecs.py --glove path-to-glove --outputfile data/glove.hdf5
--dictionary path-to-dict

path-to-dict is the *.word.dict file created from running preprocess-entail.py

Training

Baseline model (i.e. no intra-sentence attention)

th train-entail.lua -attn none -data_file path-to-train -val_data_file path-to-val
-test_data_file path-to-test -pre_word_vecs path-to-word-vecs -savefile entail-baseline

Simple attention (i.e. softmax attention)

th train-entail.lua -attn simple -data_file path-to-train -val_data_file path-to-val
-test_data_file path-to-test -pre_word_vecs path-to-word-vecs -savefile entail-simple

Structured attention (i.e. syntactic attention)

th train-entail.lua -attn struct -data_file path-to-train -val_data_file path-to-val
-test_data_file path-to-test -pre_word_vecs path-to-word-vecs -savefile entail-struct

Here path-to-word-vecs is the hdf5 file created from running get_pretrain_vecs.py and the path-to-train are the *.hdf5 files created from running preprocess-entail.py. You can add -gpuid 1 to use the (first) GPU, and change the argument to -savefile if you wish to save to a different path.

The baseline model essentially replicates A Decomposable Attention Model for Natural Language Inference. Parikh et al. EMNLP 2016. The differences are that we use a hidden layer size of 300 (they use 200), batch size of 32 (they use 4), and train for 100 epochs (they train for 400 epochs with asynchronous SGD).

See train-entail.lua (or the paper) for hyperparameters and more training options.

Question Answering

Data

The bAbI project (bAbI) dataset can be downloaded in all versions from https://research.fb.com/projects/babi/, or a copy of v1.0 from https://github.com/harvardnlp/MemN2N/tree/master/babi_data/en which this code was tested on. The latter is the 1k set where each task includes 1,000 questions.

Preprocessing

First run:

python preprocess-qa.py -dir input-data-path -vocabsize max-vocabulary-size

This will create the data hdf5 files. Vocabulary is based on the input data, and will be written to word_to_idx.csv.

Training

For the baseline model, see our MemN2N implementation.

To train structured attention with binary-potential CRF, run:

th train-qa.lua -datafile data-file.hdf5 -classifier classifier-type

Here data-file.hdf5 is the hdf5 file created from running preprocess-qa.py and the classifier is either binarycrf or unarycrf. You can add -cuda to use the (first) GPU, and add -save -saveminacc number if you wish to save model (only if the accuracy on test set is at least that specified). To train with Position Encoding or Temporal Encoding (as described in End-End Memory Networks Sukhbaatar et al. NIPS 2015), use -pe and -te respectively. Note that some default parameters (such as embedding size, max history etc...) are different from those used in the MemN2N paper. In addition, this code implements a 2-step CRF which is tested only on bAbI tasks with 2 supporting facts (however should in theory work for all tasks).

See train-qa.lua (or the paper) for hyperparameters and more training options.

License

MIT

More Repositories

1

annotated-transformer

An annotated implementation of the Transformer paper.
Jupyter Notebook
5,683
star
2

seq2seq-attn

Sequence-to-sequence model with LSTM encoder/decoders and attention
Lua
1,257
star
3

im2markup

Neural model for converting Image-to-Markup (by Yuntian Deng yuntiandeng.com)
Lua
1,203
star
4

pytorch-struct

Fast, general, and tested differentiable structured prediction in PyTorch
Jupyter Notebook
1,107
star
5

sent-conv-torch

Text classification using a convolutional neural network.
Lua
448
star
6

namedtensor

Named Tensor implementation for Torch
Jupyter Notebook
443
star
7

var-attn

Latent Alignment and Variational Attention
Python
326
star
8

sent-summary

300
star
9

neural-template-gen

Python
262
star
10

NeuralSteganography

STEGASURAS: STEGanography via Arithmetic coding and Strong neURAl modelS
Python
183
star
11

urnng

Python
176
star
12

botnet-detection

Topological botnet detection datasets and graph neural network applications
Python
169
star
13

data2text

Lua
158
star
14

sa-vae

Python
154
star
15

compound-pcfg

Python
127
star
16

cascaded-generation

Cascaded Text Generation with Markov Transformers
Python
127
star
17

TextFlow

Python
116
star
18

boxscore-data

HTML
111
star
19

decomp-attn

Decomposable Attention Model for Sentence Pair Classification (from https://arxiv.org/abs/1606.01933)
Lua
95
star
20

encoder-agnostic-adaptation

Encoder-Agnostic Adaptation for Conditional Language Generation
Python
79
star
21

genbmm

CUDA kernels for generalized matrix-multiplication in PyTorch
Jupyter Notebook
79
star
22

DeepLatentNLP

61
star
23

nmt-android

Neural Machine Translation on Android
Lua
59
star
24

BSO

Lua
54
star
25

hmm-lm

Python
42
star
26

seq2seq-talk

TeX
39
star
27

Talk-Latent

TeX
31
star
28

regulatory-prediction

Code and Data to accompany "Dilated Convolutions for Modeling Long-Distance Genomic Dependencies", presented at the ICML 2017 Workshop on Computational Biology
Python
28
star
29

harvardnlp.github.io

JavaScript
26
star
30

strux

Python
18
star
31

lie-access-memory

Lua
17
star
32

annotated-attention

Jupyter Notebook
15
star
33

DataModules

A state-less module system for torch-like languages
Python
8
star
34

rush-nlp

JavaScript
8
star
35

seq2seq-attn-web

CSS
8
star
36

tutorial-deep-latent

TeX
7
star
37

MemN2N

Torch implementation of End-to-End Memory Networks (https://arxiv.org/abs/1503.08895)
Lua
6
star
38

image-extraction

Extract images from PDFs
Jupyter Notebook
4
star
39

paper-explorer

JavaScript
3
star
40

readcomp

Entity Tracking Improves Cloze-style Reading Comprehension
Python
3
star
41

banded

Sparse banded diagonal matrices for pytorch
Cuda
2
star
42

torax

Python
2
star
43

cs6741

HTML
2
star
44

simple-recs

Python
1
star
45

poser

Python
1
star
46

iclr

1
star
47

cs6741-materials

1
star