• Stars
    star
    448
  • Rank 97,537 (Top 2 %)
  • Language
    Lua
  • License
    MIT License
  • Created about 9 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Text classification using a convolutional neural network.

Sentence Convolution Code in Torch

This code implements Kim (2014) sentence convolution code in Torch with GPUs. It replicates the results on existing datasets, and allows training of models on arbitrary other text datasets.

Quickstart

To make data in hdf5 format, run the following (with word2vec .bin path and choice of dataset):

python preprocess.py MR /path/to/word2vec.bin

To run training with GPUs:

th main.lua -data MR.hdf5 -cudnn 1 -gpuid 1

Results are timestamped and saved to the results/ directory.

Dependencies

The training pipeline requires Python hdf5 (the h5py module) and the following lua packages:

  • hdf5
  • cudnn

Training on word2vec architecture models requires downloading word2vec and unzipping. Simply run the script

./get_word2vec.sh

Creating datasets

We provide the following datasets: MR, SST1, SST2, SUBJ, TREC, CR, MPQA. All raw training data is located in the data/ directory. The SST1, SST2 data have both test and dev sets, and TREC has a test set.

The data takes word2vec embeddings, processes the vocabulary, and outputs a data matrix of vocabulary indices for each sentence.

To create the hdf5 file, run the following with DATASET as one of the described datasets:

python preprocess.py DATASET /path/to/word2vec.bin

The script outputs:

  • the DATASET.hdf5 file with the data matrix and word2vec embeddings
  • a DATASET.txt file with a word-index dictionary for the word embeddings

Training on custom datasets

We allow training on arbitrary text datasets. They should be formatted in the same way as the sample data, with one sentence per line, and the first word the class label (0-indexed). Our code handles most parsing of punctuation, possessives, capitalization, etc.

Example line:

1 no movement , no yuks , not much of anything .

Then run:

python preprocess.py custom /path/to/word2vec.bin --train /path/to/train/data --test /path/to/test/data --dev /path/to/dev/data

The output file's name can be set with the flag --custom_name (default is named custom).

Running torch

Training is typically done with 10-fold cross-validation and 25 epochs. If the data set comes with a test set, we don't do cross validation (but split training data 90/10 for the dev set). If the data comes with the dev set, we don't do the split for train/dev.

There are four main model architectures we implemented, as described in Kim (2014): rand, static, nonstatic, multichannel.

  • rand initializes the word embeddings randomly and learns them.
  • static initializes the word embeddings to word2vec and keeps the weight static.
  • nonstatic also initializes to word2vec, but allows them to be learned.
  • multichannel has two word2vec embedding layers, one static and one nonstatic. The two layers outputs are summed.

It is highly recommended that GPUs are used during training if possible (see Results section for timing benchmarks).

Separating out training and testing is easy; use the parameters -train_only and -test_only. Also, pretrained models at any stage can be loaded from a .t7 file with -warm_start_model (see more parameters below).

Output

The code outputs a checkpoint .t7 file for every fold with name -savefile. The default name is TIMESTAMP_results.

The following are saved as a table:

  • dev_scores with dev scores,
  • test scores with test scores,
  • opt with model parameters,
  • model with best model (as determined by dev score),
  • embeddings with the updated word embeddings

Model augmentations

A few modifications were made to the model architecture as experiments.

  • we include an option to include highway layers at the final MLP step (which increases depth of the model),
  • we also include highway layers at the convolutional step (which performs multiple convolutions on the resulting feature maps) as an option,
  • we experimented with skip kernels of size 5 (added in parallel with the other kernel sizes)

Results from these experiments are described below in the Results section.

Parameters

The following is a list of complete parameters allowed by the torch code.

  • model_type: Model architecture, as described above. Options: rand, static, nonstatic, multichannel
  • data: Training dataset to use, including word2vec data. This should be a .hdf5 file made with preprocess.py.
  • cudnn: Use GPUs if set to 1, otherwise set to 0
  • seed: Random seed, set to -1 for actual randomness
  • folds: Number of folds for cross-validation.
  • debug: Print debugging info including timing and confusions
  • savefile: Name of output .t7 file, which will hold the trained model. Default is TIMESTAMP_results
  • zero_indexing: Set to 1 if data is zero indexed
  • warm_start_model: Load a .t7 file with pretrained model. Should contain a table with key 'model'
  • train_only: Set to 1 to only train (no testing)
  • test_only: Given a .t7 file with model, test on testing data
  • dump_feature_maps_file: Filename for dumping feature maps of convolution at test time. This will be a .hdf5 file with fields feature_maps for the features at each time step and word_idxs for the word indexes (aligned with the last word of the filter). This currently only works for models with a single filter size. This is saved for the best model on fold 1.
  • preds_file: Filename for writing predictions (with test_only set to 1). Output is zero indexed.

Training hyperparameters:

  • num_epochs: Number of training epochs.
  • optim_method: Gradient descent method. Options: adadelta, adam
  • L2s: Set L2 norm of final linear layer weights to this.
  • batch_size: Batch size for training.

Model hyperparameters:

  • num_feat_maps: Number of convolution feature maps.
  • kernels: Kernel sizes of different convolutions.
  • dropout_p: Dropout probability.
  • highway_mlp: Number of highway MLP layers (0 for none)
  • highway_conv_layers: Number of highway convolutional layers (0 for none)
  • skip_kernel: Set 1 to use skip kernels

Results

The following results were collected with the same training setup as in Kim (2014) (same parameters, 10-fold cross validation if data has no test set, 25 epochs).

Scores

Dataset rand static nonstatic multichannel
MR 75.9 80.5 81.3 80.8
SST1 42.2 44.8 46.7 44.6
SST2 83.5 85.6 87.0 87.1
Subj 89.2 93.0 93.4 93.2
TREC 88.2 91.8 92.8 91.8
CR 78.3 83.3 84.4 83.7
MPQA 84.6 89.6 89.7 89.6

With 5 trials on SST1, we have a mean nonstatic score of 46.7 with standard deviation 1.69.

With 1 highway layer, SST1 achieves a mean score of mean 47.8, stddev 0.857, over 5 trials, and with 2 highway layers, mean 47.1, stddev 1.47, over 10 trials.

Timing

We ran timing benchmarks on SST1, which has train/dev/test data sizes of 156817/1101/2210. We used a batch size of 50.

| non-GPU | GPU --- | --- | --- per epoch | 2475 s | 54.0 s per batch | 787 ms | 15.6 ms

From these results, we see that using GPUs achieves almost a 50x speedup on training. This allows much faster tuning of parameters and model experimentation.

Relevant publications

This code is based on Kim (2014) and the corresponding Theano code.

Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751, Doha, Qatar. Association for Computational Linguistics.

Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015). Training very deep networks. In Advances in Neural Information Processing Systems (pp. 2368-2376).

More Repositories

1

annotated-transformer

An annotated implementation of the Transformer paper.
Jupyter Notebook
5,683
star
2

seq2seq-attn

Sequence-to-sequence model with LSTM encoder/decoders and attention
Lua
1,257
star
3

im2markup

Neural model for converting Image-to-Markup (by Yuntian Deng yuntiandeng.com)
Lua
1,203
star
4

pytorch-struct

Fast, general, and tested differentiable structured prediction in PyTorch
Jupyter Notebook
1,107
star
5

namedtensor

Named Tensor implementation for Torch
Jupyter Notebook
443
star
6

var-attn

Latent Alignment and Variational Attention
Python
326
star
7

sent-summary

300
star
8

neural-template-gen

Python
262
star
9

struct-attn

Code for Structured Attention Networks https://arxiv.org/abs/1702.00887
Lua
237
star
10

NeuralSteganography

STEGASURAS: STEGanography via Arithmetic coding and Strong neURAl modelS
Python
183
star
11

urnng

Python
176
star
12

botnet-detection

Topological botnet detection datasets and graph neural network applications
Python
169
star
13

data2text

Lua
158
star
14

sa-vae

Python
154
star
15

compound-pcfg

Python
127
star
16

cascaded-generation

Cascaded Text Generation with Markov Transformers
Python
127
star
17

TextFlow

Python
116
star
18

boxscore-data

HTML
111
star
19

decomp-attn

Decomposable Attention Model for Sentence Pair Classification (from https://arxiv.org/abs/1606.01933)
Lua
95
star
20

encoder-agnostic-adaptation

Encoder-Agnostic Adaptation for Conditional Language Generation
Python
79
star
21

genbmm

CUDA kernels for generalized matrix-multiplication in PyTorch
Jupyter Notebook
79
star
22

DeepLatentNLP

61
star
23

nmt-android

Neural Machine Translation on Android
Lua
59
star
24

BSO

Lua
54
star
25

hmm-lm

Python
42
star
26

seq2seq-talk

TeX
39
star
27

Talk-Latent

TeX
31
star
28

regulatory-prediction

Code and Data to accompany "Dilated Convolutions for Modeling Long-Distance Genomic Dependencies", presented at the ICML 2017 Workshop on Computational Biology
Python
28
star
29

harvardnlp.github.io

JavaScript
26
star
30

strux

Python
18
star
31

lie-access-memory

Lua
17
star
32

annotated-attention

Jupyter Notebook
15
star
33

DataModules

A state-less module system for torch-like languages
Python
8
star
34

rush-nlp

JavaScript
8
star
35

seq2seq-attn-web

CSS
8
star
36

tutorial-deep-latent

TeX
7
star
37

MemN2N

Torch implementation of End-to-End Memory Networks (https://arxiv.org/abs/1503.08895)
Lua
6
star
38

image-extraction

Extract images from PDFs
Jupyter Notebook
4
star
39

paper-explorer

JavaScript
3
star
40

readcomp

Entity Tracking Improves Cloze-style Reading Comprehension
Python
3
star
41

banded

Sparse banded diagonal matrices for pytorch
Cuda
2
star
42

torax

Python
2
star
43

cs6741

HTML
2
star
44

simple-recs

Python
1
star
45

poser

Python
1
star
46

iclr

1
star
47

cs6741-materials

1
star