• Stars
    star
    205
  • Rank 191,264 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created about 9 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

SPINN (Stack-augmented Parser-Interpreter Neural Network): fast, batchable, context-aware TreeRNNs

NOTE: This codebase is under active development. To exactly reproduce the experiments published in ACL 2016, use this release. For the most recent version, see the NYU fork.

Stack-augmented Parser-Interpreter Neural Network

This repository contains the source code described in our paper A Fast Unified Model for Sentence Parsing and Understanding. For a more informal introduction to the ideas behind the model, see this Stanford NLP blog post.

There are three separate implementations available:

  • A Python/Theano implementation of SPINN using a naïve stack representation (named fat-stack)
  • A Python/Theano implementation of SPINN using the thin-stack representation described in our paper
  • A C++/CUDA implementation of the SPINN feedforward, used for performance testing

Python code

The Python code lives, quite intuitively, in the python folder. We used this code to train and test the SPINN models before publication.

There is one enormous difference in the fat- and thin-stack implementations: fat-stack uses Theano's automatically generated symbolic backpropagation graphs, while thin-stack generates its own optimal backpropagation graph. This makes thin-stack oodles faster than its brother, but we have not yet implemented all SPINN variants to support this custom backpropagation.

Installation

Requirements:

  • Python 2.7
  • CUDA >= 7.0
  • CuDNN == v4 (v5 is not compatible with our Theano fork)

Install all required Python dependencies using the command below. (WARNING: This installs our custom Theano fork. We recommend installing in a virtual environment in order to avoid overwriting any stock Theano install you already have.)

pip install -r python/requirements.txt

We use a modified version of Theano in order to support fast forward- and backward-prop in thin-stack. While it isn't absolutely necessary to use this hacked Theano, it greatly improves thin-stack performance.

Alternatively, you can use a custom Docker image that we've prepared, as discussed in this CodaLab worksheet.

Running the code

The easiest way to launch a train/test run is to use one of the checkpoints directory. The Bash scripts in this directory will download the necessary data and launch train/test runs of all models reported in our paper. You can run any of the following scripts:

./checkpoints/spinn.sh
./checkpoints/spinn_pi.sh
./checkpoints/spinn_pi_nt.sh
./checkpoints/rnn.sh

All of the above scripts will by default launch a training run beginning with the recorded parameters of our best models. You can change their behavior using the arguments below:

$ ./checkpoints/spinn.sh -h
spinn.sh [-h] [-e] [-t] [-s] -- run a train or test run of a SPINN model

where:
    -h    show this help text
    -e    run in eval-only mode (evaluates on dev set by default)
    -t    evaluate on test set
    -s    skip the checkpoint loading; run with a randomly initialized model

To evaluate our best SPINN-PI-NT model on the test set, for example, run

$ ./checkpoints/spinn_pi_nt.sh -e -t
Running command:
python -m spinn.models.fat_classifier --data_type snli --embedding_data_path ../glove/glove.840B.300d.txt --log_path ../logs --training_data_path ../snli_1.0/snli_1.0_train.jsonl --experiment_name spinn_pi_nt --expanded_eval_only --eval_data_path ../snli_1.0/snli_1.0_test.jsonl --ckpt_path spinn_pi_nt.ckpt_best   --batch_size 32 --embedding_keep_rate 0.828528124124 --eval_seq_length 50 --init_range 0.005 --l2_lambda 3.45058959758e-06 --learning_rate 0.000297682444894 --model_dim 600 --model_type Model0 --noconnect_tracking_comp  --num_sentence_pair_combination_layers 2 --semantic_classifier_keep_rate 0.9437038157 --seq_length 50 --tracking_lstm_hidden_dim 57 --use_tracking_lstm  --word_embedding_dim 300
...
[1] Checkpointed model was trained for 156500 steps.
[1] Building forward pass.
[1] Writing eval output for ../snli_1.0/snli_1.0_test.jsonl.
[1] Written gold parses in spinn_pi_nt-snli_1.0_test.jsonl-parse.gld
[1] Written predicted parses in spinn_pi_nt-snli_1.0_test.jsonl-parse.tst
[1] Step: 156500    Eval acc: 0.808734   0.000000   ../snli_1.0/snli_1.0_test.jsonl

Custom model configurations

The main executable for the SNLI experiments in the paper is fat_classifier.py, whose flags specify the hyperparameters of the model. You may also need to set Theano flags through the THEANO_FLAGS environment variable, which specifies compilation mode (set it to fast_compile during development, and delete it to use the default state for longer runs), device, which can be set to cpu or gpu#, and cuda.root, which specifies the location of CUDA when running on GPU. floatX should always be set to float32.

Here's a sample command that runs a fast, low-dimensional CPU training run, training and testing only on the dev set. It assumes that you have a copy of SNLI available locally.

PYTHONPATH=spinn/python \
    THEANO_FLAGS=optimizer=fast_compile,device=cpu,floatX=float32 \
    python2.7 -m spinn.models.fat_classifier --data_type snli \
    --training_data_path snli_1.0/snli_1.0_dev.jsonl \
    --eval_data_path snli_1.0/snli_1.0_dev.jsonl \
    --embedding_data_path spinn/python/spinn/tests/test_embedding_matrix.5d.txt \
    --word_embedding_dim 5 --model_dim 10

For full runs, you'll also need a copy of the 840B word 300D GloVe word vectors.

C++ code

The C++ code lives in the cpp folder. This code implements a basic SPINN feedforward. (This implementation corresponds to the bare SPINN-PI-NT, "parsed input / no tracking" model, described in the paper.) It has been verified to produce the exact same output as a recursive neural network with the same weights and inputs. (We used a simplified version of Ozan Irsoy's deep-recursive project as a comparison.)

The main binary, stacktest, simply generates random input data and runs a feedforward. It outputs the total feedforward time elapsed and the numerical result of the feedforward.

Dependencies

The only external dependency of the C++ code is CUDA >=7.0. The tests depend on the googletest library, included in this repository as a Git submodule.

Installation

First install CUDA >=7.0 and ensure that nvcc is on your PATH. Then:

# From project root
cd cpp

# Pull down Git submodules (libraries)
git submodule update --init

# Compile C++ code
make stacktest
make rnntest

This should generate a binary in cpp/bin/stacktest.

Running

The binary cpp/bin/stacktest runs on random input data. You can time the feedforward yourself by running the following commands:

# From project root
cd cpp

BATCH_SIZE=512 ./bin/stacktest

You can of course set BATCH_SIZE to whatever integer you desire. The other model architecture parameters are fixed in the code, but you can easily change them as well on this line if you desire.

Baseline RNN

The binary cpp/bin/rnntest runs a vanilla RNN (ReLU activations) with random input data. You can run this performance test script as follows:

# From project root
cd cpp

BATCH_SIZE=512 ./bin/rnntest

License

Copyright 2018, Stanford University

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

More Repositories

1

dspy

DSPy: The framework for programming—not prompting—foundation models
Python
18,220
star
2

CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
Java
9,678
star
3

stanza

Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
Python
7,278
star
4

GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
C
6,867
star
5

cs224n-winter17-notes

Course notes for CS224N Winter17
TeX
1,587
star
6

pyreft

ReFT: Representation Finetuning for Language Models
Python
1,137
star
7

treelstm

Tree-structured Long Short-Term Memory networks (http://arxiv.org/abs/1503.00075)
Lua
875
star
8

pyvene

Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Python
625
star
9

string2string

String-to-String Algorithms for Natural Language Processing
Jupyter Notebook
533
star
10

python-stanford-corenlp

Python interface to CoreNLP using a bidirectional server-client interface.
Python
516
star
11

mac-network

Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
Python
494
star
12

phrasal

A large-scale statistical machine translation system written in Java.
Java
208
star
13

coqa-baselines

The baselines used in the CoQA paper
Python
176
star
14

cocoa

Framework for learning dialogue agents in a two-player game setting.
Python
158
star
15

stanza-old

Stanford NLP group's shared Python tools.
Python
138
star
16

chirpycardinal

Stanford's Alexa Prize socialbot
Python
131
star
17

stanfordnlp

[Deprecated] This library has been renamed to "Stanza". Latest development at: https://github.com/stanfordnlp/stanza
Python
114
star
18

wge

Workflow-Guided Exploration: sample-efficient RL agent for web tasks
Python
109
star
19

pdf-struct

Logical structure analysis for visually structured documents
Python
81
star
20

edu-convokit

Edu-ConvoKit: An Open-Source Framework for Education Conversation Data
Jupyter Notebook
75
star
21

cs224n-web

http://cs224n.stanford.edu
HTML
60
star
22

ColBERT-QA

Code for Relevance-guided Supervision for OpenQA with ColBERT (TACL'21)
40
star
23

stanza-train

Model training tutorials for the Stanza Python NLP Library
Python
37
star
24

phrasenode

Mapping natural language commands to web elements
Python
37
star
25

contract-nli-bert

A baseline system for ContractNLI (https://stanfordnlp.github.io/contract-nli/)
Python
29
star
26

color-describer

Code for Learning to Generate Compositional Color Descriptions
OpenEdge ABL
26
star
27

stanza-resources

23
star
28

python-corenlp-protobuf

Python bindings for Stanford CoreNLP's protobufs.
Python
20
star
29

miniwob-plusplus-demos

Demos for the MiniWoB++ benchmark
17
star
30

multi-distribution-retrieval

Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval
Python
14
star
31

huggingface-models

Scripts for pushing models to huggingface repos
Python
11
star
32

nlp-meetup-demo

Java
8
star
33

sentiment-treebank

Updated version of SST
Python
8
star
34

en-worldwide-newswire

An English NER dataset built from foreign newswire
Python
7
star
35

plot-data

datasets for plotting
Jupyter Notebook
6
star
36

contract-nli

ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts
HTML
4
star
37

plot-interface

Web interface for the plotting project
JavaScript
3
star
38

handparsed-treebank

Extra hand parsed data for training models
Perl
2
star
39

coqa

CoQA -- A Conversational Question Answering Challenge
Shell
2
star
40

pdf-struct-models

A repository for hosting models for https://github.com/stanfordnlp/pdf-struct
HTML
2
star
41

chirpy-parlai-blenderbot-fork

A fork of ParlAI supporting Chirpy Cardinal's custom neural generator
Python
2
star
42

wob-data

Data for QAWoB and FlightWoB web interaction benchmarks from the World of Bits paper (Shi et al., 2017).
Python
2
star
43

pdf-struct-dataset

Dataset for pdf-struct (https://github.com/stanfordnlp/pdf-struct)
HTML
1
star
44

nn-depparser

A re-implementation of nndep using PyTorch.
Python
1
star