SeqMatchSeq
Implementations of three models described in the three papers related to sequence matching:
-
Learning Natural Language Inference with Lstm by Shuohang Wang, Jing Jiang
-
Machine Comprehension Using Match-LSTM and Answer Pointer by Shuohang Wang, Jing Jiang
-
A Compare-Aggregate Model for Matching Text Sequences by Shuohang Wang, Jing Jiang
Learning Natural Language Inference with Lstm
Requirements
Datasets
Usage
sh preprocess.sh snli
cd main
th main.lua -task snli -model mLSTM -dropoutP 0.3 -num_classes 3
sh preprocess.sh snli
will download the datasets and preprocess the SNLI corpus into the files
(train.txt dev.txt test.txt) under the path "data/snli/sequence" with the format:
sequence1(premise) \t sequence2(hypothesis) \t label(from 1 to num_classes) \n
main.lua
will first initialize the preprossed data and word embeddings into a Torch format and
then run the alogrithm. "dropoutP" is the main prarameter we tuned.
Docker
You may try to use Docker for running the code.
- Docker Install
- Image: docker pull shuohang/seqmatchseq:1.0
After installation, just run the following codes (/PATH/SeqMatchSeq need to change):
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt shuohang/seqmatchseq:1.0 /bin/bash -c "sh preprocess.sh snli"
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt/main shuohang/seqmatchseq:1.0 /bin/bash -c "th main.lua"
Machine Comprehension Using Match-LSTM and Answer Pointer
Requirements
- Torch7
- nn
- nngraph
- optim
- parallel
- Python 2.7
- Python Packages: NLTK, collections, json, argparse
- NLTK Data: punkt
- Multiple-cores CPU
Datasets
Usage
sh preprocess.sh squad
cd main
th mainDt.lua
sh preprocess.sh squad
will download the datasets and preprocess the SQuAD corpus into the files
(train.txt dev.txt) under the path "data/squad/sequence" with the format:
sequence1(Doument) \t sequence2(Question) \t sequence of the positions where the answer appear in Document (e.g. 3 4 5 6) \n
mainDt.lua
will first initialize the preprossed data and word embeddings into a Torch format and
then run the alogrithm. As this code is run through multiple CPU cores, the initial parameters are
written in the file "main/init.lua".
opt.num_processes
: 5. The number of threads used.opt.batch_size
: 6. Batch size for each thread. (Then the mini_batch would be 5*6 .)opt.model
: boundaryMPtr / sequenceMPtr
Docker
You may try to use Docker for running the code.
- Docker Install
- Image: docker pull shuohang/seqmatchseq:1.0
After installation, just run the following codes (/PATH/SeqMatchSeq need to change):
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt shuohang/seqmatchseq:1.0 /bin/bash -c "sh preprocess.sh squad"
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt/main shuohang/seqmatchseq:1.0 /bin/bash -c "th mainDt.lua"
A Compare-Aggregate Model for Matching Text Sequences
Requirements
Datasets
- The Stanford Natural Language Inference (SNLI) Corpus
- MovieQA: Story Understanding Benchmark
- InsuranceQA Corpus V1: Answer Selection Task
- WikiQA: A Challenge Dataset for Open-Domain Question Answering
- GloVe: Global Vectors for Word Representation
For now, this code only support SNLI and WikiQA data sets.
Usage
SNLI task (The preprocessed format follows the previous description):
sh preprocess.sh snli
cd main
th main.lua -task snli -model compAggSNLI -comp_type submul -learning_rate 0.002 -mem_dim 150 -dropoutP 0.3
WikiQA task:
sh preprocess.sh wikiqa (Please first dowload the file "WikiQACorpus.zip" to the path SeqMatchSeq/data/wikiqa/ through address: https://www.microsoft.com/en-us/download/details.aspx?id=52419)
cd main
th main.lua -task wikiqa -model compAggWikiqa -comp_type mul -learning_rate 0.004 -dropoutP 0.04 -batch_size 10 -mem_dim 150
model
(model name) : compAggSNLI / compAggWikiqacomp_type
(8 different types of word comparison): submul / sub / mul / weightsub / weightmul / bilinear / concate / cos
Docker
You may try to use Docker for running the code.
- Docker Install
- Image: docker pull shuohang/seqmatchseq:1.0
After installation, just run the following codes (/PATH/SeqMatchSeq need to change):
For SNLI:
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt shuohang/seqmatchseq:1.0 /bin/bash -c "sh preprocess.sh snli"
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt/main shuohang/seqmatchseq:1.0 /bin/bash -c "th main.lua -task snli -model compAggSNLI -comp_type submul -learning_rate 0.002 -mem_dim 150 -dropoutP 0.3"
For WikiQA
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt shuohang/seqmatchseq:1.0 /bin/bash -c "sh preprocess.sh wikiqa"
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt/main shuohang/seqmatchseq:1.0 /bin/bash -c "th main.lua -task wikiqa -model compAggWikiqa -comp_type mul -learning_rate 0.004 -dropoutP 0.04 -batch_size 10 -mem_dim 150"
Copyright
Copyright 2015 Singapore Management University (SMU). All Rights Reserved.