• Stars
    star
    264
  • Rank 154,232 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created about 8 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Simple Keras model that tackles the Stanford Natural Language Inference (SNLI) corpus using summation and/or recurrent neural networks

Keras SNLI baseline example

This repository contains a simple Keras baseline to train a variety of neural networks to tackle the Stanford Natural Language Inference (SNLI) corpus.

The aim is to determine whether a premise sentence is entailed, neutral, or contradicts a hypothesis sentence - i.e. "A soccer game with multiple males playing" entails "Some men are playing a sport" while "A black race car starts up in front of a crowd of people" contradicts "A man is driving down a lonely road".

The model architecture is:

  • Extract a 300D word vector from the fixed GloVe vocabulary
  • Pass the 300D word vector through a ReLU "translation" layer
  • Encode the premise and hypothesis sentences using the same encoder (summation, GRU, LSTM, ...)
  • Concatenate the two 300D resulting sentence embeddings
  • 3 layers of 600D ReLU layers
  • 3 way softmax

Visual image description of the model

Training uses RMSProp and stops after N epochs have passed with no improvement to the validation loss. Following Liu et al. 2016, the GloVe embeddings are not updated during training. Following Munkhdalai & Yu 2016, the out of vocabulary embeddings remain zeroed out.

One of the most important aspects when using fixed Glove embeddings with summation is the "translation" layer. Bowman et al. 2016 use such a layer when moving from 300D to the lower dimensional 100D hidden state. This is likely highly important for the summation method as it allows the GloVe space to be shifted before summation. Technically when done with training the "translated" GloVe embeddings could be precomputed and this layer removed, decreasing the number of parameters, but ¯\_(ツ)_/¯

The model is relatively simple yet sits at a far higher level than other comparable baselines (specifically summation, GRU, and LSTM models) listed on the SNLI page. The summary: don't dismiss well tuned GloVe bag of words models - they can still be competitive and are far faster to train!

Model Parameters Train Validation Test
300D sum(word vectors) + 3 x 600D ReLU (this code) 1.2m 0.831 0.823 0.825
300D GRU + 3 x 600D ReLU (this code) 1.7m 0.843 0.830 0.823
300D LSTM + 3 x 600D ReLU (this code) 1.9m 0.855 0.829 0.823
300D GRU (recurrent dropout) + 3 x 600D ReLU (this code) 1.7m 0.844 0.832 0.832
300D LSTM (recurrent dropout) + 3 x 600D ReLU (this code) 1.9m 0.852 0.836 0.827
-- --- --- --- ---
300D LSTM encoders (Bowman et al. 2016) 3.0m 0.839 - 0.806
1024D GRU w/ unsupervised 'skip-thoughts' pre-training (Vendrov et al. 2015) 15m 0.988 - 0.814
300D Tree-based CNN encoders (Mou et al. 2015) 3.5m 0.833 - 0.821
300D SPINN-PI encoders (Bowman et al. 2016) 3.7m 0.892 - 0.832
600D (300+300) BiLSTM encoders (Liu et al. 2016) 3.5m 0.833 - 0.834

Only the numbers for pure sentential embedding models are shown here. The SNLI homepage shows the full list of models where attentional models perform better. If I've missed including any comparable models, submit a pull request.

All models could benefit from a more thorough evaluation and/or grid search as the existing parameters are guesstimates inspired by various papers (Bowman et al. 2015, Bowman et al. 2016, Liu et al. 2016). Only when the GRUs and LSTMs feature recurrent dropout (dropout_U) do they consistently beat the summation of word embeddings. Further work should be done exploring the hyperparameters of the GRU and LSTM.

More Repositories

1

sha-rnn

Single Headed Attention RNN - "Stop thinking with your head"
Python
1,166
star
2

trending_arxiv

Track trending arXiv papers on Twitter from within your circle
HTML
169
star
3

bitflipped

Your computer is a cosmic ray detector. Literally.
C
58
star
4

cc-warc-examples

CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop
Java
54
star
5

tf-ham

A partial TensorFlow implementation of "Learning Efficient Algorithms with Hierarchical Attentive Memory"
Python
52
star
6

right_whale_hunt

Annotated faces for NOAA Right Whale Recognition Kaggle competition
Python
35
star
7

keras_qa

Keras solution to the bAbI tasks using recurrent neural networks - merged as an example into Keras mainline
Python
34
star
8

search_iclr_2019

HTML
32
star
9

bifurcate-rs

Zero dependency images (of chaos) in Rust
Rust
32
star
10

govarint

A variable length integer compression library for Golang
Go
24
star
11

montelight-cpp

Faster raytracing through importance sampling, rejection sampling, and variance reduction
C++
21
star
12

texting_robots

Texting Robots: A Rust native `robots.txt` parser with thorough unit testing
Rust
20
star
13

Snippets

Useful code snippets that I'd rather not lose
Python
19
star
14

cs205_ga

How deep does Google Analytics go? Efficiently tackling Common Crawl using AWS & MapReduce
Python
17
star
15

gzipstream

gzipstream allows Python to process multi-part gzip files from a streaming source
Python
17
star
16

pubcrawl

*Deprecated* A short and sweet Python web crawler using Redis as the process queue, seen set and Memcache style rate limiter for robots.txt
Python
16
star
17

cc-mrjob

Demonstration of using Python to process the Common Crawl dataset with the mrjob framework
Python
8
star
18

smerity_flask

Smerity.com website generated using (naive) custom Python code, Flask & Frozen-Flask
Less
7
star
19

yolo-cpp

YOLO C++: A crash course for those needing to learn street fighting C++
C++
6
star
20

cc-quick-scripts

Useful scripts for attacking the CommonCrawl dataset and WARC/WET/WAT files
Python
6
star
21

gopagerank

PageRank implemented in Go for large graphs (billions of edges)
Go
5
star
22

Hip-Flask

*Deprecated*
JavaScript
4
star
23

glove-guante

Exploration of Global Vectors for Word Representation (GloVe)
Go
3
star
24

graphx-prank

GraphX P[age]Rank -- PageRank runner for large graphs
Scala
3
star
25

comp3109_assignment1

Nick and Smerity's assignment
Common Lisp
2
star
26

BoxOfPrimes

Fast pseudo-random prime number generator for n bits using the OpenSSL library
C
2
star
27

grimrepo

Automatically create and set remote private Git repositories at BitBucket
Python
2
star
28

texting_robots_cc_test

Texting Robots: Common Crawl `robots.txt` Test
Rust
2
star
29

tableau

Group 2's Tableau app from NCSS 2014
Python
2
star
30

FacebookFriends

Plugin for Vanilla Forums: Shows the real name of any of your Facebook friends on the Vanilla Forum
PHP
1
star
31

kaggle_connectomics

Connectomics: Predicting the directed connections between 1,000 neurons using neural activity time series data
Python
1
star
32

stat183_madness

March Madness
R
1
star
33

vimfiles

Vim Script
1
star
34

smerity.github.com

HTML
1
star
35

rosettafight

Rosetta Fight: Quick Lookup and Comparison on Rosetta Code Languages
Python
1
star
36

real_world_algorithms

Notes from the Real World Algorithms course at Sydney University
1
star
37

fknn

Fast KNN for large scale multiclass problems
C++
1
star
38

lockoutbot

Example for NCSS - making lockoutbot <3
Python
1
star
39

cs281_edge_estimator

CS281 Final Project: Estimate edge weights given multiple page view samples
Python
1
star
40

gogorobot

Exploratory robots.txt crawler written in Go
Go
1
star