• Stars
    star
    193
  • Rank 199,943 (Top 4 %)
  • Language
    Python
  • Created over 8 years ago
  • Updated over 8 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python code for training all models in the ICLR paper, "Towards Universal Paraphrastic Sentence Embeddings". These models achieve strong performance on semantic similarity tasks without any training or tuning on the training data for those tasks. They also can produce features that are at least as discriminative as skip-thought vectors for semantic similarity tasks at a minimum. Moreover, this code can achieve state-of-the-art results on entailment and sentiment tasks.

iclr2016

Code to train models from "Towards Universal Paraphrastic Sentence Embeddings".

The code is written in python and requires numpy, scipy, theano and the lasagne library.

To get started, run setup.sh to download initial word embeddings and PPDB training data. There is a demo script that takes the model that you would like to train as a command line argument (check the script to see available choices). Check main/ppdb_train.py and main/train.py for command line options.

The code is separated into 3 parts:

  • similarity: contains code for training models on the SICK similarity and entailment tasks
  • main: contains code for training models on PPDB data as well as various utilities
  • sentiment: contains code for training sentiment models.

If you use our code for your work please cite:

@article{wieting2016iclr, author = {John Wieting and Mohit Bansal and Kevin Gimpel and Karen Livescu}, title = {Towards Universal Paraphrastic Sentence Embeddings}, journal = {CoRR}, volume = {abs/1511.08198}, year = {2015}}

More Repositories

1

charagram

Code to train and use models from "Charagram: Embedding Words and Sentences via Character n-grams".
Python
125
star
2

para-nmt-50m

Pre-trained models and code and data to train and use models from "Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations"
Python
101
star
3

paraphrastic-representations-at-scale

Python
73
star
4

beyond-bleu

Python code for training models in the ACL paper, "Beyond BLEU:Training Neural Machine Translation with Semantic Similarity".
Python
52
star
5

acl2017

Code to train and use models from "Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings"
Python
33
star
6

paragram-word

Python code for training Paragram word embeddings. These achieve human-level performance on some word similiarty tasks including SimLex-999.This code was used to obtain results in the appendix of our 2015 TACL paper "From Paraphrase Database to Compositional Paraphrase Model and Back".
Python
30
star
7

simple-and-effective-paraphrastic-similarity

Python code for training models in the ACL paper, "Simple and Effective Paraphrastic Similarity from Parallel Translations".
Python
22
star
8

bilingual-generative-transformer

Code for "A Bilingual Generative Transformer for Semantic Sentence Embedding" published at EMNLP 2020.
Python
10
star
9

emnlp2017

Code and data to train and use models from "Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext"
Python
4
star
10

tacl2015

Matlab code for training a recursive neural network to learn a paraphrase model from PPBD. Also includes matlab code for training paragram word embeddings. This code was used to obtain results in our 2015 TACL paper "From Paraphrase Database to Compositional Paraphrase Model and Back."
MATLAB
4
star