jwieting/iclr2016

Stars
193
Rank 199,943 (Top 4 %)
Language
Python
Created over 8 years ago
Updated over 8 years ago

jwieting/iclr2016

jwieting

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Python code for training all models in the ICLR paper, "Towards Universal Paraphrastic Sentence Embeddings". These models achieve strong performance on semantic similarity tasks without any training or tuning on the training data for those tasks. They also can produce features that are at least as discriminative as skip-thought vectors for semantic similarity tasks at a minimum. Moreover, this code can achieve state-of-the-art results on entailment and sentiment tasks.

iclr2016

Code to train models from "Towards Universal Paraphrastic Sentence Embeddings".

The code is written in python and requires numpy, scipy, theano and the lasagne library.

To get started, run setup.sh to download initial word embeddings and PPDB training data. There is a demo script that takes the model that you would like to train as a command line argument (check the script to see available choices). Check main/ppdb_train.py and main/train.py for command line options.

The code is separated into 3 parts:

similarity: contains code for training models on the SICK similarity and entailment tasks
main: contains code for training models on PPDB data as well as various utilities
sentiment: contains code for training sentiment models.

If you use our code for your work please cite:

@article{wieting2016iclr, author = {John Wieting and Mohit Bansal and Kevin Gimpel and Karen Livescu}, title = {Towards Universal Paraphrastic Sentence Embeddings}, journal = {CoRR}, volume = {abs/1511.08198}, year = {2015}}

charagram

Code to train and use models from "Charagram: Embedding Words and Sentences via Character n-grams".

para-nmt-50m

Pre-trained models and code and data to train and use models from "Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations"

paraphrastic-representations-at-scale

beyond-bleu

Python code for training models in the ACL paper, "Beyond BLEU:Training Neural Machine Translation with Semantic Similarity".

acl2017

Code to train and use models from "Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings"

paragram-word

Python code for training Paragram word embeddings. These achieve human-level performance on some word similiarty tasks including SimLex-999.This code was used to obtain results in the appendix of our 2015 TACL paper "From Paraphrase Database to Compositional Paraphrase Model and Back".

simple-and-effective-paraphrastic-similarity

Python code for training models in the ACL paper, "Simple and Effective Paraphrastic Similarity from Parallel Translations".

bilingual-generative-transformer

Code for "A Bilingual Generative Transformer for Semantic Sentence Embedding" published at EMNLP 2020.

emnlp2017

Code and data to train and use models from "Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext"

tacl2015

Matlab code for training a recursive neural network to learn a paraphrase model from PPBD. Also includes matlab code for training paragram word embeddings. This code was used to obtain results in our 2015 TACL paper "From Paraphrase Database to Compositional Paraphrase Model and Back."