• Stars
    star
    762
  • Rank 59,625 (Top 2 %)
  • Language
    Python
  • Created about 7 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[DEPRECATED] Repo for exploring multi-task learning approaches to learning sentence representations

GLUE Baselines

This repo contains the code for baselines for the Generalized Language Understanding Evaluation (GLUE) benchmark. See our paper for more details about GLUE or the baselines.

Deprecation Warning

Use this code to reproduce our baselines. If you want code to use as a starting point for new development, though, we strongly recommend using jiant insteadโ€”it's a much more extensive and much better-documented toolkit built around the same goals.

Dependencies

Make sure you have installed the packages listed in environment.yml. When listed, specific particular package versions are required. If you use conda, you can create an environment from this package with the following command:

conda env create -f environment.yml

Note: The version of AllenNLP available on pip may not be compatible with PyTorch 0.4, in which we recommend installing from source.

Downloading GLUE

We provide a convenience python script for downloading all GLUE data and standard splits.

python download_glue_data.py --data_dir glue_data --tasks all

After downloading GLUE, point PATH_PREFIX in src/preprocess.py to the directory containing the data.

If you are blocked from s3.amazonaws.com (as may be the case in China), downloading MRPC will fail, instead you can run the command below:

git clone https://github.com/wasiahmad/paraphrase_identification.git
python download_glue_data.py --data_dir glue_data --tasks all --path_to_mrpc=paraphrase_identification/dataset/msr-paraphrase-corpus

Running

To run our baselines, use src/main.py. Because preprocessing is expensive (particularly for ELMo) and we often want to run multiple experiments using the same preprocessing, we use an argument --exp_dir for sharing preprocessing between experiments. We use argument --run_dir to save information specific to a particular run, with run_dir usually nested within exp_dir.

python main.py --exp_dir EXP_DIR --run_dir RUN_DIR --train_tasks all --word_embs_file PATH_TO_GLOVE

NB: The version of AllenNLP used has issues with tensorboard. You may need to substitute calls from tensorboard import SummaryWriter to from tensorboardX import SummaryWriter in your AllenNLP source files.

GloVe, CoVe, and ELMo

Many of our models make use of GloVe pretrained word embeddings, in particular the 300-dimensional, 840B version. To use GloVe vectors, download and extract the relevant files and set word_embs_file to the GloVe file. To learn embeddings from scratch, set --glove to 0.

We use the CoVe implementation provided here. To use CoVe, clone the repo and fill in PATH_TO_COVE in src/models.py and set --cove to 1.

We use the ELMo implementation provided by AllenNLP. To use ELMo, set --elmo to 1. To use ELMo without GloVe, additionally set --elmo_no_glove to 1.

Reference

If you use this code or GLUE, please consider citing us.

 @unpublished{wang2018glue
     title={{GLUE}: A Multi-Task Benchmark and Analysis Platform for
             Natural Language Understanding}
     author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill,
             Felix and Levy, Omer and Bowman, Samuel R.}
     note={arXiv preprint 1804.07461}
     year={2018}
 }

Feel free to contact alexwang at nyu.edu with any questions or comments.

More Repositories

1

jiant

jiant is an nlp toolkit
Python
1,639
star
2

multiNLI

Python
209
star
3

quality

Python
119
star
4

crows-pairs

This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models" (EMNLP 2020).
HTML
99
star
5

BBQ

Repository for the Bias Benchmark for QA dataset.
Python
85
star
6

DS-GA-1011-Fall2017

DS-GA-1011 Natural Language Processing with Representation Learning
Jupyter Notebook
81
star
7

ILF-for-code-generation

Python
68
star
8

CoLA-baselines

Baselines and corpus accompanying paper Neural Network Acceptability Judgments
Python
55
star
9

PRPN-Analysis

This repo contains the analysis results reported in the paper "Grammar Induction with Neural Language Models: An Unusual Replication"
Python
47
star
10

SQuALITY

Query-focused summarization data
Python
40
star
11

jiant-v1-legacy

The jiant toolkit for general-purpose text understanding models
Jupyter Notebook
21
star
12

pretraining-learning-curves

The repository for the paper "When Do You Need Billions of Words of Pretraining Data?"
20
star
13

msgs

This is a repository for the paper on testing inductive bias with scaled-down RoBERTa models.
Python
19
star
14

nlu-test-sets

Analysis of NLU test sets with IRT
Jupyter Notebook
10
star
15

CoLA

Demo for Grammaticality Judgement (Acceptability) task
JavaScript
7
star
16

nope

Data and code for "NOPE: A Corpus of Naturally-Occurring Presuppositions in English."
TeX
7
star
17

semi-automatic-nli

This is a repository for data and code accompanying paper "Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options" (AACL 2020)
Python
6
star
18

online-code-for-edge-probing

Jupyter Notebook
5
star
19

wsc-formalizations

Jupyter Notebook
4
star
20

crowdsourcing-protocol-comparison

HTML
3
star
21

CNLI-generalization

Python
2
star
22

GLUE-human-performance

HTML
1
star
23

nyu-ai-school-2023

HTML
1
star