• Stars
    star
    1,639
  • Rank 28,521 (Top 0.6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

jiant is an nlp toolkit

🚨Update🚨: As of 2021/10/17, the jiant project is no longer being actively maintained. This means there will be no plans to add new models, tasks, or features, or update support to new libraries.

jiant is an NLP toolkit

The multitask and transfer learning toolkit for natural language processing research

Generic badge codecov CircleCI Code style: black License: MIT

Why should I use jiant?

A few additional things you might want to know about jiant:

  • jiant is configuration file driven
  • jiant is built with PyTorch
  • jiant integrates with datasets to manage task data
  • jiant integrates with transformers to manage models and tokenizers.

Getting Started

Installation

To import jiant from source (recommended for researchers):

git clone https://github.com/nyu-mll/jiant.git
cd jiant
pip install -r requirements.txt

# Add the following to your .bash_rc or .bash_profile 
export PYTHONPATH=/path/to/jiant:$PYTHONPATH

If you plan to contribute to jiant, install additional dependencies with pip install -r requirements-dev.txt.

To install jiant from source (alternative for researchers):

git clone https://github.com/nyu-mll/jiant.git
cd jiant
pip install . -e

To install jiant from pip (recommended if you just want to train/use a model):

pip install jiant

We recommended that you install jiant in a virtual environment or a conda environment.

To check jiant was correctly installed, run a simple example.

Quick Introduction

The following example fine-tunes a RoBERTa model on the MRPC dataset.

Python version:

from jiant.proj.simple import runscript as run
import jiant.scripts.download_data.runscript as downloader

EXP_DIR = "/path/to/exp"

# Download the Data
downloader.download_data(["mrpc"], f"{EXP_DIR}/tasks")

# Set up the arguments for the Simple API
args = run.RunConfiguration(
   run_name="simple",
   exp_dir=EXP_DIR,
   data_dir=f"{EXP_DIR}/tasks",
   hf_pretrained_model_name_or_path="roberta-base",
   tasks="mrpc",
   train_batch_size=16,
   num_train_epochs=3
)

# Run!
run.run_simple(args)

Bash version:

EXP_DIR=/path/to/exp

python jiant/scripts/download_data/runscript.py \
    download \
    --tasks mrpc \
    --output_path ${EXP_DIR}/tasks
python jiant/proj/simple/runscript.py \
    run \
    --run_name simple \
    --exp_dir ${EXP_DIR}/ \
    --data_dir ${EXP_DIR}/tasks \
    --hf_pretrained_model_name_or_path roberta-base \
    --tasks mrpc \
    --train_batch_size 16 \
    --num_train_epochs 3

Examples of more complex training workflows are found here.

Contributing

The jiant project's contributing guidelines can be found here.

Looking for jiant v1.3.2?

jiant v1.3.2 has been moved to jiant-v1-legacy to support ongoing research with the library. jiant v2.x.x is more modular and scalable than jiant v1.3.2 and has been designed to reflect the needs of the current NLP research community. We strongly recommended any new projects use jiant v2.x.x.

jiant 1.x has been used in in several papers. For instructions on how to reproduce papers by jiant authors that refer readers to this site for documentation (including Tenney et al., Wang et al., Bowman et al., Kim et al., Warstadt et al.), refer to the jiant-v1-legacy README.

Citation

If you use jiant β‰₯ v2.0.0 in academic work, please cite it directly:

@misc{phang2020jiant,
    author = {Jason Phang and Phil Yeres and Jesse Swanson and Haokun Liu and Ian F. Tenney and Phu Mon Htut and Clara Vania and Alex Wang and Samuel R. Bowman},
    title = {\texttt{jiant} 2.0: A software toolkit for research on general-purpose text understanding models},
    howpublished = {\url{http://jiant.info/}},
    year = {2020}
}

If you use jiant ≀ v1.3.2 in academic work, please use the citation found here.

Acknowledgments

  • This work was made possible in part by a donation to NYU from Eric and Wendy Schmidt made by recommendation of the Schmidt Futures program, and by support from Intuit Inc.
  • We gratefully acknowledge the support of NVIDIA Corporation with the donation of a Titan V GPU used at NYU in this work.
  • Developer Jesse Swanson is supported by the Moore-Sloan Data Science Environment as part of the NYU Data Science Services initiative.

License

jiant is released under the MIT License.

More Repositories

1

GLUE-baselines

[DEPRECATED] Repo for exploring multi-task learning approaches to learning sentence representations
Python
762
star
2

multiNLI

Python
209
star
3

quality

Python
119
star
4

crows-pairs

This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models" (EMNLP 2020).
HTML
99
star
5

BBQ

Repository for the Bias Benchmark for QA dataset.
Python
85
star
6

DS-GA-1011-Fall2017

DS-GA-1011 Natural Language Processing with Representation Learning
Jupyter Notebook
81
star
7

ILF-for-code-generation

Python
68
star
8

CoLA-baselines

Baselines and corpus accompanying paper Neural Network Acceptability Judgments
Python
55
star
9

PRPN-Analysis

This repo contains the analysis results reported in the paper "Grammar Induction with Neural Language Models: An Unusual Replication"
Python
47
star
10

SQuALITY

Query-focused summarization data
Python
40
star
11

jiant-v1-legacy

The jiant toolkit for general-purpose text understanding models
Jupyter Notebook
21
star
12

pretraining-learning-curves

The repository for the paper "When Do You Need Billions of Words of Pretraining Data?"
20
star
13

msgs

This is a repository for the paper on testing inductive bias with scaled-down RoBERTa models.
Python
19
star
14

nlu-test-sets

Analysis of NLU test sets with IRT
Jupyter Notebook
10
star
15

CoLA

Demo for Grammaticality Judgement (Acceptability) task
JavaScript
7
star
16

nope

Data and code for "NOPE: A Corpus of Naturally-Occurring Presuppositions in English."
TeX
7
star
17

semi-automatic-nli

This is a repository for data and code accompanying paper "Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options" (AACL 2020)
Python
6
star
18

online-code-for-edge-probing

Jupyter Notebook
5
star
19

wsc-formalizations

Jupyter Notebook
4
star
20

crowdsourcing-protocol-comparison

HTML
3
star
21

CNLI-generalization

Python
2
star
22

GLUE-human-performance

HTML
1
star
23

nyu-ai-school-2023

HTML
1
star