timoschick/fewglue

Stars
158
Rank 237,131 (Top 5 %)
Language
Created about 4 years ago
Updated about 4 years ago

timoschick/fewglue

timoschick

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

This repository contains the FewGLUE dataset for few-shot natural language understanding.

FewGLUE

This repository contains the FewGLUE dataset, consisting of a random selection of 32 training examples from the SuperGLUE training sets and up to 20,000 unlabeled examples for each SuperGLUE task.

🗂️ Structure

For each task t in SuperGLUE, the directory FewGLUE/t contains two files: train.jsonl, which contains the 32 training examples, and unlabeled.jsonl, which contains all unlabeled examples. The official development and test sets are not included as they can be found here.

📑 Format

All files follow the exact same format as the original SuperGLUE training files.

📕 Citation

If you make use of FewGLUE, please cite the following paper:

@article{schick2020small,
  title={It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners},
  author={Timo Schick and Hinrich Schütze},
  journal={Computing Research Repository},
  volume={arXiv:2009.07118},
  url={http://arxiv.org/abs/2009.07118},
  year={2020}
}

pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

dino

This repository contains the code for "Generating Datasets with Pretrained Language Models".

self-debiasing

This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".

bertram

This repository contains the code for "BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Representations".

form-context-model

This repository contains the code for the Form-Context Model and its Attentive Mimicking variant.

am-for-bert

This repository contains the WordNet Language Model Probing (WNLaMPro) dataset introduced in "Rare Words: A Major Problem for Contextualized Embeddings and How to Fix it by Attentive Mimicking".

one-token-approximation

This repository contains the code for applying One-Token Approximation to a pretrained language model using subword-level tokenization.

amr-gen

This is a Java-based implementation of the AMR-to-text generator introduced in "Transition-Based Generation from Abstract Meaning Representations".