tldr-transformers
The "tl;dr" on a few notable papers on Transformers and modern NLP.
This is a living repo to keep tabs on different research threads.
Last Updated: September 20th, 2021.
Models: GPT- *, * BERT *, Adapter- *, * T5, Megatron, DALL-E, Codex, etc.
Topics: Transformer architectures + training; adversarial attacks; scaling laws; alignment; memorization; few labels; causality.
  Â
Each set of notes includes links to the paper, the original code implementation (if available) and the Huggingface
Here are some examples ---> t5, byt5, deduping transformer training sets.
This repo also includes a table quantifying the differences across transformer papers all in one table.
The transformers papers are presented somewhat chronologically below. Go to the "
Contents
- Quick Note
- Motivation
- Papers::Transformer Papers
- Papers::1 Table To Rule Them All
- Papers::Adversarial Attack Papers
- Papers::Fine-tuning Papers
- Papers::Alignment Papers
- Papers::Causality Papers
- Papers::Scaling Law Papers
- Papers::LM Memorization Papers
- Papers::Limited Label Learning Papers
- How To Contribute
- How To Point Our Errors
- Citation
- License
Quick_Note
This is not an intro to deep learning in NLP. If you are looking for that, I recommend one of the following: Fast AI's course, one of the Coursera courses, or maybe this old thing. Come here after that.
Motivation
With the explosion in papers on all things Transformers the past few years, it seems useful to catalog the salient features/results/insights of each paper in a digestible format. Hence this repo.
Models
BigTable
All of the table summaries found ^ collapsed into one really big table here.
Attac
Paper | Year | Institute | 👉 Notes 👈 | Codes |
---|---|---|---|---|
Gradient-based Adversarial Attacks against Text Transformers | 2021 | Gradient-based attack notes | None |
FineTune
Paper | Year | Institute | Codes | |
---|---|---|---|---|
Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning | 2021 | SCL notes | None |
Alignment
Paper | Year | Institute | Codes | |
---|---|---|---|---|
Fine-Tuning Language Models from Human Preferences | 2019 | OpenAI | Human pref notes | None |
Scaling
Paper | Year | Institute | Codes | |
---|---|---|---|---|
Scaling Laws for Neural Language Models | 2020 | OpenAI | Scaling laws notes | None |
Memorization
Paper | Year | Institute | Codes | |
---|---|---|---|---|
Extracting Training Data from Large Language Models | 2021 | Google et al. | To-Do | None |
Deduplicating Training Data Makes Language Models Better | 2021 | Google et al. | Dedup notes | None |
FewLabels
Paper | Year | Institute | Codes | |
---|---|---|---|---|
An Empirical Survey of Data Augmentation for Limited Data Learning in NLP | 2021 | GIT/UNC | To-Do | None |
Learning with fewer labeled examples | 2021 | Kevin Murphy & Colin Raffel (Preprint: "Probabilistic Machine Learning", Chapter 19) | Worth a read, won't summarize here. | None |
Contribute
If you are interested in contributing to this repo, feel free to do the following:
- Fork the repo.
- Create a Draft PR with the paper of interest (to prevent "in-flight" issues).
- Use the suggested template to write your "tl;dr". If it's an architecture paper, you may also want to add to the larger table here.
- Submit your PR.
Errata
Undoubtedly there is information that is incorrect here. Please open an Issue and point it out.
Citation
@misc{cliff-notes-transformers,
author = {Thompson, Will},
url = {https://github.com/will-thompson-k/cliff-notes-transformers},
year = {2021}
}
For the notes above, I've linked the original papers.
License
MIT