• Stars
    star
    307
  • Rank 136,109 (Top 3 %)
  • Language
  • Created over 2 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Seminar on Large Language Models (COMP790-101 at UNC Chapel Hill, Fall 2022)

COMP790-101: Large Language Models

Instructor: Colin Raffel

Meeting time: Mondays and Wednesdays, 1:25-2:40pm

Classroom: SN 011

Office hours: By appointment

Language models, which are trained to predict text given other text, underly many recent successes in natural language processing and artificial intelligence. Whether used for transfer learning (using language modeling as a pre-training objective before subsequent fine-tuning on a downstream task) or prompting (formulating an input sequence that induces a model to perform a desired task without any training), language modeling has proven to be an effective way of imbuing models with useful capabailities. These capabilities have been observed to consistently improve as the size of the language model increases, which has led to a focus on developing ever-larger language models. In this course, we will survey the history of language model scaling, as well as recent advances in building, analyzing, and using large LMs. The course will use a role-playing seminar format, described in more detail below.

Prerequisites

Students must have experience with machine learning (preferably deep learning) and the basics of modern natural language processing. Before taking the class, you should be able to read a recent machine learning or natural language processing conference paper and come away with a decent understanding of the basic concepts and ideas proposed in the paper (but not necessarily a deep, perfect understanding of every last detail).

Course Structure

This class will use a role-playing seminar format where students take on different roles and present papers to one another. All grading will be based on these presentations and course participation; there will be final project or other coursework.

Readings

Each class will involve the presentation and discussion of two papers. The pair of papers covered in each class session are meant to complement each other, e.g. because one paper might be the historical precedent of the other, or the papers were contempraneous, or they present different viewpoints on the same topic. Before each class, everyone is required to have read both papers. Students will be divided into four groups. Two groups will present on Mondays and the other two groups will present on Wednesdays. In a given class session, students in the presenting groups will each be given a rotating role (described below). This role defines the lens through which they read the paper and determines what they prepare for the in-class discussion. Students in the non-presenting groups are also required to read the papers, complete a quick exercise (described below), and come to class ready to discuss. All students will obtain a thorough understanding of the chosen papers and will develop their paper reading, literature review, and prototyping skills.

Presentation roles

This seminar is organized around the different "roles" students play each week: Reviewer, Archaeologist, Researcher, Hacker, Diagrammer, and (possibly) Blogger.

  • Reviewer: Complete a full critical, but not necessarily negative, review of the paper. Follow the guidelines for NeurIPS reviewers (under "Review Form"). Please complete the "Strengths and Weaknesses" and "Questions" sections and assign an overall score; you can skip the rest of the review (including writing a summary since all students should have read the paper).
  • Archaeologist: Determine where this paper sits in the context of previous and subsequent work. Find and report on one prior paper that we are not reading in this class that substantially influenced the current paper or one newer paper that we are not reading in this class that was heavily influenced by the current paper.
  • Hacker: Implement a small part of the paper on a small dataset or toy problem. Prepare to share the core code of the algorithm to the class. Do not simply download and run an existing implementation - you should implement at least a (toy version of a) method from the paper, though you are welcome to use (and give credit to) an existing implementation for "backbone" code (e.g. model building, data loading, training loop, etc.).
  • Diagrammer: Create a diagram of one of the concepts or ideas from the paper or re-make one of the plots in the paper to make it clearer. Please pick something that hasn't already been diagrammed from a previous paper.
  • Blogger:* Write a paragraph each about the two papers and an additional paragraph comparing and contrasting them. The summary of each paper should cover the motivation behind the paper, a description of any of the proposed methods, and an overview of the key findings. You should write a bit about how they are different and/or build on one another. The blogger will not present during the class session.

Non-presenter assignment

If you aren't in the presenting group during a given class period, please submit before class:

  1. A new title for either one of the papers and/or a new name for an algorithm proposed in either paper
  2. At least one question about either paper - could be something you're confused about or something you'd like to hear discussed more.

Schedule

The schedule below includes a preliminary list of the papers we will be reading. These papers are subject to change, though I will try to make changes only to papers that are at least two weeks away. If you have any suggested changes, feel free to tell me.

Date Group A Paper Group B Paper
Mon, 8/15 Class introduction, background, logistics N/A
Wed, 8/17 Attention Is All You Need (Colin presents) N/A
Mon, 8/22 A Neural Probabilistic Language Model The Unreasonable Effectiveness of Recurrent Neural Networks
Wed, 8/24 Generating Sequences With Recurrent Neural Networks Exploring the Limits of Language Modeling
Mon, 8/29 Semi-supervised Sequence Learning Learning to Generate Reviews and Discovering Sentiment
Wed, 8/31 Universal Language Model Fine-tuning for Text Classification Improving Language Understanding by Generative Pre-Training
Mon, 9/5 No class (Labor Day) N/A
Wed, 9/7 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding RoBERTa: A Robustly Optimized BERT Pretraining Approach
Mon, 9/12 ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Wed, 9/14 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Unified Language Model Pre-training for Natural Language Understanding and Generation
Mon, 9/19 Cross-lingual Language Model Pretraining ByT5: Towards a token-free future with pre-trained byte-to-byte models
Wed, 9/21 Language Models are Unsupervised Multitask Learners Language Models are Few-Shot Learners
Mon, 9/26 No class (well-being day) N/A
Wed, 9/28 Scaling Language Models: Methods, Analysis & Insights from Training Gopher PaLM: Scaling Language Modeling with Pathways
Mon, 10/3 What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Wed, 10/5 Scaling Laws for Neural Language Models Training Compute-Optimal Large Language Models
Mon, 10/10 On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 Release Strategies and the Social Impacts of Language Models
Wed, 10/12 RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models TruthfulQA: Measuring How Models Mimic Human Falsehoods
Mon, 10/17 The Pile: An 800GB Dataset of Diverse Text for Language Modeling Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
Wed, 10/19 Extracting Training Data from Large Language Models Deduplicating Training Data Makes Language Models Better
Mon, 10/24 Language Models as Knowledge Bases? REALM: Retrieval-Augmented Language Model Pre-Training
Wed, 10/26 Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference The Power of Scale for Parameter-Efficient Prompt Tuning
Mon, 10/31 Multitask Prompted Training Enables Zero-Shot Task Generalization Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
Wed, 11/2 Exploring and Predicting Transferability across NLP Tasks Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
Mon, 11/7 Training language models to follow instructions with human feedback Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Wed, 11/9 How Many Data Points is a Prompt Worth? Do Prompt-Based Models Really Understand the Meaning of their Prompts?
Mon, 11/14 Calibrate Before Use: Improving Few-Shot Performance of Language Models Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Wed, 11/16 Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Mon, 11/21 Learning Transferable Visual Models From Natural Language Supervision 🦩 Flamingo: a Visual Language Model for Few-Shot Learning
Wed, 11/23 No class (Thanksgiving break) N/A
Mon, 11/28 Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM DeepSpeed: Extreme-scale model training for everyone
Wed, 11/30 Holistic Evaluation of Language Models Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Sat, 12/3 GPT-NeoX-20B: An Open-Source Autoregressive Language Model BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Grading

  1. Presentations (84 points): For each class session where you are presenting, you will be graded out of 6 points. You will receive full credit if you do a thorough job of undertaking your role and present it in a clear and compelling way.
  2. Participation (14 points): For each class session where you aren't presenting, you'll be given up 1 point for completing the non-presenter assignment and attending and participating in class.
  3. Free points (2 points): All students get 2 free points!

Attendance, late work, and the honor code

If you miss a class without completing the corresponding assignment, you'll get a zero for that session. If you miss a class where you are in a "presenting" role for that session, you must still create the presentation for that role before the class and you must find someone else to present it for you. If you miss a class where you'd be in a "non-presenting" role, to get credit for that session you need to complete the non-presenting assignment and send it to me before the start of class. There's really no way to accept late work for the readings since it's vital that we're all reading the same papers at the same time.

All students are expected to follow the guidelines of the UNC honor code. In the context of this class, it is particularly important that you cite the source of different ideas, facts, or methods and do not claim someone else's work as your own. If you are unsure about which actions violate that honor code, or consult studentconduct.unc.edu.

Conduct

I ask that we all follow the NeurIPS Code of Conduct and the Recurse Center Social Rules. Since this is a discussion class, it's especially important that we respect everyone's perspective and input. In particular, I value the perspectives of individuals from all backgrounds reflecting the diversity of our students. I broadly define diversity to include race, gender identity, national origin, ethnicity, religion, social class, age, sexual orientation, political background, and physical and learning ability. I will strive to make this classroom an inclusive space for all students. Please let me know if there is anything I can do to improve.

Acts of discrimination, harassment, interpersonal (relationship) violence, sexual violence, sexual exploitation, stalking, and related retaliation are prohibited at UNC-Chapel Hill. If you have experienced these types of conduct, you are encouraged to report the incident and seek resources on campus or in the community. Please contact the Director of Title IX Compliance/Title IX Coordinator (Adrienne Allison, [email protected]), Report and Response Coordinators (Ew Quimbaya-Winship, [email protected]; Rebecca Gibson, [email protected]; Kathryn Winn [email protected]), Counseling and Psychological Services (CAPs) (confidential) in Campus Health Services at (919) 966-3658, or the Gender Violence Services Coordinators (confidential) (Cassidy Johnson, [email protected]; Holly Lovern, [email protected]) to discuss your specific needs. Additional resources are available at http://safe.unc.edu.

Resources

The University of North Carolina at Chapel Hill facilitates the implementation of reasonable accommodations, including resources and services, for students with disabilities, chronic medical conditions, a temporary disability or pregnancy complications resulting in barriers to fully accessing University courses, programs and activities.

Accommodations are determined through the Office of Accessibility Resources and Service (ARS) for individuals with documented qualifying disabilities in accordance with applicable state and federal laws. See the ARS Website for contact information: https://ars.unc.edu or email [email protected].

UNC-Chapel Hill is strongly committed to addressing the mental health needs of a diverse student body. The Heels Care Network website (https://care.unc.edu) is a place to access the many mental resources at Carolina. CAPS is the primary mental health provider for students, offering timely access to consultation and connection to clinically appropriate services. Go to their website https://caps.unc.edu/ or visit their facilities on the third floor of the Campus Health building for an initial evaluation to learn more.

Any student who is impacted by discrimination, harassment, interpersonal (relationship) violence, sexual violence, sexual exploitation, or stalking is encouraged to seek resources on campus or in the community. Reports can be made online to the EOC at https://eoc.unc.edu/report-an-incident/. Please contact the University’s Title IX Coordinator (Elizabeth Hall, interim – [email protected]), Report and Response Coordinators in the Equal Opportunity and Compliance Office ([email protected]), Counseling and Psychological Services (confidential), or the Gender Violence Services Coordinators ([email protected]; confidential) to discuss your specific needs. Additional resources are available at safe.unc.edu.

Changes

The professor reserves the right to make changes to the syllabus including project due dates and test dates. These changes will be announced as early as possible.

More Repositories

1

pretty-midi

Utility functions for handling MIDI data in a nice/intuitive way.
Jupyter Notebook
849
star
2

mir_eval

Evaluation functions for music/audio information retrieval/signal processing algorithms.
Python
596
star
3

theano-tutorial

A collection of tutorials on neural networks, using Theano
Jupyter Notebook
222
star
4

midi-dataset

Code for creating a dataset of MIDI ground truth
Jupyter Notebook
160
star
5

mad

Code for "Online and Linear Time Attention by Enforcing Monotonic Alignments"
Jupyter Notebook
91
star
6

Lasagne-tutorial

Adding an ipynb tutorial to Lasagne
Python
53
star
7

mocha

Example implementation of Monotonic Chunkwise Attention.
Jupyter Notebook
49
star
8

ff-attention

Experiments using feedforward networks with attention
Python
47
star
9

jax-tutorial

A tutorial on JAX (https://github.com/google/jax/)
Jupyter Notebook
45
star
10

simple_spearmint

Spearmint, without the gum
Python
42
star
11

lstm_benchmarks

Benchmarking different LSTM libraries
Python
24
star
12

comp790-deep-learning-spring-2021

Course repository for the Spring COMP790 course "Deep Learning" at UNC
23
star
13

remixavier

Given a mixed song, remove components that you have
MATLAB
19
star
14

comp790-deep-learning-spring-2022

Course repository for the Spring 2022 COMP790 course "Deep Learning" at UNC
18
star
15

midi-ground-truth

Code for "Extracting Ground Truth Information from MIDI Files: A MIDIfesto"
Jupyter Notebook
18
star
16

craffel.github.io

Code for generating colinraffel.com and my CV
HTML
16
star
17

alignment-search

Parameter search for MIDI alignment
Python
15
star
18

csc2516-deep-learning-fall-2023

Course repository for the fall 2023 session of CSC2516 "Neural Networks and Deep Learning" at U of T
15
star
19

comp664-deep-learning-spring-2023

Course repository for the Spring 2023 COMP664 course "Deep Learning" at UNC
14
star
20

median-filter

A fast 1d median filter, for filtering the rows and columns of a matrix.
C
11
star
21

comp790-information-theory-fall-2021

Course repository for the Fall 2021 COMP790 course "Information Theory" at UNC
10
star
22

crucialpython

Code from the weekly Crucial Python jaminars. http://labrosa.ee.columbia.edu/crucialpython/
Python
10
star
23

align_midi

Code for MIDI-audio alignment
Python
9
star
24

djitw

Python just-in-time compiled DTW library
Python
8
star
25

thesis

My PhD thesis
TeX
8
star
26

pse

Code for embedding pairs of sequences in a fixed-dimensional space
Python
7
star
27

mir_evaluators

Evaluator scripts for mir_eval.
Python
7
star
28

lstm_problems

Suite of toy problems which can test whether a model can learn long-term dependencies.
Python
7
star
29

spikegram-coding

Code for computing a spikegram of an audio signal using matching pursuit
Python
6
star
30

dhs

Learning to convert sequences of feature vectors to downsampled sequences of hashes
Python
4
star
31

midi-dataset-ismir

Source for the ISMIR 2015 paper "Large-scale content-based matching of MIDI and audio files"
TeX
4
star
32

music-evolution

Code for reproducing the results in "Measuring the Evolution of Contemporary Western Popular Music", Serrà et al.
Python
3
star
33

subsampling_in_expectation

Example code for "Training a Subsampling Mechanism in Expectation"
Jupyter Notebook
3
star
34

alignment-search-icassp2016

ICASSP 2016 paper, "Optimizing DTW-Based Audio-to-MIDI Alignment and Matching"
TeX
2
star
35

performerSynchronization

Code for measuring the synchronization of musicians, with many onset detection methods implemented.
Python
2
star
36

pruning_icassp2016

LaTeX source for ICASSP 2016 paper, "Pruning Subsequence Search with Attention-Based Embedding"
TeX
2
star
37

ismir2016extracting

LaTeX source for ISMIR 2016 paper "Extracting Ground-Truth Information from MIDI Files: A MIDIfesto"
TeX
1
star
38

dotfiles

cd && git clone --bare https://github.com/craffel/dotfiles.git .dotfiles && git --git-dir=.dotfiles --work-tree=. checkout
Shell
1
star
39

live-sets

Ableton Live set backup
1
star
40

rnntools

RNN layers for use with nntools
1
star
41

feature-inversion

Python
1
star
42

Snowball

"Download for a favor" content distribution web app
1
star
43

lattice-harp

Code for the Lattice Harp hybrid instrument/controller
Pure Data
1
star