• Stars
    star
    813
  • Rank 54,714 (Top 2 %)
  • Language
    Jupyter Notebook
  • License
    Other
  • Created over 1 year ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Go from no deep learning knowledge to implementing GPT.

Zero to GPT

This course will take you from no knowledge of deep learning to training your own GPT model. As AI moves out of the research lab, the world needs more people who can understand and apply it. If you want to be one of them, this course is for you.

This course balances theory and application. We’ll solve real problems, like predicting the weather and translating languages. As we do so, we'll extensively cover theoretical building blocks like gradient descent and backpropagation. This will prepare you to successfully train and use models in the real world.

We’ll start with the fundamentals - neural network architectures and training methods. Later in the course, we'll move to complex topics like transformers, GPU programming, and distributed training.

You'll need to understand Python to take this course, including for loops, functions, and classes. The first part of this Dataquest path will teach you what you need.

To use this course, go through each chapter sequentially. Read the lessons or watch the optional videos - they have the same information. Look through the implementations to solidify your understanding, and recreate them on your own.

Course Outline

0. Introduction

An overview of the course and topics we'll cover.

1. Math and NumPy fundamentals

This is an optional lesson with a basic refresher on linear algebra and calculus for deep learning. We'll use NumPy to apply the concepts. If you're already familiar with these topics, you can skip this lesson.

2. Gradient descent

Gradient descent is how neural networks train their parameters to match the data. It's the "learning" part of deep learning.

3. Dense networks

Dense networks are the basic form of a neural network, where every input is connected to an output. These can also be called fully connected networks.

4. Classification with neural networks

Classification is how we get neural networks to categorize data for us. Classification is used by language models like GPT to predict the next word in a sequence.

5. Recurrent networks

Recurrent neural networks (RNNs) are optimized to process sequences of data. They're used for tasks like translation and text classification.

6. Backpropagation in depth

So far, we've taken a loose look at backpropagation to let us focus on understanding neural network architecture. We'll build a miniature version of PyTorch, and use it to understand backpropagation better.

7. Optimizers

We've used SGD to update model parameters so far. We'll learn about other optimizers that have better convergence properties.

8. Regularization

Regularization prevents overfitting to the training set. This means that the network can generalize well to new data.

  • Lesson coming soon
  • Video coming soon

9. PyTorch

PyTorch is a framework for deep learning that automatically differentiates functions. It's widely used to create cutting-edge models.

10. Working with Text

GPT models are trained on text. We'll learn how to process text data for use in deep learning.

  • Lesson coming soon
  • Video coming soon

11. Transformers

Transformers fix the problem of vanishing/exploding gradients in RNNs by using attention. Attention allows the network to process the whole sequence at once, instead of iteratively.

12. Cleaning Text Data

If you want to train a deep learning model, you need data. Gigabytes of it. We'll discuss how you can get this data and process it.

  • Lesson coming soon

13. Distributed Training

To train large models, we need to use multiple GPUs.

  • Lesson coming soon

14. GPT-2

We'll train a version of the popular GPT-2 model.

  • Lesson coming soon

15. GPU kernels

PyTorch can automatically use GPUs for training, but not all operators are fused and optimized. For example, flash attention can speed up transformers by 2x or more. We'll use OpenAI Triton to implement GPU kernels.

  • Lesson coming soon
  • Implementation coming soon

16. Efficient Transformers

GPT models take a long time to train. We can reduce that time by using more GPUs, but we don't all have access to GPU clusters. To reduce training time, we'll incorporate some recent advances to make the transformer model more efficient.

17. Training GPT-X

We'll train GPT-X, a version of a GPT model with some optimizations and improvements.

  • Lesson coming soon
  • Implementation coming soon

More Chapters Coming Soon

Optional Chapters

Convolutional networks

Convolutional neural networks are used for working with images and time series.

Gated recurrent networks

Gated recurrent networks help RNNs process long sequences by helping networks forget irrelevant information. LSTM and GRU are two popular types of gated networks.

Encoders and decoders

Encoder/decoders are used for NLP tasks when the output isn't the same length as the input. For example, if you want to use questions/answers as training data, the answers may be a different length than the question.

Installation

If you want to run these notebooks locally, you'll need to install some Python packages.

  • Make sure you have Python 3.8 or higher installed.
  • Clone this repository.
  • Run pip install -r requirements.txt

License

You can use and adapt this material for your own courses, but not commercially. You must provide attribution to Vik Paruchuri, Dataquest if you use this material.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

More Repositories

1

marker

Convert PDF to markdown quickly with high accuracy
Python
9,051
star
2

surya

OCR, layout analysis, reading order, line detection in 90+ languages
Python
7,035
star
3

apartment-finder

A Slack bot that helps you find an apartment.
Python
1,058
star
4

texify

Math OCR model that outputs LaTeX and markdown
Python
523
star
5

textbook_quality

Generate textbook-quality synthetic LLM pretraining data
Python
449
star
6

libgen_to_txt

Convert all of libgen to high quality markdown
Python
223
star
7

pdftext

Extract structured text from pdfs quickly
Python
204
star
8

scribe

Simple speech recognition using your microphone.
Python
122
star
9

researcher

Concise answers to search queries using Google and GPT-3. Includes citations.
Python
70
star
10

scan

Score essays automatically with an easy web interface.
Python
40
star
11

evolve-music2

Evolve music automatically with python -- rewrite of evolve-music.
Python
40
star
12

classified

Score LLM pretraining data with classifiers
Python
35
star
13

evolve-music

Superseded by github.com/vikparuchuri/evolve-music2 -- use that instead.
C
25
star
14

simpsons-scripts

Find out how much the simpsons characters like each other with text and audio analysis.
Python
23
star
15

movide

The student-centric learning platform.
Python
18
star
16

snapcheck

Find out if your info was leaked.
Python
15
star
17

political-positions

Analyze politics.
Python
14
star
18

vikparuchuri.com

Code for vikparuchuri.com -- personal blog.
Ruby
13
star
19

boston-python-ml

Text scoring/classification presentation
JavaScript
9
star
20

percept

A modular machine learning framework that is easy to test and deploy.
Python
9
star
21

wp-deployment

Deploy wordpress with multisite to ec2 with ansible.
Python
7
star
22

spotify-export

Export albums from Spotify into Google Play Music.
Python
7
star
23

algorithms

Pure python implementations of various algorithms, including a matrix class.
Python
6
star
24

vikparuchuri-affirm

CSS
5
star
25

ds-webinar

How to learn data science webinar presentation
CSS
5
star
26

nyt-articles

Get articles from new york times API.
Python
5
star
27

triton_tutorial

Tutorials for Triton, a language for writing gpu kernels
Jupyter Notebook
4
star
28

pdf_to_md

Python
4
star
29

ml-math

Svelte
3
star
30

TulaLensSurvey

Android app that makes it easy to survey people.
Java
3
star
31

medicare-analysis

Analyze medicare data from the recent release.
CSS
3
star
32

sports-stats

Try to rethink sports statistics.
Python
3
star
33

bostonpython2015

Presentation for boston python 2015
CSS
2
star
34

dscontent-starter

2
star
35

Presentations

JavaScript
1
star
36

vik-blog

HTML
1
star
37

tulalens-survey-web

Web component of android survey app.
Ruby
1
star
38

nextml-talk

CSS
1
star
39

vj-wedding2

A site I made for a wedding.
JavaScript
1
star
40

matter

Chrome extension that highlights important passages.
JavaScript
1
star
41

vj-wedding

Placeholder site for a wedding (with countdown)
JavaScript
1
star
42

affirm-themes

Themes for affirm.io.
CSS
1
star
43

openphi

1
star