srush/LLM-Training-Puzzles

Stars
797
Rank 57,151 (Top 2 %)
Language
Jupyter Notebook
License
MIT License
Created over 1 year ago
Updated 10 months ago

srush/LLM-Training-Puzzles

srush

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

What would you do with 1000 H100s...

LLM Training Puzzles

by Sasha Rush - srush_nlp

This is a collection of 8 challenging puzzles about training large language models (or really any NN) on many, many GPUs. Very few people actually get a chance to train on thousands of computers, but it is an interesting challenge and one that is critically important for modern AI. The goal of these puzzles is to get hands-on experience with the key primitives and to understand the goals of memory efficiency and compute pipelining.

I recommend running in Colab. Click here and copy the notebook to get start.

If you are into this kind of thing, this is 6th in a series of these puzzles.

GPU-Puzzles

Solve puzzles. Learn CUDA.

Jupyter Notebook

Tensor-Puzzles

Solve puzzles. Improve your pytorch.

Jupyter Notebook

MiniChain

A tiny library for coding with large language models.

llama2.rs

A fast llama2 decoder in pure Rust.

Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook

annotated-s4

Implementation of https://srush.github.io/annotated-s4

annotated-mamba

Annotated version of the Mamba paper

Jupyter Notebook

Autodiff-Puzzles

Jupyter Notebook

Transformer-Puzzles

Puzzles for exploring transformers

Jupyter Notebook

streambook

Live Python Notebooks with any Editor

Jupyter Notebook

raspy

An interactive exploration of Transformer programming.

Jupyter Notebook

do-we-need-attention

parallax

awesome-o1

GPTWorld

A puzzle to learn about prompting

Jupyter Notebook

awesome-ml-tracking

triton-autodiff

Experiment of using Tangent to autodiff triton

torch-queue

LLM-Talk

torch-golf

Silly twitter torch implementations.

PyDecode

A dynamic programming toolkit.

VirtualTeaching

DIY setup for virtual teaching on ubuntu

mamba-primer

learns-dex

text2table

jax-lda

ProbTalk

Hierarchical-Bayes-Compiler

Hal Daume's hbc

g9py

drop7

Jupyter Notebook

Tensor-Puzzles-Penzai

mamba-scans

anynp

Proof-of-concept of global switching between numpy/jax/pytorch in a library.

transformers-bet

relax-decode

aima-arguments

torch-mechanics

Amateur experiments with autodiff mechanics simulators

minitorch-rust

cs5781

Machine Learning Engineering

postgres-provanence

PowerEdit

A super-minimal Python-based video editor ⚡

SemiRings

Holder for a bunch of semirings used in ChartParsing

DiffRast

MRF-LM

TextBook

Command-line Facebook

hsNLP-

Combined repo for nlp libs

provenance

icfp2009

when I was 4 years old I was maimed by a giant pig

configure

some configuration file

clustering

annotated-transformer.github.io

Annotated Transformer Blog Post

BT-AI

Jupyter Notebook

srush-blog

Eisner-Parser

An implementation of the Eisner Parser (described in "Bilexical Grammars and a Cubic-time parsing algorithm" ) in Haskell

FSM

Finite State Machine lib for haskell

hplay

opennmt-gen

PhraseDep

triton

tf-fork

srush-wiki

icfp2003

icfp2008

hypergraph

Hypergraph specification

learns-triton

Training

bipartite-sampler

Implementation of Huber-Law rejection sampling for bipartite graphs

ezTVM

test_grade

Chart-Parsing-

haskell library for basic chart parsers

blog-twitter

sigmoidfit

Jupyter Notebook

evernote

Command line bindings for evernote

prof8

Experimental paper writing linter.

transforest

decoding-methods

blog

Jupyter Notebook

nlp-course

beamer-animation

Create animations for LaTeX Beamer presentations.

Duel

nlp

twitter-simmons-sports

monadnack-project

Art project for monadnack

peoplesounds

CutParse

Lattice

lattice protobuffer

Penn-Treebank

Haskell library for the penn treebank management

icfp2020

twittersports

ProbDist

Tools for probabality distributions focusing on estimation, conditioning, and smoothing

osgai