• Stars
    star
    6,829
  • Rank 5,773 (Top 0.2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 4 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

GitHub issues Weights & Biases monitoring

GPT-NeoX

This repository records EleutherAI's library for training large-scale language models on GPUs. Our current framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training.

For those looking for a TPU-centric codebase, we recommend Mesh Transformer JAX.

If you are not looking to train models with billions of parameters from scratch, this is likely the wrong library to use. For generic inference needs, we recommend you use the Hugging Face transformers library instead which supports GPT-NeoX models.

GPT-NeoX 2.0

Prior to 3/9/2023, GPT-NeoX relied on DeeperSpeed, which was based on an old version of DeepSpeed (0.3.15). In order to migrate to the latest upstream DeepSpeed version while allowing users to access the old versions of GPT-NeoX and DeeperSpeed, we have introduced two versioned releases for both libraries:

Contents

Quick Start

Environment and Dependencies

Host Setup

First make sure you are in an environment with Python 3.8 with an appropriate version of PyTorch 1.8 or later installed. Note: Some of the libraries that GPT-NeoX depends on have not been updated to be compatible with Python 3.10+. Python 3.9 appears to work, but this codebase has been developed and tested for Python 3.8.

To install the remaining basic dependencies, run:

pip install -r requirements/requirements.txt
pip install -r requirements/requirements-wandb.txt
pip install -r requirements/requirements-tensorboard.txt
python ./megatron/fused_kernels/setup.py install # optional if not using fused kernels

from the repository root.

Warning: Our codebase relies on DeeperSpeed, our fork of the DeepSpeed library with some added changes. We strongly recommend using Anaconda, a virtual machine, or some other form of environment isolation before continuing. Failure to do so may cause other repositories that rely on DeepSpeed to break.

TensorBoard

=======

Flash Attention

To use Flash-Attention, install the additional dependencies in ./requirements/requirements-flashattention.txt and set the attention type in your configuration accordingly (see configs). This can provide significant speed-ups over regular attention on certain GPU architectures, including Ampere GPUs (such as A100s); see the repository for more details.

Containerized Setup

We also provide a Dockerfile if you prefer to run NeoX in a container. To use this option, first build an image named gpt-neox from the repository root directory with docker build -t gpt-neox -f Dockerfile .. We also host pre-built images on Docker Hub at leogao2/gpt-neox.

You can then run a container based on this image. For instance, the below snippet mounts the cloned repository (gpt-neox) directory to /gpt-neox in the container and uses nvidia-docker to make four GPUs (numbers 0-3) accessible to the container. As noted by the NCCL documentation, both --shm-size=1g and --ulimit memlock=-1 are important to prevent Docker from allocating too little shared memory.

nvidia-docker run --rm -it -e NVIDIA_VISIBLE_DEVICES=0,1,2,3 --shm-size=1g --ulimit memlock=-1 --mount type=bind,src=$PWD,dst=/gpt-neox gpt-neox

Usage

All functionality (inference included), should be launched using deepy.py, a wrapper around the deepspeed launcher.

We currently offer three main functions:

  1. train.py is used for training and finetuning models.
  2. evaluate.py is used to evaluate a trained model using the language model evaluation harness.
  3. generate.py is used to sample text from a trained model.

which can be launched with:

./deepy.py [script.py] [./path/to/config_1.yml] [./path/to/config_2.yml] ... [./path/to/config_n.yml]

E.G To generate text unconditionally with the GPT-NeoX-20B model, you can use the following:

./deepy.py generate.py ./configs/20B.yml

Or optionally pass in a text file (e.g prompt.txt) to use as the prompt, which should be a plain .txt file with each prompt separated by newline characters, also passing in the path to an output file.

./deepy.py generate.py ./configs/20B.yml -i prompt.txt -o sample_outputs.txt

To reproduce our evaluation numbers on, for example, TriviaQA and PIQA use:

./deepy.py evaluate.py ./configs/20B.yml --eval_tasks triviaqa piqa

You can add an arbitrary list of evaluation tasks here, for details of all tasks available, see lm-evaluation-harness.

For more details on each entry point, see the Training and Finetuning, Inference and Evaluation

Configuration

GPT-NeoX parameters are defined in a YAML configuration file which is passed to the deepy.py launcher. We have provided some example .yaml files in configs, including one for GPT-NeoX-20B, and example configuration files for other model sizes.

These files are generally complete, but non-optimal. For example, depending on your specific GPU configuration, you may need to change some settings such as pipe-parallel-size, model-parallel-size to increase or decrease the degree of parallelisation, train_micro_batch_size_per_gpu or gradient-accumulation-steps to modify batch size related settings, or the zero_optimization dict to modify how optimizer states are parallelised across workers.

For a more detailed guide to all the features available and how to configure them, see the configuration README, and for documentation of every possible argument, see configs/neox_arguments.md.

Datasets

Preconfigured Datasets

Several preconfigured datasets are available, including most components from the Pile, as well as the Pile train set itself, for straightforward tokenization using the prepare_data.py entry point.

E.G, to download and tokenize the enwik8 dataset with the GPT2 Tokenizer, saving them to ./data you can run:

python prepare_data.py -d ./data

or a single shard of the pile (pile_subset) with the GPT-NeoX-20B tokenizer (assuming you have it saved at ./20B_checkpoints/20B_tokenizer.json):

python prepare_data.py -d ./data -t HFTokenizer --vocab-file ./20B_checkpoints/20B_tokenizer.json pile_subset

The tokenized data will be saved out to two files: [data-dir]/[dataset-name]/[dataset-name]_text_document.binand [data-dir]/[dataset-name]/[dataset-name]_text_document.idx. You will need to add the prefix that both these files share to your training configuration file under the data-path field. E.G:

  "data-path": "./data/enwik8/enwik8_text_document",

Using Custom Data

To prepare your own dataset for training with custom data, format it as one large jsonl-formatted file with each item in the list of dictionaries being a separate document. The document text should be grouped under one JSON key, i.e "text". Any auxiliary data stored in other fields will not be used.

Next make sure to download the GPT2 tokenizer vocab, and merge files from the following links:

Or use the 20B tokenizer (for which only a single Vocab file is needed):

(alternatively, you can provide any tokenizer file that can be loaded by Hugging Face's tokenizers library with the Tokenizer.from_pretrained() command)

You can now pretokenize your data using tools/preprocess_data.py, the arguments for which are detailed below:

usage: preprocess_data.py [-h] --input INPUT [--jsonl-keys JSONL_KEYS [JSONL_KEYS ...]] [--num-docs NUM_DOCS] --tokenizer-type {HFGPT2Tokenizer,HFTokenizer,GPT2BPETokenizer,CharLevelTokenizer} [--vocab-file VOCAB_FILE] [--merge-file MERGE_FILE] [--append-eod] [--ftfy] --output-prefix OUTPUT_PREFIX
                          [--dataset-impl {lazy,cached,mmap}] [--workers WORKERS] [--log-interval LOG_INTERVAL]

optional arguments:
  -h, --help            show this help message and exit

input data:
  --input INPUT         Path to input jsonl files or lmd archive(s) - if using multiple archives, put them in a comma separated list
  --jsonl-keys JSONL_KEYS [JSONL_KEYS ...]
                        space separate listed of keys to extract from jsonl. Defa
  --num-docs NUM_DOCS   Optional: Number of documents in the input data (if known) for an accurate progress bar.

tokenizer:
  --tokenizer-type {HFGPT2Tokenizer,HFTokenizer,GPT2BPETokenizer,CharLevelTokenizer}
                        What type of tokenizer to use.
  --vocab-file VOCAB_FILE
                        Path to the vocab file
  --merge-file MERGE_FILE
                        Path to the BPE merge file (if necessary).
  --append-eod          Append an <eod> token to the end of a document.
  --ftfy                Use ftfy to clean text

output data:
  --output-prefix OUTPUT_PREFIX
                        Path to binary output file without suffix
  --dataset-impl {lazy,cached,mmap}
                        Dataset implementation to use. Default: mmap

runtime:
  --workers WORKERS     Number of worker processes to launch
  --log-interval LOG_INTERVAL
                        Interval between progress updates

For example:

python tools/preprocess_data.py \
            --input ./data/mydataset.jsonl.zst \
            --output-prefix ./data/mydataset \
            --vocab ./data/gpt2-vocab.json \
            --merge-file gpt2-merges.txt \
            --dataset-impl mmap \
            --tokenizer-type GPT2BPETokenizer \
            --append-eod

You would then run training with the following settings added to your configuration file:

  "data-path": "data/mydataset/mydataset",

Training and Finetuning

Training is launched using deepy.py, a wrapper around DeepSpeed's launcher, which launches the same script in parallel across many GPUs / nodes.

The general usage pattern is:

python ./deepy.py train.py [path/to/config1.yml] [path/to/config2.yml] ...

You can pass in an arbitrary number of configs which will all be merged at runtime.

You can also optionally pass in a config prefix, which will assume all your configs are in the same folder and append that prefix to their path.

E.G:

python ./deepy.py train.py -d configs 125M.yml local_setup.yml

This will deploy the train.py script on all nodes with one process per GPU. The worker nodes and number of GPUs are specified in the /job/hostfile file (see parameter documentation), or can simply be passed in as the num_gpus arg if running on a single node setup.

Although this is not strictly necessary, we find it useful to define the model parameters in one config file (e.g configs/125M.yml) and the data path parameters in another (e.g configs/local_setup.yml).

Pretrained Models

GPT-NeoX-20B

GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile. Technical details about GPT-NeoX-20B can be found in the associated paper. The configuration file for this model is both available at ./configs/20B.yml and included in the download links below.

Slim weights - (No optimizer states, for inference or finetuning, 39GB)

To download from the command line to a folder named 20B_checkpoints, use the following command:

wget --cut-dirs=5 -nH -r --no-parent --reject "index.html*" https://the-eye.eu/public/AI/models/GPT-NeoX-20B/slim_weights/ -P 20B_checkpoints

Full weights - (Including optimizer states, 268GB)

To download from the command line to a folder named 20B_checkpoints, use the following command:

wget --cut-dirs=5 -nH -r --no-parent --reject "index.html*" https://the-eye.eu/public/AI/models/GPT-NeoX-20B/full_weights/ -P 20B_checkpoints

Weights can be alternatively be downloaded using a BitTorrent client. Torrent files can be downloaded here: slim weights, full weights.

We additionally have 150 checkpoints saved throughout training, one every 1,000 steps. We are working on figuring out how to best serve these at scale, but in the meanwhile people interested in working with the partially trained checkpoints can email us at [email protected] to arrange access.

Pythia

The Pythia Scaling Suite is a suite of models ranging from 70M parameters to 12B parameters trained on the Pile intended to promote research on interpretability and training dynamics of large language models. Further details about the project and links to the models can be found in the in the paper and on the project's GitHub.

Polyglot

The Polyglot Project is an effort to train powerful non-English pretrained language models to promote the accessibility of this technology to researchers outside the dominant powerhouses of machine learning. EleutherAI has trained and released 1.3B, 3.8B, and 5.8B parameter Korean language models, the largest of which outpreforms all other publicly available language models on Korean language tasks. Further details about the project and links to the models can be found here.

Inference

For most uses we recommend deploying models trained using the GPT-NeoX library via the Hugging Face Transformers library which is better optimized for inference.

We support three types of generation from a pretrained model:

  1. Unconditional generation
  2. Conditional generation based on an input read from a file
  3. Interactive generation, which allows for multiple rounds of back-and-forth between a user and the language model via a command line interface

All three types of text generation can be launched via python ./deepy.py generate.py -d configs 125M.yml local_setup.yml text_generation.yml with the appropriate values set in configs/text_generation.yml.

Evaluation

GPT-NeoX supports evaluation on downstream tasks through the language model evaluation harness.

To evaluate a trained model on the evaluation harness, simply run:

python ./deepy.py evaluate.py -d configs your_configs.yml --eval_tasks task1 task2 ... taskn

where --eval_tasks is a list of evaluation tasks followed by spaces, e.g --eval_tasks lambada hellaswag piqa sciq. For details of all tasks available, refer to the lm-evaluation-harness repo.

Exporting to Hugging Face

GPT-NeoX is optimized heavily for training only, and GPT-NeoX model checkpoints are not compatible out of the box with other deep learning libraries. To make models easily loadable and shareable with end users, and for further exporting to various other frameworks, GPT-NeoX supports checkpoint conversion to the Hugging Face Transformers GPTNeoXModel format.

To convert a NeoX checkpoint (with pipeline-parallel-size>=1) to Hugging Face-loadable format, run:

python ./tools/convert_module_to_hf.py --input_dir /path/to/model/global_stepXXX --config_file your_config.yml --output_dir hf_model/save/location

To convert a sequential model to Hugging Face format, run:

python  ./tools/convert_sequential_to_hf.py --input_dir /path/to/model/global_stepXXX --config_file your_config.yml --output_dir hf_model/save/location

(Note: this script should be used for v2.0 checkpoints saved on a v2.0 commit prior to #866 and which used pipe-parallel-size=1. Using pipe-parallel-size=0 will also save models in this format.)

Then to upload a model to the Hugging Face Hub, run:

huggingface-cli login
python ./tools/upload.py

and input the requested information, including HF hub user token.

Note, however, that this compatibility is not one-to-one, and only certain configurations from GPT-NeoX are supported in the Hugging Face GPTNeoXModel class. Advanced features such as alternative positional embeddings may require new Transformers modeling code and new conversion script tweaks.

Monitoring

In addition to storing logs locally, we provide built-in support for two popular experiment monitoring frameworks: Weights & Biases and TensorBoard

Weights & Biases

EleutherAI is currently using Weights & Biases to record our experiments. If you are logged into Weights & Biases on your machine—you can do this by executing wandb login—your runs will automatically be recorded. There are two optional fields associated with Weights & Biases: wandb_group allows you to name the run group and wandb_team allows you to assign your runs to an organization or team account.

TensorBoard

We also support using TensorBoard via the tensorboard-dir field. Dependencies required for TensorBoard monitoring can be found in and installed from ./requirements/requirements-tensorboard.txt.

Running on multi-node

If you need to supply a hostfile for use with the MPI-based DeepSpeed launcher, you can set the environment variable DLTS_HOSTFILE to point to the hostfile.

Administrative Notes

Citing GPT-NeoX

If you have found the GPT-NeoX library helpful in your work, you can cite this repository as

@software{gpt-neox-library,
  title = {{GPT-NeoX: Large Scale Autoregressive Language Modeling in PyTorch}},
  author = {Andonian, Alex and Anthony, Quentin and Biderman, Stella and Black, Sid and Gali, Preetham and Gao, Leo and Hallahan, Eric and Levy-Kramer, Josh and Leahy, Connor and Nestler, Lucas and Parker, Kip and Pieler, Michael and Purohit, Shivanshu and Songz, Tri and Phil, Wang and Weinbach, Samuel},
  url = {https://www.github.com/eleutherai/gpt-neox},
  doi = {10.5281/zenodo.5879544},
  month = {8},
  year = {2021},
  version = {0.0.1},
}

To cite our 20 billion parameter model, please use

@inproceedings{gpt-neox-20b,
  title={{GPT-NeoX-20B}: An Open-Source Autoregressive Language Model},
  author={Black, Sid and Biderman, Stella and Hallahan, Eric and Anthony, Quentin and Gao, Leo and Golding, Laurence and He, Horace and Leahy, Connor and McDonell, Kyle and Phang, Jason and Pieler, Michael and Prashanth, USVSN Sai and Purohit, Shivanshu and Reynolds, Laria and Tow, Jonathan and Wang, Ben and Weinbach, Samuel},
  booktitle={Proceedings of the ACL Workshop on Challenges \& Perspectives in Creating Large Language Models},
  url={https://arxiv.org/abs/2204.06745},
  year={2022}
}

Citation instructions for other pretrained models can be found in the appropriate repository.

Licensing

This repository hosts code that is part of EleutherAI's GPT-NeoX project. Copyright (c) 2021, EleutherAI. Licensed under the Apache License:

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

This repository is based off code written by NVIDIA that is licensed under the Apache License, Version 2.0. In accordance with the Apache License, all files that are modifications of code originally written by NVIDIA maintain a NVIDIA copyright header. All files that do not contain such a header are the exclusive copyright of EleutherAI. When the NVIDIA code has been modified from its original version, that fact is noted in the copyright header. All derivative works of this repository must preserve these headers under the terms of the Apache License.

This repository also contains code written by a number of other authors. Such contributions are marked and the relevant licensing is included where appropriate.

For full terms, see the LICENSE file. If you have any questions, comments, or concerns about licensing please email us at [email protected].

Publications

The following publications have come out of this project:

The following publications by other research groups use this library:

Acknowledgements

We run our experiments on a Kubernetes cluster generously provided by CoreWeave and a SLURM cluster provided by Stability AI.

More Repositories

1

gpt-neo

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
Python
8,224
star
2

lm-evaluation-harness

A framework for few-shot evaluation of language models.
Python
6,268
star
3

pythia

The hub for EleutherAI's work on interpretability and learning dynamics
Jupyter Notebook
2,193
star
4

the-pile

Python
1,459
star
5

math-lm

Python
1,035
star
6

cookbook

Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
Python
635
star
7

polyglot

Polyglot: Large Language Models of Well-balanced Competence in Multi-languages
471
star
8

DALLE-mtf

Open-AI's DALL-E for large scale training in mesh-tensorflow.
Python
434
star
9

vqgan-clip

Jupyter Notebook
345
star
10

sae

Sparse autoencoders
Python
274
star
11

concept-erasure

Erasing concepts from neural representations with provable guarantees
Python
207
star
12

elk

Keeping language models honest by directly eliciting knowledge encoded in their activations.
Python
186
star
13

oslo

OSLO: Open Source for Large-scale Optimization
Python
173
star
14

lm_perplexity

Python
144
star
15

knowledge-neurons

A library for finding knowledge neurons in pretrained transformer models.
Python
142
star
16

pyfra

Python Research Framework
Python
107
star
17

dps

Data processing system for polyglot
Python
88
star
18

openwebtext2

Python
86
star
19

info

(Deprecated) A hub for onboarding & other information.
78
star
20

improved-t5

Experiments for efforts to train a new and improved t5
Python
76
star
21

stackexchange-dataset

Python tools for processing the stackexchange data dumps into a text dataset for Language Models
Python
73
star
22

project-menu

See the issue board for the current status of active and prospective projects!
65
star
23

magiCARP

One stop shop for all things carp
Python
58
star
24

sae-auto-interp

Python
53
star
25

semantic-memorization

Jupyter Notebook
44
star
26

tqdm-multiprocess

Using queues, tqdm-multiprocess supports multiple worker processes, each with multiple tqdm progress bars, displaying them cleanly through the main process. It offers similar functionality for python logging.
Python
41
star
27

aria

Python
37
star
28

hae-rae

32
star
29

rnngineering

Engineering the state of RNN language models (Mamba, RWKV, etc.)
Jupyter Notebook
31
star
30

features-across-time

Understanding how features learned by neural networks evolve throughout training
Python
30
star
31

mp_nerf

Massively-Parallel Natural Extension of Reference Frame
Jupyter Notebook
29
star
32

elk-generalization

Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from easy questions to hard
Python
23
star
33

pile-pubmedcentral

A script for collecting the PubMed Central dataset in a language modelling friendly format.
Python
22
star
34

best-download

URL downloader supporting checkpointing and continuous checksumming.
Python
19
star
35

polyglot-data

data related codebase for polyglot project
Python
19
star
36

aria-amt

Efficient and robust implementation of seq-to-seq automatic piano transcription.
Python
18
star
37

text-generation-testing-ui

Web app for demoing the EAI models
JavaScript
16
star
38

exploring-contrastive-topology

Jupyter Notebook
16
star
39

mdl

Minimum Description Length probing for neural network representations
Python
15
star
40

pile_dedupe

Pile Deduplication Code
Python
15
star
41

w2s

Python
15
star
42

pilev2

Python
13
star
43

distilling

Experiments with distilling large language models.
Python
13
star
44

tokengrams

Efficiently computing & storing token n-grams from large corpora
Rust
13
star
45

lm-eval2

Python
11
star
46

equivariance

A framework for implementing equivariant DL
Jupyter Notebook
10
star
47

radioactive-lab

Adapting the "Radioactive Data" paper to work for text models
Python
9
star
48

pile-literotica

Download, parse, and filter data from Literotica. Data-ready for The-Pile.
Python
8
star
49

hn-scraper

Python
8
star
50

tagged-pile

Part-of-Speech Tagging for the Pile and RedPajama
Python
8
star
51

multimodal-fid

Python
7
star
52

pile-uspto

A script for collecting the USPTO Backgrounds dataset in a language modelling friendly format.
Python
7
star
53

pile-cc-filtering

The code used to filter CC data for The Pile
Python
6
star
54

minetest-baselines

Baseline agents for Minetest tasks.
Python
6
star
55

CodeCARP

Data collection pipeline for CodeCARP. Includes PyCharm plugins.
6
star
56

pile-enron-emails

A script for collecting the Enron Emails dataset in a language modelling friendly format.
Python
6
star
57

pile-explorer

For exploring the data and documenting its limitations
Python
5
star
58

minetest-interpretabilty-notebook

Jupyter notebook for the interpretablity section of the minetester blog post
Jupyter Notebook
5
star
59

thonkenizers

yes
5
star
60

eleutherai.github.io

This is the Hugo generated website for eleuther.ai. The source of this build is new-website repo.
HTML
5
star
61

visual-grounding

Visually ground GPT-Neo 1.3b and 2.7b
Python
5
star
62

LLM-Markov-Chains

Project github for LLM Markov Chains Project
5
star
63

architecture-experiments

Repository to host architecture experiments and development using Paxml and Praxis
Python
5
star
64

llemma-sample-explorer

Sample explorer tool for the Llemma models.
HTML
5
star
65

lm-scope

Jupyter Notebook
4
star
66

latent-video-diffusion

Latent video diffusion
Python
4
star
67

megatron-3d

Python
4
star
68

website

New website for EleutherAI based on Hugo static site generator
HTML
4
star
69

Unpaired-Image-Generation

Project Repo for Unpaired Image Generation project
4
star
70

ccs

Python
4
star
71

isaac-mchorse

EleutherAI's discord bot
Python
3
star
72

pile-allpoetry

Scraper to gather poems from allpoetry.com
Python
3
star
73

EvilModel

A replication of "EvilModel 2.0: Bringing Neural Network Models into Malware Attacks"
3
star
74

eai-prompt-gallery

Library of interesting prompt generations
JavaScript
3
star
75

variance-across-time

Studying the variance in neural net predictions across training time
Python
3
star
76

pile-ubuntu-irc

A script for collecting the Ubuntu IRC dataset in a language modelling friendly format.
Python
3
star
77

reddit-comment-processing

Python
2
star
78

eleutherai-instruct-dataset

A large instruct dataset for open-source models (WIP).
2
star
79

bucket-cleaner

A small utility to clear out old model checkpoints in Google Cloud Buckets whilst keeping tensorboard event files
Python
2
star
80

groupoid-rl

Jupyter Notebook
2
star
81

equinox-llama

Equinox implementation of llama3 and llama3.1
Python
2
star
82

optax-galore

Adds GaLore style projection wrappers to optax optimizers
Python
2
star
83

lang-filter

Filter text files or archives by language
Python
1
star
84

eleuther-blog

here is the generated content for the EleutherAI blog. Source is from new-website repo
HTML
1
star
85

prefix-free-tokenizer

A prefix free tokenizer
Python
1
star
86

alignment-reader

Search and filter through alignment literature
JavaScript
1
star
87

grouch

HTML
1
star
88

language-adaptation

1
star
89

perceptors

central location for access to pretrained models for CLIP and variants, with common API and out-of-the-box differentiable weighted multi-perceptor
1
star
90

pd-books

Jupyter Notebook
1
star
91

classifier-latent-diffusion

Python
1
star
92

common-llm-settings

Common LLM Settings App
JavaScript
1
star
93

bayesian-adam

Exactly what it says on the tin
Python
1
star
94

pile-cord19

A script for collecting the CORD-19 dataset in a language modelling friendly format.
Python
1
star
95

conceptual-constraints

Applying LEACE to models during training
Jupyter Notebook
1
star
96

ngrams-across-time

Jupyter Notebook
1
star
97

steering-llama3

Python
1
star
98

truncated-gaussian

Method-of-moments estimation and sampling for truncated multivariate Gaussian distributions
Python
1
star