• Stars
    star
    5,973
  • Rank 6,756 (Top 0.2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
Lit-LLaMA

⚡ Lit-LLaMA ️

cpu-tests Build Status license Discord

Lit-LLaMA and pineapple pizza

⚡ Lit-LLaMA ️

Independent implementation of LLaMA pretraining, finetuning, and inference code that is fully open source under the Apache 2.0 license.

This implementation builds on nanoGPT.

The open-source code in this repository works with the original LLaMA weights that are distributed by Meta under a research-only license.

Looking for LLaMA 2?

Meta AI has since released LLaMA 2. Additionally, new Apache 2.0 licensed weights are being released as part of the Open LLaMA project.

To run LLaMA 2 weights, Open LLaMA weights, or Vicuna weights (among other LLaMA-like checkpoints), check out the Lit-GPT repository.

Why?

We believe that AI should be fully open source and part of the collective knowledge.

The original LLaMA code is GPL licensed which means any project using it must also be released under GPL.

This "taints" any other code and prevents integration with the rest of the ecosystem.

Lit-LLaMA solves that for good.

 

Design principles

Lit-LLaMA is:

  • Simple: Single-file implementation without boilerplate.
  • Correct: Numerically equivalent to the original model.
  • Optimized: Runs on consumer hardware or at scale.
  • Open-source: No strings attached.

Get involved!

Join our Discord to build high-performance, truly open-source models for the common benefit of the community.

 

Setup

Clone the repo

git clone https://github.com/Lightning-AI/lit-llama
cd lit-llama

install dependencies

pip install -r requirements.txt

You are all set! 🎉

 

Use the model

To generate text predictions, you need to download the model weights. If you don't have them, check out our guide.

Run inference:

python generate.py --prompt "Hello, my name is"

This will run the 7B model and require ~26 GB of GPU memory (A100 GPU).

Full guide for generating samples from the model.

Run Lit-LLaMA on consumer devices

On GPUs with bfloat16 support, the generate.py script will automatically convert the weights and consume about ~14 GB. For GPUs with less memory, or ones that don't support bfloat16, enable quantization (--quantize llm.int8):

python generate.py --quantize llm.int8 --prompt "Hello, my name is"

See python generate.py --help for more options.

You can also use GPTQ-style int4 quantization, but this needs conversions of the weights first:

python quantize/gptq.py --output_path checkpoints/lit-llama/7B/llama-gptq.4bit.pth --dtype bfloat16 --quantize gptq.int4

GPTQ-style int4 quantization brings GPU usage down to about ~5GB. As only the weights of the Linear layers are quantized, it is useful to also use --dtype bfloat16 even with the quantization enabled.

With the generated quantized checkpoint generation quantization then works as usual with --quantize gptq.int4 and the newly generated checkpoint file:

python generate.py --quantize gptq.int4 --checkpoint_path checkpoints/lit-llama/7B/llama-gptq.4bit.pth

Full guide for generating samples from the model.

Finetune the model

We provide a simple training scripts in finetune/lora.py and finetune/adapter.py that instruction-tunes a pretrained model on the Alpaca dataset using the techniques of LoRA and Adapter.

  1. Download the data and generate a instruction tuning dataset:

    python scripts/prepare_alpaca.py
  2. Run the finetuning script

    python finetune/lora.py

    or

    python finetune/adapter.py

It is expected that you have downloaded the pretrained weights as described above. The finetuning requires at least one GPU with ~24 GB memory (RTX 3090). Follow the instructions in the script to efficiently fit your GPU memory. Note: For some GPU models you might need to set torch.backends.cuda.enable_flash_sdp(False) (see comments at the top of the script).

More details about each finetuning method and how you can apply it to your own data can be found in our technical how-to guides.

Finetuning How-To Guides

These technical tutorials illustrate how to run the finetuning code.

Understanding Finetuning -- Conceptual Tutorials

Looking for conceptual tutorials and explanations? We have some additional articles below:

Pre-training

We provide a simple training script based on Fabric if you want to venture into pre-training on RedPajama, a reproduction of the original LLaMA dataset. Conversion scripts for our optimized streaming PackedDataset are included.

Follow this guide to start pre-training on the RedPajama dataset:

Get involved!

We are on a quest towards fully open source AI.

Lit-LLaMA

Join us and start contributing, especially on the following areas:

Look at train.py for a starting point towards pre-training / fine-tuning using Lightning Fabric.

We welcome all individual contributors, regardless of their level of experience or hardware. Your contributions are valuable, and we are excited to see what you can accomplish in this collaborative and supportive environment.

Unsure about contributing? Check out our Contributing to Lit-LLaMA: A Hitchhiker’s Guide to the Quest for Fully Open-Source AI guide.

Don't forget to join our Discord!

Acknowledgements

License

Lit-LLaMA is released under the Apache 2.0 license.

More Repositories

1

pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
Python
28,208
star
2

litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Python
10,409
star
3

LitServe

Lightning-fast serving engine for any AI model of any size. Flexible. Easy. Enterprise-scale.
Python
2,278
star
4

torchmetrics

Machine learning metrics for distributed, scalable PyTorch applications.
Python
2,117
star
5

deep-learning-project-template

Pytorch Lightning code guideline for conferences
Python
1,236
star
6

lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Python
1,178
star
7

litdata

Transform datasets at scale. Optimize datasets for fast AI model training.
Python
347
star
8

dl-fundamentals

Deep Learning Fundamentals -- Code material and exercises
Jupyter Notebook
342
star
9

tutorials

Collection of Pytorch lightning tutorial form as rich scripts automatically transformed to ipython notebooks.
Python
286
star
10

engineering-class

Lightning Bits: Engineering for Researchers repo
Python
131
star
11

utilities

Common Python utilities and GitHub Actions in Lightning Ecosystem
Python
50
star
12

lightning-ColossalAI

Large Scale Distributed Model Training with Colossal AI and Lightning AI
Python
50
star
13

ecosystem-ci

Automate issue discovery for your projects against Lightning nightly and releases.
Python
45
star
14

forked-pdb

Python pdb for multiple processes
Python
30
star
15

lightning-Habana

Lightning support for Intel Habana accelerators.
Python
25
star
16

lightning-Hivemind

Lightning Training strategy for HiveMind
Python
9
star
17

lightning-Graphcore

Python
7
star
18

Lightning-multinode-templates

Multinode templates for Pytorch Lightning
Python
7
star
19

LAI-E2E-ContinualAI-Emulator

Python
6
star
20

lightning-ui

Frontend for Lightning apps and home of the Design System
TypeScript
4
star
21

lightning-gan

Python
3
star
22

e2e-speed-benchmark-tests

Tests scripts which ensure that app startup times do not regress
Python
2
star
23

lightning-Horovod

Lightning Training strategy for Horovod
Python
2
star
24

LAI-lightning-template-jupyterlab-App

Python
1
star
25

cloud-training-workshop

Jupyter Notebook
1
star