• Stars
    star
    10,765
  • Rank 3,173 (Top 0.07 %)
  • Language
    TypeScript
  • License
    MIT License
  • Created over 1 year ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!

LlamaGPT

LlamaGPT

A self-hosted, offline, ChatGPT-like chatbot, powered by Llama 2. 100% private, with no data leaving your device.
New: Support for Code Llama models and Nvidia GPUs.

umbrel.com (we're hiring) ยป

Contents

  1. Demo
  2. Supported Models
  3. How to install
  4. OpenAI-compatible API
  5. Benchmarks
  6. Roadmap and contributing
  7. Acknowledgements

Demo

LlamaGPT.mp4

Supported models

Currently, LlamaGPT supports the following models. Support for running custom models is on the roadmap.

Model name Model size Model download size Memory required
Nous Hermes Llama 2 7B Chat (GGML q4_0) 7B 3.79GB 6.29GB
Nous Hermes Llama 2 13B Chat (GGML q4_0) 13B 7.32GB 9.82GB
Nous Hermes Llama 2 70B Chat (GGML q4_0) 70B 38.87GB 41.37GB
Code Llama 7B Chat (GGUF Q4_K_M) 7B 4.24GB 6.74GB
Code Llama 13B Chat (GGUF Q4_K_M) 13B 8.06GB 10.56GB
Phind Code Llama 34B Chat (GGUF Q4_K_M) 34B 20.22GB 22.72GB

How to install

Install LlamaGPT on your umbrelOS home server

Running LlamaGPT on an umbrelOS home server is one click. Simply install it from the Umbrel App Store.

LlamaGPT on Umbrel App Store

Install LlamaGPT on M1/M2 Mac

Make sure your have Docker and Xcode installed.

Then, clone this repo and cd into it:

git clone https://github.com/getumbrel/llama-gpt.git
cd llama-gpt

Run LlamaGPT with the following command:

./run-mac.sh --model 7b

You can access LlamaGPT at http://localhost:3000.

To run 13B or 70B chat models, replace 7b with 13b or 70b respectively. To run 7B, 13B or 34B Code Llama models, replace 7b with code-7b, code-13b or code-34b respectively.

To stop LlamaGPT, do Ctrl + C in Terminal.

Install LlamaGPT anywhere else with Docker

You can run LlamaGPT on any x86 or arm64 system. Make sure you have Docker installed.

Then, clone this repo and cd into it:

git clone https://github.com/getumbrel/llama-gpt.git
cd llama-gpt

Run LlamaGPT with the following command:

./run.sh --model 7b

Or if you have an Nvidia GPU, you can run LlamaGPT with CUDA support using the --with-cuda flag, like:

./run.sh --model 7b --with-cuda

You can access LlamaGPT at http://localhost:3000.

To run 13B or 70B chat models, replace 7b with 13b or 70b respectively. To run Code Llama 7B, 13B or 34B models, replace 7b with code-7b, code-13b or code-34b respectively.

To stop LlamaGPT, do Ctrl + C in Terminal.

Note: On the first run, it may take a while for the model to be downloaded to the /models directory. You may also see lots of output like this for a few minutes, which is normal:

llama-gpt-llama-gpt-ui-1       | [INFO  wait] Host [llama-gpt-api-13b:8000] not yet available...

After the model has been automatically downloaded and loaded, and the API server is running, you'll see an output like:

llama-gpt-ui_1   | ready - started server on 0.0.0.0:3000, url: http://localhost:3000

You can then access LlamaGPT at http://localhost:3000.


Install LlamaGPT with Kubernetes

First, make sure you have a running Kubernetes cluster and kubectl is configured to interact with it.

Then, clone this repo and cd into it.

To deploy to Kubernetes first create a namespace:

kubectl create ns llama

Then apply the manifests under the /deploy/kubernetes directory with

kubectl apply -k deploy/kubernetes/. -n llama

Expose your service however you would normally do that.

OpenAI compatible API

Thanks to llama-cpp-python, a drop-in replacement for OpenAI API is available at http://localhost:3001. Open http://localhost:3001/docs to see the API documentation.

Benchmarks

We've tested LlamaGPT models on the following hardware with the default system prompt, and user prompt: "How does the universe expand?" at temperature 0 to guarantee deterministic results. Generation speed is averaged over the first 10 generations.

Feel free to add your own benchmarks to this table by opening a pull request.

Nous Hermes Llama 2 7B Chat (GGML q4_0)

Device Generation speed
M1 Max MacBook Pro (64GB RAM) 54 tokens/sec
GCP c2-standard-16 vCPU (64 GB RAM) 16.7 tokens/sec
Ryzen 5700G 4.4GHz 4c (16 GB RAM) 11.50 tokens/sec
GCP c2-standard-4 vCPU (16 GB RAM) 4.3 tokens/sec
Umbrel Home (16GB RAM) 2.7 tokens/sec
Raspberry Pi 4 (8GB RAM) 0.9 tokens/sec

Nous Hermes Llama 2 13B Chat (GGML q4_0)

Device Generation speed
M1 Max MacBook Pro (64GB RAM) 20 tokens/sec
GCP c2-standard-16 vCPU (64 GB RAM) 8.6 tokens/sec
GCP c2-standard-4 vCPU (16 GB RAM) 2.2 tokens/sec
Umbrel Home (16GB RAM) 1.5 tokens/sec

Nous Hermes Llama 2 70B Chat (GGML q4_0)

Device Generation speed
M1 Max MacBook Pro (64GB RAM) 4.8 tokens/sec
GCP e2-standard-16 vCPU (64 GB RAM) 1.75 tokens/sec
GCP c2-standard-16 vCPU (64 GB RAM) 1.62 tokens/sec

Code Llama 7B Chat (GGUF Q4_K_M)

Device Generation speed
M1 Max MacBook Pro (64GB RAM) 41 tokens/sec

Code Llama 13B Chat (GGUF Q4_K_M)

Device Generation speed
M1 Max MacBook Pro (64GB RAM) 25 tokens/sec

Phind Code Llama 34B Chat (GGUF Q4_K_M)

Device Generation speed
M1 Max MacBook Pro (64GB RAM) 10.26 tokens/sec

Roadmap and contributing

We're looking to add more features to LlamaGPT. You can see the roadmap here. The highest priorities are:

  • Moving the model out of the Docker image and into a separate volume.
  • Add Metal support for M1/M2 Macs.
  • Add support for Code Llama models.
  • Add CUDA support for NVIDIA GPUs.
  • Add ability to load custom models.
  • Allow users to switch between models.

If you're a developer who'd like to help with any of these, please open an issue to discuss the best way to tackle the challenge. If you're looking to help but not sure where to begin, check out these issues that have specifically been marked as being friendly to new contributors.

Acknowledgements

A massive thank you to the following developers and teams for making LlamaGPT possible:


License

umbrel.com

More Repositories

1

umbrel

A beautiful home server OS for self-hosting with an app store. Buy a pre-built Umbrel Home with umbrelOS, or install on a Raspberry Pi or any x86 system.
TypeScript
7,313
star
2

umbrel-os

umbrelOS for Raspberry Pi 4 (only). Covert your Raspberry Pi into a home server in one click. For other hardware, checkout https://github.com/getumbrel/umbrel
Shell
589
star
3

umbrel-apps

The official app repository of the Umbrel App Store. Submit apps and updates here. Learn how โ†’ https://github.com/getumbrel/umbrel-apps#readme
HTML
519
star
4

umbrel-dashboard

[Deprecated] Moved to https://github.com/getumbrel/umbrel/tree/master/packages/dashboard. Web-based dashboard to interact with your Umbrel.
Vue
132
star
5

umbrel-community-app-store

Template repository for a creating your own Community App Store for Umbrel. Click "Use this template" and add your apps!
62
star
6

umbrel-bitcoin

The official Bitcoin Node app for Umbrel, powered by Bitcoin Core.
Vue
37
star
7

umbrel-nostr-relay

The official Nostr Relay app for Umbrel. Run your private relay to backup all your activity on Nostr.
JavaScript
36
star
8

umbrel-middleware

RESTful Bitcoin and Lightning API for Umbrel
JavaScript
21
star
9

umbrel-dev

Automatically initialize and manage an Umbrel development environment
20
star
10

umbrel-website

Source code of getumbrel.com
HTML
20
star
11

docker-electrs

A Docker container for electrs
Dockerfile
19
star
12

umbrel-lightning

The official Lightning Node app for Umbrel, powered by LND.
Vue
16
star
13

umbrel-manager

[Deprecated] Moved to https://github.com/getumbrel/umbrel/tree/master/packages/manager. Low-level system API for Umbrel.
JavaScript
16
star
14

umbrel-apps-gallery

The gallery images for apps in the Umbrel App Store exist here
HTML
13
star
15

docker-snowflake

Tor Snowflake Proxy
Dockerfile
6
star
16

umbrel-electrs

The official Electrs app for Umbrel.
JavaScript
5
star
17

umbrel-core-lightning

The official Core Lightning app for Umbrel
JavaScript
5
star
18

docker-btc-rpc-explorer

Dockerfile
5
star
19

docker-nostr-rs-relay

Dockerfile
4
star
20

dosh

Shell
4
star
21

umbrel-samourai-dojo

JavaScript
1
star