• Stars
    star
    193
  • Rank 199,994 (Top 4 %)
  • Language
    C++
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python bindings for llama.cpp

Python bindings for llama.cpp

Important

  • The Python API has changed significantly in the recent weeks and as a result, I have not had a chance to update cli.py or chat.py to reflect the new changes. The scripts under examples/simple.py and examples/simple_low_level.py should give you an idea of how to use the library.

Install

From PyPI

pip install llamacpp

Build from Source

pip install .

Get the model weights

You will need to obtain the weights for LLaMA yourself. There are a few torrents floating around as well as some huggingface repositories (e.g https://huggingface.co/nyanko7/LLaMA-7B/). Once you have them, copy them into the models folder.

ls ./models
65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model

Convert the weights to GGML format using llamacpp-convert. Then use llamacpp-quantize to quantize them into INT4. For example, for the 7B parameter model, run

llamacpp-convert ./models/7B/ 1
llamacpp-quantize ./models/7B/
llamacpp-cli

Note that running llamacpp-convert requires torch, sentencepiece and numpy to be installed. These packages are not installed by default when your install llamacpp.

Command line interface

The package installs the command line entry point llamacpp-cli that points to llamacpp/cli.py and should provide about the same functionality as the main program in the original C++ repository. There is also an experimental llamacpp-chat that is supposed to bring up a chat interface but this is not working correctly yet.

API

Documentation is TBD. But the long and short of it is that there are two interfaces

  • LlamaInference - this one is a high level interface that tries to take care of most things for you. The demo script below uses this.
  • LlamaContext - this is a low level interface to the underlying llama.cpp API. You can use this similar to how the main example in llama.cpp does uses the C API. This is a rough implementation and currently untested except for compiling successfully.

Demo script

See llamacpp/cli.py for a detailed example. The simplest demo would be something like the following:

import sys
import llamacpp


def progress_callback(progress):
    print("Progress: {:.2f}%".format(progress * 100))
    sys.stdout.flush()


params = llamacpp.InferenceParams.default_with_callback(progress_callback)
params.path_model = './models/7B/ggml-model-q4_0.bin'
model = llamacpp.LlamaInference(params)

prompt = "A llama is a"
prompt_tokens = model.tokenize(prompt, True)
model.update_input(prompt_tokens)

model.ingest_all_pending_input()

model.print_system_info()
for i in range(20):
    model.eval()
    token = model.sample()
    text = model.token_to_str(token)
    print(text, end="")
    
# Flush stdout
sys.stdout.flush()

model.print_timings()

ToDo

More Repositories

1

flybywire

A React-inspired declarative library for building DOM-based user interfaces in pure Python.
Python
113
star
2

msl-apollo-entry-guidance

A Python implementation of the Apollo Entry Guidance algorithm used by NASA's MSL spacecraft
Jupyter Notebook
44
star
3

sdc-live-trainer

Live training a neural network to drive Udacity's SDC Simulator
Python
42
star
4

simplepipe

A simple functional pipelining library for Python
Python
37
star
5

chatgpt-term

A terminal interface to ChatGPT
Rust
17
star
6

ud810-intro-computer-vision

My solutions for Udacity's "Introduction to Computer Vision" MOOC
Jupyter Notebook
16
star
7

orbiter-rs

A proof-of-concept for building Orbiter spaceflight simulator addons in Rust
Rust
13
star
8

coursera-robotics-flight

Quiz and assignment solutions for Coursera MOOC - Aerial Robotics
MATLAB
13
star
9

CarND-Projects

All the code from my projects for Udacity's Self Driving Car Engineer NanoDegree
C++
9
star
10

codeigniter-azure

A Codeigniter library that wraps over Microsoft's phpAzure library ( + query caching using CI's caching library )
PHP
8
star
11

CarND-P04-Advanced-Lane-Lines

Udacity CarND - Project 04 - Advanced Lane Lines
Jupyter Notebook
7
star
12

surveyor-sim

Web demo at: https://www.thomasantony.com/surveyor/
Rust
3
star
13

beluga

Trajectory optimization framework that uses indirect methods
Python
3
star
14

surveyor

An orbiter addon that demonstrates the Surveyor lunar probe's descent guidance algorithm
Rust
2
star
15

coursera-robotics-perception

Quiz and assignment solutions for Coursera MOOC - Robotics: Perception
MATLAB
2
star
16

vscode-cpp-docker-debug

A project documenting settings for line-by-line debugging of C++ code running in Docker using VS-Code
Dockerfile
2
star
17

thomasantony.github.io-src

Source code for website
HTML
2
star
18

flybywire-tk

A React-inspired library for building native UIs in Python using Tkinter
Python
1
star
19

runaway_robot

My solution to the final project in Udacity's CS373 course - Artificial Intelligence for Robotics
Python
1
star