• Stars
    star
    1,167
  • Rank 39,771 (Top 0.8 %)
  • Language
    Python
  • License
    MIT License
  • Created over 1 year ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Explore large language models in 512MB of RAM

Language Models

PyPI version docs x64 Build ARM64 Build Open In Colab

Python building blocks to explore large language models on any computer with 512MB of RAM

Try with Replit Badge

Translation hello world example

This package makes using large language models in software as simple as possible. All inference is performed locally to keep your data private by default.

Installation and Getting Started

This package can be installed using the following command:

pip install languagemodels

Once installed, you should be able to interact with the package in Python as follows:

>>> import languagemodels as lm
>>> lm.do("What color is the sky?")
'The color of the sky is blue.'

This will require downloading a significant amount of data (~250MB) on the first run. Models will be cached for later use and subsequent calls should be quick.

Example Usage

Here are some usage examples as Python REPL sessions. This should work in the REPL, notebooks, or in traditional scripts and applications.

Instruction Following

>>> import languagemodels as lm

>>> lm.do("Translate to English: Hola, mundo!")
'Hello, world!'

>>> lm.do("What is the capital of France?")
'Paris.'

Adjusting Model Performance

The base model should run quickly on any system with 512MB of memory, but this memory limit can be increased to select more powerful models that will consume more resources. Here's an example:

>>> import languagemodels as lm
>>> lm.do("If I have 7 apples then eat 5, how many apples do I have?")
'You have 8 apples.'
>>> lm.set_max_ram('4gb')
4.0
>>> lm.do("If I have 7 apples then eat 5, how many apples do I have?")
'I have 2 apples left.'

Text Completions

>>> import languagemodels as lm

>>> lm.complete("She hid in her room until")
'she was sure she was safe'

Chat

>>> lm.chat('''
...      System: Respond as a helpful assistant.
...
...      User: What time is it?
...
...      Assistant:
...      ''')
'I'm sorry, but as an AI language model, I don't have access to real-time information. Please provide me with the specific time you are asking for so that I can assist you better.'

Code

A model tuned on Python code is included. It can be used to complete code snippets.

>>> import languagemodels as lm
>>> lm.code("""
... a = 2
... b = 5
...
... # Swap a and b
... """)
'a, b = b, a'

External Retrieval

Helper functions are provided to retrieve text from external sources that can be used to augment prompt context.

>>> import languagemodels as lm

>>> lm.get_wiki('Chemistry')
'Chemistry is the scientific study...

>>> lm.get_weather(41.8, -87.6)
'Partly cloudy with a chance of rain...

>>> lm.get_date()
'Friday, May 12, 2023 at 09:27AM'

Here's an example showing how this can be used (compare to previous chat example):

>>> lm.chat(f'''
...      System: Respond as a helpful assistant. It is {lm.get_date()}
...
...      User: What time is it?
...
...      Assistant:
...      ''')
'It is currently Wednesday, June 07, 2023 at 12:53PM.'

Semantic search is provided to retrieve documents that may provide helpful context from a document store.

>>> import languagemodels as lm
>>> lm.store_doc(lm.get_wiki("Python"), "Python")
>>> lm.store_doc(lm.get_wiki("C language"), "C")
>>> lm.store_doc(lm.get_wiki("Javascript"), "Javascript")
>>> lm.get_doc_context("What does it mean for batteries to be included in a language?")
'From Python document: It is often described as a "batteries included" language due to its comprehensive standard library.Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.

From C document: It was designed to be compiled to provide low-level access to memory and language constructs that map efficiently to machine instructions, all with minimal runtime support.'

Full documentation

Speed

This package currently outperforms Hugging Face transformers for CPU inference thanks to int8 quantization and the CTranslate2 backend. The following table compares CPU inference performance on identical models using the best available quantization on a 20 question test set.

Backend Inference Time Memory Used
Hugging Face transformers 22s 1.77GB
This package 11s 0.34GB

Note that quantization does technically harm output quality slightly, but it should be negligible at this level.

Models

This package does allow direct model selection, but sensible defaults are preferred to allow the package to improve over time as stronger models become available. The models used are 1000x smaller than the largest models in use today. They are useful as learning tools, but perform far below the current state of the art.

This package currently uses LaMini-Flan-T5-base as its default model. This is a fine-tuning of the T5 base model on top of the FLAN fine-tuning. The model is tuned to respond to instructions in a human-like manner. The following human evaluations were reported in the paper associated with this model family:

Human-rated model comparison

For code completions, the CodeT5+ series of models are used.

Commercial Use

This package itself is licensed for commercial use, but the models used may not be compatible with commercial use. In order to use this package commercially, you can filter models by license type using the require_model_license function.

>>> import languagemodels as lm
>>> lm.config['instruct_model']
'LaMini-Flan-T5-248M-ct2-int8'
>>> lm.require_model_license("apache|bsd|mit")
>>> lm.config['instruct_model']
'flan-t5-base-ct2-int8'

It is recommended to confirm that the models used meet the licensing requirements for your software.

Projects Ideas

One of the goals for this package is to be a straightforward tool for learners and educators exploring how large language models intersect with modern software development. It can be used to do the heavy lifting for a number of learning projects:

  • CLI Chatbot (see examples/chat.py)
  • Streamlit chatbot (see examples/streamlitchat.py)
  • Chatbot with information retrieval
  • Chatbot with access to real-time information
  • Tool use
  • Text classification
  • Extractive question answering
  • Semantic search over documents
  • Document question answering

Several example programs and notebooks are included in the examples directory.

More Repositories

1

box-line-text

Simple virtual whiteboarding
HTML
455
star
2

PythonDropboxUploader

Uploads a file to Dropbox
Python
107
star
3

BWMetaAI

A StarCraft Brood War AI designed to follow the modern 1v1 metagame
Python
83
star
4

connectiongrammar

This package provides a way to develop text grammars that represent a language of interconnected 3D objects in a Python environment.
Python
73
star
5

rebrickable-sqlite

Set of scripts to create a local copy of the Rebrickable database
Makefile
27
star
6

minix3

branch of the minix3 OS for use in COS421 (Operating Systems)
23
star
7

catdocx

Extracts plain text from docx files
Shell
21
star
8

MIPS-Lite

A pipelined MIPS-Lite CPU implementation
VHDL
20
star
9

twitch-powered-up

Interact with LEGO Powered Up elements via Twitch chat and a Raspberry Pi.
JavaScript
13
star
10

gCal-iCal-Sync

Syncs a public iCal URL to a Google Calendar
Python
10
star
11

rdlgen

SSRS RDL Report Generator
Python
8
star
12

lego-by-pound

Python
7
star
13

article-template

This is a simple template for writing academic papers. It uses bibtex for tracking bibliographic information and Pandoc to convert the content to a correctly formatted document.
Makefile
6
star
14

lgeo

POV-Ray SDL
6
star
15

pyastsim

Detects similarity between Python source files based on their normalized abstract syntax trees
Python
5
star
16

pagetext.js

phantomjs script to extract article content from a page
HTML
4
star
17

doctestfn

Python
3
star
18

bricki

Application for tracking Lego inventory
Python
3
star
19

python-socket-chat

A set of Python scripts to implement a chat server and client over simple sockets
Python
2
star
20

CANParkStatus

C
2
star
21

racing-adventures

JavaScript
2
star
22

large-language-models-cpsc2550

CSS
2
star
23

rep2ai

Starcraft replay to AI converter
C
2
star
24

dotfiles

System configuration files
Shell
2
star
25

inferential

An open inference server for educational use
Python
2
star
26

x64-asm-hello-world

A basic programming assignment using x64 Linux Assembly
Assembly
2
star
27

project-template

CSS
1
star
28

nfa

Python NFA implementation
HTML
1
star
29

benji-racer

HTML
1
star
30

presdown

Simple presentation markup
JavaScript
1
star
31

github-jobs

Python
1
star
32

dm

Display mode selector
Shell
1
star
33

lithium-ion-charger

A simple charger for single lithium ion cells
C
1
star
34

gradient-descent

Python
1
star
35

calculator

Python
1
star
36

ubuntu-desktop-vm

Shell
1
star
37

retention-modeling

CSS
1
star
38

hosts

Hosts files for ad and distraction blocking
1
star
39

sysinfo

C
1
star
40

pwa-calc

A basic calculator implemented as a Progressive Web Application
HTML
1
star
41

kvlite

A very simple key value store accessible over http
C++
1
star
42

pwa-calc-slides

CSS
1
star
43

gclean

Reduces size of email messages hosted by Gmail
Python
1
star
44

xubuntu-vm

Shell
1
star
45

test-files

Makefile
1
star
46

http-password-extractor

Python
1
star
47

operating-systems-cpsc4420

Python
1
star
48

ds-examples

Jupyter Notebook
1
star
49

software-engineering-test

1
star
50

txtnorm

C
1
star
51

brick-classifier

Jupyter Notebook
1
star
52

philosophers

C
1
star
53

node-mvb

JavaScript
1
star
54

userbwlimit

Limits individual user bandwidth
Shell
1
star
55

memvulnscan

Python
1
star
56

membench

Simple memory benchmark utility
Rust
1
star
57

epubcrush

Python
1
star
58

weensy

C++
1
star
59

robot-inventor-cli

Shell
1
star
60

ubuntu-server-student-env

Scripts and config files for configuring an Ubuntu server for multiuser student use
Shell
1
star
61

gradient-descent-js

HTML
1
star
62

nperf

Python
1
star
63

json-validator

Python
1
star
64

python-docs-zim

Packages Python documentation as a Kiwix ZIM file
Makefile
1
star