• Stars
    star
    348
  • Rank 121,840 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Supercharge Open-Source AI Models

supercharger

Leverage locally-hosted Large Language Models to write software + unit tests for you.

Scripts are designed to run the Baize-30B model with 8-bit quantization on a cluster of multiple Linux servers each with two 3090 or 4090 GPUs using model parallelism.

Several other models are now supported, with some analysis here: https://docs.google.com/spreadsheets/d/1TYBNr_UPJ7wCzJThuk5ysje7K1x-_62JhBeXDbmrjA8/edit?usp=sharing

There is a blog post accompanying this repo: https://catid.io/posts/auto_codegen/

Interesting features:

  • Prompt engineering specifically for code, test, and evaluation.
  • Generates multiple code and unit tests for a given function signature, and tries any combination of them until one code+test pair passes its own tests.
  • Uses an AI to score the code and tests to decide if they are good enough.
  • Unit tested thorough code cleaning to remove unwanted artifacts from the model output.
  • Executes the candidate code tests in a virtual machine to ensure it is safe.
  • Uses a load balancer to distribute work across multiple worker nodes.

Setup the environment

Set up docker:

sudo apt install docker.io
sudo usermod -aG docker $USER

# Log out and back in here

# Verify this command succeeds
docker info

Set up this repo:

git clone https://github.com/catid/supercharger
cd ./supercharger/

conda create -n supercharger python=3.10
conda activate supercharger

# Update code and packages
./update.sh

# Check to make sure everything works.  If these fail probably you need to reboot or something.
./test_code_clean.sh
./test_model.sh

Run a worker server

conda activate supercharger

# Update code and packages
./update.sh

# Run the server
./run_server.sh

Test the worker server

conda activate supercharger

# Test a query on the server
./test_client.sh

Launch all servers in cluster from one shell command

conda activate supercharger

./launch_cluster.sh

The repo must be in the same place on all the machines, relative to ~.

This will read load_balancer_nodes.txt and launch a server process on each node. As a pre-requisite, you'll want to first install and test the server on each node. And you'll need to have passwordless ssh access to each node with e.g. ssh-copy-id.

Known issue: When the script terminates it will leave zombie processes on the servers. To kill them, go to each server and run ./kill_gpu_users.sh to kill the zombie processes.

Run a load balancer

If you have multiple worker computers you can run a load balancer on any node.

First edit the load_balancer_nodes.txt file to provide node hostnames.

conda activate supercharger

./load_balancer.sh

When running a client, specify the load balancer port 8000 instead of 5000 to use the whole cluster.

Test codegen

If you have one worker node:

python codegen/codegen.py

If you are using the load balancer on localhost:

# Number of workers should match the number of entries in load_balancer_nodes.txt
python codegen/codegen.py --workers 8 --node localhost --port 8000

Results will be summarized on the console as they come in, and you can review the generated code under ./sources/func_name/.

The codegen script will stop when a generated function passes a generated unit test, and an evaluator oracle deems that the quality of the code is sufficient to stop (you can set the threshold higher or lower with --threshold).

Example output:

(supercharger) ➜  supercharger git:(main) ✗ python codegen/codegen.py
INFO:root:Input comments: # A function that calculates the factorial of a given non-negative integer
INFO:root:Function prototype: def factorial(n):
INFO:root:Function name: factorial
INFO:root:Setting up VM...
INFO:root:Starting LLM workers...
Deleted sources/factorial/test_factorial_0.py
INFO:root:Work queue empty and detected only 0/4 workers active.  Adding job...
INFO:root:Adding a job to write more tests (tests asked/completed=0/0, codes asked/completed=0/0)
INFO:root:Worker idle... (2 seconds)
INFO:root:Worker idle... (2 seconds)
INFO:root:Worker idle... (2 seconds)
INFO:root:Work queue empty and detected only 1/4 workers active.  Adding job...
INFO:root:Adding a job to write more code (tests asked/completed=0/0, codes asked/completed=1/0)
INFO:root:Work queue empty and detected only 2/4 workers active.  Adding job...
INFO:root:Adding a job to write more tests (tests asked/completed=1/0, codes asked/completed=1/0)
INFO:root:Worker idle... (2 seconds)
INFO:root:Work queue empty and detected only 3/4 workers active.  Adding job...
INFO:root:Adding a job to write more code (tests asked/completed=1/0, codes asked/completed=2/0)
INFO:root:Work queue depth = 0 active workers = 4/4
...
INFO:root:Work queue depth = 0 active workers = 4/4
INFO:root:Generated code len=187 in 22.965782165527344 seconds, with score 0.9 (scored in 1.946894884109497 seconds)
Task ID 1: Generated code (improved=False) with score 0.9 and len=187
INFO:root:Adding a job to improve the code with self-reflection
INFO:root:Work queue depth = 1 active workers = 3/4
...
INFO:root:Work queue depth = 0 active workers = 4/4
INFO:root:Generated test len=246 in 28.84612274169922 seconds
Task ID 2: Generated test (improved=False) len=246
INFO:root:Test passed: code 1 <-> test 2 - Asking judge if we are done
INFO:root:Adding a job to improve the test with self-reflection
INFO:root:Work queue depth = 2 active workers = 3/4
INFO:root:Generated test len=307 in 32.69654178619385 seconds
Task ID 0: Generated test (improved=False) len=307
INFO:root:Test passed: code 1 <-> test 0 - Asking judge if we are done
INFO:root:Adding a job to improve the test with self-reflection
INFO:root:Work queue depth = 2 active workers = 4/4
...
INFO:root:Work queue depth = 2 active workers = 4/4
INFO:root:Generated code len=168 in 68.91890406608582 seconds, with score 0.9 (scored in 1.9692518711090088 seconds)
Task ID 3: Generated code (improved=False) with score 0.9 and len=168
INFO:root:Test passed: code 3 <-> test 2 - Asking judge if we are done
INFO:root:Test passed: code 3 <-> test 0 - Asking judge if we are done
INFO:root:Adding a job to improve the code with self-reflection
INFO:root:Work queue depth = 4 active workers = 4/4
INFO:root:Judged code/test pair with score 1.0 in 42.62114644050598 seconds
Task 5 complete: Judged pair code=1 test=2 with score=1.0
INFO:root:Work queue depth = 3 active workers = 4/4
INFO:root:Found a good code/test pair: code=1 test=2 score=1.0
INFO:root:Wrote final code and test to disk. Exiting...

The results

# A function that calculates the factorial of a given non-negative integer
def factorial(n: int) -> int:
    result = 1
    for i in range(1, n + 1):
        result *= i
    return result
import pytest
from factorial import factorial
def test_factorial():
    assert factorial(1) == 1
    assert factorial(2) == 2
    assert factorial(3) == 6
    assert factorial(4) == 24
    assert factorial(5) == 120
    assert factorial(6) == 720

When asked "What's your opinion of this code and unit test?", GPT-4 has this to say about the code:

The code implementation for the factorial function is good. It's an iterative approach, which can be more efficient in terms of memory usage compared to the recursive version. It's also more suitable for larger inputs as it does not have the risk of reaching the recursion limit.

Regarding the unit test, it is also good but has some room for improvement:

Add a test case for the base case (0), which is missing in the current test cases. The factorial of 0 is defined to be 1.
Add a test case for negative numbers to ensure the function behaves correctly with invalid input. The current implementation does not handle negative numbers, and ideally, it should raise an error in such cases.
Add more test cases for larger numbers to ensure the function works correctly for a wider range of input values.
With these improvements, the test cases would be more comprehensive and cover a wider range of scenarios.

Future work

I ran out of time to implement everything I had in mind, but here are some ideas for future work:

  • Check the output for cycles.
  • Add a planning module that breaks up a problem into several functions and generates code for each function.
  • Read the output of unit testing and use it to refine the code/tests.
  • Fine-tune the temperature, context-length, and max-tokens parameters to improve success rate.
  • Check if we can use smaller, faster models to improve code generation speed.
  • Use OpenAI API for some of the tasks in a hybrid of free + paid models.

More Repositories

1

dora

Implementation of DoRA
Python
276
star
2

Zpng

Better lossless compression than PNG with a simpler algorithm
C
267
star
3

wirehair

Wirehair : O(N) Fountain Code for Large Data
C++
267
star
4

self-discover

Implementation of Google's SELF-DISCOVER
Python
263
star
5

WLANOptimizer

Single-header C library that fixes WiFi performance issues for online gaming and other low-latency real-time network traffic.
C++
223
star
6

longhair

Longhair : O(N^2) Cauchy Reed-Solomon Block Erasure Code for Small Data
C++
156
star
7

leopard

Leopard-RS : O(N Log N) MDS Reed-Solomon Block Erasure Code for Large Data
C++
135
star
8

shorthair

Shorthair : Generational Block Streaming Erasure Codes
C++
128
star
9

TimeSync

TimeSync: Time Synchronization Library in Portable C++
C++
122
star
10

cm256

Fast GF(256) Cauchy MDS Block Erasure Codec in C
C++
107
star
11

tonk

Tonk : Reliable UDP (rUDP) Network Library and Infinite Window Erasure Code
C++
101
star
12

Zdepth

Zdepth :: Streaming Depth Compressor in C++ for Azure Kinect DK
C++
97
star
13

xrcap

Azure Kinect multi-camera secure network capture/record/replay
C++
76
star
14

snowshoe

Snowshoe - Portable, Secure, Fast Elliptic Curve Math Library in C
C++
62
star
15

XRmonitors

XRmonitors : User-Friendly Virtual Multi-Monitors for the Workplace
C++
54
star
16

tabby

Tabby - Strong, Fast, and Portable Cryptographic Signatures, Handshakes, and Password Authentication
C++
50
star
17

gf256

GF256 - Fast 8-bit Galois Field Math in C
C++
50
star
18

kvm

Low-Bandwidth IP KVM using Raspberry Pi 4
C++
48
star
19

siamese

Siamese : Infinite-Window Streaming Erasure Code (HARQ)
C++
47
star
20

bitnet_cpu

Experiments with BitNet inference on CPU
C++
46
star
21

ZdepthLossy

Lossy version of Zdepth using video encoders
C++
41
star
22

fecal

FEC-AL : O(N^2) Fountain Code for Small Data
C++
36
star
23

CauchyCaterpillar

Cauchy Caterpillar : O(N^2) Short-Window Streaming Erasure Code
C++
35
star
24

aiwebcam2

Second attempt at AI webcam, this time with OpenAI API
Python
32
star
25

upsampling

Image Upsampling with PyTorch
Python
22
star
26

calico

Calico - Strong, Fast, and Portable Authenticated Encryption
C++
21
star
27

cymric

Cymric - Portable secure random number generator
C++
20
star
28

loraftp

File transfer between two Raspberry Pis using the LoRa Pi HAT from Waveshare
C
19
star
29

mau

Network simulator for reliable UDP testing in C++
C++
16
star
30

oaillama3

Simple setup to self-host LLaMA3-70B model with an OpenAI API
16
star
31

lllm

Latent Large Language Models
Python
16
star
32

PacketAllocator

C++ Memory allocator for packet queues that free() in roughly the same order that they alloc().
C++
15
star
33

spectral_ssm

Implementation of Spectral State Space Models
Python
15
star
34

libcat

Common code library
C++
14
star
35

AutoAudiobook

Automatically create an audiobook using OpenAI
Python
14
star
36

minigpt4

MiniGPT-4 :: Updated to Torch 2.0, simple setup, easier API, cut out training code
Python
13
star
37

sdxl

SDXL GPU cluster scripts
Python
13
star
38

dataloader

High-performance tokenized language data-loader for Python C++ extension
C++
12
star
39

fp61

Experiment: Fast finite field Fp=2^61-1 in C++
C++
11
star
40

cuda_float_compress

Python package for compressing floating-point PyTorch tensors
Cuda
10
star
41

phind

Locally hosted: 60% HumanEval
Python
8
star
42

rtmp_receiver

Simple unidirectional RTMP video stream receiver
C++
7
star
43

counter

C++ wrapper for counters that can roll-over (e.g. timestamps/ack-ids)
C++
6
star
44

z16

16-bit monochrome image compressor based on Zstd
C
5
star
45

AQLM

Fixes for AQLM
Python
5
star
46

llamanal.cpp

Static code analysis for C++ projects using llama.cpp and the best LLM you can run offline without an expensive GPU.
C
5
star
47

unfiltered-diffusers

Simple fork that disables NSFW filter
Python
5
star
48

hloc

Python
4
star
49

boss-balloon

BossBalloon.io
TypeScript
4
star
50

libcatid

Automatically exported from code.google.com/p/libcatid
C++
3
star
51

voron

Voron 3D Printer files
3
star
52

halide-test

Test v14/v15 performance regression
CMake
3
star
53

logger

Feature-rich portable C++ logging subsystem in 650 lines of code
C++
3
star
54

cifar10deepspeed

Using DeepSpeed and Nvidia DALI to train various models to solve CIFAR-10
Python
3
star
55

chainlit-anthropic

Chainlit AI UI with Anthropic Backend
Python
3
star
56

fastest_gf_matrix_mult

A fairly hacked together piece of code that can quickly search for the optimal GF polynomials and Cauchy matrices for XOR-based GF matrix multiplication for erasure code encoders. May be useful if the matrices are of fixed size.
C++
3
star
57

recfilter-2020-fail

This is a failed attempt to port Recfilter to latest Halide from 2020. Maybe someone else can figure this out?
C++
2
star
58

bentpipe

Simple UDP rebroadcaster
C++
2
star
59

whisper3

Testing out Whisper 3
Python
2
star
60

sphynx

Sphynx - High Performance Network Transport Layer
C++
2
star
61

textworld_llm_benchmark

TextWorld LLM Benchmark
Python
2
star
62

pixel-perfect-sfm

pixel-perfect-sfm with some minor fixes
C++
2
star
63

train_ticket

Gated PRNGs are all you need? Spoilers: No.
Python
2
star
64

CatsChoice

PRNG Parameter Generation
C++
1
star
65

CRC16Recovery

Optimized CRC16 with error recovery in C
C++
1
star
66

Splane

Archived old code - /ban_ids/ is kind of interesting
C
1
star
67

rust_webgl_demo

Hello World : Rust Web Assembly
Rust
1
star
68

blog2022

HTML
1
star
69

swe_agent_playground

swe_agent_playground
1
star
70

Exapunks_Solutions

Exapunks Walkthrough Solutions
1
star
71

audio_prediction

Simple audio prediction example with RNN
Python
1
star
72

never_forget

Implementation of Overcoming Catastrophic Forgetting
Python
1
star
73

DependencyInjected

DependencyInjected : Light-weight and powerful Dependency Injection pattern for C++
C++
1
star
74

quicsend

quicsend :: Super-fast Internet-ready file transfer right from Python
C++
1
star