• Stars
    star
    139
  • Rank 262,954 (Top 6 %)
  • Language
    C++
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

throwaway GPT inference

a1gpt

throwaway C++ GPT-2 inference engine from @a1k0n w/ minimal but optimized BLAS ops for AVX and Apple Silicon, plus custom CUDA kernels.

no external dependencies except for accelerate framework on macos, and CUDA if you have it available.

build / run

  • First, download the model:

To just grab a copy of the model without using Python or anything:

(cd model && wget https://www.a1k0n.net/models/gpt2-weights.bin)

To download from huggingface and convert the model yourself:

$ python3 scripts/download_and_convert_gpt2.py

This will require numpy and huggingface_hub to be installed in Python

  • CMake and build

note: RelWithDebInfo is the default build type, so it should run pretty quick

$ mkdir build
$ cd build
$ cmake ..

-- The CXX compiler identification is GNU 11.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/lib/cuda/bin/nvcc
-- The CUDA compiler identification is NVIDIA 11.2.67
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/lib/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: a1gpt/build

$ make -j
[ 12%] Building CXX object CMakeFiles/bpe_test.dir/bpe_test.cpp.o
[ 25%] Building CXX object CMakeFiles/bpe_test.dir/bpe.cpp.o
[ 37%] Building CXX object CMakeFiles/gpt2.dir/main.cpp.o
[ 50%] Building CXX object CMakeFiles/gpt2.dir/model_load_gpt2.cpp.o
[ 62%] Building CXX object CMakeFiles/gpt2.dir/model.cpp.o
[ 75%] Building CXX object CMakeFiles/gpt2.dir/bpe.cpp.o
[ 87%] Linking CXX executable bpe_test
[100%] Linking CXX executable gpt2
[100%] Built target bpe_test
[100%] Built target gpt2
$ ./gpt2 -h
Usage: ./gpt2 [-s seed] [-t sampling_temperature] [-p prompt]
  -s seed: random seed (default: time(NULL))
  -t sampling_temperature: temperature for sampling (default: 0.90)
  -p prompt: prompt to start with (default: English-speaking unicorns)
  -n ntokens: number of tokens to generate (default=max: 1024)
  -c cfg_scale: classifier-free guidance scale; 1.0 means no CFG (default: 1.0)

This builds gpt2 and cugpt2 for the CUDA version, if available.

Example generation on a Macbook Air M2 with default prompt, temperature:

$ ./gpt2 -s 1688452945 -n 256
a1gpt seed=1688452945 sampling_temperature=0.90 ntokens=301
encoded prompt: 50256 818 257 14702 4917 11 11444 5071 257 27638 286 44986 82 2877 287 257 6569 11 4271 31286 1850 19272 11 287 262 843 274 21124 13 3412 517 6452 284 262 4837 373 262 1109 326 262 44986 82 5158 2818 3594 13
Generating:

In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

The unicorn, nicknamed Macalpine in the state of Montana, was the first animal ever to speak the language. The animal was first reported in 1972, during the discovery of the same region by the Inkocroft Rendezvous Lourd system in the Andes. The specimen's linguistic abilities were not extremely rare, but a few unknowns led the bewildering team to believe that the unicorn appeared to be communicating with a group that was silent.

This fluency in a language exam can prevent a unicorn from communicating with a specific person or group, but scientists believe it is rare for a unicorn to mantain such linguistic abilities. In a test they found, thousands of false Mexican translates were sent. This finding, along with other brilliant discoveries in the area, revealed that unicorns communicate with their synapses, essentially the same level of coordination as humans. The unicorn's API was claimed to evolve through a single ancestor known as the Amarr. But they were only known in California, and in many other places, as Amarr.

The legendary Amarr DNA has been widely used as a tool by cosmologists to identify flying squirrels, maple leaves and bees. In the near future, scientists hope that unicorn species and their mitochondrial DNA will

elapsed: 4.091053s, 3.995169ms per token

More Repositories

1

jsxm

FastTracker 2 .xm module player in Javascript
JavaScript
486
star
2

cycloid

self-racing car platform
Jupyter Notebook
182
star
3

asciitracing

sphere tracing in ascii
C++
105
star
4

tronbot

a1k0n's 2010 Google AI Challenge entry
C++
90
star
5

texturesynth

texture synthesizer; mainly for tile engines
C++
34
star
6

arduboy3d

a simple 3D demo for arduboy
C
24
star
7

autorustler

experimental R/C car autopilot
Jupyter Notebook
18
star
8

opl2

Yamaha YM3812 (OPL2) in javascript
Jupyter Notebook
16
star
9

303

experiments in 303 bassline resynthesis
Python
15
star
10

balancebot

little balancing robot
OpenSCAD
11
star
11

x0xb0x

git re-pack of x0xb0x code from sourcefource CVS
C
9
star
12

ants

My 2011 AIChallenge.org ants bot
C++
8
star
13

asm662

OKI 66201/66207/66301 assembler and disassembler, used in pre-1996 Honda ECUs
C
8
star
14

donut-raymarch

C
5
star
15

ghidra-msm66q59x

CSS
4
star
16

litex-c2

Silicon Labs C2 debug interface peripheral in Migen / LiteX
Python
4
star
17

asm66q59x

disassembler (and assembler?) for OKI nX-8/500S
Assembly
3
star
18

docker-rpi3-gcc8

docker image for ARM Cortex-53 gcc 8.x compiler
Dockerfile
3
star
19

a1k0n.github.io

JavaScript
2
star
20

rpi-cv-camcal

Raspberry Pi OpenCV camera calibration
C++
2
star
21

gsynth

Abandonware from 2001
C++
2
star
22

zerowing

raspberry pi zero HAT for RC cars
C
1
star
23

tetris-dcpu16

Tetris for DCPU-16 (for Notch's upcoming 0x10c)
C++
1
star
24

wowintro

reverse engineering and reimplementing http://www.pouet.net/prod.php?which=62498
Assembly
1
star
25

raidquaza

pokemon go raid coordination bot for Discord
Go
1
star
26

lallocprof

Lua 5.0 memory allocation profiler
C++
1
star