• Stars
    star
    182
  • Rank 211,154 (Top 5 %)
  • Language
    Fortran
  • License
    MIT License
  • Created almost 2 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast GPT-2 inference written in Fortran

fastGPT

The progression of GPT-2 codes from the original to "minimal", "nano" and "pico":

fastGPT is very similar to picoGPT (very small and readable), but it is also fast (see the Benchmarks section below). The speed and readability is achieved by using Fortran. I wrote a blog post introducing fastGPT.

fastGPT features:

  • Fast? βœ…
  • Training code? ❌
  • Batch inference? ❌
  • top-p sampling? ❌ top-k? ❌ temperature? ❌ categorical sampling?! ❌ greedy? βœ…
  • Readable? βœ…
  • Small? βœ…

A quick breakdown of each of the files:

  • gpt2.f90: the actual GPT-2 model and a decoder
  • main.f90: the main driver
  • create_model.py: downloads the TensorFlow model and converts to our own format (model.dat)
  • encode_input.py: encodes the text input into tokens (input file for gpt2)
  • Matmul implementations
    • linalg_f.f90 native Fortran
    • linalg_c.f90, linalg_accelerate.c macOS Accelerate Framework
  • pt.py: a reference script to run PyTorch (returns the same answer)

Build and Run

Install prerequisites:

mamba env create -f environment.yml
conda activate fastgpt

Configure and build:

FC=gfortran cmake .
make

Create the model.dat file from a given GPT-2 model. Supported sizes (and the corresponding names to be used in pt.py, and the approximate download size): "124M" (gpt2, 0.5GB), "355M" (gpt-medium, 1.5GB), "774M" (gpt-large, 3GB), "1558M" (gpt-xl, 6GB). This will download the model and cache it for subsequent runs:

python create_model.py --models_dir "models" --model_size "124M"

Now you can modify the input file to change the input string and set other parameters.

Run (requires model.dat and input in the current directory):

./gpt2

Example Output

The above ./gpt2 command prints on Apple M1 Max:

$ ./gpt2
Loading the model...
    done. Time:   0.111s

Model parameters:
n_vocab = 50257
n_ctx   =  1024
n_embd  =   768
n_layer =    12
n_head  =    12

Input text
Alan Turing theorized that computers would one day become very powerful, but even he could not imagine

Encoding: tokenizing input text into tokens (currently slow)...
    done. Time:   0.074s

Input parameters:
n_seq                =  19
n_tokens_to_generate =  20

Input tokens:
 36235 39141 18765  1143   326  9061   561   530  1110  1716   845  3665    11   475   772   339   714   407  5967

Decoded input as text:
Alan Turing theorized that computers would one day become very powerful, but even he could not imagine

Running model...
 how they would be able to do so.

"I think that the most important thing is
    done. Time:   0.304s (1.01x)

Output tokens:
   703   484   561   307  1498   284   466   523    13   198   198     1    40   892   326   262   749  1593  1517   318

Decoded output as text:
 how they would be able to do so.

"I think that the most important thing is

Chat interface

Here is an example chat using the largest 1558M model:

$ ./chat
Your name is fastGPT and you are an AI bot. The user will ask you questions and you answer in a nice, truthful, short way.
User: What is the capital of Czechia?
fastGPT: Prague.
User: How many legs does a dog have?
fastGPT: Four.
User: What color does the sky have?
fastGPT: Blue.
User: What can you type a document on?
fastGPT: A typewriter.
User: What can you drive in?
fastGPT: A car.
User: What can you fly in?
fastGPT: A plane.
User: What continent is Germany in?
fastGPT: Europe.
User: When did Second World War start?
fastGPT: 1939.
User: When did it end?
fastGPT: 1945.
User: When did the U.S. enter the Second World War?
fastGPT: 1941.
User: When did the First World War start?
fastGPT: 1914.
User: When did it end?
fastGPT: 1918.
User: When did the Mexican-American war start?
fastGPT: 1846.
User: When did it end?
fastGPT: 1848.
User: What color is snow?
fastGPT: White.
User: What color do plants usually have?
fastGPT: Green.
User: What is your name?
fastGPT: fastGPT.

BLAS Implementation

You can choose which BLAS implementation to use for matmul using:

  • -DFASTGPT_BLAS=OpenBLAS: Use OpenBLAS
  • -DFASTGPT_BLAS=Accelerate: Use the macOS Accelerate Framework
  • -DFASTGPT_BLAS=Fortran: Use the default Fortran's intrinsic matmul

Benchmarks

On Apple M1 Max, inference of the above input file (20 tokens):

                                1 core  2 cores  4 cores  8 cores

fastGPT (Accelerate, fast_tanh) 0.288s

fastGPT (Accelerate)            0.299s
PyTorch (Accelerate)            0.346s

fastGPT (OpenBLAS)              0.837s  0.514s    0.341s   0.339s
PyTorch (OpenBLAS)              0.873s  0.539s    0.386s   0.392s

fastGPT (Accelerate, no cache)  0.717s
picoGPT (Accelerate, no cache)  0.765s
PyTorch (Accelerate, no cache)  0.787s

fastGPT (OpenBLAS, no cache)    2.343s  1.603s    1.209s   1.018s
PyTorch (OpenBLAS, no cache)    2.356s  1.520s    1.104s   0.997s
picoGPT (OpenBLAS, no cache)    2.427s  1.645s    1.272s   1.081s

Total run (includes loading the model and Python imports):

fastGPT (Accelerate, fast_tanh): 0.401s
picoGPT (8 cores):               3.445s
PyTorch (OpenBLAS, 4 cores):     4.867s

TODO

  • Parallelization:
    • Over heads: #2
    • MPI: #5
  • Other sampling methods: #8
  • Batching: #7
  • Improve the UI:
    • Implement the input tokenizer in Fortran: #1
    • Show the words as they are generated: #6

More Repositories

1

theoretical-physics

Source code of the Theoretical Physics Reference online book
Python
206
star
2

fortran-utils

Various utilities for Fortran programs
Fortran
196
star
3

bcompiler

Mirror of http://www.rano.org/bcompiler.tar.gz, with a bootstrap script
Shell
79
star
4

dftatom

Routines for Radial Integration of Dirac, SchrΓΆdinger, and Poisson Equations
Fortran
73
star
5

fortran90.org

Sources of fortran90.org
Python
58
star
6

scipy-2013-tutorial

SymPy tutorial for SciPy 2013
Python
35
star
7

minpack

Library for solving nonlinear equations and nonlinear least squares problems
Fortran
32
star
8

stacktrace

Generate nice C/C++ stacktrace
C++
31
star
9

hfsolver

Hartree Fock solver
Jupyter Notebook
23
star
10

line_profiler

Python
21
star
11

SymbolicCpp

My git repository of the official SymbolicC++ release
C++
20
star
12

uuid

UUID code from util-linux (BSD licensed)
C
20
star
13

climate

Data and analysis of global warming
Jupyter Notebook
14
star
14

nektar

Mirror of https://gitlab.nektar.info/nektar/nektar
C++
14
star
15

slabikar-otf

Python
12
star
16

record

C
12
star
17

python-2.7

Python
10
star
18

feast

FEAST Eigenvalue Solver
Fortran
10
star
19

hermes2d

C++
9
star
20

sundials

C
8
star
21

fortran-generics

Use cases for Fortran generics
Fortran
8
star
22

mlc

Machine Learning Compiler
C
8
star
23

sphinx-jax

Example how to use mathjax with sphinx -- the rendering is very slow
Python
7
star
24

nwcc

Mirror of git://git.code.sf.net/p/nwcc/nwcc.git/
C
7
star
25

osmesa

C
7
star
26

n2n

C
6
star
27

wg5_platform_2020

My WG5 Convenor Platform
6
star
28

oscas-sympy

Article about SymPy
6
star
29

exodus

Mirror of git://git.code.sf.net/p/exodusii/code
C
5
star
30

enum34

Mirror of https://pypi.python.org/pypi/enum34
Python
5
star
31

restrict

Restricts the file access using LD_PRELOAD
C
5
star
32

scipy-2016-symengine-talk

SymEngine talk for SciPy 2016
Jupyter Notebook
5
star
33

numpy-fortran

NumPy Fortran Rosetta Stone
Fortran
5
star
34

femhub-online-lab

FEMhub Online Lab
JavaScript
5
star
35

mhd-hermes

Python
5
star
36

jython

5
star
37

fast-export

5
star
38

qft

Quantum Field Theory notes
Python
5
star
39

chess

C++
4
star
40

vagrant-jenkins

Jenkins installation scripts
Ruby
4
star
41

spd

Source Python Distribution
Python
4
star
42

notebook

JavaScript
4
star
43

dynsf

Automatically exported from code.google.com/p/dynsf
Python
4
star
44

paraview

4
star
45

ffte

FFTE: A Fast Fourier Transform Package (Official tarballs are unpacked into master as commits)
Fortran
4
star
46

libmesh

C
4
star
47

mathfn

This is a library with very fast implementations of numerical math functions
Python
4
star
48

python2.6

4
star
49

libxc

git mirror of libxc
C
4
star
50

python-theora

Python
3
star
51

spd_notebook

Python
3
star
52

qsnake_packages

3
star
53

abinit-cmake

Fortran
3
star
54

hermes1d

hp-FEM solver in 1D
C
3
star
55

hpfem.org

3
star
56

schroedinger

Schroedinger solver using hermes2d
Python
3
star
57

sympy-oldcore

3
star
58

ACESIII

Fortran
3
star
59

jsplot

JavaScript
3
star
60

cgit

C
3
star
61

ustr

Mirror of http://www.and.org/ustr/ustr.git
C
3
star
62

debexpo

Python
3
star
63

laplace_test

Laplace solver language shootout
Python
3
star
64

femhub_notebook

JavaScript
3
star
65

PuDB

A fork of the PuDB debugger.
Python
3
star
66

bugzilla_ui

JavaScript
3
star
67

sfepy

My sfepy repository
3
star
68

phd_thesis

Sources of my Ph.D. thesis
Python
3
star
69

grouptheorynotes

3
star
70

scipy09-tutorial

Python
2
star
71

certik_cz

web pages
2
star
72

numpy-vendor

Shell
2
star
73

cpp_stacktrace

Generate nice C++ stacktrace. This project is discontinued. Use https://github.com/certik/stacktrace instead.
C++
2
star
74

git

2
star
75

openmx

C
2
star
76

sysconf

A simple C program that prints the number of processors
C
2
star
77

pyquante

Python
2
star
78

openconnect

copy of git://git.infradead.org/users/dwmw2/openconnect.git
C
2
star
79

ginac

http://www.ginac.de/
C++
2
star
80

femhub.org

2
star
81

scipy-in-13

SciPy India 2013 keynote and tutorials
2
star
82

femhub

FEMhub
Python
2
star
83

pyvascript

Python
2
star
84

gitosis

Python
2
star
85

dotfiles

My ~/.* files
Shell
2
star
86

mesheditor

ActionScript
2
star
87

lab-dev

Scripts for managing lab-dev
2
star
88

v8

C++
2
star
89

python-3.2

Python
2
star
90

hermes1d-llnl

LLNL hermes1d repository
C++
2
star
91

grid_mapping

Grid Mapping, taken from Truchas (https://github.com/truchas/truchas-release)
Fortran
2
star
92

webhooks

Python
2
star
93

spilka-hooks

Shell
2
star
94

scikits.gpu

Python
2
star
95

patchrobot

Python
2
star
96

aa

2
star
97

fullerene

Python
2
star
98

rcp

C++
2
star
99

master

my master thesis
Vim Script
2
star
100

libint

C++
2
star