• Stars
    star
    164
  • Rank 230,032 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created about 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

OpenAI API webserver

openai-server

openai-server is an implementation of the OpenAI API.

Specifically, we implement /v1/engines/list and /v1/engines/{model_name}/completions endpoints.

Both endpoints are mostly feature-complete, with a few differences. The JSON response is identical; any library that works with the OpenAI API will probably work with this.

To get started, see the quickstart or the examples or the JavaScript API.

Contact

Quickstart

# grab the code.
git clone https://github.com/shawwn/openai-server
cd openai-server

# install dependencies.
pip3 install -r requirements.txt

# grab a gpt-2 model.
python3 download_model.py 117M # or 345M, 774M, 1558M

# start the server.
MODELS=117M bash prod.sh

# in a new terminal, ask for a completion.
bash 002_test_completion.sh 'Hello there. My name is'

Your server is now serving the OpenAI API at localhost:9000. (You can change the port via export PORT=8000)

Examples

Generating completions via the openai SDK

You can grab some completions using the official openai command-line tool:

$ OPENAI_API_BASE=http://localhost:9000 openai api completions.create -e davinci -p 'Hello, world' -t 0.8 -M 16 -n 4
===== Completion 0 =====
Hello, world. It seems like a good idea to make a living. The fact that it
===== Completion 1 =====
Hello, world. This is not the first time you're seeing the same thing at any given
===== Completion 2 =====
Hello, world, please do my best to continue the development of Monad and its conforming
===== Completion 3 =====
Hello, world controlled enemy.

"Be careful. We have come across a near total

Continuously dump completions to terminal

$ bash 003_completions.sh 'Yo dawg, we implemented OpenAI API'
Yo dawg, we implemented OpenAI API. Now, we have the ability to connect to Signal, a cryptographic data store.

We can now make this secure by using new kid on the block chain, OpenAI.

OpenAI is the new block chain protocol for the internet. This is a major milestone. As the internet becomes more open and open for everybody, it is important for us to have a robust, high-quality blockchain. It is also important that we never create an untraceable chain. The blockchain is the only way to guarantee that everyone has the same access to the network.

We are an open consortium and we believe that the blockchain is the bridge between the internet and the rest of the world. We're committed to this project. We believe that the blockchain is a bridge between the internet and
^C

Fetch the JSON endpoint manually

$ curl 'http://localhost:9000/v1/engines/117M/completions?prompt=Hello,%20my%20name%20is&max_tokens=32&n=4&temperature=0.9&echo=true'
{
  "choices": [
    {
      "finish-reason": "length",
      "index": 0,
      "logprobs": null,
      "text": "Hello, my name is Loium Chazz, and I have been far from satisfied with your departure. But I will, at least by some chance, give you permission to decide for"
    },
    {
      "finish-reason": "length",
      "index": 1,
      "logprobs": null,
      "text": "Hello, my name is Tim and my name is Jodie. Yours, Tom.\n\nTim: Oh hello, my name is Tim.\n\nJB: Where?'"
    },
    {
      "finish-reason": "length",
      "index": 2,
      "logprobs": null,
      "text": "Hello, my name is Rosen Sylvan. That's right, Buck Paoli, who was a member of the Board of Governors for George W. Bush in the 2009 Democratic primary\u2014"
    },
    {
      "finish-reason": "length",
      "index": 3,
      "logprobs": null,
      "text": "Hello, my name is Nick Martens, I am an English-speaking Canadian, University of Toronto, Mississauga, Canada. I work in a computer software company located in Canada."
    }
  ],
  "created": 1601701785.777768,
  "id": "cmpl-3qN8kwW1Ya7_qxWz4h8wuIzN",
  "model": "117M",
  "object": "text_completion"
}

Explore via your browser

You can open the JSON endpoint in your browser and start playing around with the query params.

A simple bash script for dumping completions

$ T=0.8 M=32 bash 002_test_completion.sh 'Hello, my name is'
Hello, my name is Plato and, like many of you, I am very happy with the pre-release.

The primary goal of the pre-release was to provide

The first argument to 002_test_completion.sh is the prompt:

bash 002_test_completion.sh 'Hello there. My name is'

You can set the temperature using T=0.8 and the token count using M=32:

T=0.8 M=32 bash 002_test_completion.sh 'Hello there. My name is'

To read a prompt from a file, simply pass in the filename. If the first argument is a valid filename, the file becomes the prompt:

T=0.8 M=32 bash 002_test_completion.sh README.md

If the prompt is too long, the last 1023 - M tokens of the prompt are used. Note: This means if you request 500 tokens, it will only use 1023 minus 500 tokens from the prompt. Therefore, to let GPT see as many tokens as possible, request a small number of tokens (e.g. 16).

Setting up everything from scratch

A complete example of how to go from zero code to a fully functional OpenAI API server:

# grab the code.
git clone https://github.com/shawwn/openai-server
cd openai-server

# install dependencies.
pip3 install -r requirements.txt

# grab all models (requires ~8GB of disk space; if low, just download 117M, which only requires 550MB)
python3 download_model.py 117M
python3 download_model.py 345M
python3 download_model.py 774M
python3 download_model.py 1558M

# then, do *one* of the following:

# ...serve one specific model:
MODELS=117M bash prod.sh

# ...or serve multiple models:
MODELS=1558M,117M bash prod.sh

# ...or serve all models you've downloaded (the default):
bash prod.sh

The server listens on port 9000 by default. You can change it via PORT:

PORT=8080 bash prod.sh

Now that the server is running, you can start making API requests. See examples.

Notes

A warning about frequency_penalty

for 1558M, the best results seem to come from temperature=0.6 and frequency_penalty=0.9:

curl 'http://localhost:9000/v1/engines/1558M/completions?prompt=Hello,%20my%20name%20is&max_tokens=32&n=4&temperature=0.4&frequency_penalty=0.9&echo=true'

But beware: you shouldn't use frequency_penalty unless your model is the largest (1558M, commonly known as "1.5B"). For some reason, frequency_penalty causes the output to be scrambled when the model is smaller than 1558M.

Running in production

For production usage, consider running it via the following command:

while true; do MODELS=117M bash prod.sh ; sleep 20 ; done

That way, if the server terminates for any reason, it will automatically restart.

For endpoint monitoring, I recommend updown.io.

Community

Join the ML Discord

If you're an ML enthusiast, join the ML Discord. There are ~800 members, with ~120 online at any given time:

image

There are a variety of interesting channels:

  • #papers for pointing out interesting research papers
  • #research for discussing ML research
  • #show and #samples for showing off your work
  • #hardware for hardware enthusiasts
  • #ideas for brainstorming
  • #tensorflow and #pytorch
  • #cats, #doggos, and of course #memes
  • Quite a few more.

Support me

If you found this library helpful, consider joining my patreon.

More Repositories

1

llama-dl

High-speed download of LLaMA, Facebook's 65B parameter GPT model
Shell
4,166
star
2

noh

An open source implementation of Icefrog's DotA, with a pretty amazing engine. Builds in 3 minutes flat; cross-platform.
C++
199
star
3

colab-tricks

Tricks for Colab power users
Jupyter Notebook
169
star
4

scrap

Nearly a thousand bash and python scripts I've written over the years.
Python
85
star
5

tpunicorn

Babysit your preemptible TPUs
Python
78
star
6

wiki

Research wiki
Haskell
51
star
7

sparkvis

Visualize tensors in a plain Python REPL using Sparklines
Python
43
star
8

ml-notes

Python
36
star
9

jaxnotes

Jupyter Notebook
20
star
10

shmem

Lightweight C++ cross-platform named shared memory interface.
17
star
11

hon

A complete copy of the Heroes of Newerth development environment from January 2011
C++
17
star
12

mount

Shell
16
star
13

humble-sliders

Provides "split payment" sliders as seen in Humble Bundle.
JavaScript
13
star
14

pymen

JavaScript
12
star
15

gaping

Python
12
star
16

website

The code that runs my blog: https://blog.gpt4.org/
CSS
10
star
17

disruption

Ultra-fast crossplatform IPC, inspired by the LMAX "Disruptor" pattern.
C
10
star
18

arc

Arc
9
star
19

stylegan-server

Jupyter Notebook
8
star
20

chalkie

Chalk for the browser. Useful with xterm.js.
JavaScript
6
star
21

Celeste

Celeste game engine, generated with JetBrains Rider
C#
6
star
22

mtftorch

Python
4
star
23

arcmacs

Arc and Scheme for Emacs Lisp
Emacs Lisp
4
star
24

mel

Python
4
star
25

sparc

Arc Lisp
Arc
4
star
26

arxiv-vanity-bookmarklet

A bookmarklet for arxiv-vanity.com
HTML
3
star
27

arachnid-old

Arachnid is a cross-platform 3D graphics engine.
C++
3
star
28

docs.gpt4.org

Python
3
star
29

ansi-escapes-python

ANSI escape codes for manipulating the terminal
Python
3
star
30

tftorch

3
star
31

pytreez

An implementation of Jax pytrees in pure python
Python
3
star
32

jaxtpu

A helper package to install the latest JAX on TPUs, along with all necessary dependencies (e.g. libtpu-nightly)
Python
3
star
33

beatsaber-python

Beatsaber for Python
Python
3
star
34

npnd

Numpy n-dimensional ops (scatter, gather, one-hot, etc)
Python
2
star
35

danbooru-tools

Python
2
star
36

dnnlib-util

StyleGAN2's dnnlib/util.py functionality as a standalone library
Python
2
star
37

bel-old

A mirror of Paul Graham's Bel source code and documentation, formatted
Common Lisp
2
star
38

paulg

Various utility functions by Paul Graham, implemented in Python
Python
2
star
39

apple-m1-for-ml

2
star
40

auto_assign_role

A discord bot to automatically assign roles to users whenever they rejoin the server.
Python
2
star
41

pyjax

Google's JAX library, in pure Python (no dependency on jaxlib)
2
star
42

unixpath

unix-style path processing functions
Python
2
star
43

racket-unix-sockets

Provides unix domain sockets for Racket.
Racket
2
star
44

imle

Python
2
star
45

ansi-styles-python

ANSI escape codes for styling strings in the terminal
Python
2
star
46

arachnid

High-performance 3D cross-platform game engine.
C
2
star
47

hask

Haskell in Python
Python
2
star
48

tensorflow-checkpoint-reader

Pure Python implementation of Tensorflow -- at least, enough of it to load checkpoints. :) Perfect for M1 laptops, since apparently installing Tensorflow on M1's is beyond mortal ability. But reimplementing Tensorflow in Python wasn't beyond my ability.
Python
2
star
49

coronator

Python
2
star
50

armadillo

C++
1
star
51

specnorm

Spectral norms of tensors
Python
1
star
52

books

1
star
53

d4tree

Visualize your trees using D3. 🌲
JavaScript
1
star
54

tcell

Scheme thread cells in Python
Python
1
star
55

paulg-python

Various utility functions by Paul Graham, implemented in Python
Python
1
star
56

beatsaber-lib

Various library functions related to beatsaber (replays, etc)
Python
1
star
57

xlaz

Python
1
star
58

rasterizer

1
star
59

PyTorrent

Python
1
star
60

get-annotations

A backport of Python 3.10's `inspect.get_annotations()` function
Python
1
star
61

arc3.2

Arc
1
star
62

noizy

Log function calls: from noizy import noizy
Python
1
star
63

netflix-skip

HTML
1
star
64

lumen-string-replace

string-replace for Lumen
Common Lisp
1
star
65

RecreatingNSApplication

Objective-C
1
star
66

eli

Emacs Lisp runtime interface for Python
Python
1
star
67

tfimg

Python
1
star
68

pendulum

HTML
1
star
69

rtmidi

Python
1
star
70

gallery

HTML
1
star
71

daxx-lightning

Python
1
star
72

arclang

Python
1
star
73

plexer

A simple lexer written in Python.
Python
1
star
74

snippets

John Ratcliff's code snippets
C++
1
star
75

ski

JavaScript
1
star
76

noh-game

GLSL
1
star