• Stars
    star
    131
  • Rank 275,867 (Top 6 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 1 year ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Maximize your usage of OpenAI models without hitting rate limits

openlimit

A simple tool for maximizing usage of the OpenAI API without hitting the rate limit.

  • Handles both request and token limits
  • Precisely (to the millisecond) enforces rate limits with one line of code
  • Handles synchronous and asynchronous requests
  • Plugs into Redis to track limits across multiple threads, processes, or servers

Implements the generic cell rate algorithm, a variant of the leaky bucket pattern.

Installation

You can install openlimit with pip:

$ pip install openlimit

Usage

Define a rate limit

First, define your rate limits for the OpenAI model you're using. For example:

from openlimit import ChatRateLimiter

rate_limiter = ChatRateLimiter(request_limit=200, token_limit=40000)

This sets a rate limit for a chat completion model (e.g. gpt-4, gpt-3.5-turbo). openlimit offers different rate limiter objects for different OpenAI models, all with the same parameters: request_limit and token_limit. Both limits are measured per-minute and may vary depending on the user.

Rate limiter Supported models
ChatRateLimiter gpt-4, gpt-4-0314, gpt-4-32k, gpt-4-32k-0314, gpt-3.5-turbo, gpt-3.5-turbo-0301
CompletionRateLimiter text-davinci-003, text-davinci-002, text-curie-001, text-babbage-001, text-ada-001
EmbeddingRateLimiter text-embedding-ada-002

Apply the rate limit

To apply the rate limit, add a with statement to your API calls:

chat_params = {
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
}

with rate_limiter.limit(**chat_params):
    response = openai.ChatCompletion.create(**chat_params)

Ensure that rate_limiter.limit receives the same parameters as the actual API call. This is important for calculating expected token usage.

Alternatively, you can decorate functions that make API calls, as long as the decorated function receives the same parameters as the API call:

@rate_limiter.is_limited()
def call_openai(**chat_params):
    response = openai.ChatCompletion.create(**chat_params)
    return response

Asynchronous requests

Rate limits can be enforced for asynchronous requests too:

chat_params = {
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
}

async with rate_limiter.limit(**chat_params):
    response = await openai.ChatCompletion.acreate(**chat_params)

Distributed requests

By default, openlimit uses an in-memory store to track rate limits. But if your application is distributed, you can easily plug in a Redis store to manage limits across multiple threads or processes.

from openlimit import ChatRateLimiterWithRedis

rate_limiter = ChatRateLimiterWithRedis(
    request_limit=200,
    token_limit=40000,
    redis_url="redis://localhost:5050"
)

# Use `rate_limiter` like you would normally ...

All RateLimiter objects have RateLimiterWithRedis counterparts.

Token counting

Aside from rate limiting, openlimit also provides methods for counting tokens consumed by requests.

Chat requests

To count the maximum number of tokens that could be consumed by a chat request (e.g. gpt-3.5-turbo, gpt-4), pass the request arguments into the following function:

from openlimit.utilities import num_tokens_consumed_by_chat_request

request_args = {
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "...", "content": "..."}, ...],
    "max_tokens": 15,
    "n": 1
}
num_tokens = num_tokens_consumed_by_chat_requests(**request_args)

Completion requests

Similar to chat requests, to count tokens for completion requests (e.g. text-davinci-003), pass the request arguments into the following function:

from openlimit.utilities import num_tokens_consumed_by_completion_request

request_args = {
    "model": "text-davinci-003",
    "prompt": "...",
    "max_tokens": 15,
    "n": 1
}
num_tokens = num_tokens_consumed_by_completion_request(**request_args)

Embedding requests

For embedding requests (e.g. text-embedding-ada-002), pass the request arguments into the following function:

from openlimit.utilities import num_tokens_consumed_by_embedding_request

request_args = {
    "model": "text-embedding-ada-002",
    "input": "..."
}
num_tokens = num_tokens_consumed_by_embedding_request(**request_args)

Contributing

If you want to contribute to the library, get started with Adrenaline. Paste in a link to this repository to familiarize yourself.

More Repositories

1

rebound

Command-line tool that instantly fetches Stack Overflow results when an exception is thrown
Python
4,094
star
2

adrenaline

Instant answers to any programming question
3,771
star
3

BitVision

Terminal dashboard for trading Bitcoin, predicting price movements, and losing all your money
JavaScript
1,197
star
4

communities

Library of community detection algorithms and visualization tools
Python
714
star
5

stackexplain

Explain your error message with ChatGPT
Python
515
star
6

sequitur

Library of autoencoders for sequential data
Python
417
star
7

ChatOverflow

AI-generated answers to every coding question
JavaScript
330
star
8

statcode

Man pages for HTTP status codes
Python
311
star
9

SmarterReply

Chrome extension for creating custom Smart Replies in Gmail
JavaScript
43
star
10

densify

Data augmentation algorithm for point clouds
Python
19
star
11

git-pull

Parallelized web scraper for Github
Python
17
star
12

CommunityNet

Hierarchical GNN for graph datasets with community structure
Python
13
star
13

saplings

Analyze usage patterns of imported modules in a Python program
Python
12
star
14

SeqConv

Graph convolutional operator that uses a LSTM as a filter
Python
9
star
15

BTC-Mining-Calculator

Simple command-line tool for predicting the amount of Bitcoin your device can mine in the next 24hrs
Python
8
star
16

gnn-dtsp

MATH 490 Final Project: Approximating solutions to the decision variant of the TSP with Graph Neural Networks
HTML
7
star
17

overcast

Desktop app that employs end-to-end encryption with forward secrecy for FB Messenger
JavaScript
7
star
18

DeepFCN

Deep learning tool for predicting individual differences (e.g. diagnostic status, IQ, etc.) from brain networks
Python
7
star
19

TypeSense

Chrome extension that analyzes a Messenger conversation's sentiment in real-time
JavaScript
7
star
20

neuropipe

Easy scaffolding for machine learning pipelines in Scikit-Learn
Python
6
star
21

MatrixConv

PyTorch implementation of a GNN with a CNN filter
Python
6
star
22

mvpa

Multivoxel pattern analysis (MVPA) tool for fMRI data
Python
4
star
23

tabber

Chrome extension for saving (and organizing) interesting FB messages, i.e. Pocket for Messenger.com
JavaScript
4
star
24

topigraph

A simple graph-based topic modeling algorithm
Python
3
star
25

PyReserve

Generate a project template and reserve a name on PyPi with one command
Python
3
star
26

dasher-landing-page

Dasher Software's prelaunch landing page
CSS
2
star
27

Equaliser

Automated unit testing for IEquatable objects
C#
2
star
28

excusabot

Mobile app that auto-notifies your company's Slack channel when you're running late for work (made for the ROSS hackathon)
JavaScript
2
star
29

adrenaline-vscode

2
star
30

outgraph

Outlier detection tool for graph datasets
Python
1
star
31

Mirror

Front-end for a chrome extension built at MHacks VI
CSS
1
star
32

enumerast

Algorithm that enumerates all possible execution paths in a Python AST
1
star
33

Course-Checker

A script that scrapes the status of any given UIUC course
Python
1
star
34

Sediment

Tutorial project that uses linear regression to predict a wine's quality given its chemical properties
Python
1
star
35

personal-site

CSS
1
star
36

test-repository

This is for testing the reindexing process on Adrenaline
1
star
37

University-Infographic

Infographic website demonstrating the growing college tuition bubble
CSS
1
star
38

Overcast-Website

Coming soon page for Overcast, an encrypted messaging app
CSS
1
star
39

coinhopper.io

An FPGA mining interface that dynamically mines cryptocurrencies based on each coin's predicted yield
Python
1
star
40

Vacation-Site

Landing page for a vacation rental located in Sanibel, Florida
JavaScript
1
star
41

ChatOverflow-site

Website for the ChatOverflow browser plugin
JavaScript
1
star