• Stars
    star
    1,029
  • Rank 43,105 (Top 0.9 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created 12 months ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

RayLLM - LLMs on Ray

Aviary - Study stochastic parrots in the wild

Go on bird watch right now: 🦜🔍 Aviary 🦜🔍

Aviary is an app that lets you interact with a variety of large language models (LLMs) in a single place. You can compare the outputs of different models directly, rank them by quality, get a cost and latency estimate, and more. In particular, it offers good support for Transformer models hosted on Hugging Face and in many cases also supports DeepSpeed inference acceleration.

Aviary also supports continuous batching by integrating with Hugging Face text-generation-inference (an optional dependency). Continuous batching allows you to get much better throughput and latency than static batching.

Aviary is built on top of Ray by Anyscale. It's an open source project, which means that you can deploy it yourself to a cloud service, or simply use our hosted version. If you would like to use a managed version of Aviary specific to your company, please reach out to us.

Table of Contents

Getting Help and Filing Bugs / Feature Requests

We are eager to help you get started with Aviary. You can get help on:

For bugs or for feature requests, please submit them here.

We have people in both US and European time zones who will help answer your questions.

Contributions

We are also interested in accepting contributions. Those could be anything from a new evaluator, to integrating a new model with a yaml file, to more. Feel free to post an issue first to get our feedback on a proposal first, or just file a PR and we commit to giving you prompt feedback.

Getting Help and Filing Bugs / Feature Requests

We are eager to help you get started with Aviary. You can get help on:

For bugs or for feature requests, please submit them here.

We have people in both US and European time zones who will help answer your questions.

Contributions

We are also interested in accepting contributions. Those could be anything from a new evaluator, to integrating a new model with a yaml file, to more. Feel free to post an issue first to get our feedback on a proposal first, or just file a PR and we commit to giving you prompt feedback.

Aviary User Guides

For a video introduction, see the following intro. Note: There have been some minor changes since the video was recorded. The guide below is more up to date.

Watch the video

Deploy Aviary

The guide below walks you through a minimal installation of Aviary for use on an Open Source cloud deployment.

Set up your laptop

You will need ray and aviary to be installed on your laptop. ray has to be the latest nightly version.

# The link below WILL CHANGE dependning on your platform and python version
# See https://docs.ray.io/en/latest/ray-overview/installation.html#daily-releases-nightlies
pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl
pip install "aviary @ git+https://github.com/ray-project/aviary.git"

The default Aviary installation only includes the Aviary CLI and SDK.

To install the Aviary UI, use the following command. This will enable you to run the Aviary frontend on your laptop.

pip install "aviary[frontend] @ git+https://github.com/ray-project/aviary.git"

Start a Ray Cluster

Deploy is currently only supported on AWS. Make sure you have exported your AWS credentials locally.

export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_SESSION_TOKEN=...

Start by cloning this repo to your local machine.

You may need to specify your AWS private key in the deploy/ray/aviary-cluster.yaml file. See Ray on Cloud VMs page in Ray documentation for more details.

git clone https://github.com/ray-project/aviary.git
cd aviary

# Start a Ray Cluster (This will take a few minutes to start-up)
ray up deploy/ray/aviary-cluster.yaml

If you want to use continous batching, edit deploy/ray/aviary-cluster.yaml replacing

docker:
    image: "anyscale/aviary:latest"

with

docker:
    image: "anyscale/aviary:latest-tgi"

Connect to your Cluster

# Connect to the Head node of your Ray Cluster (This will take several minutes to autoscale)
ray attach deploy/ray/aviary-cluster.yaml

# Deploy the LightGPT model. 
aviary run --model ./models/static_batching/amazon--LightGPT.yaml

You can deploy any model in the models directory of this repo, or define your own model YAML file and run that instead.

Query Aviary

From the head node, run the following commands.

export AVIARY_URL="http://localhost:8000"

# List the available models
aviary models
amazon/LightGPT

# Query the model
aviary query --model amazon/LightGPT --prompt "How do I make fried rice?"
amazon/LightGPT:
To make fried rice, start by heating up some oil in a large pan over medium-high
heat. Once the oil is hot, add your desired amount of vegetables and/or meat to the
pan. Cook until they are lightly browned, stirring occasionally. Add any other
desired ingredients such as eggs, cheese, or sauce to the pan. Finally, stir
everything together and cook for another few minutes until all the ingredients are
cooked through. Serve with your favorite sides and enjoy!

You can also use aviary query with certain LangChain-compatible APIs. Currently, we support the following APIs:

  • openai (langchain.llms.OpenAIChat)
# langchain is an optional dependency
pip install langchain

export OPENAI_API_KEY=...

# Query an Aviary model and OpenAI model
# [PROVIDER]://[MODEL_NAME]
aviary query --model amazon/LightGPT --model openai://gpt-3.5-turbo --prompt "How do I make fried rice?"

Aviary Reference

Installing Aviary

To install Aviary and its dependencies, run the following command:

pip install "aviary @ git+https://github.com/ray-project/aviary.git"

The default Aviary installation only includes the Aviary API client.

Aviary consists of a backend and a frontend, both of which come with additional dependencies. To install the dependencies for both frontend and backend for local development, run the following commands:

pip install "aviary[frontend,backend] @ git+https://github.com/ray-project/aviary.git"

The backend dependencies are heavy weight, and quite large. We only recommend installing them on a cluster.

Running Aviary Frontend locally

Aviary consists of two components, a backend and a frontend. The backend exposes a FastAPI interface running on a Ray cluster, that allows you to query various LLMs efficiently. The frontend is a Gradio interface that allows you to interact with the models in the backend through a web interface. The Gradio app is served using Ray Serve.

To run the Aviary frontend locally, you need to set the following environment variable:

export AVIARY_URL=<hostname of the backend, eg. 'http://localhost:8000'>

Once you have set these environment variables, you can run the frontend with the following command:

serve run aviary.frontend.app:app

To just use the Gradio frontend without Ray Serve, you can start it with python aviary/frontend/app.py.

If you don't have access to a deployed backend, or would just like to test and develop the frontend, you can run a mock backend locally by setting AVIARY_MOCK=True:

AVIARY_MOCK=True python aviary/frontend/app.py

In any case, the Gradio interface should be accessible at http://localhost:7860 in your browser. If running the frontend yourself is not an option, you can still use our hosted version for your experiments.

Usage stats collection

Aviary backend collects basic, non-identifiable usage statistics to help us improve the project. The mechanism for collection is the same as in Ray. For more information on what is collected and how to opt-out, see the Usage Stats Collection page in Ray documentation.

Using the Aviary CLI

Aviary comes with a CLI that allows you to interact with the backend directly, without using the Gradio frontend. Installing Aviary as described earlier will install the aviary CLI as well. You can get a list of all available commands by running aviary --help.

Currently, aviary supports a few basic commands, all of which can be used with the --help flag to get more information:

# Get a list of all available models in Aviary
aviary models

# Query a model with a list of prompts
aviary query --model <model-name> --prompt <prompt_1> --prompt <prompt_2>

# Run a query on a text file of prompts
aviary query  --model <model-name> --prompt-file <prompt-file>

# Evaluate the quality of responses with GPT-4 for evaluation
aviary evaluate --input-file <query-result-file>

# Start a new model in Aviary from provided configuration
aviary run <model>

CLI examples

Listing all available models

aviary models
mosaicml/mpt-7b-instruct
CarperAI/stable-vicuna-13b-delta
databricks/dolly-v2-12b
RWKV/rwkv-raven-14b
mosaicml/mpt-7b-chat
stabilityai/stablelm-tuned-alpha-7b
lmsys/vicuna-13b-delta-v1.1
mosaicml/mpt-7b-storywriter
h2oai/h2ogpt-oasst1-512-12b
OpenAssistant/oasst-sft-7-llama-30b-xor

Running two models on the same prompt

aviary query --model mosaicml/mpt-7b-instruct --model RWKV/rwkv-raven-14b \
  --prompt "what is love?"
mosaicml/mpt-7b-instruct:
love can be defined as feeling of affection, attraction or ...
RWKV/rwkv-raven-14b:
Love is a feeling of strong affection and care for someone or something...

Running a batch-query of two prompts on the same model

aviary query --model mosaicml/mpt-7b-instruct \
  --prompt "what is love?" --prompt "why are we here?"

Running a query on a text file of prompts

aviary query --model mosaicml/mpt-7b-instruct --prompt-file prompts.txt

Evaluating the quality of responses with GPT-4 for evaluation

 aviary evaluate --input-file aviary-output.json --evaluator gpt-4

This will result in a leaderboard-like ranking of responses, but also save the results to file:

What is the best indie band of the 90s?
                                              Evaluation results (higher ranks are better)                                               
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Model                    ┃ Rank ┃                                                                                            Response ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ mosaicml/mpt-7b-instruct │ 1    │  The Shins are often considered to be one of the greatest bands from this era, with their album 'Oh │
│                          │      │        Inverted World' being widely regarded as one of the most influential albums in recent memory │
│ RWKV/rwkv-raven-14b      │ 2    │ It's subjective and depends on personal taste. Some people might argue that Nirvana or The Smashing │
│                          │      │                       Pumpkins were the best, while others might prefer Sonic Youth or Dinosaur Jr. │
└──────────────────────────┴──────┴─────────────────────────────────────────────────────────────────────────────────────────────────────┘

You can also use the Gradio API directly, by following the instructions provided in the Aviary documentation.

Aviary Model Registry

Aviary allows you to easily add new models by adding a single configuration file. To learn more about how to customize or add new models, see the Aviary Model Registry.

Contributing

If you want to help improve or extend the Aviary, please get in touch with us! You can reach us via email for feedback and suggestions, or open an issue on GitHub. Pull requests are also welcome!

We use pre-commit hooks to ensure that all code is formatted correctly. Make sure to pip install pre-commit and then run pre-commit install. You can also run ./format to run the hooks manually.

Running tests

To run the tests, you need to install the test dependencies:

pip install -e .[test]

and then simply run pytest:

pytest .

Known issues

Aviary is still in early development, and there are a few known issues:

  • Latency and throughput are not optimized yet. This is due to the fact that we have chosen to focus on simplicity and readability for the first release. Ray and Ray Serve are framework-agnostic and Aviary can be easily modified to use FasterTransformer or other high-performance frameworks. We will continue working on improving this.
  • lmsys/vicuna-13b-delta-v1.1 model sometimes answers to English questions in Mandarin.

Future plans

  • LangChain + LlamaIndex Integration (which will make it much easier to compare open and closed LLMs).
  • Better testing.
  • Improved documentation.

More Repositories

1

ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Python
30,993
star
2

llm-numbers

Numbers every LLM developer should know
3,845
star
3

kuberay

A toolkit to run Ray applications on Kubernetes
Go
861
star
4

tutorial

Jupyter Notebook
772
star
5

tune-sklearn

A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.
Python
464
star
6

llmperf

LLMPerf is a library for validating and benchmarking LLMs
Python
366
star
7

llmperf-leaderboard

358
star
8

ray-educational-materials

This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.
Jupyter Notebook
272
star
9

ray_lightning

Pytorch Lightning Distributed Accelerators using Ray
Python
204
star
10

langchain-ray

Examples on how to use LangChain and Ray
Python
202
star
11

rl-experiments

Keeping track of RL experiments
148
star
12

xgboost_ray

Distributed XGBoost on Ray
Python
132
star
13

deltacat

A portable Pythonic Data Catalog API powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.
Python
97
star
14

rayfed

A multiple parties joint, distributed execution engine based on Ray, to help build your own federated learning frameworks in minutes.
Python
81
star
15

mobius

Mobius is an AI infrastructure platform for distributed online learning, including online sample processing, training and serving.
Java
78
star
16

plasma

A minimal shared memory object store design
C
40
star
17

enhancements

Tracking Ray Enhancement Proposals
40
star
18

lightgbm_ray

LightGBM on Ray
Python
40
star
19

ray_beam_runner

Ray-based Apache Beam runner
Python
37
star
20

mlflow-ray-serve

MLFlow Deployment Plugin for Ray Serve
Python
35
star
21

distml

Distributed ML Optimizer
Python
29
star
22

llms-in-prod-workshop-2023

Deploy and Scale LLM-based applications
Jupyter Notebook
23
star
23

ray-legacy

An experimental distributed execution engine
Python
21
star
24

ray_shuffling_data_loader

A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed training of machine learning models.
Python
18
star
25

pygloo

Pygloo provides Python bindings for Gloo.
C++
15
star
26

contrib-workflow-dag

Python
11
star
27

anyscale-berkeley-ai-hackathon

Ray and Anyscale for UC Berkeley AI Hackathon!
Jupyter Notebook
11
star
28

credis

C++
9
star
29

ray-acm-workshop-2023

Scalable/Distributed Computer Vision with Ray
Jupyter Notebook
9
star
30

spark-ray-example

A simple demonstration of embedding Ray in a Spark UDF. For Spark + AI Summit 2020.
Jupyter Notebook
8
star
31

community

Artifacts intended to support the Ray Developer Community: SIGs, RFC overviews, and governance. We're very glad you're here! ✨
8
star
32

arrow

Mirror of Apache Arrow
C++
6
star
33

llm-application

Jupyter Notebook
6
star
34

releaser

Python
5
star
35

scalable-learning

Scaling multi-node multi-GPU workloads
5
star
36

raynomics

Experimental genomics algorithms in Ray
Python
5
star
37

air-reference-arch

Jupyter Notebook
5
star
38

serve-movie-rec-demo

Python
5
star
39

maze-raylit

Hackathon 2020! Max Archit Zhe
Python
5
star
40

ray-serve-arize-observe

Building Real-Time Inference Pipelines with Ray Serve
Jupyter Notebook
5
star
41

anyscale-workshop-nyc-2023

Scalable NLP model fine-tuning and batch inference with Ray and Anyscale
Jupyter Notebook
5
star
42

kuberay-helm

Helm charts for the KubeRay project
Mustache
4
star
43

ray-saturday-dec-2022

Ray Saturday Dec 2022 edition
Jupyter Notebook
4
star
44

RFC

Community Documents
4
star
45

sandbox

Ray repository sandbox
Python
4
star
46

ray-demos

Collection of demos build with Ray
Jupyter Notebook
4
star
47

prototype_gpu_buffer

Python
3
star
48

arrow-build

Queue for building arrow
3
star
49

numbuf

Serializing primitive Python types in Arrow
C++
3
star
50

odsc-west-workshop-2023

Jupyter Notebook
3
star
51

2022_04_13_ray_serve_meetup_demo

Code samples for Ray Serve Meetup on 04/13/2022
Python
2
star
52

q4-2021-docs-hackathon

HTML
2
star
53

ray-scripts

Experimental scripts for deploying and using Ray
Shell
2
star
54

raytracer

Polymer WebUI for Ray
HTML
2
star
55

travis-tracker-v2

Python
2
star
56

scipy-ray-scalable-ml-tutorial-2023

Jupyter Notebook
2
star
57

rllib-contrib

Python
2
star
58

serve_workloads

Python
2
star
59

qcon-workshop-2023

Jupyter Notebook
2
star
60

travis-tracker

Dashboard for Tracking Travis Python Test Result.
TypeScript
1
star
61

common

Code that is shared between Ray projects
C
1
star
62

photon

A local scheduler and node manager for Ray
C
1
star
63

spmd_grid

Grid-style gang-scheduling and collective communication for Ray
Python
1
star
64

checkstyle_java

Python
1
star
65

raylibs

Libraries for Ray
1
star
66

issues-to-airtable

JavaScript
1
star
67

ray-docs-zh

Chinese translation of Ray documentation. This may not be update to date.
1
star
68

ray-project.github.io

The Ray project website
HTML
1
star
69

streaming

Streaming processing engine based on ray platform.
1
star
70

train-serve-primer

Jupyter Notebook
1
star
71

serve_config_examples

Python
1
star
72

Ray-Forward

Some resources about Ray Forward Meetup
1
star
73

ray-summit-2022

Website for Ray Summit 2022
HTML
1
star