Aviary - Study stochastic parrots in the wild
Go on bird watch right now:
Aviary is an app that lets you interact with a variety of large language models (LLMs) in a single place. You can compare the outputs of different models directly, rank them by quality, get a cost and latency estimate, and more. In particular, it offers good support for Transformer models hosted on Hugging Face and in many cases also supports DeepSpeed inference acceleration.
Aviary also supports continuous batching by integrating with Hugging Face text-generation-inference (an optional dependency). Continuous batching allows you to get much better throughput and latency than static batching.
Aviary is built on top of Ray by Anyscale. It's an open source project, which means that you can deploy it yourself to a cloud service, or simply use our hosted version. If you would like to use a managed version of Aviary specific to your company, please reach out to us.
Table of Contents
Getting Help and Filing Bugs / Feature Requests
We are eager to help you get started with Aviary. You can get help on:
For bugs or for feature requests, please submit them here.
We have people in both US and European time zones who will help answer your questions.
Contributions
We are also interested in accepting contributions. Those could be anything from a new evaluator, to integrating a new model with a yaml file, to more. Feel free to post an issue first to get our feedback on a proposal first, or just file a PR and we commit to giving you prompt feedback.
Getting Help and Filing Bugs / Feature Requests
We are eager to help you get started with Aviary. You can get help on:
For bugs or for feature requests, please submit them here.
We have people in both US and European time zones who will help answer your questions.
Contributions
We are also interested in accepting contributions. Those could be anything from a new evaluator, to integrating a new model with a yaml file, to more. Feel free to post an issue first to get our feedback on a proposal first, or just file a PR and we commit to giving you prompt feedback.
Aviary User Guides
For a video introduction, see the following intro. Note: There have been some minor changes since the video was recorded. The guide below is more up to date.
Deploy Aviary
The guide below walks you through a minimal installation of Aviary for use on an Open Source cloud deployment.
Set up your laptop
You will need ray
and aviary
to be installed on your laptop. ray
has to be
the latest nightly version.
# The link below WILL CHANGE dependning on your platform and python version
# See https://docs.ray.io/en/latest/ray-overview/installation.html#daily-releases-nightlies
pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl
pip install "aviary @ git+https://github.com/ray-project/aviary.git"
The default Aviary installation only includes the Aviary CLI and SDK.
To install the Aviary UI, use the following command. This will enable you to run the Aviary frontend on your laptop.
pip install "aviary[frontend] @ git+https://github.com/ray-project/aviary.git"
Start a Ray Cluster
Deploy is currently only supported on AWS. Make sure you have exported your AWS credentials locally.
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_SESSION_TOKEN=...
Start by cloning this repo to your local machine.
You may need to specify your AWS private key in the deploy/ray/aviary-cluster.yaml
file.
See Ray on Cloud VMs page in
Ray documentation for more details.
git clone https://github.com/ray-project/aviary.git
cd aviary
# Start a Ray Cluster (This will take a few minutes to start-up)
ray up deploy/ray/aviary-cluster.yaml
If you want to use continous batching, edit deploy/ray/aviary-cluster.yaml
replacing
docker:
image: "anyscale/aviary:latest"
with
docker:
image: "anyscale/aviary:latest-tgi"
Connect to your Cluster
# Connect to the Head node of your Ray Cluster (This will take several minutes to autoscale)
ray attach deploy/ray/aviary-cluster.yaml
# Deploy the LightGPT model.
aviary run --model ./models/static_batching/amazon--LightGPT.yaml
You can deploy any model in the models
directory of this repo,
or define your own model YAML file and run that instead.
Query Aviary
From the head node, run the following commands.
export AVIARY_URL="http://localhost:8000"
# List the available models
aviary models
amazon/LightGPT
# Query the model
aviary query --model amazon/LightGPT --prompt "How do I make fried rice?"
amazon/LightGPT:
To make fried rice, start by heating up some oil in a large pan over medium-high
heat. Once the oil is hot, add your desired amount of vegetables and/or meat to the
pan. Cook until they are lightly browned, stirring occasionally. Add any other
desired ingredients such as eggs, cheese, or sauce to the pan. Finally, stir
everything together and cook for another few minutes until all the ingredients are
cooked through. Serve with your favorite sides and enjoy!
You can also use aviary query
with certain LangChain-compatible APIs.
Currently, we support the following APIs:
- openai (
langchain.llms.OpenAIChat
)
# langchain is an optional dependency
pip install langchain
export OPENAI_API_KEY=...
# Query an Aviary model and OpenAI model
# [PROVIDER]://[MODEL_NAME]
aviary query --model amazon/LightGPT --model openai://gpt-3.5-turbo --prompt "How do I make fried rice?"
Aviary Reference
Installing Aviary
To install Aviary and its dependencies, run the following command:
pip install "aviary @ git+https://github.com/ray-project/aviary.git"
The default Aviary installation only includes the Aviary API client.
Aviary consists of a backend and a frontend, both of which come with additional dependencies. To install the dependencies for both frontend and backend for local development, run the following commands:
pip install "aviary[frontend,backend] @ git+https://github.com/ray-project/aviary.git"
The backend dependencies are heavy weight, and quite large. We only recommend installing them on a cluster.
Running Aviary Frontend locally
Aviary consists of two components, a backend and a frontend. The backend exposes a FastAPI interface running on a Ray cluster, that allows you to query various LLMs efficiently. The frontend is a Gradio interface that allows you to interact with the models in the backend through a web interface. The Gradio app is served using Ray Serve.
To run the Aviary frontend locally, you need to set the following environment variable:
export AVIARY_URL=<hostname of the backend, eg. 'http://localhost:8000'>
Once you have set these environment variables, you can run the frontend with the following command:
serve run aviary.frontend.app:app
To just use the Gradio frontend without Ray Serve, you can start it
with python aviary/frontend/app.py
.
If you don't have access to a deployed backend, or would just like to test and develop
the frontend, you can run a mock backend locally by setting AVIARY_MOCK=True
:
AVIARY_MOCK=True python aviary/frontend/app.py
In any case, the Gradio interface should be accessible at http://localhost:7860
in your browser.
If running the frontend yourself is not an option, you can still use
our hosted version for your experiments.
Usage stats collection
Aviary backend collects basic, non-identifiable usage statistics to help us improve the project. The mechanism for collection is the same as in Ray. For more information on what is collected and how to opt-out, see the Usage Stats Collection page in Ray documentation.
Using the Aviary CLI
Aviary comes with a CLI that allows you to interact with the backend directly, without
using the Gradio frontend.
Installing Aviary as described earlier will install the aviary
CLI as well.
You can get a list of all available commands by running aviary --help
.
Currently, aviary
supports a few basic commands, all of which can be used with the
--help
flag to get more information:
# Get a list of all available models in Aviary
aviary models
# Query a model with a list of prompts
aviary query --model <model-name> --prompt <prompt_1> --prompt <prompt_2>
# Run a query on a text file of prompts
aviary query --model <model-name> --prompt-file <prompt-file>
# Evaluate the quality of responses with GPT-4 for evaluation
aviary evaluate --input-file <query-result-file>
# Start a new model in Aviary from provided configuration
aviary run <model>
CLI examples
Listing all available models
aviary models
mosaicml/mpt-7b-instruct
CarperAI/stable-vicuna-13b-delta
databricks/dolly-v2-12b
RWKV/rwkv-raven-14b
mosaicml/mpt-7b-chat
stabilityai/stablelm-tuned-alpha-7b
lmsys/vicuna-13b-delta-v1.1
mosaicml/mpt-7b-storywriter
h2oai/h2ogpt-oasst1-512-12b
OpenAssistant/oasst-sft-7-llama-30b-xor
Running two models on the same prompt
aviary query --model mosaicml/mpt-7b-instruct --model RWKV/rwkv-raven-14b \
--prompt "what is love?"
mosaicml/mpt-7b-instruct:
love can be defined as feeling of affection, attraction or ...
RWKV/rwkv-raven-14b:
Love is a feeling of strong affection and care for someone or something...
Running a batch-query of two prompts on the same model
aviary query --model mosaicml/mpt-7b-instruct \
--prompt "what is love?" --prompt "why are we here?"
Running a query on a text file of prompts
aviary query --model mosaicml/mpt-7b-instruct --prompt-file prompts.txt
Evaluating the quality of responses with GPT-4 for evaluation
aviary evaluate --input-file aviary-output.json --evaluator gpt-4
This will result in a leaderboard-like ranking of responses, but also save the results to file:
What is the best indie band of the 90s?
Evaluation results (higher ranks are better)
ββββββββββββββββββββββββββββ³βββββββ³ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Model β Rank β Response β
β‘ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β mosaicml/mpt-7b-instruct β 1 β The Shins are often considered to be one of the greatest bands from this era, with their album 'Oh β
β β β Inverted World' being widely regarded as one of the most influential albums in recent memory β
β RWKV/rwkv-raven-14b β 2 β It's subjective and depends on personal taste. Some people might argue that Nirvana or The Smashing β
β β β Pumpkins were the best, while others might prefer Sonic Youth or Dinosaur Jr. β
ββββββββββββββββββββββββββββ΄βββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
You can also use the Gradio API directly, by following the instructions provided in the Aviary documentation.
Aviary Model Registry
Aviary allows you to easily add new models by adding a single configuration file. To learn more about how to customize or add new models, see the Aviary Model Registry.
Contributing
If you want to help improve or extend the Aviary, please get in touch with us! You can reach us via email for feedback and suggestions, or open an issue on GitHub. Pull requests are also welcome!
We use pre-commit
hooks to ensure that all code is formatted correctly.
Make sure to pip install pre-commit
and then run pre-commit install
.
You can also run ./format
to run the hooks manually.
Running tests
To run the tests, you need to install the test
dependencies:
pip install -e .[test]
and then simply run pytest
:
pytest .
Known issues
Aviary is still in early development, and there are a few known issues:
- Latency and throughput are not optimized yet. This is due to the fact that we have chosen to focus on simplicity and readability for the first release. Ray and Ray Serve are framework-agnostic and Aviary can be easily modified to use FasterTransformer or other high-performance frameworks. We will continue working on improving this.
lmsys/vicuna-13b-delta-v1.1
model sometimes answers to English questions in Mandarin.
Future plans
- LangChain + LlamaIndex Integration (which will make it much easier to compare open and closed LLMs).
- Better testing.
- Improved documentation.