dstack
is an open-source toolkit for training, fine-tuning, inference, and development
across multiple cloud GPU providers.
Latest news β¨
- [2023/09] Deploying LLMs with API (Example)
- [2023/09] Managed gateways (Release)
- [2023/08] Fine-tuning Llama 2 (Example)
- [2023/08] Serving SDXL with FastAPI (Example)
- [2023/07] Serving LLMS with TGI (Example)
- [2023/07] Serving LLMS with vLLM (Example)
Installation
To use dstack
, install it with pip
, and start the server.
pip install "dstack[all]" -U
dstack start
Configure clouds
Upon startup, the server sets up the default project called main
.
Prior to using dstack
, make sure to configure clouds.
Once the server is up, you can orchestrate GPU workloads using either the CLI or Python API.
Using CLI
Define a configuration
The CLI allows you to define what you want to run as a YAML file and
run it via the dstack run
CLI command.
Configurations can be of three types: dev-environment
, task
, and service
.
Dev environments
A dev environment is a virtual machine with a pre-configured IDE.
type: dev-environment
python: "3.11" # (Optional) If not specified, your local version is used
setup: # (Optional) Executed once at the first startup
- pip install -r requirements.txt
ide: vscode
Tasks
A task can be either a batch job, such as training or fine-tuning a model, or a web application.
type: task
python: "3.11" # (Optional) If not specified, your local version is used
ports:
- 7860
commands:
- pip install -r requirements.txt
- python app.py
While the task is running in the cloud, the CLI forwards its ports traffic to localhost
for convenient access.
Services
A service is an application that is accessible through a public endpoint.
type: service
port: 7860
commands:
- pip install -r requirements.txt
- python app.py
Once the service is up, dstack
makes it accessible from the Internet through
the gateway.
Run a configuration
To run a configuration, use the dstack run
command followed by
working directory and the path to the configuration file.
dstack run . -f text-generation-inference/serve.dstack.yml --gpu 80GB -y
RUN BACKEND INSTANCE SPOT PRICE STATUS SUBMITTED
tasty-zebra-1 lambda 200GB, 1xA100 (80GB) no $1.1 Submitted now
Privisioning...
Serving on https://tasty-zebra-1.mydomain.com
Using API
As an alternative to the CLI, you can run tasks and services programmatically via Python API.
import sys
import dstack
task = dstack.Task(
image="ghcr.io/huggingface/text-generation-inference:latest",
env={"MODEL_ID": "TheBloke/Llama-2-13B-chat-GPTQ"},
commands=[
"text-generation-launcher --trust-remote-code --quantize gptq",
],
ports=["8080:80"],
)
resources = dstack.Resources(gpu=dstack.GPU(memory="20GB"))
if __name__ == "__main__":
print("Initializing the client...")
client = dstack.Client.from_config(repo_dir="~/dstack-examples")
print("Submitting the run...")
run = client.runs.submit(configuration=task, resources=resources)
print(f"Run {run.name}: " + run.status())
print("Attaching to the run...")
run.attach()
# After the endpoint is up, http://127.0.0.1:8080/health will return 200 (OK).
try:
for log in run.logs():
sys.stdout.buffer.write(log)
sys.stdout.buffer.flush()
except KeyboardInterrupt:
print("Aborting the run...")
run.stop(abort=True)
finally:
run.detach()
More information
For additional information and examples, see the following links: