Calamity 🌋
Calamity is a lightweight web app that wraps my custom (private) language model API to let me play and experiment with pretrained autoregressive language models easily. It runs Huggingface transformers wrappers of language models (currently Meta's Llama 2 with 13B parameters 8-bit quantized, because it's cheaper than GPT-3/PaLM/Chinchilla/whatever else) behind a Flask / uWSGI API on the backend, and a Torus/Oak-based frontend that wraps the API. The backend lives in ./services
, and the frontend + frontend server in ./src
.
The architecture here is a little weird. There are two independent web apps: the model and API server, and the web app that wraps and calls the API and serves the client user interface. It's designed this way so that the API itself can be reused across other apps in my personal infrastructure, some of which aren't public yet. The current configuration requires about 15-16GB of VRAM to load the model and serve inference.
I personally run it on a Lambda Cloud VM with an A100 GPU behind an Nginx reverse proxy. Both the backend API service and the web app frontend run as systemd daemons. Currently, the API only lets the client customize generated sequence length, number, sampling temperature, and the eos_token
that marks the end of text generation, but I might add other parameters like top_p
down the road.
Development
Calamity is a web app written with Oak. To run and build Calamity, you'll need to install the oak
binary.
Calamity's development is managed through a Makefile:
make serve
(default when justmake
is run) starts the web server atsrc/main.oak
make build
ormake b
builds the frontend fromsrc/app.js.oak
make watch
ormake w
runs the frontend build while watching files for changes (using entr)make fmt
ormake f
re-formats all changed files tracked by Git. To format all files from scratch, run something likeoak fmt **/*.oak --fix
.