• Stars
    star
    377
  • Rank 113,535 (Top 3 %)
  • Language
    Jupyter Notebook
  • Created about 1 year ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Repository and hands-on workshop on how to develop applications with local LLMs

LLM App Dev Workshop

Introduction

a bunch of happy local llamas

This repository demonstrates how to build a simple LLM-based chatbot that can answer questions based on your documents (retrieval augmented generation - RAG) and how to deploy it using Podman or on the OpenShift Container Platform (k8s).

The corresponding workshop - first run at Red Hat Developers Hands-On Day 2023 in Darmstadt, Germany - teaches participants the basic concepts of LLMs & RAG, and how to adapt this example implementation to their own specific purpose GPT.

The software stack only uses open source tools streamlit, LlamaIndex and local open LLMs via Ollama. Real open AI for the GPU poor.

Everyone is invited to fork this repository, create their own specific purpose chatbot based on their documents, improve the setup or even hold your own workshop.

Setup

For the local setup a Mac M1 with 16GB unified memory and above are recommended. First download Ollama from ollama.ai and install it.

On Linux you can disable the Ollama service for better debugging:

sudo systemctl disable ollama
sudo systemctl stop ollama

and then manually run ollama serve.

For the local example have a look at the folder streamlit and install the requirements.

Create a virtual environment first:

python -m venv venv
source venv/bin/activate

Install the requirements:

pip install -r requirements.txt

Then start streamlit with:

streamlit run app.py

Modify the system prompt and copy different data sources to docs/ in order to create your own version of the chatbot. You can set the ollama host via the enviroment variable OLLAMA_HOST.

You can download models locally with ollama pull zephyr or via API:

curl -X POST http://ollama:11434/api/pull -d '{"name": "zephyr"}'

First start the ollama service as described and download the Zephyr model. To test the ollama server you can call the generate API:

curl -X POST http://ollama:11434/api/generate -d '{"model": "zephyr", "prompt": "Why is the sky blue?"}'

All of these commands are also documented in our cheat sheet.

Deployment

Podman

Build the container based on UBI9 Python 3.11:

podman build -t linuxbot-app .

If you're building on arm64 Mac and deploy on amd64 then generally don't forget to add --platform (in this case our base image is amd64 anyways):

podman build --platform="linux/amd64" -t linuxbot-app .

We will create a network for our linuxbot and ollama:

podman network create linuxbot

Check if DNS is enabled (it's not on the default net):

podman network inspect linuxbot

Now you can either start Ollama locally with ollama serve or start a Ollama container with

podman run --net linuxbot --name ollama -p 11434:11434 --rm docker.io/ollama/ollama:latest

Note: We just forward the port so we can curl it more easily locally as well.

This ollama service won't have GPU support enabled and much slower compared to running it locally on a Mac M1 for example.

Since we create the embeddings locally in the streamlit app we need to increase shared memory for Pytorch in order to get it running:

podman run --net linuxbot --name linuxbot-app -p 8080:8080 --shm-size=2gb -e OLLAMA_HOST=ollama -it --rm localhost/linuxbot-app

You can set the Ollama server via the environment variable OLLAMA_HOST, the default is localhost.

NOTE: It would be much better to generate the embeddings with the ollama service, this is not yet supported in LlamaIndex though.

OpenShift

Create a new project (namespace) for your workshop and deploy the ollama service in it:

oc new-project my-workshop
oc apply -f deployments/ollama.yaml

If you want to enable GPU support you have to have to install and instantiate the NVIDIA GPU Operator and Node Feature Discovery (NFD) Operator as described on the AI on OpenShift page, then deploy ollama-gpu.yaml instead.

oc apply -f deployments/ollama-gpu.yaml

The streamlit application (linuxbot) can deployed as:

oc apply -f deployments/linuxbot.yaml

We have published a preconfigured container image on quay.io/sroecker that is used in this deployment.

In order to debug your application and ollama service you can deploy a curl image like this:

oc run mycurl --image=curlimages/curl -it -- sh
oc attach mycurl -c mycurl -i -t
oc delete pod mycurl

References