• Stars
    star
    3,210
  • Rank 13,418 (Top 0.3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 1 year ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

LLM as a Chatbot Service

UPDATE

  • Internet search support: you can enable internet search capability in Gradio application and Discord bot. For gradio, there is a internet mode option in the control panel. For discord, you need to specify --internet option in your prompt. For both cases, you need a Serper API Key which you can get one from serper.dev. By signing up, you will get free 2,500 free google searches which is pretty much sufficient for a long-term test.
  • Discord Bot support: you can serve any model from the model zoo as Discord Bot. Find how to do this in the instruction section below.

💬🚀 LLM as a Chatbot Service

The purpose of this repository is to let people to use lots of open sourced instruction-following fine-tuned LLM models as a Chatbot service. Because different models behave differently, and different models require differently formmated prompts, I made a very simple library Ping Pong for model agnostic conversation and context managements.

Also, I made GradioChat UI that has a similar shape to HuggingChat but entirely built in Gradio. Those two projects are fully integrated to power this project.

Easiest way to try out ( ✅ Gradio, 🚧 Discord Bot )

Jarvislabs.ai

This project has become the one of the default framework at jarvislabs.ai. Jarvislabs.ai is one of the cloud GPU VM provider with the cheapest GPU prices. Furthermore, all the weights of the supported popular open source LLMs are pre-downloaded. You don't need to waste of your money and time to wait until download hundreds of GBs to try out a collection of LLMs. In less than 10 minutes, you can try out any model.

  • for further instruction how to run Gradio application, please follow the official documentation on the llmchat framework.

dstack

dstack is an open-source tool that allows to run LLM-based apps in a a cloud of your choice via single command. dstack supports AWS, GCP, Azure, Lambda Cloud, etc.

Use the gradio.dstack.yml and discord.dstack.yml configurations to run the Gradio app and Discord bot via dstack.

Instructions

Standalone Gradio app

  1. Prerequisites

    Note that the code only works Python >= 3.9 and gradio >= 3.32.0

    $ conda create -n llm-serve python=3.9
    $ conda activate llm-serve
  2. Install dependencies.

    $ cd LLM-As-Chatbot
    $ pip install -r requirements.txt
  3. Run Gradio application

    There is no required parameter to run the Gradio application. However, there are some small details worth being noted. When --local-files-only is set, application won't try to look up the Hugging Face Hub(remote). Instead, it will only use the files already downloaded and cached.

    Hugging Face libraries stores downloaded contents under ~/.cache by default, and this application assumes so. However, if you downloaded weights in different location for some reasons, you can set HF_HOME environment variable. Find more about the environment variables here

    In order to leverage internet search capability, you need Serper API Key. You can set it manually in the control panel or in CLI. When specifying the Serper API Key in CLI, it will be injected into the corresponding UI control. If you don't have it yet, please get one from serper.dev. By signing up, you will get free 2,500 free google searches which is pretty much sufficient for a long-term test.

    $ python app.py --root-path "" \
                    --local-files-only \
                    --share \
                    --debug \
                    --serper-api-key "YOUR SERPER API KEY"

Discord Bot

  1. Prerequisites

    Note that the code only works Python >= 3.9

    $ conda create -n llm-serve python=3.9
    $ conda activate llm-serve
  2. Install dependencies.

    $ cd LLM-As-Chatbot
    $ pip install -r requirements.txt
  3. Run Discord Bot application. Choose one of the modes in --mode-[cpu|mps|8bit|4bit|full-gpu]. full-gpu will be choseon by default(full means half - consider this as a typo to be fixed later).

    The --token is a required parameter, and you can get it from Discord Developer Portal. If you have not setup Discord Bot from the Discord Developer Portal yet, please follow How to Create a Discord Bot Account section of the tutorial from freeCodeCamp to get the token.

    The --model-name is a required parameter, and you can look around the list of supported models from model_cards.json.

    --max-workers is a parameter to determine how many requests to be handled concurrently. This simply defines the value of the ThreadPoolExecutor.

    When --local-files-only is set, application won't try to look up the Hugging Face Hub(remote). Instead, it will only use the files already downloaded and cached.

    In order to leverage internet search capability, you need Serper API Key. If you don't have it yet, please get one from serper.dev. By signing up, you will get free 2,500 free google searches which is pretty much sufficient for a long-term test. Once you have the Serper API Key, you can specify it in --serper-api-key option.

    • Hugging Face libraries stores downloaded contents under ~/.cache by default, and this application assumes so. However, if you downloaded weights in different location for some reasons, you can set HF_HOME environment variable. Find more about the environment variables here
    $ python discord_app.py --token "DISCORD BOT TOKEN" \
                            --model-name "alpaca-lora-7b" \
                            --max-workers 1 \
                            --mode-[cpu|mps|8bit|4bit|full-gpu] \
                            --local_files_only \
                            --serper-api-key "YOUR SERPER API KEY"
  4. Supported Discord Bot commands

    There is no slash commands. The only way to interact with the deployed discord bot is to mention the bot. However, you can pass some special strings while mentioning the bot.

    • @bot_name help: it will display a simple help message
    • @bot_name model-info: it will display the information of the currently selected(deployed) model from the model_cards.json.
    • @bot_name default-params: it will display the default parameters to be used in model's generate method. That is GenerationConfig, and it holds parameters such as temperature, top_p, and so on.
    • @bot_name user message --max-new-tokens 512 --temperature 0.9 --top-p 0.75 --do_sample --max-windows 5 --internet: all parameters are used to dynamically determine the text geneartion behaviour as in GenerationConfig except max-windows. The max-windows determines how many past conversations to look up as a reference. The default value is set to 3, but as the conversation goes long, you can increase this value. --internet will try to answer to your prompt by aggregating information scraped from google search. To use --internet option, you need to specify --serper-api-key when booting up the program.

Context management

Different model might have different strategies to manage context, so if you want to know the exact strategies applied to each model, take a look at the chats directory. However, here are the basic ideas that I have come up with initially. I have found long prompts will slow down the generation process a lot eventually, so I thought the prompts should be kept as short as possible while as concise as possible at the same time. In the previous version, I have accumulated all the past conversations, and that didn't go well.

  • In every turn of the conversation, the past N conversations will be kept. Think about the N as a hyper-parameter. As an experiment, currently the past 2-3 conversations are only kept for all models.

Currently supported models

Checkout the list of models

Todos

  • Gradio components to control the configurations of the generation
  • Multiple conversation management
  • Internet search capability (by integrating ChromaDB, intfloat/e5-large-v2)
  • Implement server only option w/ FastAPI

Acknowledgements

  • I am thankful to Jarvislabs.ai who generously provided free GPU resources to experiment with Alpaca-LoRA deployment and share it to communities to try out.
  • I am thankful to AI Network who generously provided A100(40G) x 8 DGX workstation for fine-tuning and serving the models.

More Repositories

1

Machine-Learning-Yearning-Korean-Translation

Korean translation of machine learning yearning book by Andrew Ng.
360
star
2

CIFAR10-img-classification-tensorflow

image classification with CIFAR10 dataset w/ Tensorflow
Jupyter Notebook
132
star
3

ml-deployment-k8s-tfserving

This project shows how to serve an TF based image classification model as a web service with TFServing, Docker, and Kubernetes(GKE).
Jupyter Notebook
118
star
4

mlops-hf-tf-vision-models

MLOps for Vision Models (TensorFlow) from 🤗 Transformers with TensorFlow Extended (TFX)
Jupyter Notebook
113
star
5

Soccer-Ball-Detection-YOLOv2

YOLOv2 trained against custom dataset
Jupyter Notebook
111
star
6

EN-FR-MLT-tensorflow

English-French Machine Language Translation in Tensorflow
HTML
108
star
7

keras-sd-serving

showing various ways to serve Keras based stable diffusion
Jupyter Notebook
107
star
8

fb-group-post-fetcher

HTML
91
star
9

hf-daily-paper-newsletter

Newsletter bot for 🤗 Daily Papers
HTML
89
star
10

semantic-segmentation-ml-pipeline

Jupyter Notebook
87
star
11

PingPong

manage histories of LLM applied applications
Python
82
star
12

gradio-chat

HuggingChat like UI in Gradio
Python
59
star
13

fastai-course-korean

korean translation + more examples for fastai course contents
Jupyter Notebook
50
star
14

image_search_with_natural_language

Application for searching images from natural language queries
Jupyter Notebook
42
star
15

DeepModels

TensorFlow Implementation of state-of-the-art models since 2012
Python
38
star
16

LLM-Pref-Mark-UI

Python
37
star
17

AlexNet

AlexNet model from ILSVRC 2012
Jupyter Notebook
35
star
18

gpt2-ft-pipeline

GPT2 fine-tuning pipeline with KerasNLP, TensorFlow, and TensorFlow Extended
Jupyter Notebook
33
star
19

auto-paper-analysis

Jupyter Notebook
33
star
20

segformer-tf-transformers

This repository demonstrates how to use TensorFlow based SegFormer model in 🤗 transformers package.
Jupyter Notebook
31
star
21

LoRA-deployment

LoRA fine-tuned Stable Diffusion Deployment
Jupyter Notebook
31
star
22

CIFAR10-VGG19-Tensorflow

Jupyter Notebook
29
star
23

Continuous-Adaptation-for-Machine-Learning-System-to-Data-Changes

https://blog.tensorflow.org/2021/12/continuous-adaptation-for-machine.html
Jupyter Notebook
27
star
24

Object-Detection-YOLOv2-Darkflow

Jupyter Notebook
25
star
25

Model-Training-as-a-CI-CD-System

Demonstration of the Model Training as a CI/CD System in Vertex AI
Python
24
star
26

practical-time-series-analysis-korean

Jupyter Notebook
24
star
27

Continuous-Adaptation-with-VertexAI-AutoML-Pipeline

Jupyter Notebook
21
star
28

Vid2Persona

This project breathes life into video characters by using AI to describe their personality and then chat with you as them.
Jupyter Notebook
20
star
29

LLM-Serve

This repository provides a framework to serve LLM(Large Language Model) based applications such as Chatbot.
Python
17
star
30

complete-mlops-system-workflow

Jupyter Notebook
17
star
31

janus

generate synthetic data for LLM fine-tuning in arbitrary situations within systematic way
Jupyter Notebook
15
star
32

TFX-WandB

Jupyter Notebook
14
star
33

deep-diver

HTML
13
star
34

paperqa-ui

Python
12
star
35

LLM-Pool

Python
10
star
36

textual-inversion-pipeline

Python
9
star
37

LLMs-Colab

Python
9
star
38

llmops-pipeline

Jupyter Notebook
6
star
39

personal_newsletter_curation

HTML
5
star
40

portfolio_template

Java
5
star
41

never-leaving-vscode

5
star
42

VGG

VGG models from ILSVRC 2014
Python
4
star
43

pocket-ml-reference-korean

주머니속 머신러닝
Jupyter Notebook
4
star
44

hf-hub-utils

3
star
45

object-detection-test

object-detection-test
Jupyter Notebook
3
star
46

deploy-stable-diffusion-tfserving

This repo explores and demonstrates how to deploy stable diffusion model with TF Serving
3
star
47

fastai-course

CSS
3
star
48

Sampling-Distribution-on-Poker-Cards-

2
star
49

Data-Wrangling-on-OpenStreeMap

Jupyter Notebook
2
star
50

llama-keras

Jupyter Notebook
2
star
51

promptengineer

2
star
52

book-tracking-react

Book tracking web-app project in React. This project is one of the requirements to graduate from 'Front End Web Development Nanodegree' @Udacity.
JavaScript
2
star
53

Baseball_Data_Analysis

Exploratory Data Visualization Project on Baseball Data in Tableau
2
star
54

Responsive-Portfolio

HTML
2
star
55

Enron-Data-Analysis

Data Analysis and Machine Learning on Enron Data
HTML
2
star
56

Data-Analysis-on-RedWine

HTML
2
star
57

SD-TFTRT

Jupyter Notebook
2
star
58

rnn_simple

Python
2
star
59

Data-Analysis-on-Titanic

applying data analysis on titanic data sheet
Jupyter Notebook
2
star
60

Linear-Regression

implement simple version of "Linear Regression" using only Numpy
Jupyter Notebook
2
star
61

neighborhood-map-react

neighborhood-map-react
JavaScript
2
star
62

Logistic-Regression

simple neural network without hidden layer
Python
2
star
63

ml-fn-impls

practice implementing functions appearing in machine learning field
Python
2
star
64

tfx-gpu-docker

Dockerfile
1
star
65

YOLO-Impl-Tensorflow

Implementation of YOLO in Tensorflow
Python
1
star
66

calculator

1
star
67

deeplearningbook-korean-translation

experiments on translation of the book deeplearningbook
Jupyter Notebook
1
star
68

genai-apis

Python
1
star
69

paper-code-match

matching between paper and its codes in side-by-side layout
HTML
1
star
70

Python-Machine-Learning-Book-Practice

Python Machine Learning 책의 소스코드를 주피터 노트북이 아닌, 소스코드 형태로 작성 연습
Python
1
star
71

deeplearning-with-structured-data

Jupyter Notebook
1
star
72

test_img_clf

HTML
1
star
73

KaggleNotebook-Notes

Personal notes on some kaggle notebooks publicly available
1
star
74

gitmlops-test1

HTML
1
star
75

dstack-exp

Python
1
star
76

llmtoolbox

hllama is a library which aims to provide a set of utility tools for large language models.
Python
1
star