• Stars
    star
    3,273
  • Rank 13,716 (Top 0.3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 2 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

LLM as a Chatbot Service

UPDATE

  • Internet search support: you can enable internet search capability in Gradio application and Discord bot. For gradio, there is a internet mode option in the control panel. For discord, you need to specify --internet option in your prompt. For both cases, you need a Serper API Key which you can get one from serper.dev. By signing up, you will get free 2,500 free google searches which is pretty much sufficient for a long-term test.
  • Discord Bot support: you can serve any model from the model zoo as Discord Bot. Find how to do this in the instruction section below.

💬🚀 LLM as a Chatbot Service

The purpose of this repository is to let people to use lots of open sourced instruction-following fine-tuned LLM models as a Chatbot service. Because different models behave differently, and different models require differently formmated prompts, I made a very simple library Ping Pong for model agnostic conversation and context managements.

Also, I made GradioChat UI that has a similar shape to HuggingChat but entirely built in Gradio. Those two projects are fully integrated to power this project.

Easiest way to try out ( ✅ Gradio, 🚧 Discord Bot )

Jarvislabs.ai

This project has become the one of the default framework at jarvislabs.ai. Jarvislabs.ai is one of the cloud GPU VM provider with the cheapest GPU prices. Furthermore, all the weights of the supported popular open source LLMs are pre-downloaded. You don't need to waste of your money and time to wait until download hundreds of GBs to try out a collection of LLMs. In less than 10 minutes, you can try out any model.

  • for further instruction how to run Gradio application, please follow the official documentation on the llmchat framework.

dstack

dstack is an open-source tool that allows to run LLM-based apps in a a cloud of your choice via single command. dstack supports AWS, GCP, Azure, Lambda Cloud, etc.

Use the gradio.dstack.yml and discord.dstack.yml configurations to run the Gradio app and Discord bot via dstack.

Instructions

Standalone Gradio app

  1. Prerequisites

    Note that the code only works Python >= 3.9 and gradio >= 3.32.0

    $ conda create -n llm-serve python=3.9
    $ conda activate llm-serve
  2. Install dependencies.

    $ cd LLM-As-Chatbot
    $ pip install -r requirements.txt
  3. Run Gradio application

    There is no required parameter to run the Gradio application. However, there are some small details worth being noted. When --local-files-only is set, application won't try to look up the Hugging Face Hub(remote). Instead, it will only use the files already downloaded and cached.

    Hugging Face libraries stores downloaded contents under ~/.cache by default, and this application assumes so. However, if you downloaded weights in different location for some reasons, you can set HF_HOME environment variable. Find more about the environment variables here

    In order to leverage internet search capability, you need Serper API Key. You can set it manually in the control panel or in CLI. When specifying the Serper API Key in CLI, it will be injected into the corresponding UI control. If you don't have it yet, please get one from serper.dev. By signing up, you will get free 2,500 free google searches which is pretty much sufficient for a long-term test.

    $ python app.py --root-path "" \
                    --local-files-only \
                    --share \
                    --debug \
                    --serper-api-key "YOUR SERPER API KEY"

Discord Bot

  1. Prerequisites

    Note that the code only works Python >= 3.9

    $ conda create -n llm-serve python=3.9
    $ conda activate llm-serve
  2. Install dependencies.

    $ cd LLM-As-Chatbot
    $ pip install -r requirements.txt
  3. Run Discord Bot application. Choose one of the modes in --mode-[cpu|mps|8bit|4bit|full-gpu]. full-gpu will be choseon by default(full means half - consider this as a typo to be fixed later).

    The --token is a required parameter, and you can get it from Discord Developer Portal. If you have not setup Discord Bot from the Discord Developer Portal yet, please follow How to Create a Discord Bot Account section of the tutorial from freeCodeCamp to get the token.

    The --model-name is a required parameter, and you can look around the list of supported models from model_cards.json.

    --max-workers is a parameter to determine how many requests to be handled concurrently. This simply defines the value of the ThreadPoolExecutor.

    When --local-files-only is set, application won't try to look up the Hugging Face Hub(remote). Instead, it will only use the files already downloaded and cached.

    In order to leverage internet search capability, you need Serper API Key. If you don't have it yet, please get one from serper.dev. By signing up, you will get free 2,500 free google searches which is pretty much sufficient for a long-term test. Once you have the Serper API Key, you can specify it in --serper-api-key option.

    • Hugging Face libraries stores downloaded contents under ~/.cache by default, and this application assumes so. However, if you downloaded weights in different location for some reasons, you can set HF_HOME environment variable. Find more about the environment variables here
    $ python discord_app.py --token "DISCORD BOT TOKEN" \
                            --model-name "alpaca-lora-7b" \
                            --max-workers 1 \
                            --mode-[cpu|mps|8bit|4bit|full-gpu] \
                            --local_files_only \
                            --serper-api-key "YOUR SERPER API KEY"
  4. Supported Discord Bot commands

    There is no slash commands. The only way to interact with the deployed discord bot is to mention the bot. However, you can pass some special strings while mentioning the bot.

    • @bot_name help: it will display a simple help message
    • @bot_name model-info: it will display the information of the currently selected(deployed) model from the model_cards.json.
    • @bot_name default-params: it will display the default parameters to be used in model's generate method. That is GenerationConfig, and it holds parameters such as temperature, top_p, and so on.
    • @bot_name user message --max-new-tokens 512 --temperature 0.9 --top-p 0.75 --do_sample --max-windows 5 --internet: all parameters are used to dynamically determine the text geneartion behaviour as in GenerationConfig except max-windows. The max-windows determines how many past conversations to look up as a reference. The default value is set to 3, but as the conversation goes long, you can increase this value. --internet will try to answer to your prompt by aggregating information scraped from google search. To use --internet option, you need to specify --serper-api-key when booting up the program.

Context management

Different model might have different strategies to manage context, so if you want to know the exact strategies applied to each model, take a look at the chats directory. However, here are the basic ideas that I have come up with initially. I have found long prompts will slow down the generation process a lot eventually, so I thought the prompts should be kept as short as possible while as concise as possible at the same time. In the previous version, I have accumulated all the past conversations, and that didn't go well.

  • In every turn of the conversation, the past N conversations will be kept. Think about the N as a hyper-parameter. As an experiment, currently the past 2-3 conversations are only kept for all models.

Currently supported models

Checkout the list of models

Todos

  • Gradio components to control the configurations of the generation
  • Multiple conversation management
  • Internet search capability (by integrating ChromaDB, intfloat/e5-large-v2)
  • Implement server only option w/ FastAPI

Acknowledgements

  • I am thankful to Jarvislabs.ai who generously provided free GPU resources to experiment with Alpaca-LoRA deployment and share it to communities to try out.
  • I am thankful to AI Network who generously provided A100(40G) x 8 DGX workstation for fine-tuning and serving the models.

More Repositories

1

Machine-Learning-Yearning-Korean-Translation

Korean translation of machine learning yearning book by Andrew Ng.
360
star
2

llamaduo

This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.
Jupyter Notebook
231
star
3

CIFAR10-img-classification-tensorflow

image classification with CIFAR10 dataset w/ Tensorflow
Jupyter Notebook
132
star
4

ml-deployment-k8s-tfserving

This project shows how to serve an TF based image classification model as a web service with TFServing, Docker, and Kubernetes(GKE).
Jupyter Notebook
120
star
5

mlops-hf-tf-vision-models

MLOps for Vision Models (TensorFlow) from 🤗 Transformers with TensorFlow Extended (TFX)
Jupyter Notebook
115
star
6

Soccer-Ball-Detection-YOLOv2

YOLOv2 trained against custom dataset
Jupyter Notebook
111
star
7

keras-sd-serving

showing various ways to serve Keras based stable diffusion
Jupyter Notebook
110
star
8

EN-FR-MLT-tensorflow

English-French Machine Language Translation in Tensorflow
HTML
108
star
9

hf-daily-paper-newsletter

Newsletter bot for 🤗 Daily Papers
HTML
100
star
10

fb-group-post-fetcher

HTML
91
star
11

semantic-segmentation-ml-pipeline

Machine Learning Pipeline for Semantic Segmentation with TensorFlow Extended (TFX) and various GCP products
Jupyter Notebook
91
star
12

PingPong

manage histories of LLM applied applications
Python
86
star
13

gradio-chat

HuggingChat like UI in Gradio
Python
63
star
14

fastai-course-korean

korean translation + more examples for fastai course contents
Jupyter Notebook
50
star
15

image_search_with_natural_language

Application for searching images from natural language queries
Jupyter Notebook
45
star
16

Vid2Persona

This project breathes life into video characters by using AI to describe their personality and then chat with you as them.
Jupyter Notebook
44
star
17

DeepModels

TensorFlow Implementation of state-of-the-art models since 2012
Python
38
star
18

LLM-Pref-Mark-UI

Python
37
star
19

AlexNet

AlexNet model from ILSVRC 2012
Jupyter Notebook
35
star
20

auto-paper-analysis

Jupyter Notebook
35
star
21

gpt2-ft-pipeline

GPT2 fine-tuning pipeline with KerasNLP, TensorFlow, and TensorFlow Extended
Jupyter Notebook
33
star
22

segformer-tf-transformers

This repository demonstrates how to use TensorFlow based SegFormer model in 🤗 transformers package.
Jupyter Notebook
31
star
23

LoRA-deployment

LoRA fine-tuned Stable Diffusion Deployment
Jupyter Notebook
31
star
24

CIFAR10-VGG19-Tensorflow

Jupyter Notebook
29
star
25

Continuous-Adaptation-for-Machine-Learning-System-to-Data-Changes

https://blog.tensorflow.org/2021/12/continuous-adaptation-for-machine.html
Jupyter Notebook
28
star
26

Model-Training-as-a-CI-CD-System

Demonstration of the Model Training as a CI/CD System in Vertex AI
Python
27
star
27

Object-Detection-YOLOv2-Darkflow

Jupyter Notebook
25
star
28

practical-time-series-analysis-korean

Jupyter Notebook
24
star
29

Continuous-Adaptation-with-VertexAI-AutoML-Pipeline

Jupyter Notebook
22
star
30

janus

generate synthetic data for LLM fine-tuning in arbitrary situations within systematic way
Jupyter Notebook
21
star
31

LLM-Serve

This repository provides a framework to serve LLM(Large Language Model) based applications such as Chatbot.
Python
17
star
32

complete-mlops-system-workflow

Jupyter Notebook
17
star
33

TFX-WandB

Jupyter Notebook
14
star
34

deep-diver

HTML
13
star
35

paperqa-ui

Python
13
star
36

LLM-Pool

Python
10
star
37

hllama

hllama is a library which aims to provide a set of utility tools for large language models.
Python
10
star
38

textual-inversion-pipeline

Python
9
star
39

LLMs-Colab

Python
9
star
40

personal_newsletter_curation

HTML
5
star
41

promptengineer

5
star
42

genai-apis

Python
5
star
43

portfolio_template

Java
5
star
44

never-leaving-vscode

5
star
45

VGG

VGG models from ILSVRC 2014
Python
4
star
46

pocket-ml-reference-korean

주머니속 머신러닝
Jupyter Notebook
4
star
47

fastai-course

CSS
3
star
48

hf-hub-utils

3
star
49

object-detection-test

object-detection-test
Jupyter Notebook
3
star
50

deploy-stable-diffusion-tfserving

This repo explores and demonstrates how to deploy stable diffusion model with TF Serving
3
star
51

llamaduo-spinoff

Jupyter Notebook
3
star
52

Sampling-Distribution-on-Poker-Cards-

2
star
53

Data-Wrangling-on-OpenStreeMap

Jupyter Notebook
2
star
54

llama-keras

Jupyter Notebook
2
star
55

book-tracking-react

Book tracking web-app project in React. This project is one of the requirements to graduate from 'Front End Web Development Nanodegree' @Udacity.
JavaScript
2
star
56

Baseball_Data_Analysis

Exploratory Data Visualization Project on Baseball Data in Tableau
2
star
57

Responsive-Portfolio

HTML
2
star
58

Data-Analysis-on-RedWine

HTML
2
star
59

read-paper-list

archive of read paper list
2
star
60

SD-TFTRT

Jupyter Notebook
2
star
61

rnn_simple

Python
2
star
62

Data-Analysis-on-Titanic

applying data analysis on titanic data sheet
Jupyter Notebook
2
star
63

neighborhood-map-react

neighborhood-map-react
JavaScript
2
star
64

ml-fn-impls

practice implementing functions appearing in machine learning field
Python
2
star
65

Logistic-Regression

simple neural network without hidden layer
Python
2
star
66

Enron-Data-Analysis

Data Analysis and Machine Learning on Enron Data
HTML
2
star
67

Linear-Regression

implement simple version of "Linear Regression" using only Numpy
Jupyter Notebook
2
star
68

tfx-gpu-docker

Dockerfile
1
star
69

YOLO-Impl-Tensorflow

Implementation of YOLO in Tensorflow
Python
1
star
70

calculator

1
star
71

deeplearningbook-korean-translation

experiments on translation of the book deeplearningbook
Jupyter Notebook
1
star
72

paper-code-match

matching between paper and its codes in side-by-side layout
HTML
1
star
73

deeplearning-with-structured-data

Jupyter Notebook
1
star
74

KaggleNotebook-Notes

Personal notes on some kaggle notebooks publicly available
1
star
75

gitmlops-test1

HTML
1
star
76

Python-Machine-Learning-Book-Practice

Python Machine Learning 책의 소스코드를 주피터 노트북이 아닌, 소스코드 형태로 작성 연습
Python
1
star
77

test_img_clf

HTML
1
star
78

dstack-exp

Python
1
star
79

recall-mate

Python
1
star