• Stars
    star
    232
  • Rank 172,070 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

wandbot is a technical support bot for Weights & Biases' AI developer tools that can run in Discord, Slack, ChatGPT and Zendesk

wandbot

Wandbot is a question-answering bot designed specifically for Weights & Biases documentation. Leveraging the power of llama-index and OpenAI's gpt-4, it provides precise and context-aware responses using a combination of FAISS for RAG and OpenAI's gpt-4 for generating responses.

Features

  • Wandbot employs Retrieval Augmented Generation with a FAISS backend, ensuring efficient and accurate responses to user queries by retrieving relevant documents.
  • It features periodic data ingestion and report generation, contributing to the bot's continuous improvement. You can view the latest data ingestion report here.
  • The bot is integrated with Discord and Slack, facilitating seamless integration with these popular collaboration platforms.
  • Performance monitoring and continuous improvement are made possible through logging and analysis with Weights & Biases Tables. Visit the workspace for more details here.
  • Wandbot has a fallback mechanism for model selection, which is used when GPT-4 fails to generate a response.
  • The bot's performance is evaluated using a mix of metrics, including retrieval accuracy, string similarity, and the correctness of model-generated responses.
  • Curious about the custom system prompt used by the bot? You can view the full prompt here.

Installation

The project is built with Python version >=3.10.0,<3.11 and utilizes poetry for managing dependencies. Follow the steps below to install the necessary dependencies:

git clone [email protected]:wandb/wandbot.git
pip install poetry
cd wandbot
poetry install --all-extras
# Depending on which platform you want to run on run the following command:
# poetry install --extras discord # for discord
# poetry install --extras slack # for slack
# poetry install --extras api # for api

Usage

Data Ingestion

The data ingestion module pulls code and markdown from Weights & Biases repositories docodile and examples ingests them into vectorstores for the retrieval augmented generation pipeline. To ingest the data run the following command from the root of the repository

poetry run python -m src.wandbot.ingestion

You will notice that the data is ingested into the data/cache directory and stored in three different directories raw_data, vectorstore with individual files for each step of the ingestion process. These datasets are also stored as wandb artifacts in the project defined in the environment variable WANDB_PROJECT and can be accessed from the wandb dashboard.

Running the Q&A Bot

Before running the Q&A bot, ensure the following environment variables are set:

OPENAI_API_KEY
COHERE_API_KEY
SLACK_EN_APP_TOKEN
SLACK_EN_BOT_TOKEN
SLACK_EN_SIGNING_SECRET
SLACK_JA_APP_TOKEN
SLACK_JA_BOT_TOKEN
SLACK_JA_SIGNING_SECRET
WANDB_API_KEY
DISCORD_BOT_TOKEN
COHERE_API_KEY
WANDBOT_API_URL="http://localhost:8000"
WANDB_TRACING_ENABLED="true"
WANDB_PROJECT="wandbot-dev"
WANDB_ENTITY="wandbot"

Once these environment variables are set, you can start the Q&A bot application using the following commands:

(poetry run uvicorn wandbot.api.app:app --host="0.0.0.0" --port=8000 > api.log 2>&1) & \
(poetry run python -m wandbot.apps.slack -l en > slack_en_app.log 2>&1) & \
(poetry run python -m wandbot.apps.slack -l ja > slack_ja_app.log 2>&1) & \
(poetry run python -m wandbot.apps.discord > discord_app.log 2>&1)

For more detailed instructions on installing and running the bot, please refer to the run.sh file located in the root of the repository.

Executing these commands will launch the API, Slackbot, and Discord bot applications, enabling you to interact with the bot and ask questions related to the Weights & Biases documentation.

Evaluation

We evaluated the performance of the Q&A bot manually and using auto eval strategies. The following W&B reports document the steps taken to evaluate the Q&A bot:

Evaluation Results

Manual Evaluation

We manually evaluated the Q&A bot's responses to establish a basline score.

Evaluation Metric Comment Score
Accurary measure the correctness of Q&A bot responses 66.67 %
URL Hallucination measure the validity and relevancy of the links 10.61 %
Query Relevancy measure if the query is relevant to W&B 88.64 %

Auto Evaluation (LLM evaluate LLM)

We employed a few auto evaluation strategies to speed up the iteration process of the bot's development

Evaluation Metric Comment Score
Faithfulness Accuracy measures if the response from a RAG pipeline matches any retrieved chunk 53.78 %
Relevancy Accuracy measures is the generated response is in-line with the context 61.36 %
Hit Rate measures if the correct chunk is present in the retrieved chunks 0.79
Mean Reciprocal Ranking (MRR) measures the quality of the retriever 0.74

Overview of the Implementation

  1. Creating Document Embeddings with FAISS
  2. Constructing the Q&A Pipeline using llama-index
  3. Selection of Models and Implementation of Fallback Mechanism
  4. Deployment of the Q&A Bot on FastAPI, Discord, and Slack
  5. Utilizing Weights & Biases Tables for Logging and Analysis
  6. Evaluating the Performance of the Q&A Bot

You can monitor the usage of the bot in the following project: https://wandb.ai/wandbot/wandbot_public

More Repositories

1

openui

OpenUI let's you describe UI using your imagination, then see it rendered live.
TypeScript
18,465
star
2

wandb

The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
Python
8,850
star
3

examples

Example deep learning projects that use wandb's features.
Jupyter Notebook
1,113
star
4

weave

Weave is a toolkit for developing AI-powered applications, built by Weights & Biases.
TypeScript
664
star
5

edu

Educational materials on deep learning by Weights & Biases
Jupyter Notebook
492
star
6

awesome-dl-projects

This is a collection of the code that accompanies the reports in The Gallery by Weights & Biases.
Jupyter Notebook
322
star
7

server

W&B Server is the self hosted version of Weights & Biases
HCL
243
star
8

Groundbreaking-Papers

ML Research paper summaries, annotated papers and implementation walkthroughs
111
star
9

llm-leaderboard

Project of llm evaluation to Japanese tasks
Python
66
star
10

droughtwatch

Weights & Biases benchmark for drought prediction
Jupyter Notebook
54
star
11

gitbook

Documentation synced with GitBook. For all issues with the wandb library, please use https://github.com/wandb/client/issues
JavaScript
41
star
12

programmer

Python
36
star
13

sweeps

W&B Hyperparameter Sweep Engine. File sweeps related issues at the W&B client: https://github.com/wandb/client
Python
34
star
14

witness

Deep learning model for recognizing puzzle patterns in The Witness.
Python
27
star
15

Hemm

A holistic evaluation library for multi-modal generative models using Weave
Python
20
star
16

superres

Project to make a higher resolution version of existing images
Python
19
star
17

layoutlm_sroie_demo

Finetune LayoutLM on SROIE dataset using W&B tools
Python
18
star
18

terraform-aws-wandb

A terraform module for deploying Weights & Biases on AWS.
HCL
17
star
19

helm-charts

Our official helm charts for deploying wandb into k8s
Mustache
17
star
20

terraform-google-dagster

HCL
17
star
21

launch-jobs

🚀💼
Python
16
star
22

client-ng

Experimental wandb CLI and Python API - See Experimental section below.
Python
16
star
23

lit_utils

Utilities for working with W&B and PyTorch Lightning in an educational context
Python
15
star
24

catz

A machine learning contest to predict the behavior of catz
Python
15
star
25

llm-workshop-fc2024

Resources for the FC 2024 LLM workshop
Jupyter Notebook
15
star
26

terraform-google-wandb

A Terraform module for deploying Weights & Biases on GCP.
HCL
12
star
27

artifacts-examples

W&B Artifacts examples
Python
12
star
28

nb_helpers

A set of tools to work with notebooks
Jupyter Notebook
9
star
29

parallel

Easy & robust parallelism in golang
Go
9
star
30

wandb-js

The W&B SDK for TypeScript, Node, and modern Web Browsers
TypeScript
8
star
31

qualcomm-contest

Jupyter Notebook
7
star
32

wandb-workspaces

Programatically edit the W&B UI
Python
7
star
33

SageMakerStudio

A repo showcasing SMSL and W&B
Jupyter Notebook
6
star
34

assets

Weights & Biases logos, branding, and assets to use and share
6
star
35

react-vis

Fork of github.com/uber/react-vis with bugfixes and extensions
JavaScript
5
star
36

terraform-azurerm-wandb

HCL
5
star
37

server-cli

Go
3
star
38

terraform-kubernetes-wandb

HCL
3
star
39

weaveflow

Jupyter Notebook
3
star
40

wandbmon

wandb wrapper for production monitoring and evaluation usecases
Python
3
star
41

wandb-uat

User acceptance testing for the Weights & Biases python SDK library.
Python
3
star
42

codesearchnet

Python
2
star
43

awesome-dl-resources

2
star
44

docugen

Reference documentation generator for Weights & Biases
Python
2
star
45

client-java

Java
2
star
46

davis-contest

Materials for the DAVIS Video Segmentation Contest
Jupyter Notebook
2
star
47

sampled-log-example

Python
2
star
48

weave-analysis

Jupyter Notebook
2
star
49

connections

Solving NYTimes Connections puzzle
Python
2
star
50

wandb-content-navigator

LLM-powered RAG slackbot and endpoint to suggest Weights & Biases content
Python
2
star
51

runchain

Example of Run Chaining
Python
1
star
52

hub

Default files and setup scripts for the hub
Shell
1
star
53

yea

Yea functional test harness
Python
1
star
54

jetson-webhook

Using WandB Webhooks on Edge Devices
Python
1
star
55

dsviz-demo

Jupyter Notebook
1
star
56

pong

A reinforcement learning contest to master the game of pong
Python
1
star
57

terraform-google-assume-aws-role

HCL
1
star
58

auto-release-notes

TypeScript
1
star
59

tiny-ml

TinyML tools for and with WandB
Jupyter Notebook
1
star
60

wandb-testing

Repo to store testing related tools
Python
1
star
61

text-extraction

Python
1
star
62

mixeval-weave

Evaluating LLMs on the MixEval dataset using W&B Weave
Python
1
star
63

libwandb-cpp

1
star
64

mon-sdk-dev

Python
1
star
65

yea-wandb

Python
1
star
66

nexus

Go
1
star
67

gpu_dashboard

extract gpu usage across the teams
Python
1
star