• Stars
    star
    965
  • Rank 47,414 (Top 1.0 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 1 year ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

WikiChat stops the hallucination of large language models by retrieving data from Wikipedia.

Wikipedia

WikiChat
arXiv Github Stars

Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

Online demo: https://wikichat.genie.stanford.edu

Introduction

Large language model (LLM) chatbots like ChatGPT and GPT-4 get things wrong a lot, especially if the information you are looking for is recent ("Tell me about the 2024 Super Bowl.") or about less popular topics ("What are some good movies to watch from [insert your favorite foreign director]?"). WikiChat uses Wikipedia and the following 7-stage pipeline to makes sure its responses are factual.

WikiChat Pipeline

Check out our paper for more details: Sina J. Semnani, Violet Z. Yao*, Heidi C. Zhang*, and Monica S. Lam. 2023. WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore. Association for Computational Linguistics.

🚨 Announcements

  • (January 8, 2024) Distilled LLaMA-2 models are released. You can run these models locally for a cheaper and faster alternative to paid APIs.
  • (December 8, 2023) We present our work at EMNLP 2023.
  • (October 27, 2023) The camera-ready version of our paper is now available on arXiv.
  • (October 06, 2023) Our paper is accepted to the Findings of EMNLP 2023.

Installation

Everything has been tested using Python 3.8 on Ubuntu 20.04, but should run on other Linux distributions as well. If you want to use this on Windows or Mac, or with a newer Python version, expect to do some troubleshooting in a few of the installation steps. We recommend using the conda environment in conda_env.yaml.

Clone the repo:

git clone https://github.com/stanford-oval/WikiChat.git
cd WikiChat

Create and activate the Python environment:

conda env create --file conda_env.yml
conda activate wikichat
python -m spacy download en_core_web_sm

MMake sure this environment is activated whenever you run any of the following commands.

Set up the LLM endpoint:

WikiChat is compatible with various LLMs, but we recommend using OpenAI models, via either openai.com or Azure. Additionally, you can use Together.ai, or host your own model locally, but these often result in lower quality outputs.

Fill out the appropriate fields in llm_config.yaml.

Then create a file named API_KEYS (which is included in .gitignore), copy the following into it and fill out the fields for the LLM endpoint you want to use:

# Fill in the following values with your own keys for the API you are using. Make sure there is not extra space after the key.
# Changes to this file are ignored by git, so that you can safely store your keys here during development.
export AZURE_OPENAI_API_KEY=[Your Azure OpenAI API key from "Keys and Endpoint" section of your deployment]
export OPENAI_API_KEY=[Your OpenAI API key from https://platform.openai.com/api-keys]
export TOGETHER_API_KEY=[Your Together.ai API key from https://api.together.xyz/settings/api-keys]

# Fill in this value if you are using COSMOS DB to store user data via a front end.
export COSMOS_CONNECTION_STRING=[Your COSMOS connection string]

Set up the retrieval index

To retrieve reliable information, WikiChat uses ColBERT to retrieve from a text corpus. You can use our pre-built Wikipedia index, or create your own index using your own data (see below). Following most prior work on Wikipedia, we discard tables and lists (including bullet points) when indexing Wikipedia.

Download the English Wikipedia index from the HuggingFace Hub:

make download-colbert-index language=en wiki_date=11_06_2023

Start a ColBERT inference server that accepts HTTP requests:

make start-colbert-gunicorn wiki_date=11_06_2023 language=en

No GPU is needed to use ColBERT as it is set to use CPU. The entire index will be loaded to RAM, which requires about 100GB of RAM. If you don't have that much RAM, you can enable memory mapping by adding colbert_memory_map=true to this command. This will reduce the RAM usage to about 35GB, but will make retrieval slower.

You need to keep this process running so that the chatbot can communicate with the ColBERT index. For that, either keep this terminal open, or use tmux/screen. By default, the server listens on port 5000. You can test this server by running a curl command like this in a new terminal:

curl http://127.0.0.1:5000/search -d '{"query": "who is the current monarch of the united kingdom?", "evi_num": 1}' -X GET -H 'Content-Type: application/json'

Interactive Chat via Command Line

You can run different versions of the WikiChat pipeline. Here are a few configurations:

make demo pipeline=early_combine do_refine=true engine=gpt-4 # the WikiChat_G4 pipeline from the paper
make demo pipeline=early_combine do_refine=true engine=text-davinci-003 # the WikiChat_G3.5 pipeline from the paper
make demo pipeline=early_combine do_refine=false engine=local # the WikiChat_L pipeline from the paper, when used in conjunction with our distilled LLaMA-7B model
make demo pipeline=generate do_refine=false engine=... # the baseline, only LLM generation (Stage 3 of the pipeline)
make demo pipeline=generate_and_correct do_refine=false engine=... # only fact-check the LLM's output (Stages 3, 4, 5, 6 of the pipeline)
make demo pipeline=retrieve_and_generate do_refine=false engine=... # Stages 1, 2, 6 of the pipeline
make demo pipeline=retrieve_only do_refine=false engine=... # Stage 1 of the pipeline

make demo pipeline=early_combine do_refine=true engine=gpt-35-turbo-instruct # a faster version of WikiChat (not in the paper)
make demo pipeline=early_combine do_refine=false engine=gpt-35-turbo-instruct draft_engine=gpt-4 # a balanced version of WikiChat, hallucinates less than the full gpt-35-turbo-instruct version, and has about the same latency (not in the paper)

engine can be any value on the left hand side of an engine_map in llm_config.yaml.

See pipelines/pipeline_arguements.py for more details on the different pipeline configurations.

Run a distilled model for lower latency and cost

First, make sure to set the api_base field in llm_config.yaml, and note the ip address and port number.

Then start the inference server in a separate terminal using HuggingFace's text-generation-inference library. We recommend using their provided Docker image given its ease of use. Download one of the available models, then run docker run --gpus all --shm-size 1g -p <port>:80 -v ./:/data ghcr.io/huggingface/text-generation-inference:1.3.4 --model-id /data/<path-to-model-directory> --hostname <ip> --num-shard <number-of-gpus>. When the inference server is running on an NVIDIA A100 GPU, each chatbot response should take just a few seconds, plus the time needed for retrieval via ColBERT.

The following models are available on the HuggingFace Hub.

  • stanford-oval/Llama-2-7b-WikiChat: A slightly improved version of the distilled LLaMA model described in our paper. Similar to the paper, stages 6 (draft) and 7 (refine) are fused together. The differences from the paper are:
  1. In addition to fine-tuning on simulated conversations about Wikipedia topics, we also include several less knowledge-intensive (i.e. more chit-chat) simulated conversations to prevent failure modes.
  2. We fine-tune LLaMA-2-7B instead of LLaMA-7B. Run make demo engine=local do_refine=false if you are using this model.
  • stanford-oval/Llama-2-7b-WikiChat-fused: Similar to the previous model, except that stages 3 (generate) and 4 (extract claim) are also fused together. Therefore, this model is almost twice as fast. Run make demo engine=local do_refine=false fuse_claim_splitting=true if you are using this model.

Run user simulator

In order to evaluate a chatbot, you can simulate conversations with a user simulator. subset can be one of head, tail, or recent, corresponding to the three subsets introduced in the paper. This script will read the topic (i.e. a Wikipedia title and article) from the corresponding benchmark/topics/$(subset)_articles.json. file. num_output_dialogs is the number of simulated dialogs to generate, and num_output_turns is the number of turns in each dialog.

make simulate-users user_engine=gpt-4 user_temperature=1 subset=head num_output_dialogs=1 num_output_turns=2

Depending on the engine you are using, this might take some time. The simulated dialogs and the log file will be saved in benchmark/simulated_dialogs/. You can also provide any of the pipeline parameters from above. You can experiment with different user characteristics by modifying user_characteristics in benchmark/scripts/user_simulator.py.

Evaluate factuality and conversationality of any conversation

Coming soon!

Interactive Chat via a Web Interface

You can try out our online demo at wikichat.genie.stanford.edu, or host your own. The following code requires Cosmos DB (Azure's offering of MongoDB) to store user data, but you can easily modify it to use a different database.

Bring up the backend API:

make start-backend-gunicorn do_refine=true ssl_key_file=/path/to/privkey.pem ssl_certificate_file=/path/to/fullchain.pem

This script requires generating an SSL certificate. You can use a self-signed certificate for development or disable SSL by modifying the code. However, for production and if you are hosting the frontend and backend on separate machines, you should use a valid certificate. Otherwise, user inputs are transferred from the frontend to the backend API unencrypted. You can obtain a free certificate from Let's Encrypt.

Note that engine and pipeline are not important here, as the frontend will send the appropriate parameters to the backend. This brings up a gunicorn server (on port 5001) and a Redis server (on port 5003) to persis Flask's rate limit data to disk. To shut down the gunicorn server, simply press ctrl + c. To shut down the Redis server, run redis-cli -p 5003 shutdown.

Run this curl command to test the backend:

curl http://127.0.0.1:5001/chat -d '{"experiment_id": "test_experiment", "dialog_id": "test_dialog", "turn_id": 0, "system_name": "early_combine", "new_user_utterance": "Who stars in the wandering earth 2?"}' -X POST -H 'Content-Type: application/json'

The frontend is located at https://github.com/stanford-oval/ovalchat, which you can deploy separately on another VM or via vercel.com. Follow the instructions there to set up the frontend.

Create your own retrieval index

You can change the default values of the following parameters in Makefile if needed:

nbits ?= 1# encode each dimension with 2 bits
max_block_words ?= 100# maximum number of words in each paragraph
doc_maxlen ?= 140# number of "tokens", to account for (100 "words" + title) that we include in each wikipedia paragraph
wiki_date ?= 11_06_2023# The date on which the Wikipedia dump was downloaded
language ?= en# the English Wikipedia
nranks ?= 8# number of GPUs to use for indexing

If you are supplying your own data, make sure you have a file named collection_all.tsv with the format id \t title | passage in each line, then skip to make index-wiki below.

Download latest English wikipedia dump:

make download-latest-wiki

Run wikiextractor:

make extract-wiki

This will extract the pages into a set of sharded files, which will be located in the text/ directory. This step takes a few hours.

Run

make split-wiki

This script will split the Wikipedia documents into blocks, with each block containing up to max_block_words words. It will write these blocks into collection_all.tsv which is then used for making the ColBERT index.

Run this command to start ColBERT indexing. This step should be run on GPUs:

make index-wiki

Optionally, you can merge all the smaller index files into a single file. This will enable you to use mempory mapping when loading the index, which will reduce the RAM usage at the cost of increasing the retrieval speed. This step requires enough RAM to load the entire index, about 100GB of RAM, but after that, the resulting index can be run on a machine with as little as 35GB of RAM.

make coalesce-index

Citation

If you have used code or data from this repository, please cite this paper:

@inproceedings{semnani-etal-2023-wikichat,
    title = "{W}iki{C}hat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on {W}ikipedia",
    author = "Semnani, Sina  and
      Yao, Violet  and
      Zhang, Heidi  and
      Lam, Monica",
    editor = "Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-emnlp.157",
    pages = "2387--2413",
}

This repository also contains code from https://github.com/attardi/wikiextractor for preprocessing Wikipedia, and https://github.com/stanford-futuredata/ColBERT/ for ColBERT. If you use code from these repositories, please cite their respective repository and paper as well.

More Repositories

1

storm

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
Python
11,955
star
2

genie-cloud

Genie As A Service and Thingpedia
JavaScript
295
star
3

genie-server

The home server version of Almond
JavaScript
268
star
4

suql

SUQL: Conversational Search over Structured and Unstructured Data with LLMs
Python
193
star
5

genie-toolkit

The Genie open source kit for voice assistant (formerly known as Almond)
TypeScript
193
star
6

almond-gnome

The Almond Virtual Assistant, Linux desktop version
JavaScript
86
star
7

genienlp

GenieNLP: A versatile codebase for any NLP task
Python
84
star
8

thingtalk

The Programming Language of Virtual Assistants
TypeScript
84
star
9

wikidata-emnlp23

WikiSP, a semantic parser for Wikidata. WikiWebQuestions, a SPARQL-annotated dataset on Wikidata
Python
77
star
10

almond-dialog-agent

The Open Virtual Assistant
JavaScript
56
star
11

thingengine-core

A Modular, Powerful Virtual Assistant Engine
JavaScript
38
star
12

thingpedia-common-devices

Thingpedia interface code for commonly used devices
JavaScript
38
star
13

ovalchat

OVALChat is a customizable Web app aimed at conducting user studies with chatbots
TypeScript
27
star
14

almond-cmdline

Full-featured command-line version of Almond & ThingEngine
JavaScript
26
star
15

zero-shot-multiwoz-acl2020

Artifact associated with the paper "Zero-Shot Transfer Learning with Synthesized Data for Multi-Domain Dialogue State Tracking"
Makefile
24
star
16

schema2qa

Schema2QA Question Answering Dataset
Makefile
18
star
17

chainlite

LangChain + LiteLLM that works
Python
18
star
18

genie-parser

Neural Network Semantic Parser for Almond
Python
16
star
19

noora-v1

[ARCHIVED] A platform utilizing conversational AI to improve the social skills of individuals with ASD.
TypeScript
15
star
20

dialogues

A unified versatile interface for dialogue datasets
Python
15
star
21

node-pulseaudio

A fork of https://bitbucket.org/kayo/node-pulseaudio, which was unmantained. Home of pulseaudio2 npm module
C++
15
star
22

almond-android

The Almond Android App
C
12
star
23

almond-hassio-repository

Dockerfile
10
star
24

node-smtlib

Node.js wrappers for SMT-Lib 2.0
TypeScript
10
star
25

genie-client

C++
10
star
26

SPL

Semantic Parser Localizer (SPL) code repository
Python
9
star
27

thingpedia-api

Shared code for Thingpedia interfaces
TypeScript
8
star
28

noora

Using conversational AI to improve the social conversation of individuals with ASD.
TypeScript
7
star
29

genie-k8s

Kubernetes scripts to train models with Genie
Python
7
star
30

trade-dst

Jupyter Notebook
6
star
31

cs224v-fall2021

Makefile
6
star
32

pyGenieScript

A packaged GenieScript in Python
Python
5
star
33

diya

Make an API for things that don't have an API
JavaScript
5
star
34

almond-voice

A prototype voice interface for Almond, an open-source virtual assistant developed at Stanford.
TypeScript
5
star
35

ThingEngine

An open source platform for IoT rules that you can execute anywhere you want
5
star
36

almond-tokenizer

The tokenizer and preprocessor part of the Almond parser
Java
5
star
37

spinach

SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions
5
star
38

genie-sdk

Genie skill development kit
Shell
4
star
39

pyalmond

Python client for the Almond API
Python
4
star
40

almond-bot

The Almond Bot Service.
TypeScript
4
star
41

consumer-queue

JavaScript
3
star
42

genie_open_text

Python
3
star
43

transparent-rpc

Automatic Proxy-based RPC for Node.js
TypeScript
3
star
44

CSP-DST

Code implementation for the paper "Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues"
Python
3
star
45

cs294-homework-workdir

The workdir for cs294s/w homeworks
Makefile
3
star
46

oval-website

The new OVAL website
Astro
3
star
47

medxchange

Medical Data Exchange (MedXchange) platform
JavaScript
2
star
48

almond-enterprise

On-premise multi-user, single-profile Almond
JavaScript
2
star
49

thingpedia-discovery

Device discovery components for ThingPedia
JavaScript
2
star
50

query-validation

Server-side validation of query and body parameters for Express
TypeScript
2
star
51

project-cardiology

A virtual assistant, based on Almond, that helps doctors remind patients to track their blood pressure regularly.
JavaScript
1
star
52

thingpedia-demos

Demo, testing and mock devices for ThingPedia
JavaScript
1
star
53

node-libcanberra

Node.js bindings for libcanberra (event sound playing)
C++
1
star
54

thingpedia-cli

Command-line tools to interact with Thingpedia
JavaScript
1
star
55

gpt3-example

A simple example on how to use GPT-3 via the OpenAI API
Python
1
star
56

GenieScript-Python

Python
1
star
57

thingtalk-units

Unit conversion library from ThingTalk
TypeScript
1
star
58

slackmond

Almond-Slack bridge with multi-user support
JavaScript
1
star
59

cs224v-fall2022

Makefile
1
star
60

web-questions-wikidata

TypeScript
1
star
61

wikidata-scripts

Wikidata Scripts
JavaScript
1
star
62

thingpedia-client

Thingpedia client side libraries
JavaScript
1
star