• Stars
    star
    11,955
  • Rank 2,743 (Top 0.06 %)
  • Language
    Python
  • License
    MIT License
  • Created 8 months ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

STORM: Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking

This repository contains the code for our NAACL 2024 paper Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models by Yijia Shao, Yucheng Jiang, Theodore A. Kanell, Peter Xu, Omar Khattab, and Monica S. Lam.

STORM is a LLM system that writes Wikipedia-like articles from scratch based on Internet search.

While the system cannot produce publication-ready articles that often require a significant number of edits, experienced Wikipedia editors have found it helpful in their pre-writing stage.

Try out our live demo to see how STORM can help your knowledge exploration journey and please provide feedback to help us improve the system ๐Ÿ™!

Research Before Writing

STORM breaks down generating long articles with citations into two steps:

  1. Pre-writing stage: The system conducts Internet-based research to collect references and generates an outline.
  2. Writing stage: The system uses the outline and references to generate the full-length article with citations.

STORM identifies the core of automating the research process as automatically coming up with good questions to ask. Directly prompting the language model to ask questions does not work well. To improve the depth and breadth of the questions, STORM adopts two strategies:

  1. Perspective-Guided Question Asking: Given the input topic, STORM discovers different perspectives by surveying existing articles from similar topics and uses them to control the question-asking process.
  2. Simulated Conversation: STORM simulates a conversation between a Wikipedia writer and a topic expert grounded in Internet sources to enable the language model to update its understanding of the topic and ask follow-up questions.

Based on the separation of the two stages, STORM is implemented in a highly modular way (see engine.py) using dspy.

Setup

We view STORM as an example of automated knowledge curation. We are working on enhancing our codebase to increase its extensibility. Stay tuned!

Below, we provide a quick start guide to run STORM locally to reproduce our experiments.

  1. Install the required packages.
    conda create -n storm python=3.11
    conda activate storm
    pip install -r requirements.txt
  2. Set up OpenAI API key and You.com search API key. Create a file secrets.toml under the root directory and add the following content:
    # Set up OpenAI API key.
    OPENAI_API_KEY=<your_openai_api_key>
    # If you are using the API service provided by OpenAI, include the following line:
    OPENAI_API_TYPE="openai"
    # If you are using the API service provided by Microsoft Azure, include the following lines:
    OPENAI_API_TYPE="azure"
    AZURE_API_BASE=<your_azure_api_base_url>
    AZURE_API_VERSION=<your_azure_api_version>
    # Set up You.com search API key.
    YDC_API_KEY=<your_youcom_api_key>

Paper Experiments

The FreshWiki dataset used in our experiments can be found in ./FreshWiki.

Run the following commands under ./src.

Pre-writing Stage

For batch experiment on FreshWiki dataset:

python -m scripts.run_prewriting --input-source file --input-path ../FreshWiki/topic_list.csv  --engine gpt-4 --do-research --max-conv-turn 5 --max-perspective 5
  • --engine (choices=[gpt-4, gpt-35-turbo]): the LLM engine used for generating the outline
  • --do-research: if True, simulate conversation to research the topic; otherwise, load the results.
  • --max-conv-turn: the maximum number of questions for each information-seeking conversation
  • --max-perspective: the maximum number of perspectives to be considered, each perspective corresponds to an information-seeking conversation.
    • STORM also uses a general conversation to collect basic information about the topic. So, the maximum number of QA pairs is max_turn * (max_perspective + 1). ๐Ÿ’ก Reducing max_turn or max_perspective can speed up the process and reduce the cost but may result in less comprehensive outline.
    • The parameter will not have any effect if --disable-perspective is set (the perspective-driven question asking is disabled).

To run the experiment on a single topic:

python -m scripts.run_prewriting --input-source console --engine gpt-4 --max-conv-turn 5 --max-perspective 5 --do-research
  • The script will ask you to enter the Topic and the Ground truth url that will be excluded. If you do not have any url to exclude, leave that field empty.

The generated outline will be saved in {output_dir}/{topic}/storm_gen_outline.txt and the collected references will be saved in {output_dir}/{topic}/raw_search_results.json.

Writing Stage

For batch experiment on FreshWiki dataset:

python -m scripts.run_writing --input-source file --input-path ../FreshWiki/topic_list.csv --engine gpt-4 --do-polish-article --remove-duplicate
  • --do-polish-article: if True, polish the article by adding a summarization section and removing duplicate content if --remove-duplicate is set True.

To run the experiment on a single topic:

python -m scripts.run_writing --input-source console --engine gpt-4 --do-polish-article --remove-duplicate
  • The script will ask you to enter the Topic. Please enter the same topic as the one used in the pre-writing stage.

The generated article will be saved in {output_dir}/{topic}/storm_gen_article.txt and the references corresponding to citation index will be saved in {output_dir}/{topic}/url_to_info.json. If --do-polish-article is set, the polished article will be saved in {output_dir}/{topic}/storm_gen_article_polished.txt.

Customize the STORM Configurations

We set up the default LLM configuration in LLMConfigs in src/modules/utils.py. You can use set_conv_simulator_lm(),set_question_asker_lm(), set_outline_gen_lm(), set_article_gen_lm(), set_article_polish_lm() to override the default configuration. These functions take in an instance from dspy.dsp.LM or dspy.dsp.HFModel.

๐Ÿ’ก For a good practice,

  • choose a cheaper/faster model for conv_simulator_lm which is used to split queries, synthesize answers in the conversation.
  • if you need to conduct the actual writing step, choose a more powerful model for article_gen_lm. Based on our experiments, weak models are bad at generating text with citations.

Automatic Evaluation

In our paper, we break down the evaluation into two parts: outline quality and full-length article quality.

Outline Quality

We introduce heading soft recall and heading entity recall to evaluate the outline quality. This makes it easier to prototype methods for pre-writing.

Run the following command under ./eval to compute the metrics on FreshWiki dataset:

python eval_outline_quality.py --input-path ../FreshWiki/topic_list.csv --gt-dir ../FreshWiki --pred-dir ../results --pred-file-name storm_gen_outline.txt --result-output-path ../results/storm_outline_quality.csv

Full-length Article Quality

eval/eval_article_quality.py provides the entry point of evaluating full-length article quality using ROUGE, entity recall, and rubric grading. Run the following command under eval to compute the metrics:

python eval_article_quality.py --input-path ../FreshWiki/topic_list.csv --gt-dir ../FreshWiki --pred-dir ../results --gt-dir ../FreshWiki --output-dir ../results/storm_article_eval_results --pred-file-name storm_gen_article_polished.txt

Use the Metric Yourself

The similarity-based metrics (i.e., ROUGE, entity recall, and heading entity recall) are implemented in eval/metrics.py.

For rubric grading, we use the prometheus-13b-v1.0 introduced in this paper. eval/evaluation_prometheus.py provides the entry point of using the metric.

Contributions

If you have any questions or suggestions, please feel free to open an issue or pull request. We welcome contributions to improve the system and the codebase!

Contact person: Yijia Shao and Yucheng Jiang

Citation

Please cite our paper if you use this code or part of it in your work:

@inproceedings{shao2024assisting,
      title={{Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models}}, 
      author={Yijia Shao and Yucheng Jiang and Theodore A. Kanell and Peter Xu and Omar Khattab and Monica S. Lam},
      year={2024},
      booktitle={Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)}
}

More Repositories

1

WikiChat

WikiChat stops the hallucination of large language models by retrieving data from Wikipedia.
Python
965
star
2

genie-cloud

Genie As A Service and Thingpedia
JavaScript
295
star
3

genie-server

The home server version of Almond
JavaScript
268
star
4

suql

SUQL: Conversational Search over Structured and Unstructured Data with LLMs
Python
193
star
5

genie-toolkit

The Genie open source kit for voice assistant (formerly known as Almond)
TypeScript
193
star
6

almond-gnome

The Almond Virtual Assistant, Linux desktop version
JavaScript
86
star
7

genienlp

GenieNLP: A versatile codebase for any NLP task
Python
84
star
8

thingtalk

The Programming Language of Virtual Assistants
TypeScript
84
star
9

wikidata-emnlp23

WikiSP, a semantic parser for Wikidata. WikiWebQuestions, a SPARQL-annotated dataset on Wikidata
Python
77
star
10

almond-dialog-agent

The Open Virtual Assistant
JavaScript
56
star
11

thingengine-core

A Modular, Powerful Virtual Assistant Engine
JavaScript
38
star
12

thingpedia-common-devices

Thingpedia interface code for commonly used devices
JavaScript
38
star
13

ovalchat

OVALChat is a customizable Web app aimed at conducting user studies with chatbots
TypeScript
27
star
14

almond-cmdline

Full-featured command-line version of Almond & ThingEngine
JavaScript
26
star
15

zero-shot-multiwoz-acl2020

Artifact associated with the paper "Zero-Shot Transfer Learning with Synthesized Data for Multi-Domain Dialogue State Tracking"
Makefile
24
star
16

schema2qa

Schema2QA Question Answering Dataset
Makefile
18
star
17

chainlite

LangChain + LiteLLM that works
Python
18
star
18

genie-parser

Neural Network Semantic Parser for Almond
Python
16
star
19

noora-v1

[ARCHIVED] A platform utilizing conversational AI to improve the social skills of individuals with ASD.
TypeScript
15
star
20

dialogues

A unified versatile interface for dialogue datasets
Python
15
star
21

node-pulseaudio

A fork of https://bitbucket.org/kayo/node-pulseaudio, which was unmantained. Home of pulseaudio2 npm module
C++
15
star
22

almond-android

The Almond Android App
C
12
star
23

almond-hassio-repository

Dockerfile
10
star
24

node-smtlib

Node.js wrappers for SMT-Lib 2.0
TypeScript
10
star
25

genie-client

C++
10
star
26

SPL

Semantic Parser Localizer (SPL) code repository
Python
9
star
27

thingpedia-api

Shared code for Thingpedia interfaces
TypeScript
8
star
28

noora

Using conversational AI to improve the social conversation of individuals with ASD.
TypeScript
7
star
29

genie-k8s

Kubernetes scripts to train models with Genie
Python
7
star
30

trade-dst

Jupyter Notebook
6
star
31

cs224v-fall2021

Makefile
6
star
32

pyGenieScript

A packaged GenieScript in Python
Python
5
star
33

diya

Make an API for things that don't have an API
JavaScript
5
star
34

almond-voice

A prototype voice interface for Almond, an open-source virtual assistant developed at Stanford.
TypeScript
5
star
35

ThingEngine

An open source platform for IoT rules that you can execute anywhere you want
5
star
36

almond-tokenizer

The tokenizer and preprocessor part of the Almond parser
Java
5
star
37

spinach

SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions
5
star
38

genie-sdk

Genie skill development kit
Shell
4
star
39

pyalmond

Python client for the Almond API
Python
4
star
40

almond-bot

The Almond Bot Service.
TypeScript
4
star
41

consumer-queue

JavaScript
3
star
42

genie_open_text

Python
3
star
43

transparent-rpc

Automatic Proxy-based RPC for Node.js
TypeScript
3
star
44

CSP-DST

Code implementation for the paper "Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues"
Python
3
star
45

cs294-homework-workdir

The workdir for cs294s/w homeworks
Makefile
3
star
46

oval-website

The new OVAL website
Astro
3
star
47

medxchange

Medical Data Exchange (MedXchange) platform
JavaScript
2
star
48

almond-enterprise

On-premise multi-user, single-profile Almond
JavaScript
2
star
49

thingpedia-discovery

Device discovery components for ThingPedia
JavaScript
2
star
50

query-validation

Server-side validation of query and body parameters for Express
TypeScript
2
star
51

project-cardiology

A virtual assistant, based on Almond, that helps doctors remind patients to track their blood pressure regularly.
JavaScript
1
star
52

thingpedia-demos

Demo, testing and mock devices for ThingPedia
JavaScript
1
star
53

node-libcanberra

Node.js bindings for libcanberra (event sound playing)
C++
1
star
54

thingpedia-cli

Command-line tools to interact with Thingpedia
JavaScript
1
star
55

gpt3-example

A simple example on how to use GPT-3 via the OpenAI API
Python
1
star
56

GenieScript-Python

Python
1
star
57

thingtalk-units

Unit conversion library from ThingTalk
TypeScript
1
star
58

slackmond

Almond-Slack bridge with multi-user support
JavaScript
1
star
59

cs224v-fall2022

Makefile
1
star
60

web-questions-wikidata

TypeScript
1
star
61

wikidata-scripts

Wikidata Scripts
JavaScript
1
star
62

thingpedia-client

Thingpedia client side libraries
JavaScript
1
star