• Stars
    star
    601
  • Rank 73,989 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.org/abs/2012.12624

DensePhrases

Getting Started | Lee et al., ACL 2021 | Lee et al., EMNLP 2021 | Demo | References | License

DensePhrases is a text retrieval model that can return phrases, sentences, passages, or documents for your natural language inputs. Using billions of dense phrase vectors from the entire Wikipedia, DensePhrases searches phrase-level answers to your questions in real-time or retrieves passages for downstream tasks.

DensePhrases Demo

Please see our ACL paper (Learning Dense Representations of Phrases at Scale) for details on how to learn dense representations of phrases and the EMNLP paper (Phrase Retrieval Learns Passage Retrieval, Too) on how to perform multi-granularity retrieval.

***** Try out our online demo of DensePhrases here! *****

Updates

Getting Started

After installing DensePhrases and dowloading a phrase index you can easily retrieve phrases, sentences, paragraphs, or documents for your query.

densephrases-interactive.mp4

See here for more examples such as using CPU-only mode, creating a custom index, and more.

You can also use DensePhrases to retrieve relevant documents for a dialogue or run entity linking over given texts.

>>> from densephrases import DensePhrases

# Load DensePhrases for dialogue and entity linking
>>> model = DensePhrases(
...     load_dir='princeton-nlp/densephrases-multi-query-kilt-multi',
...     dump_dir='/path/to/densephrases-multi_wiki-20181220/dump',
... )

# Retrieve relevant documents for a dialogue
>>> model.search('I love rap music.', retrieval_unit='document', top_k=5)
['Rapping', 'Rap metal', 'Hip hop', 'Hip hop music', 'Hip hop production']

# Run entity linking for the target phrase denoted as [START_ENT] and [END_ENT]
>>> model.search('[START_ENT] Security Council [END_ENT] members expressed concern on Thursday', retrieval_unit='document', top_k=1)
['United Nations Security Council']

We provide more examples, which includes training a state-of-the-art open-domain question answering model called Fusion-in-Decoder by Izacard and Grave, 2021.

Quick Link

Installation

# Install torch with conda (please check your CUDA version)
conda create -n densephrases python=3.7
conda activate densephrases
conda install pytorch=1.9.0 cudatoolkit=11.0 -c pytorch

# Install apex
git clone https://www.github.com/nvidia/apex.git
cd apex
python setup.py install
cd ..

# Install DensePhrases
git clone -b v1.0.0 https://github.com/princeton-nlp/DensePhrases.git
cd DensePhrases
pip install -r requirements.txt
python setup.py develop

main branch uses python==3.7 and transformers==2.9.0. See below for other versions of DensePhrases.

Release Note Description
v1.0.0 link transformers==2.9.0, same as main
v1.1.0 link transformers==4.13.0

Resources

Before downloading the required files below, please set the default directories as follows and ensure that you have enough storage to download and unzip the files:

# Running config.sh will set the following three environment variables:
# DATA_DIR: for datasets (including 'kilt', 'open-qa', 'single-qa', 'truecase', 'wikidump')
# SAVE_DIR: for pre-trained models or index; new models and index will also be saved here
# CACHE_DIR: for cache files from Huggingface Transformers
source config.sh

To download the resources described below, you can use download.sh as follows:

# Use bash script to download data (change data to models or index accordingly)
source download.sh
Choose a resource to download [data/wiki/models/index]: data
data will be downloaded at ...
...
Downloading data done!

1. Datasets

  • Datasets (1GB) - Pre-processed datasets including reading comprehension, generated questions, open-domain QA and slot filling. Download and unzip it under $DATA_DIR or use download.sh.
  • Wikipedia dumps (5GB) - Pre-processed Wikipedia dumps in different sizes. See here for more details. Download and unzip it under $DATA_DIR or use download.sh.
# Check if the download is complete
ls $DATA_DIR
kilt  open-qa  single-qa  truecase  wikidump

2. Pre-trained Models

Huggingface Transformers

You can use pre-trained models from the Huggingface model hub. Any model name that starts with princeton-nlp (specified in load_dir) will be automatically translated as a model in our Huggingface model hub.

>>> from densephrases import DensePhrases

# Load densephraes-multi-query-nq from the Huggingface model hub
>>> model = DensePhrases(
...     load_dir='princeton-nlp/densephrases-multi-query-nq',
...     dump_dir='/path/to/densephrases-multi_wiki-20181220/dump',
... )

Model list

Model Query-FT. NQ WebQ TREC TriviaQA SQuAD Description
densephrases-multi None 31.9 25.5 35.7 44.4 29.3 EM before any Query-FT.
densephrases-multi-query-multi Multiple 40.8 35.0 48.8 53.3 34.2 Used for demo
Model Query-FT. & Eval EM Prediction (Test) Description
densephrases-multi-query-nq NQ 41.3 link -
densephrases-multi-query-wq WebQ 41.5 link -
densephrases-multi-query-trec TREC 52.9 link --regex required
densephrases-multi-query-tqa TriviaQA 53.5 link -
densephrases-multi-query-sqd SQuAD 34.5 link -

Important: all models except densephrases-multi are query-side fine-tuned on the specified dataset (Query-FT.) using the phrase index densephrases-multi_wiki-20181220. Also note that our pre-trained models are case-sensitive models and the best results are obtained when --truecase is on for any lowercased queries (e.g., NQ).

  • densephrases-multi: trained on mutiple reading comprehension datasets (NQ, WebQ, TREC, TriviaQA, SQuAD).
  • densephrases-multi-query-multi: densephrases-multi query-side fine-tuned on multiple open-domain QA datasets (NQ, WebQ, TREC, TriviaQA, SQuAD).
  • densephrases-multi-query-*: densephrases-multi query-side fine-tuned on each open-domain QA dataset.

For pre-trained models in other tasks (e.g., slot filling), see examples. Note that most pre-trained models are the results of query-side fine-tuning densephrases-multi.

Download manually

  • Pre-trained models (8GB) - All pre-trained DensePhrases models (including cross-encoder teacher models spanbert-base-cased-*). Download and unzip it under $SAVE_DIR or use download.sh.
# Check if the download is complete
ls $SAVE_DIR
densephrases-multi  densephrases-multi-query-nq  ...  spanbert-base-cased-squad
>>> from densephrases import DensePhrases

# Load densephraes-multi-query-nq locally
>>> model = DensePhrases(
...     load_dir='/path/to/densephrases-multi-query-nq',
...     dump_dir='/path/to/densephrases-multi_wiki-20181220/dump',
... )

3. Phrase Index

Please note that you don't need to download this phrase index unless you want to work on the full Wikipedia scale.

  • densephrases-multi_wiki-20181220 (74GB) - Original phrase index (1048576_flat_OPQ96) + metadata for the entire Wikipedia (2018.12.20). Download and unzip it under $SAVE_DIR or use download.sh.

We also provide smaller phrase indexes based on more aggresive filtering (optional).

These smaller indexes should be placed under $SAVE_DIR/densephrases-multi_wiki-20181220/dump/start along with any other indexes you downloaded. If you only use a smaller phrase index and don't want to download the large index (74GB), you need to download metadata (20GB) and place it under $SAVE_DIR/densephrases-multi_wiki-20181220/dump folder as shown below. The structure of the files should look like:

$SAVE_DIR/densephrases-multi_wiki-20181220
โ””โ”€โ”€ dump
    โ”œโ”€โ”€ meta_compressed.pkl
    โ””โ”€โ”€ start
        โ”œโ”€โ”€ 1048576_flat_OPQ96
        โ”œโ”€โ”€ 1048576_flat_OPQ96_medium
        โ””โ”€โ”€ 1048576_flat_OPQ96_small

All phrase indexes are created from the same model (densephrases-multi) and you can use all of pre-trained models above with any of these phrase indexes. To change the index, simply set index_name (or --index_name in densephrases/options.py) as follows:

>>> from densephrases import DensePhrases

# Load DensePhrases with a smaller index
>>> model = DensePhrases(
...     load_dir='princeton-nlp/densephrases-multi-query-multi',
...     dump_dir='/path/to/densephrases-multi_wiki-20181220/dump',
...     index_name='start/1048576_flat_OPQ96_small'
... )

The performance of densephrases-multi-query-nq on Natural Questions (test) with different phrase indexes is shown below.

Phrase Index Open-Domain QA (EM) Sentence Retrieval (Acc@1/5) Passage Retrieval (Acc@1/5) Size Description
1048576_flat_OPQ96 41.3 48.7 / 66.4 52.6 / 71.5 60GB evaluated with eval-index-psg
1048576_flat_OPQ96_medium 39.9 48.3 / 65.8 52.2 / 70.9 39GB
1048576_flat_OPQ96_small 38.0 47.2 / 64.0 50.7 / 69.1 20GB

Note that the passage retrieval accuracy (Acc@1/5) is generally higher than the reported numbers in the paper since these phrase indexes return natural paragraphs instead of fixed-sized text blocks (i.e., 100 words).

Playing with a DensePhrases Demo

You can run the Wikipedia-scale demo on your own server. For your own demo, you can change the phrase index (obtained from here) or the query encoder (e.g., to densephrases-multi-query-nq).

The resource requirement for running the full Wikipedia scale demo is:

  • 50 ~ 100GB RAM (depending on the size of a phrase index)
  • Single 11GB GPU (optional)

Note that you no longer need an SSD to run the demo unlike previous phrase retrieval models (DenSPI, DenSPI+Sparc). The following commands serve exactly the same demo as here on your http://localhost:51997.

# Serve a query encoder on port 1111
nohup python run_demo.py \
    --run_mode q_serve \
    --cache_dir $CACHE_DIR \
    --load_dir princeton-nlp/densephrases-multi-query-multi \
    --cuda \
    --max_query_length 32 \
    --query_port 1111 > $SAVE_DIR/logs/q-serve_1111.log &

# Serve a phrase index on port 51997 (takes several minutes)
nohup python run_demo.py \
    --run_mode p_serve \
    --index_name start/1048576_flat_OPQ96 \
    --cuda \
    --truecase \
    --dump_dir $SAVE_DIR/densephrases-multi_wiki-20181220/dump/ \
    --query_port 1111 \
    --index_port 51997 > $SAVE_DIR/logs/p-serve_51997.log &

# Below are the same but simplified commands using Makefile
make q-serve MODEL_NAME=densephrases-multi-query-multi Q_PORT=1111
make p-serve DUMP_DIR=$SAVE_DIR/densephrases-multi_wiki-20181220/dump/ Q_PORT=1111 I_PORT=51997

Please change --load_dir or --dump_dir if necessary and remove --cuda for CPU-only version. Once you set up the demo, the log files in $SAVE_DIR/logs/ will be automatically updated whenever a new question comes in. You can also send queries to your server using mini-batches of questions for faster inference.

# Test on NQ test set
python run_demo.py \
    --run_mode eval_request \
    --index_port 51997 \
    --test_path $DATA_DIR/open-qa/nq-open/test_preprocessed.json \
    --eval_batch_size 64 \
    --save_pred \
    --truecase

# Same command with Makefile
make eval-demo I_PORT=51997

# Result
(...)
INFO - eval_phrase_retrieval -   {'exact_match_top1': 40.83102493074792, 'f1_score_top1': 48.26451418695196}
INFO - eval_phrase_retrieval -   {'exact_match_top10': 60.11080332409972, 'f1_score_top10': 68.47386731458751}
INFO - eval_phrase_retrieval -   Saving prediction file to $SAVE_DIR/pred/test_preprocessed_3610_top10.pred

For more details (e.g., changing the test set), please see the targets in Makefile (q-serve, p-serve, eval-demo, etc).

DensePhrases: Training, Indexing and Inference

In this section, we introduce a step-by-step procedure to train DensePhrases, create phrase vectors and indexes, and run inferences with the trained model. All of our commands here are simplified as Makefile targets, which include exact dataset paths, hyperparameter settings, etc.

If the following test run completes without an error after the installation and the download, you are good to go!

# Test run for checking installation (takes about 10 mins; ignore the performance)
make draft MODEL_NAME=test
DensePhrases Steps
  • A figure summarizing the overall process below

1. Training phrase and query encoders

To train DensePhrases from scratch, use run-rc-nq in Makefile, which trains DensePhrases on NQ (pre-processed for the reading comprehension task) and evaluate it on reading comprehension as well as on (semi) open-domain QA. You can simply change the training set by modifying the dependencies of run-rc-nq (e.g., nq-rc-data => sqd-rc-data and nq-param => sqd-param for training on SQuAD). You'll need a single 24GB GPU for training DensePhrases on reading comprehension tasks, but you can use smaller GPUs by setting --gradient_accumulation_steps properly.

# Train DensePhrases on NQ with Eq. 9 in Lee et al., ACL'21
make run-rc-nq MODEL_NAME=densephrases-nq

run-rc-nq is composed of the six commands as follows (in case of training on NQ):

  1. make train-rc ...: Train DensePhrases on NQ with Eq. 9 (L = lambda1 L_single + lambda2 L_distill + lambda3 L_neg) with generated questions.
  2. make train-rc ...: Load trained DensePhrases in the previous step and further train it with Eq. 9 with pre-batch negatives.
  3. make gen-vecs: Generate phrase vectors for D_small (= set of all passages in NQ dev).
  4. make index-vecs: Build a phrase index for D_small.
  5. make compress-meta: Compresss metadata for faster inference.
  6. make eval-index ...: Evaluate the phrase index on the development set questions.

At the end of step 2, you will see the performance on the reading comprehension task where a gold passage is given (about 72.0 EM on NQ dev). Step 6 gives the performance on the semi-open-domain setting (denoted as D_small; see Table 6 in the paper) where the entire passages from the NQ development set is used for the indexing (about 62.0 EM with NQ dev questions). The trained model will be saved under $SAVE_DIR/$MODEL_NAME. Note that during the single-passage training on NQ, we exclude some questions in the development set, whose annotated answers are found from a list or a table.

2. Creating a phrase index

Let's assume that you have a pre-trained DensePhrases named densephrases-multi, which can also be downloaded from here. Now, you can generate phrase vectors for a large-scale corpus like Wikipedia using gen-vecs-parallel. Note that you can just download the phrase index for the full Wikipedia scale and skip this section.

# Generate phrase vectors in parallel for a large-scale corpus (default = wiki-dev)
make gen-vecs-parallel MODEL_NAME=densephrases-multi START=0 END=8

The default text corpus for creating phrase vectors is wiki-dev located in $DATA_DIR/wikidump. We have three options for larger text corpora:

  • wiki-dev: 1/100 Wikipedia scale (sampled), 8 files
  • wiki-dev-noise: 1/10 Wikipedia scale (sampled), 500 files
  • wiki-20181220: full Wikipedia (20181220) scale, 5621 files

The wiki-dev* corpora also contain passages from the NQ development set, so that you can track the performance of your model with an increasing size of the text corpus (usually decreases as it gets larger). The phrase vectors will be saved as hdf5 files in $SAVE_DIR/$(MODEL_NAME)_(data_name)/dump (e.g., $SAVE_DIR/densephrases-multi_wiki-dev/dump), which will be referred to $DUMP_DIR below.

Parallelization

START and END specify the file index in the corpus (e.g., START=0 END=8 for wiki-dev and START=0 END=5621 for wiki-20181220). Each run of gen-vecs-parallel only consumes 2GB in a single GPU, and you can distribute the processes with different START and END using slurm or shell script (e.g., START=0 END=200, START=200 END=400, ..., START=5400 END=5621). Distributing 28 processes on 4 24GB GPUs (each processing about 200 files) can create phrase vectors for wiki-20181220 in 8 hours. Processing the entire Wikiepdia requires up to 500GB and we recommend using an SSD to store them if possible (a smaller corpus can be stored in a HDD).

After generating the phrase vectors, you need to create a phrase index for the sublinear time search of phrases. Here, we use IVFOPQ for the phrase index.

# Create IVFOPQ index for a set of phrase vectors
make index-vecs DUMP_DIR=$SAVE_DIR/densephrases-multi_wiki-dev/dump/

For wiki-dev-noise and wiki-20181220, you need to modify the number of clusters to 101,372 and 1,048,576, respectively (simply change medium1-index in รฌndex-vecs to medium2-index or large-index). For wiki-20181220 (full Wikipedia), this takes about 1~2 days depending on the specification of your machine and requires about 100GB RAM. For IVFSQ as described in the paper, you can use index-add and index-merge to distribute the addition of phrase vectors to the index.

You also need to compress the metadata (saved in hdf5 files together with phrase vectors) for a faster inference of DensePhrases. This is mandatory for the IVFOPQ index.

# Compress metadata of wiki-dev
make compress-meta DUMP_DIR=$SAVE_DIR/densephrases-multi_wiki-dev/dump

For evaluating the performance of DensePhrases with your phrase indexes, use eval-index.

# Evaluate on the NQ test set questions
make eval-index MODEL_NAME=densephrases-multi DUMP_DIR=$SAVE_DIR/densephrases-multi_wiki-dev/dump/

3. Query-side fine-tuning

Query-side fine-tuning makes DensePhrases a versatile tool for retrieving multi-granularity text for different types of input queries. While query-side fine-tuning can also improve the performance on QA datasets, it can be used to adapt DensePhrases to non-QA style input queries such as "subject [SEP] relation" to retrieve object entities or "I love rap music." to retrieve relevant documents on rapping.

First, you need a phrase index for the full Wikipedia (wiki-20181220), which can be simply downloaded here, or a custom phrase index as described here. Given your query-answer or query-document pairs pre-processed as json files in $DATA_DIR/open-qa or $DATA_DIR/kilt, you can easily query-side fine-tune your model. For instance, the training set of T-REx ($DATA_DIR/kilt/trex/trex-train-kilt_open_10000.json) looks as follows:

{
    "data": [
        {
            "id": "111ed80f-0a68-4541-8652-cb414af315c5",
            "question": "Effie Germon [SEP] occupation",
            "answers": [
                "actors",
                ...
            ]
        },
        ...
    ]
}

The following command query-side fine-tunes densephrases-multi on T-REx.

# Query-side fine-tune on T-REx (model will be saved as MODEL_NAME)
make train-query MODEL_NAME=densephrases-multi-query-trex DUMP_DIR=$SAVE_DIR/densephrases-multi_wiki-20181220/dump/

Note that the pre-trained query encoder is specified in train-query as --load_dir $(SAVE_DIR)/densephrases-multi and a new model will be saved as densephrases-multi-query-trex as specified in MODEL_NAME. You can also train on different datasets by changing the dependency trex-open-data to *-open-data (e.g., ay2-kilt-data for entity linking).

4. Inference

With any DensePhrases query encoders (e.g., densephrases-multi-query-nq) and a phrase index (e.g., densephrases-multi_wiki-20181220), you can test your queries as follows and the retrieval results will be saved as a json file with the --save_pred option:

# Evaluate on Natural Questions
make eval-index MODEL_NAME=densephrases-multi-query-nq DUMP_DIR=$SAVE_DIR/densephrases-multi_wiki-20181220/dump/

# If the demo is being served on http://localhost:51997
make eval-demo I_PORT=51997

For the evaluation on different datasets, simply change the dependency of eval-index (or eval-demo) accordingly (e.g., nq-open-data to trec-open-data for the evaluation on CuratedTREC).

Pre-processing

At the bottom of Makefile, we list commands that we used for pre-processing the datasets and Wikipedia. For training question generation models (T5-large), we used https://github.com/patil-suraj/question_generation (see also here for QG). Note that all datasets are already pre-processed including the generated questions, so you do not need to run most of these scripts. For creating test sets for custom (open-domain) questions, see preprocess-openqa in Makefile.

Questions?

Feel free to email Jinhyuk Lee ([email protected]) for any questions related to the code or the paper. You can also open a Github issue. Please try to specify the details so we can better understand and help you solve the problem.

References

Please cite our paper if you use DensePhrases in your work:

@inproceedings{lee2021learning,
    title={Learning Dense Representations of Phrases at Scale},
    author={Lee, Jinhyuk and Sung, Mujeen and Kang, Jaewoo and Chen, Danqi},
    booktitle={Association for Computational Linguistics (ACL)},
    year={2021}
}
@inproceedings{lee2021phrase,
    title={Phrase Retrieval Learns Passage Retrieval, Too},
    author={Lee, Jinhyuk and Wettig, Alexander and Chen, Danqi},
    booktitle={Conference on Empirical Methods in Natural Language Processing (EMNLP)},
    year={2021},
}

License

Please see LICENSE for details.

More Repositories

1

SWE-agent

SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.
Python
12,189
star
2

tree-of-thought-llm

[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Python
4,416
star
3

SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
Python
3,310
star
4

SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
Python
1,554
star
5

MeZO

[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
Python
1,002
star
6

PURE

[NAACL 2021] A Frustratingly Easy Approach for Entity and Relation Extraction https://arxiv.org/abs/2010.12812
Python
777
star
7

LM-BFF

[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723
Python
712
star
8

SimPO

SimPO: Simple Preference Optimization with a Reference-Free Reward
Python
510
star
9

LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Python
492
star
10

ALCE

[EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627
Python
414
star
11

LESS

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
Jupyter Notebook
319
star
12

AutoCompressors

[EMNLP 2023] Adapting Language Models to Compress Long Contexts
Python
262
star
13

WebShop

[NeurIPS 2022] ๐Ÿ›’WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Python
247
star
14

TRIME

[EMNLP 2022] Training Language Models with Memory Augmentation https://arxiv.org/abs/2205.12674
Python
189
star
15

CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
Python
187
star
16

intercode

[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
Python
179
star
17

OptiPrompt

[NAACL 2021] Factual Probing Is [MASK]: Learning vs. Learning to Recall https://arxiv.org/abs/2104.05240
Python
167
star
18

TransformerPrograms

[NeurIPS 2023] Learning Transformer Programs
Python
154
star
19

EntityQuestions

EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers https://arxiv.org/abs/2109.08535
Python
134
star
20

QuRating

[ICML 2024] Selecting High-Quality Data for Training Language Models
Python
119
star
21

CEPE

[ACL 2024] Long-Context Language Modeling with Parallel Encodings
Python
117
star
22

DinkyTrain

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration ๐Ÿšƒ
Python
109
star
23

LLMBar

[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
Python
95
star
24

MQuAKE

[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Jupyter Notebook
86
star
25

USACO

Can Language Models Solve Olympiad Programming?
Python
86
star
26

NLProofS

EMNLP 2022: Generating Natural Language Proofs with Verifier-Guided Search https://arxiv.org/abs/2205.12443
Python
80
star
27

MADE

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering
Python
70
star
28

LM-Kernel-FT

A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643
Python
68
star
29

calm-textgame

[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games
Python
64
star
30

CharXiv

CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Python
63
star
31

c-sts

[EMNLP 2023] C-STS: Conditional Semantic Textual Similarity
Python
61
star
32

DataMUX

[NeurIPS 2022] DataMUX: Data Multiplexing for Neural Networks
Jupyter Notebook
58
star
33

ShortcutGrammar

EMNLP 2022: Finding Dataset Shortcuts with Grammar Induction https://arxiv.org/abs/2210.11560
Jupyter Notebook
58
star
34

LitSearch

A Retrieval Benchmark for Scientific Literature Search
Python
53
star
35

Collie

[ICLR 2024] COLLIE: Systematic Construction of Constrained Text Generation Tasks
Jupyter Notebook
51
star
36

EvalConvQA

[ACL 2022] Ditch the Gold Standard: Re-evaluating Conversational Question Answering
Python
45
star
37

MABEL

EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975
Python
35
star
38

LM-Science-Tutor

Python
32
star
39

rationale-robustness

NAACL 2022: Can Rationalization Improve Robustness? https://arxiv.org/abs/2204.11790
Python
26
star
40

PTP

Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073
Python
23
star
41

InstructEval

[NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.
Jupyter Notebook
23
star
42

WhatICLLearns

[ACL 2023 Findings] What In-Context Learning โ€œLearnsโ€ In-Context: Disentangling Task Recognition and Task Learning
Python
21
star
43

Cognac

Repo for paper: Controllable Text Generation with Language Constraints
Python
19
star
44

corpus-poisoning

[EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156
Python
18
star
45

semsup

Semantic Supervision: Enabling Generalization over Output Spaces
Python
16
star
46

ELIZA-Transformer

Representing Rule-based Chatbots with Transformers
Python
15
star
47

SRL-NLC

Safe Reinforcement Learning with Natural Language Constraints
14
star
48

Edge-Pruning

Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
Python
14
star
49

datamux-pretraining

MUX-PLMs: Pretraining LMs with Data Multiplexing
Python
14
star
50

XTX

[ICLR 2022 Spotlight] Multi-Stage Episodic Control for Strategic Exploration in Text Games
Python
13
star
51

MultilingualAnalysis

Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"
Python
13
star
52

blindfold-textgame

[NAACL 2021] Reading and Acting while Blindfolded: The Need for Semantics in Text Game Agents
Python
12
star
53

align-mlm

Python
11
star
54

dyck-transformer

[ACL 2021] Self-Attention Networks Can Process Bounded Hierarchical Languages
Python
11
star
55

metric-wsd

NAACL'2021: Non-Parametric Few-Shot Learning for Word Sense Disambiguation
Python
10
star
56

semsup-xc

SemSup-XC: Semantic Supervision for Extreme Classification
Jupyter Notebook
10
star
57

lwm

We develop world models that can be adapted with natural language. Intergrating these models into artificial agents allows humans to effectively control these agents through verbal communication.
Python
9
star
58

benign-data-breaks-safety

Python
7
star
59

CopyCat

Python
7
star
60

Heuristic-Core

[ACL 2024] The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models - https://arxiv.org/abs/2403.03942
Python
6
star
61

CARETS

Python
6
star
62

SPARTAN

SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers
Python
5
star
63

attribute-tagging

[LaReL 2022] Towards an Enhanced, Faithful, and Adaptable Web Interaction Environment
Python
4
star
64

NegotiationToM

Code release for Improving Dialog Systems for Negotiation with Personality Modeling.
Python
4
star
65

il-scaling-in-games

Official code repo of "Scaling Laws for Imitation Learning in NetHack"
Python
4
star
66

MoQA

Python
3
star