• Stars
    star
    273
  • Rank 150,780 (Top 3 %)
  • Language
    Python
  • Created over 1 year ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[EMNLP 2023] Adapting Language Models to Compress Long Contexts

Adapting Language Models to Compress Long Contexts

This is the official implementation of the paper Adapting Language Models to Compress Long Contexts, in which we train AutoCompressors which are language models with the new capability to (1) compress context information into a small set of summary vectors and (2) reason over these summary vectors which are passed to the model as soft prompts.

Now supporting .generate() and releasing an AutoCompressor based on Llama-2-7b.



Example:

Example use of the API with a pre-trained AutoCompressor model:

import torch
from transformers import AutoTokenizer
from auto_compressor import LlamaAutoCompressorModel, AutoCompressorModel

# Load AutoCompressor trained by compressing 6k tokens in 4 compression steps
tokenizer = AutoTokenizer.from_pretrained("princeton-nlp/AutoCompressor-Llama-2-7b-6k")
# Need bfloat16 + cuda to run Llama model with flash attention
model = LlamaAutoCompressorModel.from_pretrained("princeton-nlp/AutoCompressor-Llama-2-7b-6k", torch_dtype=torch.bfloat16).eval().cuda()

prompt = 'The first name of the current US president is "'
prompt_tokens = tokenizer(prompt, add_special_tokens=False, return_tensors="pt").input_ids.cuda()

context = """Joe Biden, born in Scranton, Pennsylvania, on November 20, 1942, had a modest upbringing in a middle-class family. He attended the University of Delaware, where he double-majored in history and political science, graduating in 1965. Afterward, he earned his law degree from Syracuse University College of Law in 1968.\nBiden's early political career began in 1970 when he was elected to the New Castle County Council in Delaware. In 1972, tragedy struck when his wife Neilia and 1-year-old daughter Naomi were killed in a car accident, and his two sons, Beau and Hunter, were injured. Despite this devastating loss, Biden chose to honor his commitment and was sworn in as a senator by his sons' hospital bedsides.\nHe went on to serve as the United States Senator from Delaware for six terms, from 1973 to 2009. During his time in the Senate, Biden was involved in various committees and was particularly known for his expertise in foreign affairs, serving as the chairman of the Senate Foreign Relations Committee on multiple occasions.\nIn 2008, Joe Biden was selected as the running mate for Barack Obama, who went on to win the presidential election. As Vice President, Biden played an integral role in the Obama administration, helping to shape policies and handling issues such as economic recovery, foreign relations, and the implementation of the Affordable Care Act (ACA), commonly known as Obamacare.\nAfter completing two terms as Vice President, Joe Biden decided to run for the presidency in 2020. He secured the Democratic nomination and faced the incumbent President Donald Trump in the general election. Biden campaigned on a platform of unity, promising to heal the divisions in the country and tackle pressing issues, including the COVID-19 pandemic, climate change, racial justice, and economic inequality.\nIn the November 2020 election, Biden emerged victorious, and on January 20, 2021, he was inaugurated as the 46th President of the United States. At the age of 78, Biden became the oldest person to assume the presidency in American history.\nAs President, Joe Biden has worked to implement his agenda, focusing on various initiatives, such as infrastructure investment, climate action, immigration reform, and expanding access to healthcare. He has emphasized the importance of diplomacy in international relations and has sought to rebuild alliances with global partners.\nThroughout his long career in public service, Joe Biden has been recognized for his commitment to bipartisanship, empathy, and his dedication to working-class issues. He continues to navigate the challenges facing the nation, striving to bring the country together and create positive change for all Americans."""
context_tokens = tokenizer(context, add_special_tokens=False, return_tensors="pt").input_ids.cuda()

summary_vectors = model(context_tokens, output_softprompt=True).softprompt
print(f"Compressing {context_tokens.size(1)} tokens to {summary_vectors.size(1)} summary vectors")
# >>> Compressing 660 tokens to 50 summary vectors

generation_with_summary_vecs = model.generate(prompt_tokens, do_sample=False, softprompt=summary_vectors, max_new_tokens=12)[0]
print("Generation w/ summary vectors:\n" + tokenizer.decode(generation_with_summary_vecs))
# >>> The first name of the current US president is "Joe" and the last name is "Biden".

next_tokens_without_context = model.generate(prompt_tokens, do_sample=False, max_new_tokens=11)[0]
print("Generation w/o context:\n" + tokenizer.decode(next_tokens_without_context))
# >>> The first name of the current US president is "Donald" and the last name is "Trump".

Install

Setup a new environment and install the most recent version of pytorch, followed by these libraries

pip install packaging
pip install transformers==4.35.0 datasets==2.14.4 sentencepiece==0.1.99 flash-attn==2.3.3 wandb
# Flash rotary embeddings (requires setting correct CUDA_HOME variable)
pip install git+https://github.com/Dao-AILab/flash-attention.git#subdirectory=csrc/rotary

Training

train.sh is the main method for training AutoCompressors and defines the most important hyperparameters for train.py. You may have adjust some setting, like the number GPUs, depending on the system. The script should be easy to get started with, since it uses pre-tokenized datasets from the huggingface hub.

Notes on Flash Attention

We use Flash Attention which lowers the memory requirements during training substantially.

Llama architecture: We implement flash attention via the Flash Attention package. These kernels require training and running the model on cuda in mixed or half precision.

OPT architecture: We implement flash attention via torch.nn.functional.scaled_dot_product_attention, which you can use by adding --fast_attention to train.sh. Note that this is experimental and requires the preview version of pytorch. We have encountered some issues with using fast attention during evaluation, especially with use_cache=True, so we recommend only using the fast attention during training.

Pre-trained Models

All the fine-tuned models from our paper can be found on Huggingface hub:

Link Base model Fine-tuning seq. length Fine-tuning data #Summary vectors Summmary accumulation Randomized segmenting Softprompt stop gradient
princeton-nlp/AutoCompressor-Llama-2-7b-6k Llama-2-7b 6144 tokens in 4 compression steps 15B tokens from RedPajama 50 ✔️ ✔️ ✔️
princeton-nlp/FullAttention-Llama-2-7b-6k Llama-2-7b 6144 tokens without compression 15B tokens from RedPajama -
princeton-nlp/AutoCompressor-2.7b-6k OPT-2.7b 6144 tokens in 4 compression steps 2B tokens from the Pile 50 ✔️ ✔️ ✔️
princeton-nlp/RMT-2.7b-8k OPT-2.7b 8192 tokens in 4 compression steps 2B tokens from the Pile 50
princeton-nlp/FullAttention-2.7b-4k OPT-2.7b 4092 tokens without compression 2B tokens from the Pile -
princeton-nlp/AutoCompressor-2.7b-30k OPT-2.7b 30720 tokens in 20 compression steps 2B tokens from Books3 from the Pile 50 ✔️ ✔️ ✔️
princeton-nlp/AutoCompressor-1.3b-30k OPT-1.3b 30720 tokens in 20 compression steps 2B tokens from Books3 from the Pile 50 ✔️ ✔️ ✔️
princeton-nlp/AutoCompressor-1.3b-30k OPT-1.3b 30720 tokens in 15 compression steps 2B tokens from Books3 from the Pile 50

Loading Models

To load Llama-20based AutoCompressor models, import LlamaAutoCompressModel:

from transformers import AutoTokenizer
from auto_compressor import LlamaAutoCompressorModel

tokenizer = AutoTokenizer.from_pretrained("princeton-nlp/AutoCompressor-Llama-2-7b-6k")
model = LlamaAutoCompressModel.from_pretrained("princeton-nlp/AutoCompressor-Llama-2-7b-6k")

To load OPT-based AutoCompressor models, import OPTAutoCompressorModel:

from transformers import AutoTokenizer
from auto_compressor import OPTAutoCompressorModel

tokenizer = AutoTokenizer.from_pretrained("princeton-nlp/AutoCompressor-2.7b-6k")
model = OPTAutoCompressorModel.from_pretrained("princeton-nlp/AutoCompressor-2.7b-6k")

Summary Vectors

The summary vectors for a given context can be obtained in two ways:

  1. Explicitly: Call the model with out = model(input_ids, attention_mask, ..., output_softprompt=True) and obtain the summary vectors as summary_vectors = out.softprompt which can be passed to further calls by model(..., softprompt=sumary_vectors).
  2. Implicitly: Call the model with out = model(input_ids, segment_lengths=segment_lengths), where segment_lengths is a list of integers that should add up to the overall sequence length input_ids.size(1). After each segment, the model will automatically generate the summary vectors and prepend them to the next segment. This can still be combined with output_softprompt=True to generate the final summary vectors for the entire input. This is convenient for multi-step compression of long inputs, which would otherwise exceed the model's maximum position.

Bug or Questions?

If you have any questions related to the code or the paper, feel free to email Alexis and Alexander ([email protected], [email protected]). If you encounter a problem or bug when using the code, you can open an issue. Please try to specify the problem with detail so we can help you quickly!

Citation

@article{chevalier2023adapting,
   title={Adapting Language Models to Compress Contexts},
   author={Chevalier, Alexis and Wettig, Alexander and Ajith, Anirudh and Chen, Danqi},
   journal={arXiv preprint 2305.14788},
   year={2023}
}

More Repositories

1

SWE-agent

SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
Python
13,504
star
2

tree-of-thought-llm

[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Python
4,726
star
3

SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
Python
3,381
star
4

SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
Python
1,846
star
5

MeZO

[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
Python
1,031
star
6

PURE

[NAACL 2021] A Frustratingly Easy Approach for Entity and Relation Extraction https://arxiv.org/abs/2010.12812
Python
788
star
7

LM-BFF

[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723
Python
714
star
8

SimPO

SimPO: Simple Preference Optimization with a Reference-Free Reward
Python
672
star
9

DensePhrases

[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.org/abs/2012.12624
Python
605
star
10

LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Python
546
star
11

ALCE

[EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627
Python
450
star
12

LESS

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
Jupyter Notebook
354
star
13

WebShop

[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Python
264
star
14

TRIME

[EMNLP 2022] Training Language Models with Memory Augmentation https://arxiv.org/abs/2205.12674
Python
192
star
15

intercode

[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
Python
191
star
16

CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
Python
188
star
17

OptiPrompt

[NAACL 2021] Factual Probing Is [MASK]: Learning vs. Learning to Recall https://arxiv.org/abs/2104.05240
Python
167
star
18

TransformerPrograms

[NeurIPS 2023] Learning Transformer Programs
Python
157
star
19

EntityQuestions

EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers https://arxiv.org/abs/2109.08535
Python
139
star
20

QuRating

[ICML 2024] Selecting High-Quality Data for Training Language Models
Python
137
star
21

CEPE

[ACL 2024] Long-Context Language Modeling with Parallel Encodings
Python
135
star
22

DinkyTrain

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃
Python
111
star
23

LLMBar

[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
Python
108
star
24

MQuAKE

[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Jupyter Notebook
97
star
25

USACO

Can Language Models Solve Olympiad Programming?
Python
96
star
26

ProLong

Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
Python
82
star
27

NLProofS

EMNLP 2022: Generating Natural Language Proofs with Verifier-Guided Search https://arxiv.org/abs/2205.12443
Python
81
star
28

CharXiv

[NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Python
72
star
29

MADE

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering
Python
70
star
30

LM-Kernel-FT

A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643
Python
68
star
31

c-sts

[EMNLP 2023] C-STS: Conditional Semantic Textual Similarity
Python
66
star
32

calm-textgame

[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games
Python
65
star
33

DataMUX

[NeurIPS 2022] DataMUX: Data Multiplexing for Neural Networks
Jupyter Notebook
58
star
34

ShortcutGrammar

EMNLP 2022: Finding Dataset Shortcuts with Grammar Induction https://arxiv.org/abs/2210.11560
Jupyter Notebook
57
star
35

LitSearch

A Retrieval Benchmark for Scientific Literature Search
Python
54
star
36

Collie

[ICLR 2024] COLLIE: Systematic Construction of Constrained Text Generation Tasks
Jupyter Notebook
52
star
37

EvalConvQA

[ACL 2022] Ditch the Gold Standard: Re-evaluating Conversational Question Answering
Python
45
star
38

HELMET

The HELMET Benchmark
Python
42
star
39

MABEL

EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975
Python
37
star
40

LM-Science-Tutor

Python
34
star
41

rationale-robustness

NAACL 2022: Can Rationalization Improve Robustness? https://arxiv.org/abs/2204.11790
Python
26
star
42

PTP

Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073
Python
25
star
43

corpus-poisoning

[EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156
Python
25
star
44

InstructEval

[NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.
Jupyter Notebook
23
star
45

Edge-Pruning

Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
Python
22
star
46

WhatICLLearns

[ACL 2023 Findings] What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning
Python
21
star
47

Cognac

Repo for paper: Controllable Text Generation with Language Constraints
Python
19
star
48

lwm

We develop world models that can be adapted with natural language. Intergrating these models into artificial agents allows humans to effectively control these agents through verbal communication.
Python
18
star
49

ELIZA-Transformer

Representing Rule-based Chatbots with Transformers
Python
18
star
50

semsup

Semantic Supervision: Enabling Generalization over Output Spaces
Python
16
star
51

benign-data-breaks-safety

Python
16
star
52

SRL-NLC

Safe Reinforcement Learning with Natural Language Constraints
14
star
53

datamux-pretraining

MUX-PLMs: Pretraining LMs with Data Multiplexing
Python
14
star
54

XTX

[ICLR 2022 Spotlight] Multi-Stage Episodic Control for Strategic Exploration in Text Games
Python
13
star
55

MultilingualAnalysis

Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"
Python
13
star
56

dyck-transformer

[ACL 2021] Self-Attention Networks Can Process Bounded Hierarchical Languages
Python
12
star
57

blindfold-textgame

[NAACL 2021] Reading and Acting while Blindfolded: The Need for Semantics in Text Game Agents
Python
12
star
58

align-mlm

Python
11
star
59

metric-wsd

NAACL'2021: Non-Parametric Few-Shot Learning for Word Sense Disambiguation
Python
10
star
60

semsup-xc

SemSup-XC: Semantic Supervision for Extreme Classification
Jupyter Notebook
10
star
61

Heuristic-Core

[ACL 2024] The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models - https://arxiv.org/abs/2403.03942
Python
9
star
62

CopyCat

Python
9
star
63

NegotiationToM

Code release for Improving Dialog Systems for Negotiation with Personality Modeling.
Python
7
star
64

CARETS

Python
6
star
65

SPARTAN

SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers
Python
5
star
66

il-scaling-in-games

Official code repo of "Scaling Laws for Imitation Learning in Single-Agent Games"
Python
5
star
67

attribute-tagging

[LaReL 2022] Towards an Enhanced, Faithful, and Adaptable Web Interaction Environment
Python
4
star
68

MoQA

Python
3
star