• Stars
    star
    154
  • Rank 242,095 (Top 5 %)
  • Language
    Python
  • Created about 1 year ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

LongQLoRA: Extent Context Length of LLMs Efficiently

LongQLoRA: Efficient and Effective Method to Extend Context Length of LLMs

Technical Report

Technical Report: LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models

Introduction

LongQLoRA is a memory-efficient and effective method to extend context length of Large Language Models with less training GPUs. On a single 32GB V100 GPU, LongQLoRA can extend the context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k. LongQLoRA achieves competitive perplexity performance on PG19 and Proof-pile dataset after only 1000 finetuning steps, our model outperforms LongLoRA and is very close to MPT-7B-8K.

Evaluation perplexity on PG19 validation and Proof-pile test datasets in evaluation context length of 8192:

Model PG19 Proof-pile
LLaMA2-7B >1000 >1000
MPT-7B-8K 7.98 2.67
LongLoRA-LoRA-7B-8K 8.20 2.78
LongLoRA-Full-7B-8K 7.93 2.73
LongQLoRA-7B-8K 7.96 2.73

Evaluation perplexity of 7B models on PG19 validation and Proof-pile test datasets in evaluation context length from 1024 to 8192:

Dataset

We sample about 54k long text from Redpajama dataset to finetune pretrained models, whose token lengths ranging from 4096 to 32768.

We also build a long context instruction dataset for supervised finetuning chat models. This dataset contains 39k instruction data, mainly including book summarization, Natural Questions, subset of LongQA and Evol-Instruct of WizardLM. In order to adapt to the target length of 8192, the max token number of each data is 8192. The distribution is as follows.

Dataset Description
🤗LongQLoRA-Pretrain-Data-54k Include 54212 data, used to finetune pretrained model
🤗LongQLoRA-SFT-Data-39k Include 38821 data, used to finetune chat model

Model

Model Context Length Description
🤗LongQLoRA-Llama2-7b-8k 8192 Finetuned with LongQLoRA-Pretrain-Data-54k for 1k steps based on LLaMA2-7B
🤗LongQLoRA-Vicuna-13b-8k 8192 Finetuned with LongQLoRA-SFT-Data-39k for 1.7k steps based on Vicuna-13B-V1.5
🤗LongQLoRA-Llama2-7b-8k-lora 8192 LoRA weights
🤗LongQLoRA-Vicuna-13b-8k-lora 8192 LoRA weights

Training

The training configs are saved in the train_args directory, some parameters are as follows:

  • sft: Do sft task if set as True, otherwise do pretraining task.
  • model_max_length: The target context length.
  • max_seq_length: The max sequence length in training, should be less than or equal to model_max_length
  • logging_steps: Log training loss every n steps.
  • save_steps: Save model every n steps.
  • lora_rank: The LoRA rank in training.

Extend context length of pretrained model LLaMA2-7B:

deepspeed train.py --train_args_file ./train_args/llama2-7b-pretrain.yaml

Extend context length of chat Model Vicuna-13B:

deepspeed train.py --train_args_file ./train_args/vicuna-13b-sft.yaml

Inference

You can merge the lora weight to base model:

cd script
python merge_lora.py

Inference with pretrained model:

cd script/inference
python inference.py

Chat with chat model:

cd script/inference
python chat.py

Evaluation

Download the evaluation dataset tokenized by LLaMA2 by LongLoRA.

Dataset
🤗PG19-validation.bin
🤗PG19-test.bin
🤗Proof-pile-test.bin

Evaluate the perplexity of models. You can set load_in_4bit as True to save memory:

cd script/evaluate
python evaluate.py \
      --batch_size 1 \
      --base_model YeungNLP/LongQLoRA-Llama2-7b-8k \
      --seq_len 8192 \
      --context_size 8192 \
      --sliding_window 8192 \
      --data_path pg19-validation.bin

Evaluate the perplexity of models with LoRA weights:

cd script/evaluate
python evaluate.py \
      --batch_size 1 \
      --base_model YeungNLP/LongQLoRA-Llama2-7b-8k \
      --peft_model LongQLoRA-Llama2-7b-8k-lora\
      --seq_len 8192 \
      --context_size 8192 \
      --sliding_window 8192 \
      --data_path pg19-validation.bin

Examples

The examples generated by LongQLoRA-Vicuna-13b-8k ars as follows.

Examples of long context generartion, the input context lengths are between 4096 and 8192 which are larger than original context length of LLaMA2.

Examples of short context generartion, model keep the performance of short instruction following.

Citation

@misc{yang2023longqlora,
      title={LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models}, 
      author={Jianxin Yang},
      year={2023},
      eprint={2311.04879},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgement

More Repositories

1

Firefly

Firefly: 大模型训练工具,支持训练Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
Python
5,493
star
2

GPT2-chitchat

GPT2 for Chinese chitchat/用于中文闲聊的GPT2模型(实现了DialoGPT的MMI思想)
Python
2,972
star
3

CPM

Easy-to-use CPM for Chinese text generation(基于CPM的中文文本生成)
Python
525
star
4

Firefly-LLaMA2-Chinese

Firefly中文LLaMA-2大模型,支持增量预训练Baichuan2、Llama2、Llama、Falcon、Qwen、Baichuan、InternLM、Bloom等大模型
Python
390
star
5

CLIP-Chinese

中文CLIP预训练模型
Python
374
star
6

QQMusicSpider

基于Scrapy的QQ音乐爬虫(QQ Music Spider),爬取歌曲信息、歌词、精彩评论等,并且分享了QQ音乐中排名前6400名的内地和港台歌手的49万+的音乐语料
Python
310
star
7

LLMPruner

Python
282
star
8

ClipCap-Chinese

基于ClipCap的看图说话Image Caption模型
Python
267
star
9

LEBERT-NER-Chinese

基于词汇信息融合的中文NER模型
Python
160
star
10

SimCSE

SimCSE有监督与无监督实验复现
Python
138
star
11

OFA-Chinese

transformers结构的中文OFA模型
Python
118
star
12

PAMAE

使用Python复现SIGKDD2017的PAMAE算法(并行k-medoids算法)/The Python implementation of SIGKDD 2017's PAMAE algorithm (parallel k-medoids algorithm)
Python
33
star
13

TankBattle

基于Netty的联机版坦克大战
Java
17
star
14

Shopee-Price-Match-Guarantee

对比学习 虾皮同款商品匹配
Python
13
star