• Stars
    star
    159
  • Rank 235,916 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 1 year ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for "Lion: Adversarial Distillation of Proprietary Large Language Models (EMNLP 2023)"

Lion: Adversarial Distillation of Closed-Source Large Language Model

Lion

[📄 Paper] | [🤗 Lion Weights] | [🖥️ Demo]


Code License Data License Weight Diff License Python 3.9+

Tuned on 70k instruction-following data, Lion (7B) can achieve 95% capability of ChatGPT!

News

We are currently working on training larger-sized versions (13B, 33B, and 65B if feasible). Thank you for your patience.

  • [June 10, 2023] We released insturctions for addressing OOM during fine-tuning, check it in Training Process.
  • [May 26, 2023] We released the model weights. Check out the 7B model!
  • [May 25, 2023] We released an online demo, try our model here!
  • [May 23, 2023] We released the code for training and inference.

Contents

  1. Overview

  2. Online Demo

  3. Recovering Lion weights

  4. Inference

  5. Training Process

  6. Evaluation

  7. Citation

  8. Disclaimer

Overview

The high-level overview of our adversarial distillation framework, where we craft a compact Student LLM based on a superior closed-source LLM that serves three roles: the Teacher, the Referee, and the Generator. From left to right, there are three stages in an iteration:

  1. an imitation stage to align the student’s response with the teacher’s response;
  2. a discrimination stage to identify hard samples;
  3. a generation stage to produce new hard samples for escalating the challenges presented to the student model.

Online Demo

We will provide our latest models for you to try for as long as possible. You may ask some questions to Lion and we are happy to hear your feedback!

Demo Link (it will expire in 72 hours, so we regularly update the link)

Since the training data are English instruction-following examples, You'd better ask questions in English. However, we found Lion can also understand instructions in other languages to some extent. See the following case:

Recovering Lion weights

We release Lion weights as delta weights to comply with the LLaMA model license.

You can add our delta to the original LLaMA weights to obtain the Lion weights. Instructions:

  1. Get the original LLaMA weights in the huggingface format by following the instructions here
  2. Please download our delta model from Hugging Face
  3. Use the following scripts to get Lion weights by applying our delta:
python src/weight_diff.py recover --path_raw huggyllama/llama-7b --path_diff YuxinJiang/Lion --path_tuned <path_to_store_recovered_weights>

Inference

For inference and training of Lion, please first install the requirements:

pip install -r requirements.txt

We provide the decoding script for Lion, which reads a input file and generates corresponding responses for each sample, and finally consolidates them into an output file. It can be run on a single machine with 16GB GPU.

python src/lion_inference.py \
    --model_dir <path_to_hf_converted_lion_ckpt_and_tokenizer> \
    --data_dir <path_to_input_json_file> \
    --output_dir <path_to_output_json_file> \
    --num_gpus 1

Training Process

Below shows one iteration of our adversarial distillation framework.

1. Imitation Stage

1.1 Acquire the teacher's response on the Train Pool

python src/chatgpt_inference.py \
    -q <path_to_json_file_for_the_Train_Pool> \
    -o <path_to_chatgpt_inference_for_the_Train_Pool> \
    --api_key <your_openai_api_key>

1.2 Instruction-tuning the student based on the teacher’s response on the Train Pool

Fine-tuning was conducted on on a machine with 8 A100 80G GPUs.

torchrun --nproc_per_node=8 --master_port=<your_random_port> src/train.py \
    --model_name_or_path <path_to_hf_converted_ckpt_and_tokenizer> \
    --data_path <path_to_chatgpt_inference_for_the_Train_Pool> \
    --bf16 True \
    --output_dir result \
    --num_train_epochs 3 \
    --model_max_length 1024 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 500 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True

Addressing OOM

Naively, fine-tuning a 7B model requires about 7 x 8 x 2 = 112 GB of VRAM. Commands given above enable parameter sharding, so no redundant model copy is stored on any GPU. If you'd like to further reduce the memory footprint, here are some options:

  • Turn on CPU offload for FSDP with --fsdp "full_shard auto_wrap offload". This saves VRAM at the cost of longer runtime.

  • In our experience, DeepSpeed stage-3 (with offload) can at times be more memory efficient than FSDP with offload. Here's an example to use DeepSpeed stage-3 with 8 GPUs with both parameter and optimizer offload:

    deepspeed src/train_deepspeed.py \
        --model_name_or_path <path_to_hf_converted_ckpt_and_tokenizer> \
        --data_path <path_to_chatgpt_inference_for_the_Train_Pool> \
        --output_dir result \
        --num_train_epochs 3 \
        --model_max_length 1024 \
        --per_device_train_batch_size 16 \
        --per_device_eval_batch_size 1 \
        --gradient_accumulation_steps 1 \
        --evaluation_strategy "no" \
        --save_strategy "steps" \
        --save_steps 600 \
        --save_total_limit 1 \
        --learning_rate 2e-5 \
        --warmup_ratio 0.03 \
        --logging_steps 1 \
        --lr_scheduler_type "cosine" \
        --report_to "tensorboard" \
        --gradient_checkpointing True \
        --deepspeed srcs/configs/deepspeed_config.json \
        --fp16 True
    • The DeepSpeed library also provides some helpful functions to estimate memory usage.
  • LoRA fine-tunes low-rank slices of the query, key, and value embedding heads. This can reduce the total memory footprint from 112GB to about 7x4=28GB. We may release our re-implemention of this in the future, but for now the peft codebase can be a useful resource.

2. Discrimination Stage

2.1 Acquire the teacher's response on the Cache Pool

python src/chatgpt_inference.py \
    -q <path_to_json_file_for_the_Cache_Pool> \
    -o <path_to_chatgpt_inference_for_the_Cache_Pool> \
    --api_key <your_openai_api_key>

2.2 Acquire the student's response on the Cache Pool

python src/lion_inference.py \
    --model_dir <path_to_hf_converted_lion_ckpt_and_tokenizer> \
    --data_dir <path_to_json_file_for_the_Cache_Pool> \
    --output_dir <path_to_lion_inference_for_the_Cache_Pool> \
    --num_gpus 8

2.3 Ask the referee to output two scores according to the respose quality of the teacher and the student

python src/chatgpt_referee.py \
    -a <path_to_chatgpt_inference_for_the_Cache_Pool> <path_to_lion_inference_for_the_Cache_Pool> \
    -o <path_to_output_review_file> \
    --api_key <your_openai_api_key>

2.4 Discriminate hard instructions and easy instructions

python src/discrimination.py \
    --review_path <path_to_output_review_file> \
    --chatgpt_inference_path <path_to_chatgpt_inference_for_the_Cache_Pool> \
    --lion_inference_path <path_to_lion_inference_for_the_Cache_Pool> \
    --hard_save_path <path_to_identified_hard_instructions> \
    --easy_save_path <path_to_identified_easy_instructions>

3. Generation Stage

3.1 Generate new hard instructions

python -m src/generate_hard_instruction generate_instruction_following_data \
    --seed_tasks_path <path_to_identified_hard_instructions> \
    --output_dir <path_to_generated_hard_instructions> \
    --num_instructions_to_generate 3000 \
    --api_key <your_openai_api_key>

3.2 Generate new easy instructions

python -m src/generate_easy_instruction generate_instruction_following_data \
    --seed_tasks_path <path_to_identified_easy_instructions> \
    --output_dir <path_to_generated_easy_instructions> \
    --num_instructions_to_generate 3000 \
    --api_key <your_openai_api_key>

Evaluation

Automatic Evaluation with GPT-4

we leverage GPT-4 to automatically rate the response quality (with scores from 1 to 10) between two models on 80 unseen Vicuna-Instructions. ChatGPT has been chosen as the reference model to estimate the relative capability of diverse LLMs against it. The relative score is reported in percentage, computed as the ratio of the sum of scores.

Relative Overall Response Quality:

Relative Response Quality of Diverse Task Categories:

Human Evaluation with Alignment Criteria

We employ the alignment criteria proposed by Askell et al. (2021), which define that an assistant is considered aligned if it is characterized by being helpful, honest, and harmless (HHH). We performed a human evaluation on 252 UserOriented-Instructions. To estimate the won rate, we compare the frequency of won, tie, and lost between each pair of models below.

Citation

Please cite our paper if you use the code in this repo.

@article{DBLP:journals/corr/abs-2305-12870,
  author       = {Yuxin Jiang and
                  Chunkit Chan and
                  Mingyang Chen and
                  Wei Wang},
  title        = {Lion: Adversarial Distillation of Closed-Source Large Language Model},
  journal      = {CoRR},
  volume       = {abs/2305.12870},
  year         = {2023},
  url          = {https://doi.org/10.48550/arXiv.2305.12870},
  doi          = {10.48550/arXiv.2305.12870},
  eprinttype    = {arXiv},
  eprint       = {2305.12870},
  timestamp    = {Fri, 26 May 2023 11:29:33 +0200},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2305-12870.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Disclaimer

⚠️ Lion is intended and licensed for research use ONLY. Commercial use is strictly prohibited. The content produced by any version of Lion is influenced by uncontrollable variables such as randomness, and therefore, the accuracy of the output cannot be guaranteed by this project. This project does not accept any legal liability for the content of the model output, nor does it assume responsibility for any losses incurred due to the use of associated resources and output results.

More Repositories

1

PromCSE

Code for "Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning (EMNLP 2022)"
Python
115
star
2

Chinese-sentence-pair-modeling

Use deep models including BiLSTM, ABCNN, ESIM, RE2, BERT, etc. and evaluate on 5 Chinese NLP datasets: LCQMC, BQ Corpus, ChineseSTS, OCNLI, CMNLI
Jupyter Notebook
70
star
3

FollowBench

Code for "FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models (ACL 2024)"
Python
69
star
4

SST-2-sentiment-analysis

Use BiLSTM_attention, BERT, ALBERT, RoBERTa, XLNet model to classify the SST-2 data set based on pytorch
Jupyter Notebook
59
star
5

Movielens1M-movie-recommendation-system

使用MovieLens数据集实现了基于Auto Encoder(AE), Variational Auto Encoder(VAE), BERT的深度学习电影推荐系统
Jupyter Notebook
38
star
6

improved_DeepLOB

This repo contains my reimplementation and improvement of DeepLOB model.
Jupyter Notebook
16
star
7

GOLF_for_IDRR

Code for "Global and Local Hierarchy-aware Contrastive Framework for Hierarchical Implicit Discourse Relation Recognition (ACL 2023)"
Python
10
star
8

argument-impact-classification

Further research on the paper "The Role of Pragmatic and Discourse Context in Determining Argument Impact"
Python
7
star
9

deep-learning-tutorial

a deep learning tutorial containing FNN, CNN, RNN, XAI, VAE, GAN, etc.
Jupyter Notebook
7
star
10

Debate-alpaca-lora

Instruct-tune LLaMA on debate data; Light-weight DebateGPT
6
star
11

COMP_5222_Project

This is the repository of COMP 5222 group project, our group number is 13.
Python
6
star
12

XAI_in_NLP

XAI_in_NLP
Jupyter Notebook
5
star
13

MSBD-5003-project

parallel implementation of hierarchical clustering algorithm based on pyspark
HTML
5
star
14

MSBD_5001_kaggle

Containing the data and codes for MSBD_5001_kaggle_competition
Jupyter Notebook
4
star
15

algorithms-python

使用python编写的部分算法题目整理
1
star