• This repository has been archived on 12/Oct/2023
  • Stars
    star
    3,349
  • Rank 12,870 (Top 0.3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 1 year ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

ChatGLM Efficient Tuning

GitHub Repo stars GitHub Code License GitHub last commit GitHub pull request

Fine-tuning 🤖ChatGLM-6B model with 🤗PEFT.

👋 Join our WeChat.

[ English | 中文 ]

Changelog

[23/06/05] Now we support 4-bit LoRA training (aka QLoRA). Try --quantization_bit 4 argument to work with 4-bit quantized model. (experimental feature)

[23/06/01] We implemented a framework supporting the efficient tuning of LLaMA and BLOOM models. Please follow LLaMA-Efficient-Tuning if you are interested.

[23/05/19] Now we support using the development set to evaluate the model while training. Try --dev_ratio argument to specify the size of development set.

[23/04/29] Now we support training ChatGLM with Reinforcement Learning with Human Feedback (RLHF) ! We provide several examples to run RLHF training, please refer to the examples folder for details.

[23/04/20] Our repo achieved 100 stars within 12 days! Congratulations!

[23/04/19] Now we support merging the weights of fine-tuned models trained by LoRA! Try --checkpoint_dir checkpoint1,checkpoint2 argument for continually fine-tuning the models.

[23/04/18] Now we support training the quantized models using three fine-tuning methods! Try quantization_bit argument for training the model in 4/8 bits.

[23/04/12] Now we support training from checkpoints! Use --checkpoint_dir argument to specify the checkpoint model to fine-tune from.

[23/04/11] Now we support training with combined datasets! Try --dataset dataset1,dataset2 argument for training with multiple datasets.

Datasets

Our script now supports the following datasets:

Please refer to data/README.md for details.

Some datasets require confirmation before using them, so we recommend logging in with your HuggingFace account using these commands.

pip install --upgrade huggingface_hub
huggingface-cli login

Fine-Tuning Methods

Our script now supports the following fine-tuning methods:

  • LoRA
    • Fine-tuning the low-rank adapters of the model.
  • P-Tuning V2
    • Fine-tuning the prefix encoder of the model.
  • Freeze
    • Fine-tuning the MLPs in the last n blocks of the model.

Requirement

  • Python 3.8+ and PyTorch 1.13.1
  • 🤗Transformers, Datasets, Accelerate, PEFT and TRL
  • protobuf, cpm_kernels and sentencepiece
  • jieba, rouge_chinese and nltk (used at evaluation)
  • gradio and mdtex2html (used in web_demo.py)

And powerful GPUs!

Getting Started

Data Preparation (optional)

Please refer to data/example_dataset for checking the details about the format of dataset files. You can either use a single .json file or a dataset loading script with multiple files to create a custom dataset.

Note: please update data/dataset_info.json to use your custom dataset. About the format of this file, please refer to data/README.md.

Dependence Installation (optional)

git clone https://github.com/hiyouga/ChatGLM-Efficient-Tuning.git
conda create -n chatglm_etuning python=3.10
conda activate chatglm_etuning
cd ChatGLM-Efficient-Tuning
pip install -r requirements.txt

If you want to enable LoRA or Freeze quantization on Windows, you will be required to install a pre-built version of bitsandbytes library, which supports CUDA 11.6 or 11.7.

pip install https://github.com/acpopescu/bitsandbytes/releases/download/v0.37.2-win.1/bitsandbytes-0.37.2-py3-none-any.whl

Fine-tuning with a Single GPU

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --do_train \
    --dataset alpaca_gpt4_en \
    --finetuning_type lora \
    --output_dir path_to_sft_checkpoint \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --fp16

Please refer to our Wiki about the details of the arguments.

Distributed Fine-tuning with Multiple GPUs

accelerate config # configure the environment
accelerate launch src/train_sft.py # arguments (same as above)

Note: if you are using LoRA method at fine-tuning, please provide --ddp_find_unused_parameters False argument to avoid the runtime error.

Training Reward Model

CUDA_VISIBLE_DEVICES=0 python src/train_rm.py \
    --do_train \
    --dataset comparison_gpt4_en \
    --finetuning_type lora \
    --output_dir path_to_rm_checkpoint \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 1.0 \
    --fp16

Training with RLHF

CUDA_VISIBLE_DEVICES=0 python src/train_ppo.py \
    --do_train \
    --dataset alpaca_gpt4_en \
    --finetuning_type lora \
    --checkpoint_dir path_to_sft_checkpoint \
    --reward_model path_to_rm_checkpoint \
    --output_dir path_to_ppo_checkpoint \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 1.0 \
    --fp16

Evaluation (BLEU and ROUGE_CHINESE)

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --do_eval \
    --dataset alpaca_gpt4_en \
    --checkpoint_dir path_to_checkpoint \
    --output_dir path_to_eval_result \
    --per_device_eval_batch_size 8 \
    --max_samples 50 \
    --predict_with_generate

Predict

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py \
    --do_predict \
    --dataset alpaca_gpt4_en \
    --checkpoint_dir path_to_checkpoint \
    --output_dir path_to_predict_result \
    --per_device_eval_batch_size 8 \
    --max_samples 50 \
    --predict_with_generate

CLI Demo

python src/cli_demo.py \
    --checkpoint_dir path_to_checkpoint

Web Demo

python src/web_demo.py \
    --checkpoint_dir path_to_checkpoint

Export model

python src/export_model.py \
    --checkpoint_dir path_to_checkpoint \
    --output_dir path_to_export

Hardware Requirements

Fine-tune method Batch size Mode GRAM Speed
LoRA (r=8) 16 FP16 28GB 8ex/s
LoRA (r=8) 8 FP16 24GB 8ex/s
LoRA (r=8) 4 FP16 20GB 8ex/s
LoRA (r=8) 4 INT8 10GB 8ex/s
LoRA (r=8) 4 INT4 8GB 8ex/s
P-Tuning (p=16) 4 FP16 20GB 8ex/s
P-Tuning (p=16) 4 INT8 16GB 8ex/s
P-Tuning (p=16) 4 INT4 12GB 8ex/s
Freeze (l=3) 4 FP16 24GB 8ex/s
Freeze (l=3) 4 INT8 12GB 8ex/s
RM method Batch size Mode GRAM Speed
LoRA (r=8) + rm 4 FP16 22GB -
LoRA (r=8) + rm 1 INT8 11GB -
RLHF method Batch size Mode GRAM Speed
LoRA (r=8) + ppo 4 FP16 23GB -
LoRA (r=8) + ppo 1 INT8 12GB -

Note: r is the lora rank, p is the number of prefix tokens, l is the number of trainable layers, ex/s is the examples per second at training. The gradient_accumulation_steps is set to 1. All are evaluated on a single Tesla V100 (32G) GPU, they are approximated values and may vary in different GPUs.

Fine-tuning ChatGLM: A Case

Training Results

We use the whole alpaca_gpt4_zh dataset to fine-tune the ChatGLM model with LoRA (r=8) for one epoch, using the default hyper-parameters. The loss curve during training is presented below.

training loss

Evaluation Results

We select 100 instances in the alpaca_gpt4_zh dataset to evaluate the fine-tuned ChatGLM model and compute the BLEU and ROUGE scores. The results are presented below.

Score Original FZ (l=2) PT (p=16) LoRA (r=8)
BLEU-4 15.75 16.85 16.06 17.01 (+1.26)
Rouge-1 34.51 36.62 34.80 36.77 (+2.26)
Rouge-2 15.11 17.04 15.32 16.83 (+1.72)
Rouge-l 26.18 28.17 26.35 28.86 (+2.68)
Params (%) / 4.35% 0.06% 0.06%

FZ: freeze tuning, PT: P-Tuning V2 (we use pre_seq_len=16 for fair comparison with LoRA), Params: the percentange of trainable parameters.

Compared with Existing Implementations

  • THUDM/ChatGLM-6B
    • Official implementation of fine-tuning ChatGLM with P-Tuning v2 on the ADGEN dataset.
    • Our fine-tuning script is largely depend on it. We further implement the LoRA tuning method. Additionally, we dynamically pad the inputs to the longest sequence in the batch instead of the maximum length, to accelerate the fine-tuning.
  • mymusise/ChatGLM-Tuning
    • An unoffical implementation of fine-tuning ChatGLM with LoRA on the Stanford Alpaca dataset.
    • We borrowed some ideas from it. Our fine-tuning script integrates the data pre-processing part into the training procedure, so we need not generate a pre-processed dataset before training.
  • ssbuild/chatglm_finetuning
  • lich99/ChatGLM-finetune-LoRA
  • liucongg/ChatGLM-Finetuning
    • An unofficial implementation of fine-tuning ChatGLM with several methods including Freeze, LoRA and P-Tuning on the industrial dataset.
    • We are aim to incorporate more instruction-following datasets for fine-tuning the ChatGLM model.
  • yanqiangmiffy/InstructGLM
    • An unofficial implementation of fine-tuning ChatGLM that explores the ChatGLM's ability on the instruction-following datasets.
    • Our fine-tuning script integrates the data pre-processing part in to the training procedure.

TODO

  • Employing LangChain to easily build applications that are capable of leveraging external knowledge upon fine-tuned ChatGLM models.
  • Implementing the alignment algorithms to align human preferrences.
  • Incorporating Chinese datasets into the training sets.
  • Incorporating ChatGPT & GPT-4 self-chat data into the training sets.
  • Implementing the Freeze-Tuning and P-Tuning method.
  • Supporting Multi-GPUs fine-tuning.
  • Adding script for evaluation.
  • Loading from checkpoint.
  • Fine-tuning the quantized model.
  • Writing a guidebook about how to fine-tune ChatGLM with this framework.
  • Combining with state-of-the-art model editing algorithms. (e.g. MEND)
  • Incorporating the OpenAssistant Conversations Dataset for SFT and alignment.
  • Incorporating the high quality Chinese instruction dataset COIG.

License

This repository is licensed under the Apache-2.0 License. Please follow the Model License to use ChatGLM-6B model.

Citation

If this work is helpful, please cite as:

@Misc{chatglm-efficient-tuning,
  title = {ChatGLM Efficient Tuning},
  author = {hiyouga},
  howpublished = {\url{https://github.com/hiyouga/ChatGLM-Efficient-Tuning}},
  year = {2023}
}

Acknowledgement

This repo benefits from ChatGLM-6B, ChatGLM-Tuning and yuanzhoulvpi2017/zero_nlp. Thanks for their wonderful works.

More Repositories

1

LLaMA-Factory

Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)
Python
7,645
star
2

Dual-Contrastive-Learning

Code for our paper "Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation"
Python
111
star
3

PBAN-PyTorch

A Position-aware Bidirectional Attention Network for Aspect-level Sentiment Analysis, PyTorch implementation.
Python
35
star
4

AMP-Regularizer

Code for our paper "Regularizing Neural Networks via Adversarial Model Perturbation", CVPR2021
Python
31
star
5

FastEdit

⚡🩹 Editing large language models within 10 seconds
Python
31
star
6

RepWalk

Code and dataset for our paper "Replicate, Walk, and Stop on Syntax: an Effective Neural Network Model for Aspect-Level Sentiment Classification", AAAI2020
Python
25
star
7

AMP-Poster-Slides-LaTeX

LaTeX Poster and Slides for AMP (CVPR 2021)
TeX
17
star
8

ChatNVL-Towards-Visual-Novel-ChatBot

Python
16
star
9

HuaweiCup2021-MCM-ProblemE

2021年华为杯第十八届中国研究生数学建模竞赛E题全国一等奖
Python
16
star
10

bilibili-parse

bilibili视频html5直播&下载&API(待修复)
PHP
15
star
11

Image-Segmentation-PyTorch

U-Net for image segmentation, PyTorch implementation.
Python
13
star
12

cryptography-experiment

BUAA CST Spring 2019 Cryptography Experiment
Python
9
star
13

buaa-counselor-order

辅导员预约微信小程序
JavaScript
7
star
14

BiLSTM-CRF-PyTorch-demo

A simple baseline model for Named Entity Recognition
Python
7
star
15

SAGAN-PyTorch

A PyTorch implementation for Self-Attention Generative Adversarial Networks
Python
5
star
16

hiyouga-blog-project

填坑ing...
TypeScript
5
star
17

Visual-Novel-Music

视觉小说音乐库(跑路ing)
PHP
5
star
18

LLaMA-QQ-Chatbot

A QQ chatbot using OpenAI API
JavaScript
4
star
19

Musicbox-for-web

论坛中使用的简易音乐播放器
PHP
3
star
20

Toxic_Detection

BUAA SCSE Autumn 2021 Machine Learning Group Homework
Python
3
star
21

Java-Network-Capturer

BUAA CST Autumn 2018 Java Programming Course Design
Java
3
star
22

database-experiment

BUAA CST Autumn 2019 Database Experiment
JavaScript
2
star
23

Cuisine_Prediction

BUAA SCSE Autumn 2021 Machine Learning Personal Homework
Python
2
star
24

digiC-experiment

BUAA CST Autumn 2018 Digital Circuit Experiment
Verilog
1
star
25

Survey-readme-template

How to write a pretty readme for your survey.
1
star
26

PY-Learning

学习代码
Python
1
star
27

information-theory-experiment

BUAA CST Spring 2019 Information Theory Experiment
Python
1
star
28

hiyouga

1
star
29

yukidou-wechat

祐希堂汉化组公众号接口
PHP
1
star
30

getchu-proxy

论坛中使用的Getchu游戏信息抓取
PHP
1
star
31

Papercode-readme-template

How to write a pretty readme for your paper's code.
1
star