• Stars
    star
    207
  • Rank 188,645 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 1 year ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna

Vicuna-LoRA-RLHF-PyTorch

a full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware


Table of Contents


Environment Setup

穷人卡:2080Ti 12G
torch==2.0.0
cuda==11.8

Todo List

  • Download Vicuna Weights
  • SFT: Supervised Finetune
  • Merge Adapter into Model
  • RLHF
    • train reward model
    • tuning with RL

Run


Download Vicuna Weights

python apply_delta.py --base 'decapoda-research/llama-7b-hf' --target './weights/vicuna-7b' --delta lmsys/vicuna-7b-delta-v1.1

Supervised Finetune

check src/peft/utils/save_and_load.py first, Only comment the line 52 to

# #to_return = {k: v for k, v in to_return.items() if (("lora_" in k and adapter_name in k) or ("bias" in k))}

then run

python supervised_finetune.py --data_path './data/merge_sample.json' --output_path 'lora-Vicuna' --model_path './weights/vicuna-7b' --eval_steps 200 --save_steps 200 --test_size 1

Merge PEFT adapter into Model

check peft version first, if peft not 0.2.0, should install peft==0.2.0

pip uninstall peft -y
pip install peft==0.2.0  # 0.3.0.dev0 has many errors
python merge_peft_adapter.py --model_name 'lora-Vicuna'

pip uninstall peft -y
pip install git+https://github.com/huggingface/peft.git # then comments peft/utis/save_and_load.py line 52.

Train Reward Model

python train_reward_model.py --model_name './weights/vicuna-7b' --gradient_accumulation_steps 32 --per_device_train_batch_size 1 --train_subset 100 --eval_subset 10 --local_rank 0 --bf16 False

Merge Reward adapter into Model

python merge_peft_adapter.py --model_name ./reward_model_vicuna-7b

Tuning LM with PPO

python tuning_lm_with_rl.py --model_name './lora-Vicuna-adapter-merged' --reward_model_name './reward_model_vicuna-7b-adapter-merged' --adafactor False --tokenizer_name 'decapoda-research/llama-7b-hf' --save_freq 100 --output_max_length 128 --batch_size 1 --gradient_accumulation_steps 1 --batched_gen True --ppo_epochs 1 --seed 0 --learning_rate 1.4e-5 --early_stopping True --output_dir './tuning_llama_rl_checkpoints'

Topics

  1. Vicuna model weight not on HuggingFace hub, so you need download first by runing apply_delta.py scripts.
  2. SFT之前,切记有个注意事项,需要检查下 安装的peft代码, src/peft/utils/save_and_load.py , 如果 line 52 有这行代码 #to_return = {k: v for k, v in to_return.items() if (("lora_" in k and adapter_name in k) or ("bias" in k))},需要将其注释掉,否则在finetune完之后,保存不了 adapter model 的参数。切记!
  3. PEFT的版本,目前从git上安装的是 0.3.0.dev0 版本,在merge_peft_adapter的时候有问题,需要切换到peft==0.2.0 (0.3.0.dev0 没有 _get_submodules()这个函数)
  4. train reward model的时候 会发生另一个问题: ValueError: weight is on the meta device, we need a value to put in on 0. 需要参看 transformer 在github上的最新代码,我在发现这个问题的时候,隔天发现在transformer的github上 8小时前才刚刚修复了这个问题。
  5. 最后一步,代码上基本是ok的,但是本人只有2080Ti的卡,加载完finetune model之后,再加载Reward model的时候 直接CUDA out of memory了,所以并未执行。

Reference

apply_delta.py 来自 FastChat

requirements 主要是按照 alpaca-lora 来配环境。


Star-History

star-history


Donation

If this project help you reduce time to develop, you can give me a cup of coffee :)

AliPay(支付宝)

ali_pay

WechatPay(微信)

wechat_pay

License

MIT © Kun

More Repositories

1

awesome_LLMs_interview_notes

LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案
1,126
star
2

CycleGAN-VC2

Voice Conversion by CycleGAN (语音克隆/语音转换): CycleGAN-VC2
Python
521
star
3

Recurrent-LLM

The open-source LLM implementation of paper: RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text. AI 写小说,AI写作
Python
152
star
4

SecBERT

pretrained BERT model for cyber security text, learned CyberSecurity Knowledge
Python
144
star
5

CycleGAN-VC3

Voice Conversion by CycleGAN (语音克隆/语音转换):CycleGAN-VC3
Python
137
star
6

LAS_Mandarin_PyTorch

Listen, attend and spell Model and a Chinese Mandarin Pretrained model (中文-普通话 ASR模型)
Python
121
star
7

ChatGLM-LoRA-RLHF-PyTorch

A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
Python
121
star
8

NLP4CyberSecurity

NLP model and tech for cyber security tasks
Jupyter Notebook
75
star
9

ThreatReportExtractor

Extracting Attack Behavior from Threat Reports
Python
74
star
10

Alpaca-LoRA-RLHF-PyTorch

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
Python
54
star
11

nude-detect

Porn Content Pic or Video Recognization
Python
38
star
12

location_clustering

用户地理位置的聚类算法实现—基于DBSCAN和Kmeans的混合算法
Python
25
star
13

awesome_NLP-Interview-Notes

nlp_interview notes and answers: 该仓库主要记录 NLP 算法工程师相关的面试题和参考答案
18
star
14

AI-WAF

AI driven Web Application Firewall
Python
18
star
15

apk-view-tracer

Apk-view-tracer is a trigger tool for Android Dynamic Analysis and can be used in android anti-virus dynamic analysis.
Python
18
star
16

drowsiness-detection

打瞌睡检测,通过检测眼皮对眼球的遮挡程度,判定是否打瞌睡😂
Python
17
star
17

HomoglyphAttacksDetector

Detecting Homoglyph Attacks with CNN model using Computer Vision method
Jupyter Notebook
11
star
18

RepackagedAppDetector

Detect re-packaged app on Android based on fuzzy hash of instructions in dex
8
star
19

Loss-Function-In-PyTorch

Loss Function in PyTorch
Jupyter Notebook
7
star
20

WindowsStoreCrawler

crawl windows application from windows store on windows 8
C#
7
star
21

SpeakerRecognition-ResNet-GhostVLAD

Utterance-level Aggregation For Speaker Recognition In The Wild, using a "thin-ResNet" trunk architecture, and a dictionary-based NetVLAD or GhostVLAD layer to aggregate features across time, that can be trained end-to-end
Python
7
star
22

PrivacyLeakAdvancedDetection

Privacy Leak and Behavior Detect on Android based on method call graph
Java
6
star
23

audio_classification_models.pytorch

audio/voice classification in pytorch implementations
5
star
24

GANs-implementation

GAN models implementation repo
Python
4
star
25

jackaduma

personal profile
4
star
26

speaker_recognition_models.pytorch

speaker recognition / speaker verification models in pytorch implementation
4
star
27

DotNetAppGuard

Decompile &Static Analysis Dot Net App by using java
Java
4
star
28

Speech-Transformer-PyTorch

Python
4
star
29

jackaduma.github.io

CSS
4
star
30

django-cache-machine-mongoengine

Automatic caching and invalidation for Django & Mongodb. using models through the mongoengine ORM.
Python
4
star
31

py-recommender-framework

Recommender Framework implemented by python
Python
4
star
32

LangChain-OpenLLMs

Langchain-OpenLLMs with local knowledge library based on open source LLMs.
Jupyter Notebook
4
star
33

Annotated-Diffusion-Model

The Annotated Diffusion Model
Jupyter Notebook
3
star
34

SecCopilot

2
star
35

malicious-url-detection-with-ML

malicious url detection with machine learning
Python
1
star
36

phishing-url-detection-with-ML

phishing url detection with machine learning
1
star
37

awesome_AI_in_CyberSecurity_papers

awesome AI in CyberSecurity papers list
1
star
38

awesome_AI_in_Speech_papers

awesome AI in Speech papers
1
star
39

weak-password-detection-with-ML

weak password detection with machine learning
1
star