• Stars
    star
    254
  • Rank 155,080 (Top 4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created 5 months ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

🐳 Aurora is a [Chinese Version] MoE model. Aurora is a further work based on Mixtral-8x7B, which activates the chat capability of the model's Chinese open domain.

Aurora: Activating chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning

Rongsheng Wang, Haoming Chen, Ruizhe Zhou, Yaofei Duan, Kunyan Cai, Han Ma, Jiaxi Cui, Jian Li, Patrick Cheong-Iao Pang, Yapeng Wang, Tao Tan☨

☨Corresponding author

Important

We highly recommend using our Aurora based on DPO! 👉Here If you don't have enough GPU or tutorials to run it, we recommend you to run it with one click using the 👉Xian Gong Cloud Aurora image. You can also check out our 👉tutorial videos.

Overview

Existing research has demonstrated that refining large language models (LLMs) through the utilization of machine-generated instruction-following data empowers these models to exhibit impressive zero-shot capabilities for novel tasks, without requiring human-authored instructions. In this paper, we systematically investigate, preprocess, and integrate three Chinese instruction-following datasets with the aim of enhancing the Chinese conversational capabilities of Mixtral-8x7B sparse Mixture-of-Experts model. Through instruction fine-tuning on this carefully processed dataset, we successfully construct the Mixtral-8x7B sparse Mixture-of-Experts model named "Aurora." To assess the performance of Aurora, we utilize three widely recognized benchmark tests: C-Eval, MMLU, and CMMLU. Empirical studies validate the effectiveness of instruction fine-tuning applied to Mixtral-8x7B sparse Mixture-of-Experts model. This work is pioneering in the execution of instruction fine-tuning on a sparse expert-mixed model, marking a significant breakthrough in enhancing the capabilities of this model architecture.

Evaluation

It is known that LLM evaluation remains a significant challenge. We use three public benchmarks in our study.

Scores of different checkpoints on BLEU and ROUGE.

Model Checkpoints BLEU-4 ROUGE-1 ROUGE-2 ROUGE-l
checkpoints-6000 18.4134 38.2669 18.9526 26.572
checkpoints-8000 18.3351 38.4327 19.058 26.6573
checkpoints-8000 18.5638 38.5497 19.1992 26.8305
checkpoints-12000 18.7156 38.7787 19.3347 27.0613
checkpoints-14000 18.5194 38.6898 19.2032 26.8863

Aurora's performance was tested in the medical evaluation benchmark CMB

Model Avg. Scores
Aurora 29.87
Mistral-7B 22.26
More details
{
    "accuracy_per_category": {
        "医师考试": 0.305,
        "护理考试": 0.33875,
        "药师考试": 0.289375,
        "医技考试": 0.30666666666666664,
        "专业知识考试": 0.27875,
        "医学考研": 0.27625
    },
    "accuracy_per_subcategory": {
        "医师考试": {
            "规培结业": 0.295,
            "执业助理医师": 0.3175,
            "执业医师": 0.3375,
            "中级职称": 0.3125,
            "高级职称": 0.2625
        },
        "护理考试": {
            "护士执业资格": 0.4,
            "护师执业资格": 0.325,
            "主管护师": 0.355,
            "高级护师": 0.275
        },
        "药师考试": {
            "执业西药师": 0.3075,
            "执业中药师": 0.2925,
            "初级药士": 0.325,
            "初级药师": 0.2925,
            "初级中药士": 0.2475,
            "初级中药师": 0.2775,
            "主管药师": 0.305,
            "主管中药师": 0.2675
        },
        "医技考试": {
            "医技士": 0.31,
            "医技师": 0.2775,
            "主管技师": 0.3325
        },
        "专业知识考试": {
            "基础医学": 0.25,
            "临床医学": 0.27,
            "预防医学与公共卫生学": 0.3575,
            "中医学与中药学": 0.2375
        },
        "医学考研": {
            "护理学": 0.2475,
            "考研政治": 0.3225,
            "西医综合": 0.2925,
            "中医综合": 0.2425
        }
    }
}

Next are some references we gave you about GPU memory usage during the training and inference stage. Please note that we did all inference and training on a single GPU.

Stage GPU Memory Usage
Training ~43 GiB
Inference ~25 GiB

Quick-Use

Thanks to the inference code from @fouvy, now you can quickly use Aurora with the following code.

Inference with Gradio
import gradio as gr
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer
from threading import Thread
from peft import PeftModel
import time

# download base model weights
# https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
# or
# https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-Instruct-v0.1
model_name_or_path = "mistralai/Mixtral-8x7B-Instruct-v0.1"

# download lora model weights
# https://huggingface.co/wangrongsheng/Aurora
# or
# https://modelscope.cn/models/wangrongsheng/Aurora-Mixtral-8x7B
lora_weights = "wangrongsheng/Aurora"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model0 = AutoModelForCausalLM.from_pretrained(model_name_or_path, load_in_4bit=True, device_map="auto", torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(
    model0,
    lora_weights,
)

class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        stop_ids = [0,]
        for stop_id in stop_ids:
            if input_ids[0][-1] == stop_id:
                return True
        return False

def convert_history_to_text(history):
    text = ""
    if len(history) > 1:
        text = "<s> " + "".join(
                [
                    "".join(
                        [
                            f"[INST]{item[0]}[/INST] {item[1]} ",
                        ]
                    )
                    for item in history[:-1]
                ]
            ) + "</s> "
    text += "".join(
        [
            "".join(
                [
                    f"[INST]{history[-1][0]}[/INST]",
                ]
            )
        ]
    )
    return text

def predict(message, history):
    history_transformer_format = history + [[message, ""]]
    stop = StopOnTokens()

    messages = convert_history_to_text(history_transformer_format)

    model_inputs = tokenizer([messages], return_tensors="pt").to("cuda")
    streamer = TextIteratorStreamer(tokenizer, timeout=10., skip_prompt=True, skip_special_tokens=True)
    generate_kwargs = dict(
        model_inputs,
        streamer=streamer,
        max_new_tokens=4096,
        do_sample=True,
        top_p=0.95,
        top_k=1000,
        temperature=1.0,
        num_beams=1,
        pad_token_id=tokenizer.eos_token_id,
        stopping_criteria=StoppingCriteriaList([stop])
        )
    t = Thread(target=model.generate, kwargs=generate_kwargs)
    t.start()

    partial_message  = ""
    t1 = time.time()
    count = 0
    for new_token in streamer:
        if new_token != '<':
            partial_message += new_token
            count += 1
            yield partial_message
    t2 = time.time()
    speed = count/(t2-t1)
    print("inference speed: %f tok/s" % speed)

gr.ChatInterface(predict,chatbot=gr.Chatbot(height=600,),title="MoE").queue().launch()
Test 1 (Mixtral-8x7B-Instruct-v0.1)
inference speed: 13.004695 tok/s
After inference:
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A    639547      C   python                                    12230MiB |
|    3   N/A  N/A    639547      C   python                                    15450MiB |
+---------------------------------------------------------------------------------------+

Test 2 (Aurora-Mixtral-8x7B + Mixtral-8x7B-Instruct-v0.1)
inference speed: 11.221806 tok/s
After inference:
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A    640109      C   python                                    12196MiB |
|    3   N/A  N/A    640109      C   python                                    15406MiB |
+---------------------------------------------------------------------------------------+

Easy-to-Use

1. Clone and Set up

https://github.com/WangRongsheng/Aurora.git
cd Aurora
pip install -r requirements.txt

2. Download Model

Base Model:

Model Download
Mixtral-8x7B-Instruct-v0.1 [HuggingFace] [HuggingFace-mirror] [ModelScope]

LoRA Model:

Model Download
Aurora [HuggingFace] [ModelScope] [WiseModel]
Aurora-PLus [HuggingFace] [WiseModel]

Note

Aurora-Plus is a bilingual Chinese and English MoE model that we highly recommend for any testing!

The huge model parameters are not convenient for you to manage your task, so we provide LoRA weights, which will be merged with the base model before inference. You don't have to worry about it.

3. Inference

Web:

CUDA_VISIBLE_DEVICES=0 python src/web_demo.py \
    --model_name_or_path ./Mixtral-8x7B-Instruct-v0.1 \
    --checkpoint_dir Aurora \
    --finetuning_type lora \
    --quantization_bit 4 \
    --template mistral

Then you can visit: http://127.0.0.1:7860/

CLI:

CUDA_VISIBLE_DEVICES=0 python src/cli_demo.py \
    --model_name_or_path ./Mixtral-8x7B-Instruct-v0.1 \
    --checkpoint_dir Aurora \
    --finetuning_type lora \
    --quantization_bit 4 \
    --template mistral

API:

CUDA_VISIBLE_DEVICES=0 python src/api_demo.py \
    --model_name_or_path ./Mixtral-8x7B-Instruct-v0.1 \
    --checkpoint_dir Aurora \
    --finetuning_type lora \
    --quantization_bit 4 \
    --template mistral

If you need to load weights for specific checkpoints, you can set them up like this: --checkpoint_dir Aurora/checkpoint-6000.

Train

If you have a single GPU and its GPU memory size is larger than 48GB, you can train your own models.

Train your MoE model
CUDA_VISIBLE_DEVICES=5 python   src/train_bash.py \
    --stage sft \
    --model_name_or_path ./Mixtral-8x7B-Instruct-v0.1 \
    --do_train \
    --dataset alpaca_zh,alpaca_gpt4_zh,sharegpt \
    --finetuning_type lora \
    --quantization_bit 4 \
    --overwrite_cache \
    --output_dir output/ \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 100 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16 \
    --template mistral \
    --lora_target q_proj,v_proj

--quantization_bit 4 means you will use QLoRA, If you have a larger GPU memory size you can remove it and use LoRA.

Evaluation your MoE model
CUDA_VISIBLE_DEVICES=0 python src/evaluate.py \
    --model_name_or_path ./Mixtral-8x7B-Instruct-v0.1 \
    --checkpoint_dir Aurora/checkpoint-5000 \
    --finetuning_type lora \
    --quantization_bit 4 \
    --template mistral \
    --task cmmlu \ # cmmlu, mmlu, ceval
    --split test \
    --lang en \ # zh, en
    --n_shot 5 \
    --batch_size 8

Results

Acknowledgments

This work is mainly done by the Faculty of Applied Sciences of the Macao Polytechnic University. The computational resources used in this work were obtained from AWS servers. The fine-tuning framework we used is LLaMA-Factory, which brings a lot of convenience to our work. We also thank the public datasets from the open source community, such as shareAI, stanford_alpaca and GPT-4-LLM. Most importantly we are very grateful to Mistral AI, who are leading a new technology boom that will dramatically change the future of technology development.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{wang2023auroraactivating,
      title={Aurora:Activating Chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning}, 
      author={Rongsheng Wang and Haoming Chen and Ruizhe Zhou and Yaofei Duan and Kunyan Cai and Han Ma and Jiaxi Cui and Jian Li and Patrick Cheong-Iao Pang and Yapeng Wang and Tao Tan},
      year={2023},
      eprint={2312.14557},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

License

Please follow the Apache 2.0 License.

More Repositories

1

ChatGenTitle

🌟 ChatGenTitle:使用百万arXiv论文信息在LLaMA模型上进行微调的论文题目生成模型
Python
823
star
2

XrayGLM

🩺 首个会看胸部X光片的中文多模态医学大模型 | The first Chinese Medical Multimodal Model that Chest Radiographs Summarization.
Python
799
star
3

CareGPT

🌞 CareGPT (关怀GPT)是一个医疗大语言模型,同时它集合了数十个公开可用的医疗微调数据集和开放可用的医疗大语言模型,包含LLM的训练、测评、部署等以促进医疗LLM快速发展。Medical LLM, Open Source Driven for a Healthy Future.
Python
504
star
4

MedQA-ChatGLM

🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调,我们的眼光不止于医疗问答
Python
280
star
5

BestYOLO

🌟Change the world, it will become a better place. | 以科研和竞赛为导向的最好的YOLO实践框架!
Python
200
star
6

Use-LLMs-in-Colab

🤖 集合众多大模型在Colab上的使用 | LLMs is all you need.
Jupyter Notebook
109
star
7

make-your-yolov5_dataset

💥Make your yolov5 dataset by using labelimg.I hope my work can help you make your yolov5 datasets more quickly.
Python
76
star
8

SAM-fine-tune

🌌 Fine tune specific SAM model on any task
Python
62
star
9

KDAT

💥一个专为视觉方向目标检测全流程的标注工具集,全称:Kill Object Detection Annotation Tools。
Python
62
star
10

IvyGPT

[CICAI 2023] 💊 产生最贴近真实医生问诊效果的医疗大语言模型IvyGPT
45
star
11

Awesome-LLM-with-RAG

A curated list of Large Language Model with RAG
Python
45
star
12

DS_Yanweimin

🐋 大学《数据结构(C语言版)(第2版)》 严蔚敏版的配套PPT/源代码/实验安排/课时安排
43
star
13

Chinese-LLaMA-Alpaca-Usage

📔 对Chinese-LLaMA-Alpaca进行使用说明和核心代码注解
Jupyter Notebook
42
star
14

PaddleOCR-Flask-deploy

✅Deploy PaddleOCR with flask | 利用Flask对PaddleOCR进行部署,方便调用
HTML
35
star
15

Statistical-learning-method-lihang

《统计学习方法》,作者李航,本书全面系统地介绍了统计学习的主要内容
Jupyter Notebook
23
star
16

Knowledge-Base-LLMs-QA

👽 基于大模型的知识库问答 | Large model-based knowledge base Q&A.
Python
22
star
17

for-Graduate_student

💥考研指导
20
star
18

pytorch-classification

利用pytorch实现图像分类的一个完整的代码,训练,预测,TTA,模型融合,模型部署,cnn提取特征,svm或者随机森林等进行分类,模型蒸馏,一个完整的代码
Jupyter Notebook
20
star
19

tf-pytorch-paddle

💥三大深度学习框架:tensorflow,pytorch,paddle的高层API使用学习
Jupyter Notebook
19
star
20

yolov5-obj-distance-measure

✅yolov5-obj-distance-measure.You can change code in yolov5.
Python
16
star
21

Classify-Leaves

✅Kaggle竞赛之176种树叶图片种类识别分类
Jupyter Notebook
12
star
22

Directory

一个简约又简单的基于云服务部署的私人网盘系统
PHP
11
star
23

DeepLearing-LiMu-Notes

⛏️LiMu Deeplearning notes. | 李沐深度学习课程笔记
Jupyter Notebook
11
star
24

CDNet-yolov5

💟《CDNet:一个基于YOLOv5的在Jetson Nano上实时、鲁棒的斑马线检测网络》论文的原生(ultralytics)yolov5训练、推理baseline仓库
Python
10
star
25

awesome-LLM-resourses

🧑‍🚀 全世界最好的中文LLM资料总结
10
star
26

Yolov5-on-flask

Running YOLOv5 through web browser using Flask microframework
Python
9
star
27

Wear_mask

基于自制数据集+百度EasyDL训练的佩戴口罩识别
8
star
28

DS-python_Zhangguanghe

🐒 大学《数据结构-python语言描述》 张光河版的配套PPT/源代码/实验安排/课时安排
8
star
29

for-CPA

小白CPA(注册会计师)学习考证指南
8
star
30

Interesting-python

💟There are many interesting python examples.
Python
7
star
31

wrsArxiv

👻 Arxiv个性化定制化模版,实现对特定领域的相关内容、作者与学术会议的有效跟进,将Arxiv定制化为MyArxiv.
CSS
5
star
32

YOLOv5_tfjs

✅YOLOv5 TFjs infer demo
JavaScript
5
star
33

EasyDL

✅基于百度EasyDL训练的模型,并可以部署在前端、PC,移动端和微信小程序端,视频流推断
Python
5
star
34

ReadPaper

🧑‍🚀 Professional translation and reading of English academic papers in PDF format.
HTML
4
star
35

mask-yolov5-fastapi

Yolov5 is used for mask recognition, and fastapi is used for web deployment.
Python
4
star
36

Image-Registration

⛏️A collection of tools and practices for image registration. | 图像配准的工具和实践集合
Python
4
star
37

Sentence-BERT-Similarity

📃Train text similarity model based on Sentence-BERT | 基于Sentence-BERT训练自己的文本相似度模型
Python
4
star
38

cnn-visualization

卷积神经网络(CNN)从卷积层到池化层可视化演示
C#
3
star
39

Awesome-Blog

集结优秀的博客与高颜值博客
3
star
40

smart-photo

基于百度飞桨PaddleClas和Watchdog构建的智慧相册
Python
3
star
41

Yolov5-DeepSort-Pytorch

Real-time multi-person tracker using YOLO v5 and deep sort.
Python
3
star
42

Dimensionality-reduction-algorithm

☺️ PCA、LDA、MDS、LLE、TSNE等降维算法的python实现
Jupyter Notebook
3
star
43

yolov5_LPRNet_carcard

Jupyter Notebook
2
star
44

Blog-backup

✅Personal blog posts and page backups.
Jupyter Notebook
2
star
45

Mask-Detection-yolov4-tiny

✅Mask wearing detection based on Yolo V4 tiny.It can be used in graduation project.
Python
2
star
46

Classify-Fu

✅图像分类成长赛——AI集福,“福”字图片识别
Python
2
star
47

Algorithms

😊 All Algorithms.
Python
2
star
48

Machine-Learning-zzh-notes

周志华《机器学习》,一本详细介绍了机器学习领域不同类型的算法的书
2
star
49

Slides-Reports-and-papers

⛏️This is the storage of my Slides、Reports and Papers. | 存储PPT、报告和论文
2
star
50

Bayesian-Personalized-Ranking

Bayesian Personalized Ranking is a learning algorithm for collaborative filtering first introduced in: BPR: Bayesian Personalized Ranking from Implicit Feedback. Steffen Rendle, Christoph Freudenthaler, Zeno Gantner and Lars Schmidt-Thieme, Proc. UAI 2009.
Python
2
star
51

ChatCitation

ChatGPT辅助论文引用格式生成,支持单个和批量生成
Python
2
star
52

weibo-HotSearch-data

微博实时热搜榜数据抓取并保存
Python
2
star
53

IlovePaddleModel

💟集锦PaddlePaddle为我们提供的方便的模型和解决方案 | The best model, the best US.
2
star
54

WangRongsheng

✔️My Github profile. | 我的Github个人主页
Python
2
star
55

MisinfoGenDet

Jupyter Notebook
1
star
56

random-roll-call

随机点名
HTML
1
star
57

Computer-Vision

💟养成系计算机视觉
1
star
58

flask-data-collection

✅基于flask实现的前端数据集收集平台
HTML
1
star
59

Our-Love

[源码]毒💘 /甜💖 鸡汤的网站
CSS
1
star
60

auto-healthy-clock

Henan University of Technology auto healthy clock.
Python
1
star
61

Christmas-lucky-draw

🎅 圣诞抽奖-python
Python
1
star
62

images

参考: https://github.com/jrainlau/picee ,制作的前端上传图片到github,作为图床使用
Python
1
star
63

beautiful

Many beautiful sisters.
HTML
1
star
64

Are-you-still-there

😇 一个python的demo,可以秒速发千万条消息给别人,当代的键盘侠克星
Python
1
star
65

Good-Search_Github

🌝 Github的高级搜索方法
1
star
66

make-EfficientDet-datasets

✅make your EfficientDet datasets.|制作你的EfficientDet可以训练的数据集
Python
1
star
67

CareGPT-Bot

1
star
68

DimensionalityReduction-code

💝 经典降维算法的Demo(python),同时还给出了参考资料的链接。
Jupyter Notebook
1
star
69

Dataset-website

数据集网站的收集
1
star
70

HandWriting-Recognition-GUI

HandWriting-Recognition-GUI.
Python
1
star
71

Pytorch-Tutorial

💟 Learn Pytorch.
Python
1
star
72

Transform-your-data

yolo->voc,voc->yolo,voc->coco,voc-tfrecord
Python
1
star
73

LLM-cookbook

1
star
74

Data-Science-Notes

📔 一份数据科学的笔记以及资料
Jupyter Notebook
1
star
75

Paddle2.0-API

高层API助你快速上手深度学习
HTML
1
star
76

Seven-sorting-methods-python

🐋 实现了常用的排序算法,包括:冒泡排序、直接插入排序、直接选择排序、希尔排序、归并排序、快速排序、堆排序,基于python 3 编程实现。
Python
1
star
77

AI-Practice

⛏️Artificial intelligence project practice. | 人工智能项目实践
Jupyter Notebook
1
star
78

RetrievalCLIP

1
star
79

YOLOv5_Server_deploy

✅YOLOv5 Server deploy
Python
1
star
80

MLPNumberClassifier

基于MLP的简单手写体数字识别
Python
1
star