• Stars
    star
    2,922
  • Rank 15,515 (Top 0.4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡

LLM Zoo: democratizing ChatGPT

zoo

LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models. [Tech Report]

Latest News

  • [05/05/2023]: Release the training code. Now, you can replicate a multilingual instruction-following LLM by yourself. :-)
  • [04/24/2023]: Add more results (e.g., MOSS) in the evaluation benchmark.
  • [04/08/2023]: Release the Phoenix (for all languages) and Chimera (for Latin languages) models.

🤔 Motivation

  • Break "AI supremacy" and democratize ChatGPT

"AI supremacy" is understood as a company's absolute leadership and monopoly position in an AI field, which may even include exclusive capabilities beyond general artificial intelligence. This is unacceptable for AI community and may even lead to individual influence on the direction of the human future, thus bringing various hazards to human society.

  • Make ChatGPT-like LLM accessible across countries and languages
  • Make AI open again. Every person, regardless of their skin color or place of birth, should have equal access to the technology gifted by the creator. For example, many pioneers have made great efforts to spread the use of light bulbs and vaccines to developing countries. Similarly, ChatGPT, one of the greatest technological advancements in modern history, should also be made available to all.

🎬 Get started

Install

Run the following command to install the required packages:

pip install -r requirements.txt

CLI Inference

python -m llmzoo.deploy.cli --model-path /path/to/weights/

For example, for Phoenix, run

python -m llmzoo.deploy.cli --model-path FreedomIntelligence/phoenix-inst-chat-7b

and it will download the model from Hugging Face automatically. For Chimera, please follow this instruction to prepare the weights.

Check here for deploying a web application.

📚 Data

Overview

We used the following two types of data for training Phoenix and Chimera:

Instruction data
  • Multilingual instructions (language-agnostic instructions with post-translation)
+ Self-Instructed / Translated (Instruction, Input) in Language A
- ---(Step 1) Translation --->
+ (Instruction, Input) in Language B (B is randomly sampled w.r.t. the probability distribution of realistic languages)
- ---(Step 2) Generate--->
+ Output in Language B
  • User-centered instructions
+ (Role, Instruction, Input) seeds
- ---(Step 1) Self Instruct--->
+ (Role, Instruction, Input) samples
- ---(Step 2) generate output Instruct--->
+ (Role, Instruction, Input) ---> Output
Conversation data
  • User-shared conversations
+ ChatGPT conversations shared on the Internet
- ---(Step 1) Crawl--->
+ Multi-round conversation data

Check InstructionZoo for the collection of instruction datasets.

Check GPT-API-Accelerate Tool for faster data generation using ChatGPT.

Download

🐼 Models

Overview of existing models

Model Backbone #Params Open-source model Open-source data Claimed language Post-training (instruction) Post-training (conversation) Release date
ChatGPT - - multi 11/30/22
Wenxin - - zh 03/16/23
ChatGLM GLM 6B en, zh 03/16/23
Alpaca LLaMA 7B en 52K, en 03/13/23
Dolly GPT-J 6B en 52K, en 03/24/23
BELLE BLOOMZ 7B zh 1.5M, zh 03/26/23
Guanaco LLaMA 7B en, zh, ja, de 534K, multi 03/26/23
Chinese-LLaMA-Alpaca LLaMA 7/13B en, zh 2M/3M, en/zh 03/28/23
LuoTuo LLaMA 7B zh 52K, zh 03/31/23
Vicuna LLaMA 7/13B en 70K, multi 03/13/23
Koala LLaMA 13B en 355K, en 117K, en 04/03/23
BAIZE LLaMA 7/13/30B en 52K, en 111.5K, en 04/04/23
Phoenix (Ours) BLOOMZ 7B multi 40+ 40+ 04/08/23
Latin Phoenix: Chimera (Ours) LLaMA 7/13B multi (Latin) Latin Latin 04/08/23
The key difference between existing models and ours.

The key difference in our models is that we utilize two sets of data, namely instructions and conversations, which were previously only used by Alpaca and Vicuna respectively. We believe that incorporating both types of data is essential for a recipe to achieve a proficient language model. The rationale is that the instruction data helps to tame language models to adhere to human instructions and fulfill their information requirements, while the conversation data facilitates the development of conversational skills in the model. Together, these two types of data complement each other to create a more well-rounded language model.

Phoenix (LLM across Languages)

The philosophy to name

The first model is named Phoenix. In Chinese culture, the Phoenix is commonly regarded as a symbol of the king of birds; as the saying goes "百鸟朝凤", indicating its ability to coordinate with all birds, even if they speak different languages. We refer to Phoenix as the one capable of understanding and speaking hundreds of (bird) languages. More importantly, Phoenix is the totem of "the Chinese University of Hong Kong, Shenzhen" (CUHKSZ); it goes without saying this is also for the Chinese University of Hong Kong (CUHK).

Model Backbone Data Link
Phoenix-chat-7b BLOOMZ-7b1-mt Conversation parameters
Phoenix-inst-chat-7b BLOOMZ-7b1-mt Instruction + Conversation parameters
Phoenix-inst-chat-7b-int4 BLOOMZ-7b1-mt Instruction + Conversation parameters

Chimera (LLM mainly for Latin and Cyrillic languages)

The philosophy to name

The philosophy to name: The biggest barrier to LLM is that we do not have enough candidate names for LLMs, as LLAMA, Guanaco, Vicuna, and Alpaca have already been used, and there are no more members in the camel family. Therefore, we find a similar hybrid creature in Greek mythology, Chimera, composed of different Lycia and Asia Minor animal parts. Coincidentally, it is a hero/role in DOTA (and also Warcraft III). It could therefore be used to memorize a period of playing games overnight during high school and undergraduate time.

Model Backbone Data Link
Chimera-chat-7b LLaMA-7b Conversation parameters (delta)
Chimera-chat-13b LLaMA-13b Conversation parameters (delta)
Chimera-inst-chat-7b LLaMA-7b Instruction + Conversation parameters (delta)
Chimera-inst-chat-13b LLaMA-13b Instruction + Conversation parameters (delta)

Due to LLaMA's license restrictions, we follow FastChat to release our delta weights. To use Chimera, download the original LLaMA weights and run the script:

python tools/apply_delta.py \
 --base /path/to/llama-13b \
 --target /output/path/to/chimera-inst-chat-13b \
 --delta FreedomIntelligence/chimera-inst-chat-13b-delta

CAMEL (Chinese And Medically Enhanced Langauge models)

The philosophy to name

The philosophy to name: Its Chinese name is HuatuoGPT or 华佗GPT to commemorate the great Chinese physician named Hua Tuo (华佗), who lived around 200 AC. Training is already finished; we will release it in two weeks; some efforts are needed to deploy it in public cloud servers in case of massive requests.

Check our models in HuatuoGPT or try our demo (API key required). Similar biomedical models could be seen in biomedical LLMs.

More models in the future

Legal GPT (coming soon)

Vision-Language Models (coming soon)

Retrieval-augmented Models (coming soon)

🧐 Evaluation and Benchmark

We provide a bilingual, multidimensional comparison across different open-source models with ours.

Chinese

  • Automatic Evaluation Using GPT-4:
Model Ratio
Phoenix-inst-chat-7b vs. ChatGPT 85.2%
Phoenix-inst-chat-7b vs. ChatGLM-6b 94.6%
Phoenix-inst-chat-7b vs. Baidu-Wenxin 96.8%
Phoenix-inst-chat-7b vs. MOSS-moon-003-sft 109.7%
Phoenix-inst-chat-7b vs. BELLE-7b-2m 122.7%
Phoenix-inst-chat-7b vs. Chinese-Alpaca-7b 135.3%
Phoenix-inst-chat-7b vs. Chinese-Alpaca-13b 125.2%

Observation: It shows that Phoenix-chat-7b achieves 85.2% performance of ChatGPT in Chinese. It slightly underperforms Baidu-Wenxin (96.8%) and ChatGLM-6b (94.6 %), both are not fully open-source; ChatGLM-6b only provides model weights without training data and details. Although Phoenix is a multilingual LLM, it achieves SOTA performance among all open-source Chinese LLMs.

  • Human Evaluation:
win tie lose
Phoenix vs. ChatGPT 12 35 53
Phoenix vs. ChatGLM-6b 36 11 53
Phoenix vs. Baidu-Wenxin 29 25 46
Phoenix vs. BELLE-7b-2m 55 31 14
Phoenix vs. Chinese-Alpaca-13b 56 31 13

Observation: It shows that the human evaluation results show the same trend as the automatic evaluation results.

English

  • Automatic Evaluation Using GPT-4:
Model Ratio
Chimera-chat-7b vs. ChatGPT 85.2%
Chimera-chat-13b vs. ChatGPT 92.6%
Chimera-inst-chat-13b vs. ChatGPT 96.6%

👾 Quantization

We offer int8 and int4 quantizations, which will largely reduce the GPU memory consumption, e.g., from ~28GB to ~7GB for phoenix.

Int8

You can directly obatin int8 version of phoenix by passing --load-8bit when using cli inference. E.g.,

python -m llmzoo.deploy.cli --model-path FreedomIntelligence/phoenix-inst-chat-7b --load-8bit

Int4

For int4 version, we take advantage of GPTQ. You can directly obatin int4 version of Phoenix by passing int4 version model and --load-4bit when using cli inference. This would require package AutoGPTQ be installed. E.g.,

python -m llmzoo.deploy.cli --model-path FreedomIntelligence/phoenix-inst-chat-7b-int4 --load-4bit

We use AutoGPTQ to support Phoenix via,

BUILD_CUDA_EXT=0 pip install auto-gptq[triton]

For Chimera, we can not share the int4 version parameters due to restrictions. And you can follow the example in our patched AutoGPTQ to conduct quantization by yourselves.

Thank yhyu13, please check the merged weight and GPTQ quantized weight for chimera in chimera-inst-chat-13b-hf and chimera-inst-chat-13b-gptq-4bit.

Inference in pure C/C++: You can refer to this link to run Chimera or Phoenix on your PC.

🏭 Deployment

Launch a controller

python -m llmzoo.deploy.webapp.controller

Launch a model worker

python -m llmzoo.deploy.webapp.model_worker --model-path /path/to/weights/

Launch a gradio web server

python -m llmzoo.deploy.webapp.gradio_web_server

Now, you can open your browser and chat with a model.

😀 Training by yourself

Prepare the data

You can either download the phoenix-sft-data-v1 data or prepare your own data. Put your data on the path data/data.json.

Training

For Phoenix, run

bash scripts/train_phoenix_7b.sh

For Chimera, prepare the LLaMA weights following this instruction and run

bash scripts/train_chimera_7b.sh
bash scripts/train_chimera_13b.sh

🤖 Limitations

Our goal in releasing our models is to assist our community in better replicating ChatGPT/GPT4. We are not targeting competition with other competitors, as benchmarking models is a challenging task. Our models face similar models to those of ChatGPT/GPT4, which include:

  • Lack of common sense: our models may not always have the ability to apply common sense knowledge to situations, which can lead to nonsensical or inappropriate responses.

  • Limited knowledge domain: our models' knowledge is based on the data it was trained on, and it may not have the ability to provide accurate or relevant responses outside that domain.

  • Biases: our models may have biases that reflect the biases in the data it was trained on, which can result in unintended consequences or unfair treatment.

  • Inability to understand emotions: While our models can understand language, it may not always be able to understand the emotional tone behind it, which can lead to inappropriate or insensitive responses.

  • Misunderstandings due to context: our models may misunderstand the context of a conversation, leading to misinterpretation and incorrect responses.

🙌 Contributors

LLM Zoo is mainly contributed by:

As an open-source project, we are open to contributions. Feel free to contribute if you have any ideas or find any issue.

Acknowledgement

We are aware that our works are inspired by the following works, including but not limited to

Without these, nothing could happen in this repository.

Citation

@article{phoenix-2023,
  title={Phoenix: Democratizing ChatGPT across Languages},
  author={Zhihong Chen and Feng Jiang and Junying Chen and Tiannan Wang and Fei Yu and Guiming Chen and Hongbo Zhang and Juhao Liang and Chen Zhang and Zhiyi Zhang and Jianquan Li and Xiang Wan and Benyou Wang and Haizhou Li},
  journal={arXiv preprint arXiv:2304.10453},
  year={2023}
}
@misc{llm-zoo-2023,
  title={LLM Zoo: democratizing ChatGPT},
  author={Zhihong Chen and Junying Chen and Hongbo Zhang and Feng Jiang and Guiming Chen and Fei Yu and Tiannan Wang and Juhao Liang and Chen Zhang and Zhiyi Zhang and Jianquan Li and Xiang Wan and Haizhou Li and Benyou Wang},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/FreedomIntelligence/LLMZoo}},
}

We are from the School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHKSZ) and the Shenzhen Rsearch Institute of Big Data (SRIBD).

Star History

Star History Chart

More Repositories

1

Medical_NLP

Medical NLP Competition, dataset, large models, paper
2,066
star
2

TextClassificationBenchmark

A Benchmark of Text Classification in PyTorch
Python
601
star
3

HuatuoGPT

HuatuoGPT, Towards Taming Language Models To Be a Doctor. (An Open Medical GPT)
Python
527
star
4

InstructionZoo

184
star
5

crosstalk-generation

Code and data for crosstalk text generation tasks, exploring whether large models and pre-trained language models can understand humor.
Python
163
star
6

CMB

CMB, A Comprehensive Medical Benchmark in Chinese
Python
124
star
7

Evaluation-of-ChatGPT-on-Information-Extraction

An Evaluation of ChatGPT on Information Extraction task, including Named Entity Recognition (NER), Relation Extraction (RE), Event Extraction (EE) and Aspect-based Sentiment Analysis (ABSA).
Python
121
star
8

qnn

Python
112
star
9

ReasoningNLP

paper list on reasoning in NLP
109
star
10

GrammarGPT

The code and data for GrammarGPT.
Python
87
star
11

complex-order

Python
83
star
12

Huatuo-26M

The Largest-scale Chinese Medical QA Dataset: with 26,000,000 question answer pairs.
67
star
13

FastLLM

Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
Python
32
star
14

DPTDR

Code for COLING22 paper, DPTDR: Deep Prompt Tuning for Dense Passage Retrieval
Python
25
star
15

GPT-API-Accelerate

The "GPT-API-Accelerate" project provides a set of Python classes for accelerating the process of generating responses to prompts using the OpenAI GPT-3.5 API.
Python
19
star
16

REMOP

Code for the paper: Modular Retrieval for Generalization and Interpretation.
Python
11
star
17

ReaLM

A trainable user simulator
Python
9
star
18

ChatGPT-Detection-PR-HPPT

Codes and dataset for the paper: Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text
Python
9
star
19

Reading-list-of-ChatGPT

7
star
20

Autonomous_Learning

LLMs Could Autonomously Learn Without External Supervision. (An Autonomous Learning Method)
Python
5
star
21

DotaGPT

Chinese Medical instruction-tuning Dataset
Python
5
star
22

HuatuoGPT-R

RAG to reduce medical haluccination.
5
star
23

MultilingualSIFT

MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning
5
star
24

MindedWheeler

Embody_AI with car as Demo
C++
5
star
25

MedJamba

Multilingual Medical Model Based On Jamba
Python
4
star
26

finetune_chatgpt

The example for finetuning chatgpt.
Python
3
star
27

ChatZoo

Chat data for training LLMs
3
star
28

LLMZOO-API-SDK

Python
3
star
29

Overview-of-ChatGPT

3
star
30

LLMFactory

A factory to standardize LLM adaptation through modularization
Python
2
star
31

OpenChatGPT

2
star
32

try_Phoenix2

Phoenix2 code in dev
Python
1
star
33

MLLM-Bench

Evaluating Multi-modal LLMs using GPT-4V
HTML
1
star