• Stars
    star
    231
  • Rank 172,392 (Top 4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 2 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Efficient, Low-Resource, Distributed transformer implementation based on BMTrain

ModelCenter

Efficient Low-Resource Implementations of Big Models

OverviewDocumentationInstallationQuick StartSupported Models简体中文

Documentation Status GitHub release (latest by date including pre-releases) GitHub

What's New

Overview

ModelCenter implements pre-trained language models (PLMs) based on the backend OpenBMB/BMTrain. ModelCenter supports Efficient, Low-Resource, Extendable model usage and distributed training.

Our main advantages are:

  • Easy to use. Compared to Deepspeed and Megatron, we have better and more flexible code-packaging and easy to configure python environments, and the training code is uniform with PyTorch style.
  • More efficient memory utilization. Models with large memory footprints can cause OOM (out of memory) before the computational power of the GPU is fully utilized. Our implementation reduces the memory footprint by several times, allowing more efficient use of the GPU's computational power with a larger batch size.
  • Efficient distributed training with low resources. With the support of OpenBMB/BMTrain, we are able to easily extend the ZeRO optimization to any PLMs, and we optimize communication and time scheduling for faster distributed training.

Documentation

Our documentation provides more information about the package.

Installation

1. From PyPI (Recommend)

$ pip install model-center

2. From Source

$ git clone https://github.com/OpenBMB/ModelCenter.git
$ cd ModelCenter
$ pip install -r requirements.txt
$ python3 setup.py install

Quick Start

In the quick start, you will walk through how to fine-tune a BERT model on a classification task.

1. Initialize bmtrain backend

First, you need to import bmtrain and use bmtrain.init_distributed() at the beginning of your code, which can initialize the distributed environments.

import bmtrain as bmt
bmt.init_distributed(seed=0)

2. Prepare the model

Next, you can simply get a pre-trained BERT model from model_center, e.g., bert-base-uncased. When fine-tuning BERT on the classification task, a feed-forward layer need to be appended to the last layer.

import torch
from model_center.model import Bert, BertConfig
from model_center.layer import Linear

class BertModel(torch.nn.Module):
    def __init__(self, config):
        super().__init__()
        self.bert = Bert.from_pretrained("bert-base-uncased")
        self.dense = Linear(config.dim_model, 2)
        bmt.init_parameters(self.dense)

    def forward(self, input_ids, attention_mask):
        pooler_output = self.bert(input_ids=input_ids, attention_mask=attention_mask).pooler_output
        logits = self.dense(pooler_output)
        return logits

config = BertConfig.from_pretrained("bert-base-uncased")
model = BertModel(config)

If only config is needed instead of pretrained checkpoint, you can initialize a model as the following:

config = BertConfig.from_json_file("your/path/to/config.json")
model = Bert(config)
bmt.init_parameters(model)
# bmt.load(model, "your/path/to/pytorch_model.pt")

3. Perpare the dataset

The next step is to prepare the dataset used for training and evaluation. Here, we use the BoolQ dataset from the SuperGLUE benchmark. You need to download the dataset and put the unzipped folder in your_path_to_dataset.

from model_center.dataset.bertdataset import DATASET
from model_center.dataset import DistributedDataLoader
from model_center.tokenizer import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
splits = ['train', 'dev']
dataset = {}

for split in splits:
    dataset[split] = DATASET['BoolQ']('your_path_to_dataset', split, bmt.rank(), bmt.world_size(), tokenizer, max_encoder_length=512)

batch_size = 64
train_dataloader = DistributedDataLoader(dataset['train'], batch_size=batch_size, shuffle=True)
dev_dataloader = DistributedDataLoader(dataset['dev'], batch_size=batch_size, shuffle=False)

4. Train the model

Now, select optimizer, learning rate scheduler, loss function, and then start training the model! Here, we train BERT for 5 epochs and evaluate it at the end of each epoch.

optimizer = bmt.optim.AdamOffloadOptimizer(model.parameters())

lr_scheduler = bmt.lr_scheduler.Noam(
    optimizer, 
    start_lr = 1e-5,
    warmup_iter = 100, 
    end_iter = -1)

loss_func = bmt.loss.FusedCrossEntropy(ignore_index=-100)

optim_manager = bmt.optim.OptimManager(loss_scale=1024)
optim_manager.add_optimizer(optimizer, lr_scheduler)

for epoch in range(5):
    model.train()
    for data in train_dataloader:
        input_ids = data['input_ids']
        attention_mask = data['attention_mask']
        labels = data['labels']

        # model forward
        logits = model(input_ids, attention_mask)

        # calculate loss
        loss = loss_func(logits.view(-1, logits.shape[-1]), labels.view(-1))

        # use bmt.sum_loss(loss) to gather all loss information from all distributed processes
        global_loss = bmt.sum_loss(loss).item()

        # zero grad
        optim_manager.zero_grad()

        # scale loss before backward to avoid precision underflow of fp16
        optim_manager.backward(loss)

        # clip gradient norm
        grad_norm = optim_manager.clip_grad_norm(optimizer.param_groups, max_norm=10.0, scale = optimizer.scale, norm_type = 2)

        # step for all optimizer inside optim_manager
        optim_manager.step()

        # print information only on rank 0 when distributed training
        bmt.print_rank(
            "loss: {:.4f} | lr: {:.4e}, scale: {:10.4f} | grad_norm: {:.4f} |".format(
                global_loss,
                lr_scheduler.current_lr,
                int(optimizer.scale),
                grad_norm,
            )
        )

    # evaluate model
    model.eval()
    with torch.no_grad():
        pd = [] # prediction
        gt = [] # ground_truth
        for data in dev_dataloader:
            input_ids = data["input_ids"]
            attention_mask = data["attention_mask"]
            labels = data["labels"]

            logits = model(input_ids, attention_mask)
            loss = loss_func(logits.view(-1, logits.shape[-1]), labels.view(-1))

            logits = logits.argmax(dim=-1)

            pd.extend(logits.cpu().tolist())
            gt.extend(labels.cpu().tolist())

        # gather results from all distributed processes
        pd = bmt.gather_result(torch.tensor(pd).int()).cpu().tolist()
        gt = bmt.gather_result(torch.tensor(gt).int()).cpu().tolist()

        # calculate metric
        from sklearn.metrics import accuracy_score
        acc = accuracy_score(gt, pd)
        bmt.print_rank(f"accuracy: {acc*100:.2f}")

5. Run your code

You can run the above code using the same launch command as the distributed module of PyTorch.

Choose one of the following commands depending on your version of PyTorch.

  • ${MASTER_ADDR} means the IP address of the master node.
  • ${MASTER_PORT} means the port of the master node.
  • ${NNODES} means the total number of nodes.
  • ${GPU_PER_NODE} means the number of GPUs per node.
  • ${NODE_RANK} means the rank of this node.

torch.distributed.launch (more suitable for torch < 1.10)

$ python3 -m torch.distributed.launch --master_addr ${MASTER_ADDR} \
                                      --master_port ${MASTER_PORT} \
                                      --nproc_per_node ${GPU_PER_NODE} \
                                      --nnodes ${NNODES} \
                                      --node_rank ${NODE_RANK} \
                                      train.py

torchrun (more suitable for torch >= 1.10)

$ torchrun --nnodes=${NNODES} \
           --nproc_per_node=${GPU_PER_NODE} \
           --rdzv_id=1 \
           --rdzv_backend=c10d \
           --rdzv_endpoint=${MASTER_ADDR}:${MASTER_PORT} \
           train.py

More information can be found from the documentation.

Supported Models

  • CPM-1[paper]. We currently support loading the following checkpoint via CPM1.from_pretrained(identifier) as follows,

    • cpm1-large
  • CPM-2[paper]. We currently support loading the following checkpoint via CPM2.from_pretrained(identifier) as follows,

    • cpm2-large
  • BERT[paper]. We currently support loading the following checkpoint via Bert.from_pretrained(identifier) as follows,

    • bert-base-cased
    • bert-base-uncased
    • bert-large-cased
    • bert-large-uncased
    • bert-base-chinese
    • bert-base-multilingual-cased
    • kv-plm
  • RoBERTa[paper]. We currently support loading the following checkpoint via Roberta.from_pretrained(identifier) of the following:

    • roberta-base
    • roberta-large
  • T5[paper]. We currently support loading the following checkpoint via T5.from_pretrained(identifier) of the following:

    • t5-small
    • t5-base
    • t5-large
    • t5-3b
    • t5-11b
    • t5-v1_1-small
    • t5-v1_1-base
    • t5-v1_1-large
    • t5-v1_1-xl
    • t5-v1_1-xxl
    • mt5-small
    • mt5-base
    • mt5-large
    • mt5-xl
    • mt5-xxl
    • mengzi-t5-base
    • flan-t5-small
    • flan-t5-base
    • flan-t5-large
    • flan-t5-xl
    • flan-t5-xxl
  • GPT-2[paper]. We currently support loading the following checkpoint via GPT2.from_pretrained(identifier) of the following:

    • gpt2-base
    • gpt2-medium
    • gpt2-large
    • gpt2-xl
    • wenzhong-gpt2-3.5b
  • GPT-J[paper]. We currently support loading the following checkpoint via GPTj.from_pretrained(identifier) of the following:

    • gptj-6b
  • Longformer[paper]. we currently support loading the following checkpoint via Longformer.from_pretrained(identifier) of the following:

    • lawformer
  • GLM[paper]. we currently support loading the following checkpoint via GLM.from_pretrained(identifier) of the following:

    • glm-10b-zh
  • ViT[paper]. we currently support loading the following checkpoint via ViT.from_pretrained(identifier) of the following:

    • vit-base-patch16-224
  • LLaMA[paper]. convert checkpoint via transfer/hugLLaMa_bmtrainLLaMa.py.

Performance

You can find more performance metrics in the repo OpenBMB/BMTrain.

Community

We welcome everyone to contribute codes following our contributing guidelines.

You can also find us on other platforms:

License

The package is released under the Apache 2.0 License.

More Repositories

1

ChatDev

Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
Shell
24,842
star
2

MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Python
11,809
star
3

XAgent

An Autonomous LLM Agent for Complex Task Solving
Python
8,017
star
4

MiniCPM

MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
Python
5,208
star
5

ToolBench

[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
Python
4,704
star
6

AgentVerse

🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation
JavaScript
4,040
star
7

BMTools

Tool Learning for Big Models, Open-Source Solutions of ChatGPT-Plugins
Python
2,876
star
8

CPM-Bee

百亿参数的中英文双语基座大模型
Python
2,683
star
9

VisCPM

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
Python
1,040
star
10

ProAgent

An LLM-based Agent for the New Automation Paradigm - Agentic Process Automation
Python
721
star
11

BMInf

Efficient Inference for Big Models
Python
572
star
12

BMTrain

Efficient Training (including pre-training and fine-tuning) for Big Models
Python
543
star
13

IoA

An open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through internet-like connectivity.
Python
533
star
14

CPM-Live

Live Training for Open-source Big Models
Python
510
star
15

BMList

A List of Big Models
Python
339
star
16

UltraFeedback

A large-scale, fine-grained, diverse preference dataset (and models).
Python
280
star
17

RepoAgent

An LLM-powered repository agent designed to assist developers and teams in generating documentation and understanding repositories quickly.
Python
235
star
18

BMPrinciples

A collection of phenomenons observed during the scaling of big foundation models, which may be developed into consensus, principles, or laws in the future
222
star
19

UltraEval

[ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.
Python
206
star
20

InfiniteBench

100k+ Long-Context Benchmark for Large Language Models (paper upcoming)
Python
105
star
21

OlympiadBench

[ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems.
Python
85
star
22

DecT

Source code for ACL 2023 paper Decoder Tuning: Efficient Language Understanding as Decoding
Python
42
star
23

XAgent-doc

Document for XAgent.
20
star
24

UltraLink

An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset
Python
17
star
25

RAGEval

Python
15
star
26

BMInf-demos

BMInf demos.
JavaScript
14
star
27

General-Model-License

6
star
28

MobileCPM

A Toolkit for Running On-device Large Language Models (LLMs) in APP
C++
1
star