• Stars
    star
    391
  • Rank 110,003 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 3 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training

LiBai

docs GitHub GitHub release PRs Welcome Python Checks Docs Release Status

Introduction

English | 简体中文

LiBai is a large-scale open-source model training toolbox based on OneFlow. The main branch works with OneFlow 0.7.0.

Highlights
  • Support a collection of parallel training components

    LiBai provides multiple parallelisms such as Data Parallelism, Tensor Parallelism, and Pipeline Parallelism. It's also extensible for other new parallelisms.

  • Varied training techniques

    LiBai provides many out-of-the-box training techniques such as Distributed Training, Mixed Precision Training, Activation Checkpointing, Recomputation, Gradient Accumulation, and Zero Redundancy Optimizer(ZeRO).

  • Support for both CV and NLP tasks

    LiBai has predifined data process for both CV and NLP datasets such as CIFAR, ImageNet, and BERT Dataset.

  • Easy to use

    LiBai's components are designed to be modular for easier usage as follows:

    • LazyConfig system for more flexible syntax and no predefined structures
    • Friendly trainer and engine
    • Used as a library to support building research projects on it. See projects/ for some projects that are built based on LiBai
  • High Efficiency

Installation

See Installation instructions.

Getting Started

See Quick Run for the basic usage of LiBai.

Documentation

See LiBai's documentation for full API documentation and tutorials.

ChangeLog

Beta 0.2.0 was released in 07/07/2022, the general changes in 0.2.0 version are as follows:

Features:

  • Support evaluation enabled and set eval_iter
  • Support customized sampler in config.py
  • Support rdma for pipeline-model-parallel
  • Support multi fused kernel
    • fused_scale_mask_softmax_dropout
    • fused_scale_tril_softmax_mask_scale
    • fused_self_attention in branch libai_bench
  • User Experience Optimization
  • Optimization for training throughput, see benchmark for more details

Supported Models:

  • Support 3D parallel Roberta model
  • Support 2D parallel (data parallel + tensor model parallel) SimCSE model
  • Support Data parallel MAE model
  • Support Data parallel MOCOV3 model

See changelog for details and release history.

Contributing

We appreciate all contributions to improve LiBai. See CONTRIBUTING for the contributing guideline.

License

This project is released under the Apache 2.0 license.

Citation

If you find this project useful for your research, consider cite:

@misc{of2021libai,
  author =       {Xingyu Liao and Peng Cheng and Tianhe Ren and Depeng Liang and
                  Kai Dang and Yi Wang and Xiaoyu Xu},
  title =        {LiBai},
  howpublished = {\url{https://github.com/Oneflow-Inc/libai}},
  year =         {2021}
}

Join the WeChat group

LiBai_Wechat_QRcode

More Repositories

1

oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
C++
5,888
star
2

onediff

OneDiff: A drop-in acceleration lib for ComfyUI, HF diffusers, Stable Diffusion web UI, and other diffusion models.
Python
622
star
3

DLPerf

DeepLearning Framework Performance Profiling Toolkit
Python
275
star
4

one-yolov5

A more efficient yolov5 with oneflow backend 🎉🎉🎉
Python
213
star
5

OneFlow-Benchmark

OneFlow models for benchmarking.
Python
104
star
6

models

Models and examples built with OneFlow
Python
94
star
7

vision

Datasets, Transforms and Models specific to Computer Vision
Python
82
star
8

oneflow-documentation

oneflow documentation
HTML
68
star
9

one-glm

A more efficient GLM implementation!
Python
54
star
10

oneflow_convert

OneFlow->ONNX
Python
41
star
11

oneflow-xrt

C++
22
star
12

oneflow-yolo-doc

https://start.oneflow.org/oneflow-yolo-doc
HTML
22
star
13

serving

OneFlow Serving
C++
20
star
14

oneflow-lite

C++
18
star
15

one-fx

A toolkit for developers to simplify the transformation of nn.Module instances. It's now corresponding to Pytorch.fx.
Python
13
star
16

diffusers

Python
13
star
17

oneflow_face

Python
12
star
18

conda-env

Shell
12
star
19

flow-OpCounter

Count the FLOPs & Params of your OneFlow model.
Python
11
star
20

occl

C++
11
star
21

diffusion-benchmark

Python
11
star
22

oneflow_vision_model

Python
10
star
23

one-codegeex

Python
7
star
24

Oneflow-Model-Compression

Python
7
star
25

oneflow-api-cn

Chinese Documents of OneFlow API
Python
7
star
26

oneflow-mlu

C++
7
star
27

faster-chatglm-6b

Python
6
star
28

oneflow-hip

C++
6
star
29

OneAutoTest

Auto-Test System
Shell
5
star
30

community

Stores documents used by the OneFlow developer community
5
star
31

trt_flash_attention

C++
4
star
32

oneflow_yolov3

Python
4
star
33

get-oneflow

Build or fetch pre-build outputs of OneFlow in GitHub Actions
TypeScript
3
star
34

CoModels

Python
3
star
35

text

Data loaders and abstractions for text and NLP
Python
3
star
36

oneflow_imaginaire

Implementation of NVlabs imaginaire models in Oneflow
Python
3
star
37

utensor

C++
2
star
38

comm_network

C
2
star
39

code_film

Python
1
star
40

lesson_projects

Python
1
star
41

oneflow-insiders

Repo to collect issues of OneFlow early adopters
1
star
42

oneflow-mlu-models

Python
1
star
43

manylinux-builder

TypeScript
1
star
44

Zhusuan-Oneflow

Zhusuan with backend Oneflow
Python
1
star