hpcaitech/PaLM-colossalai

This repository has been archived on 16/Oct/2023
Stars
192
Rank 202,019 (Top 4 %)
Language
Python
License
Apache License 2.0
Created over 2 years ago
Updated almost 2 years ago

hpcaitech/PaLM-colossalai

hpcaitech

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Scalable PaLM implementation of PyTorch

Pathways Language Model (PaLM) based on PyTorch

A PyTorch implementation of the model architecture of Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance. We take advantage of Colosssal-AI to exploit multiple optimization strategies, e.g. data parallelism, tensor parallelism, mixed precision & ZeRO, to scale the training to multiple GPUs.

You are very welcome to contribute in any way to help us enhance the usability of this project.

Preparation

Install requirements, e.g. Colosssal-AI, which is a Pytorch-based large-scale model training system with various efficient parallelization techniques.

pip install -r requirements.txt

Use HuggingFace datasets to download Wikitext-2 dataset. The placeholder /PATH/TO/DATA is optional and is ./wiki_dataset by default.

python ./tools/download_wiki.py -o </PATH/TO/DATA>

Download tokenizer files by calling the following command. The place holder /PATH/TO/TOKENIZER/ is optional and is ./token by default.

bash ./tools/download_token.sh </PATH/TO/TOKENIZER/>

Usage

Configure your settings in CONFIG_FILE.py like below. We also provide some examples in ./configs

SEQ_LENGTH = 512
BATCH_SIZE = 8
NUM_EPOCHS = 10

parallel = dict(
    tensor=dict(mode='1d', size=2),
)

model = dict(type="palm_small")

Set dataset & tokenizer paths

export DATA=</PATH/TO/DATA/>
export TOKENIZER=</PATH/TO/TOKENIZER/>

Run

env OMP_NUM_THREADS=12 torchrun --nproc_per_node NUM_GPUS \
    train.py --from_torch --config CONFIG_FILE.py

Run With Docker

Dockerfile is provided in this repository and you can run PaLM in Docker with the following commands.

# build docker image
docker build -t palm .

# exec training
docker run -ti --gpus all --rm palm \
    torchrun --nproc_per_node NUM_GPUS \
        train.py --from_torch --config CONFIG_FILE.py

Acknowledgement

The project has referred PaLM-Pytorch from lucidrains.

ColossalAI

Making large AI models cheaper, faster and more accessible

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

EnergonAI

Large-scale model inference.

FastFold

Optimizing AlphaFold Training and Inference on GPU Clusters

ColossalAI-Examples

Examples of training models with hybrid parallelism using ColossalAI

CachedEmbedding

A memory efficient DLRM training solution using ColossalAI

TensorNVMe

A Python library transfers PyTorch tensors between CPU and NVMe

SkyComputing

Sky Computing: Accelerating Geo-distributed Computing in Federated Learning

ColossalAI-Benchmark

Performance benchmarking with ColossalAI

Titans

A collection of models built with ColossalAI

ColossalAI-Documentation

Documentation for Colossal-AI

ColossalAI-Pytorch-lightning

Oh-My-Dockerfile

A collection of dockerfiles for various tasks

Elixir

Elixir: Train a Large Language Model on a Small GPU Cluster

GPT-Demo

GPT Demo with hybrid distributed training

public_assets

Storing publicly available assets such as images, animations and texts

OPT-Benchmark

mmdetection-examples

Train mmdetection models with ColossalAI.