• Stars
    star
    141
  • Rank 259,971 (Top 6 %)
  • Language
    Python
  • Created about 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

GPT2 Quickly

Build your own GPT2 quickly, without doing many useless work.

Build

This project is base on 🤗 transformer. This tutorial show you how to train your own language(such as Chinese or Japanese) GPT2 model in a few code with Tensorflow 2.

You can try this project in colab right now.

Main file


├── configs
│   ├── test.py
│   └── train.py
├── build_tokenizer.py
├── predata.py
├── predict.py
└── train.py

Preparation

virtualenv

git clone [email protected]:mymusise/gpt2-quickly.git
cd gpt2-quickly
python3 -m venv venv
source venv/bin/activate

pip install -r requirements.txt

Install google/sentencepiece

0x00. prepare your raw dataset

this is a example of raw dataset: raw.txt

0x01. Build vocab

python cut_words.py
python build_tokenizer.py

0x02. Tokenize

python predata.py --n_processes=2

0x03 Train

python train.py

0x04 Predict

python predict.py

0x05 Fine-Tune

ENV=FINETUNE python finetune.py