• Stars
    star
    510
  • Rank 86,011 (Top 2 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created almost 2 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Crosslingual Generalization through Multitask Finetuning

Crosslingual Generalization through Multitask Finetuning

This repository provides an overview of all components used for the creation of BLOOMZ & mT0 and xP3 introduced in the paper Crosslingual Generalization through Multitask Finetuning.

Data

Name Explanation Example models
xP3x Mixture of 17 tasks in 277 languages with English prompts WIP - Join us at Project Aya @C4AI to help!
xP3 Mixture of 13 training tasks in 46 languages with English prompts BLOOMZ & mT0-13B
xP3mt Mixture of 13 training tasks in 46 languages with prompts in 20 languages (machine-translated from English) BLOOMZ-MT & mT0-13B-MT
xP3all xP3 + our evaluation datasets adding an additional 3 tasks for a total of 16 tasks in 46 languages with English prompts
xP3megds Megatron-DeepSpeed processed version of xP3 BLOOMZ
P3 Repreprocessed version of the English-only P3 with 8 training tasks BLOOMZ-P3 & mT0-13B-P3

Models

Multitask finetuned on xP3. Recommended for prompting in English.
Parameters 300M 580M 1.2B 3.7B 13B 560M 1.1B 1.7B 3B 7.1B 176B
Finetuned Model mt0-small mt0-base mt0-large mt0-xl mt0-xxl bloomz-560m bloomz-1b1 bloomz-1b7 bloomz-3b bloomz-7b1 bloomz
Multitask finetuned on xP3mt. Recommended for prompting in non-English.
Finetuned Model mt0-xxl-mt bloomz-7b1-mt bloomz-mt
Multitask finetuned on P3. Released for research purposes only. Strictly inferior to above models!
Finetuned Model mt0-xxl-p3 bloomz-7b1-p3 bloomz-p3
Original pretrained checkpoints. Not recommended.
Pretrained Model mt5-small mt5-base mt5-large mt5-xl mt5-xxl bloom-560m bloom-1b1 bloom-1b7 bloom-3b bloom-7b1 bloom

Create xP3(x)

We have processed & uploaded xP3. If you want to recreate it, follow these steps:

  1. Get promptsource: For xP3mt git clone -b xp3mt https://github.com/Muennighoff/promptsource.git, for xP3 git clone -b tr13 https://github.com/Muennighoff/promptsource.git & install cd promptsource; pip install -e .
  2. Get packages pip install -q datasets iso-639
  3. Get the creation script & edit it if necessary:
  • For xP3mt, set USE_ENGLISH_PROMPTS = False in the beginning
  • For xP3, set USE_ENGLISH_PROMPTS = True in the beginning
  1. Run the script, such as via python prepare_xp3.py or a SLURM script

For the new extension of xP3, xP3x, the process is largely the same except:

  1. Install the xp3x branch instead i.e. pip install git+https://github.com/Muennighoff/promptsource.git@xp3x
  2. The creation script is in this repository & named create_xp3x.py.

xP3x is a superset of xP3, so unless you want to reproduce the paper, we recommend always using xP3x (or xP3mt if you want machine-translated prompts).

Train models

BLOOMZ

  1. Download the pretrained model checkpoint, which is of shape PP=12, TP=4, DP=4. If you'd like to reshape the model you will also need to download the universal checkpoint. If you want to continue finetuning, you should use our finetuned checkpoint, which is of shape PP=72, TP=1, DP=4.
  2. Setup the training code: git clone -b t0loading https://github.com/bigscience-workshop/Megatron-DeepSpeed & follow its setup guide to create an environment with necessary packages.
  3. Download the Megatron-DeepSpeed processed xP3megds or repreprocess it for Megatron-DeepSpeed yourself by downloading xP3, removing the merged_{lang}.jsonl files & preprocess it using the script here.
  4. Setup & run the training script: We use SLURM scripts available at bigscience-workshop/bigscience/train/tr13-mtf and referred to as xp3capmixnewcodelonglossseq. E.g. this is the script launched to train bloomz. Important parts of the script to modify are:
  • #SBATCH variables, such as nodes, gpus, time, etc. - Our SLURM guide is here
  • source $six_ALL_CCFRWORK/start-tr13f-6B3-ml-t0 to point to your own conda environment setup via Megatron-DeepSpeed
  • PATH environment variables, notably
    • TRAIN_DATA_PATH & VALID_DATA_PATH, which point to files pointing to your processed training and validation data. We provide our files in this repository (xp3capmixnewcodelong_train.txt & xp3capmixnewcodelong_validation.txt), but you will likely want to change the paths inside. The percentages per language are based on how much each language makes up in xP3 with code being slightly upsampled.
  • PP_SIZE=72, TP_SIZE=1 & BATCH SIZE & co specifying the layout. This will depend on the hardware available to you. If you change, you may have to reshape the model. For reshaping you need to use the universal checkpoint and use the --universal flag in the script. We recommend saving a new checkpoint right after & then continuing training without --universal, which will be faster.
  • If you want to restart from a saved checkpoint (e.g. after training a few steps like above), make sure to remove the --no-load-optim & --reset-progress flags
  • After training, you can convert the checkpoint to transformers format using the script here

Helpful resources:

mT0

Follow the finetuning instructions here making sure to use pretrained mT5 models & the xP3 dataset.

Helpful resources:

Evaluate models

Evaluation results are all available in this repository: https://huggingface.co/datasets/bigscience/evaluation-results under the respective models. Below we explain how to run evaluation.

Rank Evaluation

We evaluate the models on Rank Evaluation on XCOPA, XNLI, XStoryCloze & XWinograd:

  1. Get promptsource fork: git clone -b xp3mt https://github.com/Muennighoff/promptsource.git & cd promptsource; pip install -e .
  2. Get t-zero fork: git clone -b muennighoff/upgrdps https://github.com/Muennighoff/t-zero.git & cd t-zero; pip install -e .
  3. Download model & run evaluation script, for example for bloomz.

Generation Evaluation

We evaluate generation on translation & summarization during training for validation:

  1. Get promptsource fork: git clone -b xp3mt https://github.com/Muennighoff/promptsource & cd promptsource; pip install -e .
  2. Get bigscience-workshop/lm-evaluation-harness: git clone https://github.com/bigscience-workshop/lm-evaluation-harness. The script for the 7.1B model, for example, is here.

We also evaluate code generation on HumanEval:

  1. Get code evaluation code git clone https://github.com/loubnabnl/bloom-code-evaluation & go through its setup.
  2. Set prepend_eos to False in code_eval.py at complete_code(model, tokenizer, prompt, num_completions=1, prepend_eos=True, **gen_kwargs) i.e. complete_code(model, tokenizer, prompt, num_completions=1, prepend_eos=False, **gen_kwargs).
  3. Download model & run evaluation script swapping out MODEL_CKPT for your path, for example for bloomz use this.

Plots & Tables

Plots

  • Figure 1: plotstables/xp3_taxonomy.drawio & plotstables/xp3_taxonomy.pdf
  • Figure 2: plotstables/xp3_languages.ipynb & colab
  • Figure 3: plotstables/xp3_variants.pdf & drawings
  • Figure 4: plotstables/xp3_generalization_bar.pdf & colab
  • Figure 5: plotstables/lang_generalization & colab
  • Figure 6: plotstables/scale.pdf & colab
  • Figure 7: plotstables/validation.pdf & colab
  • Figure 8: plotstables/pretraining_sizes.pdf & colab
  • Figure 9: plotstables/english_task_generalization.pdf & colab
  • Figure 10: plotstables/task_generalization.pdf & colab
  • Figure 11: plotstables/roots_xp3_languages.pdf & colab requiring some of the files in plotstables/contamination
  • Figure 12: plotstables/examples/bloom_code_example.py & plotstables/examples/bloom_code_light.pdf & plotstables/examples/bloomz_code_light.pdf; The raw code files can be found here & here
  • Figure 13 - Figure 16: plotstables/examples/*.pdf & plotstables/examples/generations.drawio

Tables

Citation

@article{muennighoff2022crosslingual,
  title={Crosslingual generalization through multitask finetuning},
  author={Muennighoff, Niklas and Wang, Thomas and Sutawika, Lintang and Roberts, Adam and Biderman, Stella and Scao, Teven Le and Bari, M Saiful and Shen, Sheng and Yong, Zheng-Xin and Schoelkopf, Hailey and others},
  journal={arXiv preprint arXiv:2211.01786},
  year={2022}
}

More Repositories

1

petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Python
9,056
star
2

promptsource

Toolkit for creating, sharing and using natural language prompts.
Python
2,627
star
3

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Python
1,305
star
4

bigscience

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
Shell
971
star
5

t-zero

Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)
Python
456
star
6

biomedical

Tools for curating biomedical training data for large-scale language modeling
Python
452
star
7

data-preparation

Code used for sourcing and cleaning the BigScience ROOTS corpus
Jupyter Notebook
297
star
8

lam

Libraries, Archives and Museums (LAM)
79
star
9

data_tooling

Tools for managing datasets for governance and training.
HTML
75
star
10

multilingual-modeling

BLOOM+1: Adapting BLOOM model to support a new unseen language
Python
69
star
11

evaluation

Code and Data for Evaluation WG
Python
41
star
12

data_sourcing

This directory gathers the tools developed by the Data Sourcing Working Group
Python
31
star
13

metadata

Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.
Python
30
star
14

model_card

24
star
15

tokenization

Python
11
star
16

carbon-footprint

A repository for `codecarbon` logs.
Jupyter Notebook
10
star
17

bloom-dechonk

A repo for running model shrinking experiments
Python
10
star
18

historical_texts

BigScience working group on language models for historical texts
Jupyter Notebook
8
star
19

catalogue_data

Scripts to prepare catalogue data
Jupyter Notebook
8
star
20

pii_processing

PII Processing code to detect and remediate PII in BigScience datasets. Reference implementation for the PII Hackathon
Python
8
star
21

training_dynamics

5
star
22

bibliography

A list of BigScience publications
TeX
3
star
23

scaling-laws-tokenization

scaling-laws-tokenization
2
star
24

datasets_stats

Generate statistics over datasets used in the context of BS
Makefile
2
star
25

evaluation-robustness-consistency

Tools for evaluating model robustness and consistency
Python
2
star
26

interpretability-ideas

1
star
27

evaluation-results

Dump of results for bigscience.
Python
1
star