bigscience
Research workshop on large language models - The Summer of Language Models 21
At the moment we have 2 code repos:
- https://github.com/bigscience-workshop/Megatron-DeepSpeed - this is our flagship code base
- https://github.com/bigscience-workshop/bigscience - (this repo) for everything else - docs, experiments, etc.
Currently, the most active segments of this repo are:
- JZ - Lots of information about our work environment which helps evaluate, plan and get things done
- Experiments - many experiments are being done. Documentation, result tables, scripts and logs are all there
- Datasets info
- Train - all the information about the current trainings (see below for the most important ones)
We have READMEs for specific aspects, such as:
Trainings
While we keep detailed chronicles of experiments and findings for some of the main trainings, here is a doc that contains a summary of the most important findings: Lessons learned
Train 1 - 13B - unmodified Megatron gpt2 - baseline
- the full spec and discussions
- the training script
- checkpoints and logs:
- chronicles
You can watch the training logs live by running this tail -f
like script over remote log file that gets synced to the hub once an hour:
perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \
print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \
https://huggingface.co/bigscience/tr1-13B-logs/resolve/main/main_log.txt
Train 3
Architecture and scaling baseline runs: no fancy tricks, just GPT2. Here are links to the respective tensorboards:
Size | 1B3 | 760M | 350M | 125M |
---|---|---|---|---|
C4 + low warmup | a | b | c | |
OSCAR + low warmup | f | |||
C4 + high warmup | e | |||
OSCAR + high warmup | d (current baseline) | g | h | i |
Pile + high warmup | m | j | k | l |
Train 8
104B - unmodified Megatron gpt2 - with extra-wide hidden size to learn how to deal with training instabilities
- the full spec and discussions
- the training script
- checkpoints and logs:
- chronicles
You can watch the training logs live by running this tail -f
like script over remote log file that gets synced to the hub once an hour:
perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \
print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \
https://cdn-lfs.huggingface.co/bigscience/tr8-104B-logs/b2cc478d5ae7c9ec937ea2db1d2fe09de593fa2ec38c171d6cc5dca094cd79f9
Train 11
This is the current main training
tr11-176B-ml
- the full spec and discussions
- the training script
- checkpoints and logs:
- chronicles-prequel
- chronicles
You can watch the training logs live by running this tail -f
like script over remote log file that gets synced to the hub once an hour:
perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -LsI $u]=~/2 200.*?content-length: (\d+)/s; \
print qx[curl -Lsr $b-$e $u] if $e>$b; $b=$e; sleep 300}' \
https://huggingface.co/bigscience/tr11-176B-ml-logs/resolve/main/logs/main/main_log.txt