📖 The Large Language Model Training Handbook
An open collection of methodologies to help with successful training of large language models.
This is technical material suitable for LLM training engineers and operators. That is the content here contains lots of scripts and copy-n-paste commands to enable you to quickly solve your problems.
If you are not interested in technical details but want more of a detailed overview and concepts please refer to the sister The Large Language Model Training Playbook instead.
note: The list of topics will expand over time - at the moment filling in only a subset
Model parallelism
Maximizing throughput
Tensor precision / Data types
Training hyper-parameters and model initializations
Instabilities
Debugging software and hardware failures
SLURM
Resources
License
The content of this site is distributed under Attribution-ShareAlike 4.0 International.
Unless specified otherwise the code in this repo is licensed under Apache License, Version 2.0.