• Stars
    star
    286
  • Rank 144,727 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Collection of Pytorch lightning tutorial form as rich scripts automatically transformed to ipython notebooks.

PytorchLightning Tutorials

CI internal Build Status codecov Deploy Docs pre-commit.ci status

This is the Lightning Library - collection of Lightning related notebooks which are pulled back to the main repo as submodule and rendered inside the main documentations. The key features/highlights:

  • we keep the repo light-weighted - notebooks are stored in rich script format
  • all scripts/notebooks are tested to be fully executable
  • fully reproducible by saving runtime env. details

For more details read our blogpost - Best Practices for Publishing PyTorch Lightning Tutorial Notebooks

Adding/Editing notebooks

This repo in main branch contain only python scripts with markdown extensions, and notebooks are generated in special publication branch, so no raw notebooks are accepted as PR. On the other hand we highly recommend creating a notebooks and convert it script with jupytext as

jupytext --set-formats ipynb,py:percent my-notebook.ipynb

Contribution structure

The addition has to formed as new folder

  • the folder name is used for the future notebooks
  • single python scripts with converted notebooks (name does not matter)
  • metadata named .meta.yaml including following info:
    title: Sample notebooks
    author: [User](contact)
    created: YYYY-MM-DD
    updated: YYYY-MM-DD
    license: CC BY-SA
    # multi-line
    description: |
      This notebook will walk you through ...
    requirements:
      - package  # with version if needed
    # define supported - CPU|GPU|TPU
    accelerator:
      - CPU

Using datasets

It is quite common to use some public or competition's dataset for your example. We facilitate this via defining the data sources in the metafile. There are two basic options, download a file from web or pul Kaggle dataset:

datasets:
  web:
    - https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
  kaggle:
    - titanic

In both cases, the downloaded archive (Kaggle dataset is originally downloaded as zip file) is extracted to the default dataset folder under sub-folder with the same name as the downloaded file. To get path to this dataset folder, please use environment variable PATH_DATASETS, so in your script use:

import os

data_path = os.environ.get("PATH_DATASETS", "_datasets")
path_titanic = os.path.join(data_path, "titatnic")

Warning: some Kaggle datasets can be quite large and the process is - downloading and extracting, which means that particular runner needs to have double free space. For this reason, the CPU runner is limited to 3GB datasets.

Suggestions

  • For inserting images into text cells use MarkDown formatting, so we can insert inline images to the notebooks directly and drop eventual dependency on internet connection -> generated notebooks could be better shared offline
  • If your images need special sizes, use ![Cation](my-image.png){height="60px" width="240px"}
  • If your notebook is computational or any other resource (CPU/RAM) demanding use only GPU accelerator option in meta config

Known limitations

  • Nothing major at this moment

Meantime notes

On the back side of publishing workflow you can find in principle these three steps

# 1) convert script to notebooks
jupytext --set-formats ipynb,py:percent notebook.py

# 2) testing the created notebook
pytest -v notebook.ipynb  --nbval

# 3) generating notebooks outputs
papermill in-notebook.ipynb out-notebook.ipynb

More Repositories

1

pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
Python
28,208
star
2

litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Python
10,409
star
3

lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
Python
5,973
star
4

LitServe

Lightning-fast serving engine for any AI model of any size. Flexible. Easy. Enterprise-scale.
Python
2,278
star
5

torchmetrics

Machine learning metrics for distributed, scalable PyTorch applications.
Python
2,117
star
6

deep-learning-project-template

Pytorch Lightning code guideline for conferences
Python
1,236
star
7

lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Python
1,178
star
8

litdata

Transform datasets at scale. Optimize datasets for fast AI model training.
Python
347
star
9

dl-fundamentals

Deep Learning Fundamentals -- Code material and exercises
Jupyter Notebook
342
star
10

engineering-class

Lightning Bits: Engineering for Researchers repo
Python
131
star
11

utilities

Common Python utilities and GitHub Actions in Lightning Ecosystem
Python
50
star
12

lightning-ColossalAI

Large Scale Distributed Model Training with Colossal AI and Lightning AI
Python
50
star
13

ecosystem-ci

Automate issue discovery for your projects against Lightning nightly and releases.
Python
45
star
14

forked-pdb

Python pdb for multiple processes
Python
30
star
15

lightning-Habana

Lightning support for Intel Habana accelerators.
Python
25
star
16

lightning-Hivemind

Lightning Training strategy for HiveMind
Python
9
star
17

lightning-Graphcore

Python
7
star
18

Lightning-multinode-templates

Multinode templates for Pytorch Lightning
Python
7
star
19

LAI-E2E-ContinualAI-Emulator

Python
6
star
20

lightning-ui

Frontend for Lightning apps and home of the Design System
TypeScript
4
star
21

lightning-gan

Python
3
star
22

e2e-speed-benchmark-tests

Tests scripts which ensure that app startup times do not regress
Python
2
star
23

lightning-Horovod

Lightning Training strategy for Horovod
Python
2
star
24

LAI-lightning-template-jupyterlab-App

Python
1
star
25

cloud-training-workshop

Jupyter Notebook
1
star