• Stars
    star
    162
  • Rank 224,748 (Top 5 %)
  • Language
    Python
  • Created over 2 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

🐸 - A general purpose model trainer, as flexible as it gets

πŸ‘Ÿ Trainer

An opinionated general purpose model trainer on PyTorch with a simple code base.

Installation

From Github:

git clone https://github.com/coqui-ai/Trainer
cd Trainer
make install

From PyPI:

pip install trainer

Prefer installing from Github as it is more stable.

Implementing a model

Subclass and overload the functions in the TrainerModel()

Training a model with auto-optimization

See the MNIST example.

Training a model with advanced optimization

With πŸ‘Ÿ you can define the whole optimization cycle as you want as the in GAN example below. It enables more under-the-hood control and flexibility for more advanced training loops.

You just have to use the scaled_backward() function to handle mixed precision training.

...

def optimize(self, batch, trainer):
    imgs, _ = batch

    # sample noise
    z = torch.randn(imgs.shape[0], 100)
    z = z.type_as(imgs)

    # train discriminator
    imgs_gen = self.generator(z)
    logits = self.discriminator(imgs_gen.detach())
    fake = torch.zeros(imgs.size(0), 1)
    fake = fake.type_as(imgs)
    loss_fake = trainer.criterion(logits, fake)

    valid = torch.ones(imgs.size(0), 1)
    valid = valid.type_as(imgs)
    logits = self.discriminator(imgs)
    loss_real = trainer.criterion(logits, valid)
    loss_disc = (loss_real + loss_fake) / 2

    # step dicriminator
    _, _ = self.scaled_backward(loss_disc, None, trainer, trainer.optimizer[0])

    if trainer.total_steps_done % trainer.grad_accum_steps == 0:
        trainer.optimizer[0].step()
        trainer.optimizer[0].zero_grad()

    # train generator
    imgs_gen = self.generator(z)

    valid = torch.ones(imgs.size(0), 1)
    valid = valid.type_as(imgs)

    logits = self.discriminator(imgs_gen)
    loss_gen = trainer.criterion(logits, valid)

    # step generator
    _, _ = self.scaled_backward(loss_gen, None, trainer, trainer.optimizer[1])
    if trainer.total_steps_done % trainer.grad_accum_steps == 0:
        trainer.optimizer[1].step()
        trainer.optimizer[1].zero_grad()
    return {"model_outputs": logits}, {"loss_gen": loss_gen, "loss_disc": loss_disc}

...

See the GAN training example with Gradient Accumulation

Training with Batch Size Finder

see the test script here for training with batch size finder.

The batch size finder starts at a default BS(defaults to 2048 but can also be user defined) and searches for the largest batch size that can fit on your hardware. you should expect for it to run multiple trainings until it finds it. to use it instead of calling trainer.fit() youll call trainer.fit_with_largest_batch_size(starting_batch_size=2048) with starting_batch_size being the batch the size you want to start the search with. very useful if you are wanting to use as much gpu mem as possible.

Training with DDP

$ python -m trainer.distribute --script path/to/your/train.py --gpus "0,1"

We don't use .spawn() to initiate multi-gpu training since it causes certain limitations.

  • Everything must the pickable.
  • .spawn() trains the model in subprocesses and the model in the main process is not updated.
  • DataLoader with N processes gets really slow when the N is large.

Training with Accelerate

Setting use_accelerate in TrainingArgs to True will enable training with Accelerate.

You can also use it for multi-gpu or distributed training.

CUDA_VISIBLE_DEVICES="0,1,2" accelerate launch --multi_gpu --num_processes 3 train_recipe_autoregressive_prompt.py

See the Accelerate docs.

Adding a callback

πŸ‘Ÿ Supports callbacks to customize your runs. You can either set callbacks in your model implementations or give them explicitly to the Trainer.

Please check trainer.utils.callbacks to see available callbacks.

Here is how you provide an explicit call back to a πŸ‘ŸTrainer object for weight reinitialization.

def my_callback(trainer):
    print(" > My callback was called.")

trainer = Trainer(..., callbacks={"on_init_end": my_callback})
trainer.fit()

Profiling example

  • Create the torch profiler as you like and pass it to the trainer.
    import torch
    profiler = torch.profiler.profile(
        activities=[
            torch.profiler.ProfilerActivity.CPU,
            torch.profiler.ProfilerActivity.CUDA,
        ],
        schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2),
        on_trace_ready=torch.profiler.tensorboard_trace_handler("./profiler/"),
        record_shapes=True,
        profile_memory=True,
        with_stack=True,
    )
    prof = trainer.profile_fit(profiler, epochs=1, small_run=64)
    then run Tensorboard
  • Run the tensorboard.
    tensorboard --logdir="./profiler/"

Supported Experiment Loggers

To add a new logger, you must subclass BaseDashboardLogger and overload its functions.

Anonymized Telemetry

We constantly seek to improve 🐸 for the community. To understand the community's needs better and address them accordingly, we collect stripped-down anonymized usage stats when you run the trainer.

Of course, if you don't want, you can opt out by setting the environment variable TRAINER_TELEMETRY=0.

More Repositories

1

TTS

πŸΈπŸ’¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Python
28,027
star
2

STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
C++
2,131
star
3

open-speech-corpora

πŸ’Ž A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
1,174
star
4

TTS-papers

🐸 collection of TTS papers
551
star
5

xtts-streaming-server

Python
200
star
6

STT-examples

🐸STT integration examples
Python
113
star
7

STT-models

Open models for Coqui STT
104
star
8

coqpit

Simple but maybe too simple config management through python data classes. We use it for machine learning.
Python
90
star
9

TTS-recipes

🐸TTS recipes for different datasets
Shell
84
star
10

inference-engine

Coqui Inference Engine
C
37
star
11

snakepit

🐍 Coqui's machine learning job scheduler
JavaScript
31
star
12

coqui-voice-pack

🐸Coqui Dialogue Audio Pack contains more than 2000 audio files of synthetic human voices over dialogue created specifically for video games. The pack includes both male and female voices from >30 different voices, and all of the files can be used for commercial purposes (royalty free).
31
star
13

stt-model-manager

Coqui STT Model Manager - install, manage and try out Coqui STT models from the Model Zoo
JavaScript
20
star
14

open-bible-scripts

scipts for working with open.bible data
Shell
19
star
15

data-checker

🫠 check your data, before you wreck your model
Python
15
star
16

coqui-py

Coqui CLI
Python
8
star
17

snakepit-client

🐍 Client for the Coqui Snakepit machine learning job scheduler
JavaScript
2
star