• Stars
    star
    354
  • Rank 120,042 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Helps you write algorithms in PyTorch that adapt to the available (CUDA) memory

Torch Memory-adaptive Algorithms (TOMA)

Build Status codecov PyPI

A collection of helpers to make it easier to write code that adapts to the available (CUDA) memory. Specifically, it retries code that fails due to OOM (out-of-memory) conditions and lowers batchsizes automatically.

To avoid failing over repeatedly, a simple cache is implemented that memorizes that last successful batchsize given the call and available free memory.

Installation

To install using pip, use:

pip install toma

To run the tests, use:

python setup.py test

Example

from toma import toma

@toma.batch(initial_batchsize=512)
def run_inference(batchsize, model, dataset):
    # ...

run_inference(batchsize, model, dataset)

This will try to execute train_model with batchsize=512. If a memory error is thrown, it will decrease the batchsize until it succeeds.

Note: This batch size can be different from the batch size used to accumulate gradients by only calling optimizer.step() every so often.

To make it easier to loop over a ranges, there are also toma.range and toma.chunked:

@toma.chunked(initial_step=512)
def compute_result(out: torch.Tensor, start: int, end: int):
    # ...

result = torch.empty((8192, ...))
compute_result(result)

This will chunk result and pass the chunks to compute_result one by one. Again, if it fails due to OOM, the step will be halfed etc. Compared to toma.batch, this allows for reduction of the step size while looping over the chunks. This can save computation.

@toma.range(initial_step=32)
def reduce_data(start: int, end: int, out: torch.Tensor, dataA: torch.Tensor, dataB: torch.Tensor):
    # ...

reduce_data(0, 1024, result, dataA, dataB)

toma.range iterates over range(start, end, step) with step=initial_step. If it fails due to OOM, it will lower the step size and continue.

toma.execute

To make it easier to just execute a block without having to extract it into a function and then call it, we also provide toma.execute.batch, toma.execute.range and toma.execute.chunked, which are somewhat unorthodox and call the function that is passed to them right away. (Mainly because there is no support for anonymous functions in Python beyond lambda expressions.)

def function():
    # ... other code

    @toma.execute.chunked(batched_data, initial_step=128):
    def compute(chunk, start, end):
        # ...

Cache

There are 3 available cache types at the moment. They can be changed by either setting toma.DEFAULT_CACHE_TYPE or by passing cache_type to the calls.

For example:

@toma.batch(initial_batchsize=512, cache_type=toma.GlobalBatchsizeCache)

or

toma.explicit.batch(..., toma_cache_type=toma.GlobalBatchsizeCache)

StacktraceMemoryBatchsizeCache: Stacktrace & Available Memory (the default)

This memorizes the successful batchsizes for a given call trace and available memory at that point. For most machine learning code, this is sufficient to remember the right batchsize without having to look at the actual arguments and understanding more of the semantics.

The implicit assumption is that after a few iterations a stable state will be reached in regards to GPU and CPU memory usage.

To limit the CPU memory of the process, toma provides:

import toma.cpu_memory

toma.cpu_memory.set_cpu_memory_limit(8)

This can also be useful to avoid accidental swap thrashing.

GlobalBatchsizeCache: Global per Function

This reuses the last successful batchsize independently from where the call happened.

NoBatchsizeCache: No Caching

Always starts with the suggested batchsize and fails over if necessary.

Benchmark/Overhead

There is overhead involved. Toma should only be used with otherwise time/memory-consuming operations.

---------------------------------------------------------------------------------- benchmark: 5 tests ----------------------------------------------------------------------------------
Name (time in ms)          Min                Max               Mean            StdDev             Median                IQR            Outliers       OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_native             2.1455 (1.0)       3.7733 (1.0)       2.3037 (1.0)      0.1103 (1.0)       2.2935 (1.0)       0.1302 (1.0)          81;5  434.0822 (1.0)         448           1
test_simple            17.4657 (8.14)     27.0049 (7.16)     21.0453 (9.14)     2.6233 (23.79)    20.4881 (8.93)      3.4384 (26.42)        13;0   47.5165 (0.11)         39           1
test_toma_no_cache     31.4380 (14.65)    40.8567 (10.83)    33.2749 (14.44)    2.2530 (20.43)    32.2698 (14.07)     2.8210 (21.67)         4;1   30.0527 (0.07)         25           1
test_explicit          33.0759 (15.42)    52.1866 (13.83)    39.6956 (17.23)    6.9620 (63.14)    38.4929 (16.78)    11.2344 (86.31)         4;0   25.1917 (0.06)         20           1
test_toma              36.9633 (17.23)    57.0220 (15.11)    43.5201 (18.89)    6.7318 (61.05)    41.6034 (18.14)     7.2173 (55.45)         2;2   22.9779 (0.05)         13           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Thanks

Thanks to @y0ast for feedback and discussion.

More Repositories

1

tfpyth

Putting TensorFlow back in PyTorch, back in TensorFlow (differentiable TensorFlow PyTorch adapters).
Python
641
star
2

llm-strategy

Directly Connecting Python to LLMs via Strongly-Typed Functions, Dataclasses, Interfaces & Generic Types
Python
376
star
3

BatchBALD

Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning.
Python
219
star
4

dart_repl

Proof of concept REPL shell for Dart
Dart
81
star
5

batchbald_redux

Reusable BatchBALD implementation
Jupyter Notebook
71
star
6

mdp

Make it easy to specify simple MDPs that are compatible with the OpenAI Gym.
Python
37
star
7

mnist_by_zip

Compression algorithms (like the well-known zip file compression) can be used for machine learning purposes, specifically for classifying hand-written digits (MNIST)
Jupyter Notebook
35
star
8

llmtracer

Trace LLM calls (and others) and visualize them in WandB, as interactive SVG or using a streaming local webapp
Python
12
star
9

player_of_jeopardy

ChatGPT can solve Jeopardy! clues really well!
Python
10
star
10

chatplayground

Chat Playground for LLMs
Python
9
star
11

ddu_dirty_mnist

Dirty-MNIST dataset introduced in "Deterministic Neural Networks with Inductive Biases Capture Epistemic and Aleatoric Uncertainty" (https://arxiv.org/abs/2102.11582)
Jupyter Notebook
7
star
12

pbt

Jupyter notebooks to play around with population based training, as described in https://arxiv.org/abs/1711.09846
Jupyter Notebook
7
star
13

blackboard-pagi

Python
7
star
14

2302.08981

Jupyter Notebook
5
star
15

batch_pong_poc

Instead of running one environment at a time or one per thread, run everything in batch using numpy on a single core.
Jupyter Notebook
4
star
16

hello-slurm

Shell
4
star
17

pytorch_datadiet

Python
3
star
18

laaos

Logs as append-only source.
Python
3
star
19

implicit_lambda

This package adds support for implicit lambdas, so you can write `map(_ + 5, a_list)` instead of `map(lambda x: x + 5, a_list)`.
Python
3
star
20

dlb_chapter2

TeX
1
star
21

WML

Whitespace Markup Language
C++
1
star
22

2020_ebm_presentation

A presentation about EBMs and Hopfield networks @ OATML
HTML
1
star
23

2202.01851

Repository for 'A Note on "Assessing Generalization of SGD via Disagreement"'
Jupyter Notebook
1
star
24

algo_fairness

Jupyter Notebook
1
star
25

2208.00549

Unifying Approaches in Data Subset Selection - Experiments
Jupyter Notebook
1
star