• Stars
    star
    1,077
  • Rank 42,945 (Top 0.9 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created about 7 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A framework for data augmentation for 2D and 3D image classification and segmentation

batchgenerators by MIC@DKFZ

batchgenerators is a python package for data augmentation. It is developed jointly between the Division of Medical Image Computing at the German Cancer Research Center (DKFZ) and the Applied Computer Vision Lab of the Helmholtz Imaging Platform.

It is not (yet) perfect, but we feel it is good enough to be shared with the community. If you encounter bug, feel free to contact us or open a github issue.

If you use it please cite the following work:

Isensee Fabian, Jäger Paul, Wasserthal Jakob, Zimmerer David, Petersen Jens, Kohl Simon, 
Schock Justus, Klein Andre, Roß Tobias, Wirkert Sebastian, Neher Peter, Dinkelacker Stefan, 
Köhler Gregor, Maier-Hein Klaus (2020). batchgenerators - a python framework for data 
augmentation. doi:10.5281/zenodo.3632567

Build Status

Supported Augmentations

We supports a variety of augmentations, all of which are compatible with 2D and 3D input data! (This is something that was missing in most other frameworks).

  • Spatial Augmentations
    • mirroring
    • channel translation (to simulate registration errors)
    • elastic deformations
    • rotations
    • scaling
    • resampling
  • Color Augmentations
    • brightness (additive, multiplivative)
    • contrast
    • gamma (like gamma correction in photo editing)
  • Noise Augmentations
    • Gaussian Noise
    • Rician Noise
    • ...will be expanded in future commits
  • Cropping
    • random crop
    • center crop
    • padding

Note: Stack transforms by using batchgenerators.transforms.abstract_transforms.Compose. Finish it up by plugging the composed transform into our multithreader: batchgenerators.dataloading.multi_threaded_augmenter.MultiThreadedAugmenter

How to use it

The working principle is simple: Derive from DataLoaderBase class, reimplement generate_train_batch member function and use it to stack your augmentations! For simple example see batchgenerators/examples/example_ipynb.ipynb

A heavily commented example for using SlimDataLoaderBase and MultithreadedAugmentor is available at: batchgenerators/examples/multithreaded_with_batches.ipynb. It gives an idea of the interplay between the SlimDataLoaderBase and the MultiThreadedAugmentor. The example uses the MultiThreadedAugmentor for loading and augmentation on mutiple processes, while covering the entire dataset only once per epoch (basically sampling without replacement).

We also now have an extensive example for BraTS2017/2018 with both 2D and 3D DataLoader and augmentations: batchgenerators/examples/brats2017/

There are also CIFAR10/100 datasets and DataLoader available at batchgenerators/datasets/cifar.py

Data Structure

The data structure that is used internally (and with which you have to comply when implementing generate_train_batch) is kept simple as well: It is just a regular python dictionary! We did this to allow maximum flexibility in the kind of data that is passed along through the pipeline. The dictionary must have a 'data' key:value pair. It optionally can handle a 'seg' key:vlaue pair to hold a segmentation. If a 'seg' key:value pair is present all spatial transformations will also be applied to the segmentation! A part from 'data' and 'seg' you are free to do whatever you want (your image classification/regression target for example). All key:value pairs other than 'data' and 'seg' will be passed through the pipeline unmodified.

'data' value must have shape (b, c, x, y) for 2D or shape (b, c, x, y, z) for 3D! 'seg' value must have shape (b, c, x, y) for 2D or shape (b, c, x, y, z) for 3D! Color channel may be used here to allow for several segmentation maps. If you have only one segmentation, make sure to have shape (b, 1, x, y (, z))

How to install locally

Install batchgenerators

pip install --upgrade batchgenerators

Import as follows

from batchgenerators.transforms.color_transforms import ContrastAugmentationTransform

Windows Support is very experimental!

Batchgenerators makes heavy use of python multiprocessing and python multiprocessing on windows is different from linux. To prevent the workers from freezing in windows, you have to guard your code with if __name__ == '__main__' and use multiprocessing's freeze_support. The executed script may then look like this:

# some imports and functions here

def main():
    # do some stuff

if __name__ == '__main__':
    from multiprocessing import freeze_support
    freeze_support()
    main()

This is not required on Linux.

Release Notes

(only highlights, not an exhaustive list)

  • 0.23:

    • fixed the import mess. __init__.py files are now empty. This is a breaking change for some users! Please adapt your imports :-)
    • local_transforms are now a thing, check them out!
    • resize_segmentation now uses 'edge' mode and no longer takes a cval argument. Resizing segmentations with constant border values (previous default) can cause problems and should not be done.
  • 0.20.0:

    • fixed an issue with MultiThreadedAugmenter not terminating properly after KeyboardInterrupt; Fixed an error with the number and order of samples being returned when pin_memory=True; Improved performance by always hiding process-process communication bottleneck through threading
  • 0.19.5:

    • fixed OMP_NUM_THREADS issue by using threadpoolctl package; dropped python 2 support (threadpoolctl is not available for python 2)
  • 0.19:

    • There is now a complete example for BraTS2017/8 available for both 2D and 3D. Use this if you would like to get some insights on how I (Fabian) do my experiments
    • Windows is now supported! Thanks @justusschock for your support!
    • new, simple parametrization of elastic deformation. Use SpatialTransform_2!
    • CIFAR10/100 DataLoader are now available for your convenience
    • a bug in MultiThreadedAugmenter that could interfere with reproducibility is now fixed
  • 0.18:

    • all augmentations (there are some exceptions though) are implemented on a per-sample basis. This should make it easier to use the augmentations outside of the Transforms of batchgenerators
    • applicable Transforms now have a keyword p_per_sample with which the user can specify a probability with which this transform is applied to a sample. Before, this was handled by RndTransform and applied to the whole batch (so either all samples were augmented or none). Now this decision is made on a per-sample basis and increases variability by a lot.
    • following the previous point, RndTransform is now deprecated
    • AlternativeMultiThreadedAugmenter is now deprecated as well (no need to have this anymore)
    • pytorch users can now transform numpy arrays to pytorch tensors within batchgenerators (NumpyToTensor). For some reason, inter-process communication is faster with tensors (~factor 4), so this is recommended!
    • if numpy arrays were converted to pytorch tensors, MultithreadedAugmenter now allows to pin the memory as well (pin_memory=True). This will happen in a background thread (inspired by pytorch DataLoader). pinned memory can be copied to the GPU much faster. My (Fabian) classification experiment with Resnet50 got a speed boost of 12% from just that.

batchgenerators is developed by the Division of Medical Image Computing of the German Cancer Research Center (DKFZ) and the Applied Computer Vision Lab (ACVL) of the Helmholtz Imaging Platform.

More Repositories

1

nnUNet

Python
5,539
star
2

medicaldetectiontoolkit

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.
Python
1,287
star
3

nnDetection

nnDetection is a self-configuring framework for 3D (volumetric) medical object detection which can be applied to new data sets without manual intervention. It includes guides for 12 data sets that were used to develop and evaluate the performance of the proposed method.
Python
536
star
4

MedNeXt

[MICCAI 2023] MedNeXt is a fully ConvNeXt architecture for 3D medical image segmentation.
Python
313
star
5

HD-BET

MRI brain extraction tool
Python
262
star
6

TractSeg

Automatic White Matter Bundle Segmentation
Python
222
star
7

napari-sam

Python
220
star
8

trixi

Manage your machine learning experiments with trixi - modular, reproducible, high fashion. An experiment infrastructure optimized for PyTorch, but flexible enough to work for your framework and your tastes.
Python
219
star
9

basic_unet_example

An example project of how to use a U-Net for segmentation on medical images with PyTorch.
Python
137
star
10

MITK-Diffusion

MITK Diffusion - Official part of the Medical Imaging Interaction Toolkit
C++
76
star
11

LIDC-IDRI-processing

Scripts for the preprocessing of LIDC-IDRI data
Python
75
star
12

BraTS2017

Python
74
star
13

BodyPartRegression

Python
62
star
14

dynamic-network-architectures

Python
61
star
15

mood

Repository for the Medical Out-of-Distribution Analysis Challenge.
Python
60
star
16

ACDC2017

Python
54
star
17

niicat

This is a tool to quickly preview nifti images on the terminal
Python
51
star
18

RegRCNN

This repository holds the code framework used in the paper Reg R-CNN: Lesion Detection and Grading under Noisy Labels. It is a fork of MIC-DKFZ/medicaldetectiontoolkit with regression capabilites.
Python
51
star
19

Skeleton-Recall

Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures
Python
47
star
20

MultiTalent

Implemention of the Paper "MultiTalent: A Multi-Dataset Approach to Medical Image Segmentation"
Python
46
star
21

image_classification

🎯 Deep Learning Framework for Image Classification & Regression in Pytorch for Fast Experiments
Python
42
star
22

RTTB

Swiss army knife for radiotherapy analysis
C++
26
star
23

vae-anomaly-experiments

Python
26
star
24

Hyppopy

Hyppopy is a python toolbox for blackbox optimization. It's purpose is to offer a unified and easy to use interface to a collection of solver libraries.
Python
25
star
25

patchly

A grid sampler for larger-than-memory N-dimensional images
Python
23
star
26

semantic_segmentation

Python
23
star
27

probabilistic_unet

A U-Net combined with a variational auto-encoder that is able to learn conditional distributions over semantic segmentations.
Jupyter Notebook
22
star
28

image-time-series

Code for deep learning-based glioma/tumor growth models
Python
21
star
29

anatomy_informed_DA

Python
18
star
30

batchgeneratorsv2

Python
13
star
31

foundation-models-for-cbmir

Python
12
star
32

MedVol

Python
12
star
33

ParticleSeg3D

Python
10
star
34

generalized_yolov5

An extension of YOLOv5 to non-natural images together with 5-Fold Cross-Validation
Python
8
star
35

radtract

RadTract: enhanced tractometry with radiomics-based imaging biomarkers for improved predictive modelling.
Python
8
star
36

gpconvcnp

Code for "GP-ConvCNP: Better Generalization for Convolutional Conditional Neural Processes on Time Series Data"
Python
8
star
37

cmdint

CmdInterface enables detailed logging of command line and python experiments in a very lightweight manner (coding wise). It wraps your command line or python function calls in a few lines of python code and logs everything you might need to reproduce the experiment later on or to simply check what you did a couple of years ago.
Python
8
star
38

acvl_utils

Python
7
star
39

MurineAirwaySegmentation

Python
7
star
40

cOOpD

Python
7
star
41

PROUNET

Prostate U-net
Python
7
star
42

napari-nifti

Python
4
star
43

agent-sam

Segment Anything model wrapper used by the Medical Imaging Interaction Toolkit (MITK).
Python
4
star
44

OverthINKingSegmenter

Python
3
star
45

perovskite-xai

Python
3
star
46

help_a_hematologist_out_challenge

Python
2
star
47

AGGC2022

Automated Gleason Grading on WSI
Python
2
star
48

tqdmp

Multiprocessing with tqdm progressbars!
Python
2
star
49

MatchPoint

MatchPoint is a translational image registration framework written in C++. It offers a standardized interface to utilize several registration algorithm resources (like ITK, plastimatch, elastix) easily in a host application.
C++
2
star
50

napari-mzarr

Python
2
star
51

n2c2-challenge-2019

Jupyter Notebook
2
star
52

mzarr

Python
1
star
53

imlh-icml-detection-tools

Python
1
star
54

napari-blosc2

Python
1
star
55

BraTPRO

Python
1
star