• Stars
    star
    130
  • Rank 267,857 (Top 6 %)
  • Language
    R
  • License
    GNU Lesser Genera...
  • Created over 6 years ago
  • Updated 10 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Dataflow Programming for Machine Learning in R

mlr3pipelines

Package website: release | dev

Dataflow Programming for Machine Learning in R.

tic CRAN StackOverflow Mattermost

What is mlr3pipelines?

Watch our “WhyR 2020” Webinar Presentation on Youtube for an introduction! Find the slides here.

WhyR 2020 mlr3pipelines

mlr3pipelines is a dataflow programming toolkit for machine learning in R utilising the mlr3 package. Machine learning workflows can be written as directed “Graphs” that represent data flows between preprocessing, model fitting, and ensemble learning units in an expressive and intuitive language. Using methods from the mlr3tuning package, it is even possible to simultaneously optimize parameters of multiple processing units.

In principle, mlr3pipelines is about defining singular data and model manipulation steps as “PipeOps”:

pca        = po("pca")
filter     = po("filter", filter = mlr3filters::flt("variance"), filter.frac = 0.5)
learner_po = po("learner", learner = lrn("classif.rpart"))

These pipeops can then be combined together to define machine learning pipelines. These can be wrapped in a GraphLearner that behave like any other Learner in mlr3.

graph = pca %>>% filter %>>% learner_po
glrn = GraphLearner$new(graph)

This learner can be used for resampling, benchmarking, and even tuning.

resample(tsk("iris"), glrn, rsmp("cv"))
#> <ResampleResult> of 10 iterations
#> * Task: iris
#> * Learner: pca.variance.classif.rpart
#> * Warnings: 0 in 0 iterations
#> * Errors: 0 in 0 iterations

Feature Overview

Single computational steps can be represented as so-called PipeOps, which can then be connected with directed edges in a Graph. The scope of mlr3pipelines is still growing; currently supported features are:

  • Simple data manipulation and preprocessing operations, e.g. PCA, feature filtering
  • Task subsampling for speed and outcome class imbalance handling
  • mlr3 Learner operations for prediction and stacking
  • Simultaneous path branching (data going both ways)
  • Alternative path branching (data going one specific way, controlled by hyperparameters)
  • Ensemble methods and aggregation of predictions

Documentation

The easiest way to get started is reading some of the vignettes that are shipped with the package, which can also be viewed online:

Bugs, Questions, Feedback

mlr3pipelines is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an “issue” about it on the GitHub page!

In case of problems / bugs, it is often helpful if you provide a “minimum working example” that showcases the behaviour (but don’t worry about this if the bug is obvious).

Please understand that the resources of the project are limited: response may sometimes be delayed by a few days, and some feature suggestions may be rejected if they are deemed too tangential to the vision behind the project.

Citing mlr3pipelines

If you use mlr3pipelines, please cite our JMLR article:

@Article{mlr3pipelines,
  title = {{mlr3pipelines} - Flexible Machine Learning Pipelines in R},
  author = {Martin Binder and Florian Pfisterer and Michel Lang and Lennart Schneider and Lars Kotthoff and Bernd Bischl},
  journal = {Journal of Machine Learning Research},
  year = {2021},
  volume = {22},
  number = {184},
  pages = {1-7},
  url = {https://jmlr.org/papers/v22/21-0281.html},
}

Similar Projects

A predecessor to this package is the mlrCPO-package, which works with mlr 2.x. Other packages that provide, to varying degree, some preprocessing functionality or machine learning domain specific language, are the caret package and the related recipes project, and the dplyr package.

More Repositories

1

mlr

Machine Learning in R
R
1,628
star
2

mlr3

mlr3: Machine Learning in R - next generation
R
879
star
3

mlr3book

Online version of Bischl, B., Sonabend, R., Kotthoff, L., & Lang, M. (Eds.). (2024). "Applied Machine Learning Using mlr3 in R". CRC Press.
TeX
241
star
4

mlrMBO

Toolbox for Bayesian Optimization and Model-Based Optimization in R
R
185
star
5

mlr3proba

Probabilistic Learning for mlr3
R
111
star
6

mlr3learners

Recommended learners for mlr3
R
87
star
7

mlr3extralearners

Extra learners for use in mlr3.
R
76
star
8

mlr-outreach

HTML
64
star
9

parallelMap

R package to interface some popular parallelization backends with a unified interface
R
57
star
10

mlr3tuning

Hyperparameter optimization package of the mlr3 ecosystem
R
51
star
11

mlr3spatiotempcv

Spatiotemporal resampling methods for mlr3
TeX
47
star
12

mlr3verse

Meta-package for installing/updating mlr3* packages.
R
45
star
13

mlr3spatial

Spatial objects within the mlr3 ecosystem
HTML
42
star
14

mlr3viz

Visualizations for mlr3
R
41
star
15

mlrCPO

Composable Preprocessing Operators for MLR
R
37
star
16

mlr3keras

Deep learning for mlr3
R
34
star
17

mcboost

Multi-Calibration & Multi-Accuracy Boosting for R
R
28
star
18

paradox

ParamHelpers Next Generation
R
27
star
19

ParamHelpers

Helpers for parameters in black-box optimization, tuning and machine learning.
R
25
star
20

mlr3mbo

Flexible Bayesian Optimization in R
R
23
star
21

mlr3gallery

Case studies using mlr3
HTML
21
star
22

mlr3db

Data Backends to let mlr3 work transparently with (remote) data bases
R
21
star
23

mlr3cluster

Cluster analysis for mlr3
R
19
star
24

mlr3fselect

Feature selection package of the mlr3 ecosystem.
R
19
star
25

mlr3filters

Filter-based feature selection for mlr3
R
19
star
26

bbotk

Black-box optimization framework for R.
R
19
star
27

mlr3-learndrake

Template for using mlr3 with drake
HTML
18
star
28

mlr3hyperband

Successive Halving and Hyperband in the mlr3 ecosystem
R
18
star
29

mlr3temporal

Forecasting for mlr3
HTML
18
star
30

mlr3torch

Deep learning framework for the mlr3 ecosystem based on torch
R
16
star
31

user2020

Material for the useR2020 tutorial
14
star
32

miesmuschel

Flexible Mixed Integer Evolutionary Strategies
R
14
star
33

mlr3fairness

mlr3 extension for Fairness in Machine Learning
HTML
13
star
34

mlr3benchmark

Analysis and tools for benchmarking in mlr3 and beyond.
R
12
star
35

mlr3tuningspaces

Collection of search spaces for hyperparameter optimization in the mlr3 ecosystem
R
12
star
36

farff

a faster arff parser
R
11
star
37

mlr3measures

Performance measures used in mlr3
R
11
star
38

mlr3cheatsheets

Cheat Sheets for mlr3 and Friends
HTML
11
star
39

mlr3misc

Miscellaneous helper functions for mlr3
R
10
star
40

mlr3website

The mlr3 quarto website and accomanying R package.
R
8
star
41

mlr-extralearner

R
8
star
42

mlr3survival

Survival analysis for mlr3
R
7
star
43

mlr3learners.template

Learner from package {<package>} for mlr3
R
5
star
44

mlr3batchmark

Connector between mlr3 and batchtools
R
5
star
45

mlr3docker

Docker Image for mlr3
Dockerfile
5
star
46

mlr3ordinal

Ordinal Regression for mlr3
R
5
star
47

mlr3multioutput

Multiple Targets for mlr3
R
4
star
48

mlr3-targets

R
4
star
49

rush

Parallel and distributed computing in R.
R
4
star
50

styler.mlr

{styler} mlr style guide
R
4
star
51

mlr3oml

Connect mlr3 with OpenML
R
4
star
52

mlr3fda

Functional Data Analysis for mlr3
R
3
star
53

mlr-web

HTML
3
star
54

mlr3automl

R
2
star
55

mlr3pkgdowntemplate

pkgdown template package for mlr* packages
SCSS
2
star
56

mlr3data

Data sets used in the book, gallery, or in examples of mlr3.
R
2
star
57

mlr-org-website

HTML
1
star
58

mlrcranlog

mlr-org cranlogs
R
1
star
59

mlr3summary

R
1
star