mlr3proba
Package website: release
Probabilistic Supervised Learning for mlr3.
What is mlr3proba ?
mlr3proba is a machine learning toolkit for making probabilistic predictions within the mlr3 ecosystem. It currently supports the following tasks:
- Probabilistic supervised regression - Supervised regression with a predictive distribution as the return type.
- Predictive survival analysis - Survival analysis where individual predictive hazards can be queried. This is equivalent to probabilistic supervised regression with censored observations.
- Unconditional distribution estimation, where the distribution is returned. Sub-cases are density estimation and unconditional survival estimation.
Key features of mlr3proba are
- A unified fit/predict model interface to any probabilistic predictive model (frequentist, Bayesian, or other)
- Pipeline/model composition
- Task reduction strategies
- Domain-agnostic evaluation workflows using task specific algorithmic performance measures.
mlr3proba makes use of the distr6 probability distribution interface as its probabilistic predictive return type.
Feature Overview
The current mlr3proba release focuses on survival analysis, and contains:
- Task frameworks for survival analysis (
TaskSurv
) - A comprehensive selection of predictive survival learners (mostly via mlr3extralearners)
- A comprehensive selection of performance measures for predictive survival learners, with respect to prognostic index (continuous rank) prediction, and probabilistic (distribution) prediction
- PipeOps integrated with mlr3pipelines, for basic pipeline building, and reduction/composition strategies using linear predictors and baseline hazards.
Roadmap
The vision of mlr3proba is to provide comprehensive machine learning functionality to the mlr3 ecosystem for continuous probabilistic return types.
The lifecycle of the survival task and features are considered
maturing
and any major changes are unlikely.
The density and probabilistic supervised regression tasks are currently in the early stages of development. Task frameworks have been drawn up, but may not be stable; learners need to be interfaced, and contributions are very welcome (see issues).
Installation
mlr3proba
is not on CRAN and is unlikely to be reuploaded (see
here for
reasons). As such you must install with one of the following methods:
Install from r-universe:
options(repos=c(
mlrorg = 'https://mlr-org.r-universe.dev',
raphaels1 = 'https://raphaels1.r-universe.dev',
CRAN = 'https://cloud.r-project.org'
))
install.packages("mlr3proba")
or
install.packages("mlr3proba", repos = "https://mlr-org.r-universe.dev")
Or for easier installation going forward:
- Run
usethis::edit_r_environ()
then in the file that opened add or editoptions
to look something like
options(repos = c(
raphaels1 = "https://raphaels1.r-universe.dev",
mlrorg = "https://mlr-org.r-universe.dev",
CRAN = 'https://cloud.r-project.org'
))
- Save and close the file, restart your R session
- Run
install.packages("mlr3proba")
as usual
Install from GitHub:
remotes::install_github("mlr-org/mlr3proba")
Learners
Core learners are implemented in mlr3proba, recommended common learners are implemented in mlr3learners, and many more are implemented in mlr3extralearners. Use the interactive search table to search for available survival learners and see the learner status page for their live status.
Measures
For density estimation only the log-loss is currently implemented, for survival analysis, see full list here. Some commonly used measures are the following:
ID | Measure | Package | Type |
---|---|---|---|
surv.dcalib | D-Calibration | mlr3proba | Calibration |
surv.cindex | Concordance Index | mlr3proba | Discrimination |
surv.uno_auc | Unoβs AUC | survAUC | Discrimination |
surv.graf | Integrated Brier Score | mlr3proba | Scoring Rule |
surv.rcll | Right-Censored Log loss | mlr3proba | Scoring Rule |
surv.intlogloss | Integrated Log Loss | mlr3proba | Scoring Rule |
Bugs, Questions, Feedback
mlr3proba is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an βissueβ about it on the GitHub page!
In case of problems / bugs, it is often helpful if you provide a βminimum working exampleβ that showcases the behavior (but donβt worry about this if the bug is obvious).
Similar Projects
Predecessors to this package are previous instances of survival modelling in mlr. The skpro package in the python/scikit-learn ecosystem follows a similar interface for probabilistic supervised learning and is an architectural predecessor. Several packages exist which allow probabilistic predictive modelling with a Bayesian model specific general interface, such as rjags and stan. For implementation of a few survival models and measures, a central package is survival. There does not appear to be a package that provides an architectural framework for distribution/density estimation, see this list for a review of density estimation packages in R.
Acknowledgements
Several people contributed to the building of mlr3proba
. Firstly,
thanks to Michel Lang for writing mlr3survival
. Several learners and
measures implemented in mlr3proba
, as well as the prediction, task,
and measure surv objects, were written initially in mlr3survival
before being absorbed into mlr3proba
. Secondly thanks to Franz Kiraly
for major contributions towards the design of the proba-specific parts
of the package, including compositors and predict types. Also for
mathematical contributions towards the scoring rules implemented in the
package. Finally thanks to Bernd Bischl and the rest of the mlr core
team for building mlr3
and for many conversations about the design of
mlr3proba
.
Citing mlr3proba
If you use mlr3proba, please cite our Bioinformatics article:
@Article{,
title = {mlr3proba: An R Package for Machine Learning in Survival Analysis},
author = {Raphael Sonabend and Franz J KirΓ‘ly and Andreas Bender and Bernd Bischl and Michel Lang},
journal = {Bioinformatics},
month = {02},
year = {2021},
doi = {10.1093/bioinformatics/btab039},
issn = {1367-4803},
}