• Stars
    star
    124
  • Rank 288,207 (Top 6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An extension of LightGBM to probabilistic modelling and prediction

LightGBMLSS - An extension of LightGBM to probabilistic forecasting

We propose a new framework of LightGBM that predicts the entire conditional distribution of a univariate response variable. In particular, LightGBMLSS models all moments of a parametric distribution, i.e., mean, location, scale and shape (LSS), instead of the conditional mean only. Choosing from a wide range of continuous, discrete, and mixed discrete-continuous distribution, modelling and predicting the entire conditional distribution greatly enhances the flexibility of LightGBM, as it allows to create probabilistic forecasts from which prediction intervals and quantiles of interest can be derived.

Features

✅ Estimation of all distributional parameters.
✅ Normalizing Flows allow modelling of complex and multi-modal distributions.
✅ Automatic derivation of Gradients and Hessian of all distributional parameters using PyTorch.
✅ Automated hyper-parameter search, including pruning, is done via Optuna.
✅ The output of LightGBMLSS is explained using SHapley Additive exPlanations.
✅ LightGBMLSS provides full compatibility with all the features and functionality of LightGBM.
✅ LightGBMLSS is available in Python.

News

💥 [2023-07-20] Release of v0.3.0 introduces Normalizing Flows. See the release notes for an overview.
💥 [2023-06-22] Release of v0.2.2. See the release notes for an overview.
💥 [2023-06-15] LightGBMLSS now supports Zero-Inflated and Zero-Adjusted Distributions.
💥 [2023-05-26] Release of v0.2.1. See the release notes for an overview.
💥 [2023-05-23] Release of v0.2.0. See the release notes for an overview.
💥 [2022-01-05] LightGBMLSS now supports estimating the full predictive distribution via Expectile Regression.
💥 [2022-01-05] LightGBMLSS now supports automatic derivation of Gradients and Hessians.
💥 [2022-01-04] LightGBMLSS is initialized with suitable starting values to improve convergence of estimation.
💥 [2022-01-04] LightGBMLSS v0.1.0 is released!

Installation

To install LightGBMLSS please first run

pip install git+https://github.com/StatMixedML/LightGBMLSS.git

Then, to install the shap-dependency, run

pip install git+https://github.com/dsgibbons/shap.git

Available Distributions

LightGBMLSS currently supports the following PyTorch distributions.

Distribution Usage Type Support Number of Parameters
Beta Beta() Continuous
(Univariate)
$y \in (0, 1)$ 2
Cauchy Cauchy() Continuous
(Univariate)
$y \in (-\infty,\infty)$ 2
Expectile Expectile() Continuous
(Univariate)
$y \in (-\infty,\infty)$ Number of expectiles
Gamma Gamma() Continuous
(Univariate)
$y \in (0, \infty)$ 2
Gaussian Gaussian() Continuous
(Univariate)
$y \in (-\infty,\infty)$ 2
Gumbel Gumbel() Continuous
(Univariate)
$y \in (-\infty,\infty)$ 2
Laplace Laplace() Continuous
(Univariate)
$y \in (-\infty,\infty)$ 2
LogNormal LogNormal() Continuous
(Univariate)
$y \in (0,\infty)$ 2
Negative Binomial NegativeBinomial() Discrete Count
(Univariate)
$y \in (0, 1, 2, 3, \ldots)$ 2
Poisson Poisson() Discrete Count
(Univariate)
$y \in (0, 1, 2, 3, \ldots)$ 1
Spline Flow SplineFlow() Continuous & Discrete Count
(Univariate)
$y \in (-\infty,\infty)$

$y \in [0, \infty)$

$y \in [0, 1]$

$y \in (0, 1, 2, 3, \ldots)$
2xcount_bins + (count_bins-1) (order=quadratic)

3xcount_bins + (count_bins-1) (order=linear)
Student-T StudentT() Continuous
(Univariate)
$y \in (-\infty,\infty)$ 3
Weibull Weibull() Continuous
(Univariate)
$y \in [0, \infty)$ 2
Zero-Adjusted Beta ZABeta() Discrete-Continuous
(Univariate)
$y \in [0, 1)$ 3
Zero-Adjusted Gamma ZAGamma() Discrete-Continuous
(Univariate)
$y \in [0, \infty)$ 3
Zero-Adjusted LogNormal ZALN() Discrete-Continuous
(Univariate)
$y \in [0, \infty)$ 3
Zero-Inflated Negative Binomial ZINB() Discrete-Count
(Univariate)
$y \in [0, 1, 2, 3, \ldots)$ 3
Zero-Inflated Poisson ZIPoisson() Discrete-Count
(Univariate)
$y \in [0, 1, 2, 3, \ldots)$ 2

Some Notes

Stabilization

Since LightGBMLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important that these are comparable in magnitude for all distributional parameters. Due to variability regarding the ranges, the estimation of Gradients and Hessians might become unstable so that LightGBMLSS might not converge or might converge very slowly. To mitigate these effects, we have implemented a stabilization of Gradients and Hessians.

For improved convergence, an alternative approach is to standardize the (continuous) response variable, such as dividing it by 100 (e.g., y/100). This approach proves especially valuable when the response range significantly differs from that of Gradients and Hessians. Nevertheless, it is essential to carefully evaluate and apply both the built-in stabilization and response standardization techniques in consideration of the specific dataset at hand.

Runtime

Since LightGBMLSS is based on a one vs. all estimation strategy, where a separate tree is grown for each distributional parameter, it requires training [number of iterations] * [number of distributional parameters] trees. Hence, the runtime of LightGBMLSS is generally slightly higher for univariate distributions as compared to LightGBM, which requires training [number of iterations] trees only.

Feedback

We encourage you to provide feedback on how to enhance LightGBMLSS or request the implementation of additional distributions by opening a new discussion.

Reference Paper

März, A. and Kneib, T.: (2022) Distributional Gradient Boosting Machines.
März, Alexander (2019): XGBoostLSS - An extension of XGBoost to probabilistic forecasting.