ย
Nixtla ย๐ค Forecast
Machine Learning Scalable machine learning for time series forecasting
mlforecast is a framework to perform time series forecasting using machine learning models, with the option to scale to massive amounts of data using remote clusters.
Install
PyPI
pip install mlforecast
If you want to perform distributed training, you can instead use
pip install "mlforecast[distributed]"
, which will also install
dask. Note that youโll also need to install either
LightGBM
or
XGBoost.
conda-forge
conda install -c conda-forge mlforecast
Note that this installation comes with the required dependencies for the
local interface. If you want to perform distributed training, you must
install dask (conda install -c conda-forge dask
) and either
LightGBM
or
XGBoost.
Quick Start
Minimal Example
import lightgbm as lgb
from mlforecast import MLForecast
from sklearn.linear_model import LinearRegression
mlf = MLForecast(
models = [LinearRegression(), lgb.LGBMRegressor()],
lags=[1, 12],
freq = 'M'
)
mlf.fit(df)
mlf.predict(12)
Get Started with this quick guide.
Follow this end-to-end walkthrough for best practices.
Why?
Current Python alternatives for machine learning models are slow,
inaccurate and donโt scale well. So we created a library that can be
used to forecast in production environments. MLForecast
includes
efficient feature engineering to train any machine learning model (with
fit
and predict
methods such as
sklearn
) to fit millions of time
series.
Features
- Fastest implementations of feature engineering for time series forecasting in Python.
- Out-of-the-box compatibility with Spark, Dask, and Ray.
- Probabilistic Forecasting with Conformal Prediction.
- Support for exogenous variables and static covariates.
- Familiar
sklearn
syntax:.fit
and.predict
.
Missing something? Please open an issue or write us in
Examples and Guides
How to use
The following provides a very basic overview, for a more detailed description see the documentation.
Data setup
Store your time series in a pandas dataframe in long format, that is, each row represents an observation for a specific serie and timestamp.
from mlforecast.utils import generate_daily_series
series = generate_daily_series(
n_series=20,
max_length=100,
n_static_features=1,
static_as_categorical=False,
with_trend=True
)
series.head()
unique_id | ds | y | static_0 | |
---|---|---|---|---|
0 | id_00 | 2000-01-01 | 1.751917 | 72 |
1 | id_00 | 2000-01-02 | 9.196715 | 72 |
2 | id_00 | 2000-01-03 | 18.577788 | 72 |
3 | id_00 | 2000-01-04 | 24.520646 | 72 |
4 | id_00 | 2000-01-05 | 33.418028 | 72 |
Models
Next define your models. If you want to use the local interface this can
be any regressor that follows the scikit-learn API. For distributed
training there are LGBMForecast
and XGBForecast
.
import lightgbm as lgb
import xgboost as xgb
from sklearn.ensemble import RandomForestRegressor
models = [
lgb.LGBMRegressor(),
xgb.XGBRegressor(),
RandomForestRegressor(random_state=0),
]
Forecast object
Now instantiate a MLForecast
object with the models and the features
that you want to use. The features can be lags, transformations on the
lags and date features. The lag transformations are defined as
numba jitted functions that transform an
array, if they have additional arguments you can either supply a tuple
(transform_func
, arg1
, arg2
, โฆ) or define new functions fixing the
arguments. You can also define differences to apply to the series before
fitting that will be restored when predicting.
from mlforecast import MLForecast
from mlforecast.target_transforms import Differences
from numba import njit
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean
@njit
def rolling_mean_28(x):
return rolling_mean(x, window_size=28)
fcst = MLForecast(
models=models,
freq='D',
lags=[7, 14],
lag_transforms={
1: [expanding_mean],
7: [rolling_mean_28]
},
date_features=['dayofweek'],
target_transforms=[Differences([1])],
)
Training
To compute the features and train the models call fit
on your
Forecast
object.
fcst.fit(series)
MLForecast(models=[LGBMRegressor, XGBRegressor, RandomForestRegressor], freq=<Day>, lag_features=['lag7', 'lag14', 'expanding_mean_lag1', 'rolling_mean_28_lag7'], date_features=['dayofweek'], num_threads=1)
Predicting
To get the forecasts for the next n
days call predict(n)
on the
forecast object. This will automatically handle the updates required by
the features using a recursive strategy.
predictions = fcst.predict(14)
predictions
unique_id | ds | LGBMRegressor | XGBRegressor | RandomForestRegressor | |
---|---|---|---|---|---|
0 | id_00 | 2000-04-04 | 69.082830 | 67.761337 | 68.226556 |
1 | id_00 | 2000-04-05 | 75.706024 | 74.588699 | 75.484774 |
2 | id_00 | 2000-04-06 | 82.222473 | 81.058289 | 82.853684 |
3 | id_00 | 2000-04-07 | 89.577638 | 88.735947 | 90.351212 |
4 | id_00 | 2000-04-08 | 44.149095 | 44.981384 | 46.291173 |
... | ... | ... | ... | ... | ... |
275 | id_19 | 2000-03-23 | 30.151270 | 31.814825 | 32.592799 |
276 | id_19 | 2000-03-24 | 31.418104 | 32.653374 | 33.563294 |
277 | id_19 | 2000-03-25 | 32.843567 | 33.586033 | 34.530912 |
278 | id_19 | 2000-03-26 | 34.127210 | 34.541473 | 35.507559 |
279 | id_19 | 2000-03-27 | 34.329202 | 35.450943 | 36.425001 |
280 rows ร 5 columns
Visualize results
import matplotlib.pyplot as plt
import pandas as pd
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(12, 6), gridspec_kw=dict(hspace=0.3))
for i, (uid, axi) in enumerate(zip(series['unique_id'].unique(), ax.flat)):
fltr = lambda df: df['unique_id'].eq(uid)
pd.concat([series.loc[fltr, ['ds', 'y']], predictions.loc[fltr]]).set_index('ds').plot(ax=axi)
axi.set(title=uid, xlabel=None)
if i % 2 == 0:
axi.legend().remove()
else:
axi.legend(bbox_to_anchor=(1.01, 1.0))
fig.savefig('figs/index.png', bbox_inches='tight')
plt.close()
Sample notebooks
How to contribute
See CONTRIBUTING.md.