• Stars
    star
    501
  • Rank 88,002 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Scalable machine ๐Ÿค– learning for time series forecasting.

Nixtla ย  Tweet ย Slack

Machine Learning ๐Ÿค– Forecast

Scalable machine learning for time series forecasting

CI Python PyPi conda-forge License

mlforecast is a framework to perform time series forecasting using machine learning models, with the option to scale to massive amounts of data using remote clusters.

Install

PyPI

pip install mlforecast

If you want to perform distributed training, you can instead use pip install "mlforecast[distributed]", which will also install dask. Note that youโ€™ll also need to install either LightGBM or XGBoost.

conda-forge

conda install -c conda-forge mlforecast

Note that this installation comes with the required dependencies for the local interface. If you want to perform distributed training, you must install dask (conda install -c conda-forge dask) and either LightGBM or XGBoost.

Quick Start

Minimal Example

import lightgbm as lgb

from mlforecast import MLForecast
from sklearn.linear_model import LinearRegression

mlf = MLForecast(
    models = [LinearRegression(), lgb.LGBMRegressor()],
    lags=[1, 12],
    freq = 'M'
)
mlf.fit(df)
mlf.predict(12)

Get Started with this quick guide.

Follow this end-to-end walkthrough for best practices.

Why?

Current Python alternatives for machine learning models are slow, inaccurate and donโ€™t scale well. So we created a library that can be used to forecast in production environments. MLForecast includes efficient feature engineering to train any machine learning model (with fit and predict methods such as sklearn) to fit millions of time series.

Features

  • Fastest implementations of feature engineering for time series forecasting in Python.
  • Out-of-the-box compatibility with Spark, Dask, and Ray.
  • Probabilistic Forecasting with Conformal Prediction.
  • Support for exogenous variables and static covariates.
  • Familiar sklearn syntax: .fit and .predict.

Missing something? Please open an issue or write us in Slack

Examples and Guides

๐Ÿ“š End to End Walkthrough: model training, evaluation and selection for multiple time series.

๐Ÿ”Ž Probabilistic Forecasting: use Conformal Prediction to produce prediciton intervals.

๐Ÿ‘ฉโ€๐Ÿ”ฌ Cross Validation: robust modelโ€™s performance evaluation.

๐Ÿ”Œ Predict Demand Peaks: electricity load forecasting for detecting daily peaks and reducing electric bills.

๐Ÿ“ˆ Transfer Learning: pretrain a model using a set of time series and then predict another one using that pretrained model.

๐ŸŒก๏ธ Distributed Training: use a Dask cluster to train models at scale.

How to use

The following provides a very basic overview, for a more detailed description see the documentation.

Data setup

Store your time series in a pandas dataframe in long format, that is, each row represents an observation for a specific serie and timestamp.

from mlforecast.utils import generate_daily_series

series = generate_daily_series(
    n_series=20,
    max_length=100,
    n_static_features=1,
    static_as_categorical=False,
    with_trend=True
)
series.head()
unique_id ds y static_0
0 id_00 2000-01-01 1.751917 72
1 id_00 2000-01-02 9.196715 72
2 id_00 2000-01-03 18.577788 72
3 id_00 2000-01-04 24.520646 72
4 id_00 2000-01-05 33.418028 72

Models

Next define your models. If you want to use the local interface this can be any regressor that follows the scikit-learn API. For distributed training there are LGBMForecast and XGBForecast.

import lightgbm as lgb
import xgboost as xgb
from sklearn.ensemble import RandomForestRegressor

models = [
    lgb.LGBMRegressor(),
    xgb.XGBRegressor(),
    RandomForestRegressor(random_state=0),
]

Forecast object

Now instantiate a MLForecast object with the models and the features that you want to use. The features can be lags, transformations on the lags and date features. The lag transformations are defined as numba jitted functions that transform an array, if they have additional arguments you can either supply a tuple (transform_func, arg1, arg2, โ€ฆ) or define new functions fixing the arguments. You can also define differences to apply to the series before fitting that will be restored when predicting.

from mlforecast import MLForecast
from mlforecast.target_transforms import Differences
from numba import njit
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean


@njit
def rolling_mean_28(x):
    return rolling_mean(x, window_size=28)


fcst = MLForecast(
    models=models,
    freq='D',
    lags=[7, 14],
    lag_transforms={
        1: [expanding_mean],
        7: [rolling_mean_28]
    },
    date_features=['dayofweek'],
    target_transforms=[Differences([1])],
)

Training

To compute the features and train the models call fit on your Forecast object.

fcst.fit(series)
MLForecast(models=[LGBMRegressor, XGBRegressor, RandomForestRegressor], freq=<Day>, lag_features=['lag7', 'lag14', 'expanding_mean_lag1', 'rolling_mean_28_lag7'], date_features=['dayofweek'], num_threads=1)

Predicting

To get the forecasts for the next n days call predict(n) on the forecast object. This will automatically handle the updates required by the features using a recursive strategy.

predictions = fcst.predict(14)
predictions
unique_id ds LGBMRegressor XGBRegressor RandomForestRegressor
0 id_00 2000-04-04 69.082830 67.761337 68.226556
1 id_00 2000-04-05 75.706024 74.588699 75.484774
2 id_00 2000-04-06 82.222473 81.058289 82.853684
3 id_00 2000-04-07 89.577638 88.735947 90.351212
4 id_00 2000-04-08 44.149095 44.981384 46.291173
... ... ... ... ... ...
275 id_19 2000-03-23 30.151270 31.814825 32.592799
276 id_19 2000-03-24 31.418104 32.653374 33.563294
277 id_19 2000-03-25 32.843567 33.586033 34.530912
278 id_19 2000-03-26 34.127210 34.541473 35.507559
279 id_19 2000-03-27 34.329202 35.450943 36.425001

280 rows ร— 5 columns

Visualize results

import matplotlib.pyplot as plt
import pandas as pd

fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(12, 6), gridspec_kw=dict(hspace=0.3))
for i, (uid, axi) in enumerate(zip(series['unique_id'].unique(), ax.flat)):
    fltr = lambda df: df['unique_id'].eq(uid)
    pd.concat([series.loc[fltr, ['ds', 'y']], predictions.loc[fltr]]).set_index('ds').plot(ax=axi)
    axi.set(title=uid, xlabel=None)
    if i % 2 == 0:
        axi.legend().remove()
    else:
        axi.legend(bbox_to_anchor=(1.01, 1.0))
fig.savefig('figs/index.png', bbox_inches='tight')
plt.close()

Sample notebooks

How to contribute

See CONTRIBUTING.md.

More Repositories

1

statsforecast

Lightning โšก๏ธ fast forecasting with statistical and econometric models.
Python
3,846
star
2

neuralforecast

Scalable and user friendly neural ๐Ÿง  forecasting algorithms.
Python
3,001
star
3

nixtla

TimeGPT-1: production ready pre-trained Time Series Foundation Model for forecasting and anomaly detection. Generative pretrained transformer for time series trained on over 100B data points. It's capable of accurately predicting various domains such as retail, electricity, finance, and IoT with just a few lines of code ๐Ÿš€.
Jupyter Notebook
2,208
star
4

hierarchicalforecast

Probabilistic Hierarchical forecasting ๐Ÿ‘‘ with statistical and econometric methods.
Python
568
star
5

tsfeatures

Calculates various features from time series data. Python implementation of the R package tsfeatures.
Python
362
star
6

Nixtla

Automated time series processing and forecasting.
Python
253
star
7

transfer-learning-time-series

Transfer ๐Ÿค— Learning for Time Series Forecasting
Jupyter Notebook
123
star
8

datasetsforecast

Datasets for time series forecasting
Jupyter Notebook
38
star
9

fpp3-python

Forecasting: principles and practice in python
Jupyter Notebook
13
star
10

timegpt-forecaster-streamlit

TimeGPT forecaster example using streamlit
Python
12
star
11

vantage

Use TimeGPT to predict cloud costs and detect anomalies.
Python
11
star
12

public-slides

Nixtla Public Slides
Python
6
star
13

nixtlats

6
star
14

popol-vuh

Popol Vuh: Nixtla's operating system
Python
6
star
15

utilsforecast

Python
4
star
16

nixtlar

R SDK for TimeGPT
R
3
star
17

m4-forecasts

ZIP version of M4 forecasts uploaded to https://github.com/Mcompetitions/M4-methods/tree/master/Point%20Forecasts.
2
star
18

m5-forecasts

ZIP version of dataset and forecasts uploaded to https://drive.google.com/drive/folders/1D6EWdVSaOtrP1LEFh1REjI3vej6iUS_4.
2
star
19

nixtla-commons

Nixtla shared assets
CSS
2
star
20

blog

Jupyter Notebook
1
star
21

docs

MDX
1
star
22

how-to-contribute-nixtlaverse

Instruction to contribute to the Nixtla libraries
1
star