• Stars
    star
    3,928
  • Rank 11,117 (Top 0.3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Time series forecasting with PyTorch

PyTorch Forecasting

PyPI Version Conda Version Docs Status Linter Status Build Status Code Coverage

Documentation | Tutorials | Release Notes

PyTorch Forecasting is a PyTorch-based package for forecasting time series with state-of-the-art network architectures. It provides a high-level API for training networks on pandas data frames and leverages PyTorch Lightning for scalable training on (multiple) GPUs, CPUs and for automatic logging.


Our article on Towards Data Science introduces the package and provides background information.

PyTorch Forecasting aims to ease state-of-the-art timeseries forecasting with neural networks for real-world cases and research alike. The goal is to provide a high-level API with maximum flexibility for professionals and reasonable defaults for beginners. Specifically, the package provides

  • A timeseries dataset class which abstracts handling variable transformations, missing values, randomized subsampling, multiple history lengths, etc.
  • A base model class which provides basic training of timeseries models along with logging in tensorboard and generic visualizations such actual vs predictions and dependency plots
  • Multiple neural network architectures for timeseries forecasting that have been enhanced for real-world deployment and come with in-built interpretation capabilities
  • Multi-horizon timeseries metrics
  • Hyperparameter tuning with optuna

The package is built on pytorch-lightning to allow training on CPUs, single and multiple GPUs out-of-the-box.

Installation

If you are working on windows, you need to first install PyTorch with

pip install torch -f https://download.pytorch.org/whl/torch_stable.html.

Otherwise, you can proceed with

pip install pytorch-forecasting

Alternatively, you can install the package via conda

conda install pytorch-forecasting pytorch -c pytorch>=1.7 -c conda-forge

PyTorch Forecasting is now installed from the conda-forge channel while PyTorch is install from the pytorch channel.

To use the MQF2 loss (multivariate quantile loss), also install pip install pytorch-forecasting[mqf2]

Documentation

Visit https://pytorch-forecasting.readthedocs.io to read the documentation with detailed tutorials.

Available models

The documentation provides a comparison of available models.

To implement new models or other custom components, see the How to implement new models tutorial. It covers basic as well as advanced architectures.

Usage example

Networks can be trained with the PyTorch Lighning Trainer on pandas Dataframes which are first converted to a TimeSeriesDataSet.

# imports for training
import lightning.pytorch as pl
from lightning.pytorch.loggers import TensorBoardLogger
from lightning.pytorch.callbacks import EarlyStopping, LearningRateMonitor
# import dataset, network to train and metric to optimize
from pytorch_forecasting import TimeSeriesDataSet, TemporalFusionTransformer, QuantileLoss
from lightning.pytorch.tuner import Tuner

# load data: this is pandas dataframe with at least a column for
# * the target (what you want to predict)
# * the timeseries ID (which should be a unique string to identify each timeseries)
# * the time of the observation (which should be a monotonically increasing integer)
data = ...

# define the dataset, i.e. add metadata to pandas dataframe for the model to understand it
max_encoder_length = 36
max_prediction_length = 6
training_cutoff = "YYYY-MM-DD"  # day for cutoff

training = TimeSeriesDataSet(
    data[lambda x: x.date <= training_cutoff],
    time_idx= ...,  # column name of time of observation
    target= ...,  # column name of target to predict
    group_ids=[ ... ],  # column name(s) for timeseries IDs
    max_encoder_length=max_encoder_length,  # how much history to use
    max_prediction_length=max_prediction_length,  # how far to predict into future
    # covariates static for a timeseries ID
    static_categoricals=[ ... ],
    static_reals=[ ... ],
    # covariates known and unknown in the future to inform prediction
    time_varying_known_categoricals=[ ... ],
    time_varying_known_reals=[ ... ],
    time_varying_unknown_categoricals=[ ... ],
    time_varying_unknown_reals=[ ... ],
)

# create validation dataset using the same normalization techniques as for the training dataset
validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training.index.time.max() + 1, stop_randomization=True)

# convert datasets to dataloaders for training
batch_size = 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=2)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=2)

# create PyTorch Lighning Trainer with early stopping
early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=1, verbose=False, mode="min")
lr_logger = LearningRateMonitor()
trainer = pl.Trainer(
    max_epochs=100,
    accelerator="auto",  # run on CPU, if on multiple GPUs, use strategy="ddp"
    gradient_clip_val=0.1,
    limit_train_batches=30,  # 30 batches per epoch
    callbacks=[lr_logger, early_stop_callback],
    logger=TensorBoardLogger("lightning_logs")
)

# define network to train - the architecture is mostly inferred from the dataset, so that only a few hyperparameters have to be set by the user
tft = TemporalFusionTransformer.from_dataset(
    # dataset
    training,
    # architecture hyperparameters
    hidden_size=32,
    attention_head_size=1,
    dropout=0.1,
    hidden_continuous_size=16,
    # loss metric to optimize
    loss=QuantileLoss(),
    # logging frequency
    log_interval=2,
    # optimizer parameters
    learning_rate=0.03,
    reduce_on_plateau_patience=4
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")

# find the optimal learning rate
res = Tuner(trainer).lr_find(
    tft, train_dataloaders=train_dataloader, val_dataloaders=val_dataloader, early_stop_threshold=1000.0, max_lr=0.3,
)
# and plot the result - always visually confirm that the suggested learning rate makes sense
print(f"suggested learning rate: {res.suggestion()}")
fig = res.plot(show=True, suggest=True)
fig.show()

# fit the model on the data - redefine the model with the correct learning rate if necessary
trainer.fit(
    tft, train_dataloaders=train_dataloader, val_dataloaders=val_dataloader,
)

More Repositories

1

sktime

A unified framework for machine learning with time series
Python
7,833
star
2

sktime-dl

DEPRECATED, now in sktime - companion package for deep learning based on TensorFlow
Python
595
star
3

skpro

A unified framework for tabular probabilistic regression, time-to-event prediction, and probability distributions in python
Python
236
star
4

sktime-tutorial-pydata-amsterdam-2020

Introduction to Machine Learning with Time Series at PyData Festival Amsterdam 2020
Jupyter Notebook
123
star
5

pysf

Supervised forecasting of sequential data in Python.
Python
55
star
6

sktime-tutorial-pydata-global-2021

Introduction to sktime at the PyData Global 2021
Jupyter Notebook
54
star
7

sktime-tutorial-pydata-berlin-2022

Jupyter Notebook
40
star
8

sktime-tutorial-pydata-global-2022

sktime - python toolbox for time series: pipelines and transformers
Jupyter Notebook
25
star
9

mlaut

Python
24
star
10

mentoring

sktime mentorship program
20
star
11

skbase

Base classes for creating scikit-learn-like parametric objects, and tools for working with them.
Python
17
star
12

sktime-tutorial-pydata-global-2023

Jupyter Notebook
17
star
13

sktime-tutorial-europython-2023

Jupyter Notebook
14
star
14

sktime-workshops

sktime workshops & tutorials
Jupyter Notebook
13
star
15

sktime-workshop-pydata-london-2022

PyData London 2022 sktime workshop
Jupyter Notebook
11
star
16

sktime-tutorial-pydata-london-2023

Jupyter Notebook
9
star
17

distance-based-time-series-clustering

Jupyter Notebook
9
star
18

sktime-neuro

time series machine learning for neurological data
Python
8
star
19

community-org

Community organisation for sktime
8
star
20

enhancement-proposals

sktime enhancement proposals
Jupyter Notebook
7
star
21

sktime-tutorial-ODSC-Europe-2023

Jupyter Notebook
6
star
22

sktime-tutorial-ODSC-Europe-2024

Jupyter Notebook
5
star
23

pcit

Jupyter Notebook
4
star
24

sktime-tutorial-pydata-Amsterdam-2023

Jupyter Notebook
4
star
25

sktime-workshop-pyconPL2024

Jupyter Notebook
4
star
26

sktime-presentation-pydata-nyc-2023

HTML
2
star
27

sktime-tutorial-pydata-seattle-2023

skbase - a workbench for creating scikit-learn like parametric objects and libraries
Jupyter Notebook
2
star
28

sktime-datasets

2
star
29

sktime-workshop-pydata-prague-2023

Jupyter Notebook
2
star
30

sktime-workshop-scipy-2024

Jupyter Notebook
1
star
31

data-container

Design and implementation of time series data container
1
star
32

sktime-tutorial-euroscipy2024

sktime tutorial at EuroSciPy 2024
Jupyter Notebook
1
star