• Stars
    star
    683
  • Rank 66,158 (Top 2 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created over 2 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Python toolbox/library for reality-centric machine/deep learning and data mining on partially-observed time series with PyTorch, including SOTA neural network models for science analysis tasks of imputation, classification, clustering, forecasting & anomaly detection on incomplete (irregularly-sampled) multivariate TS with NaN missing values

Welcome to PyPOTS

A Python Toolbox for Data Mining on Partially-Observed Time Series

Python version powered by Pytorch the latest release version GPL-v3 license Community GitHub contributors GitHub Repo stars GitHub Repo forks Code Climate maintainability Coveralls coverage GitHub Testing arXiv DOI Conda downloads PyPI downloads

⦿ Motivation: Due to all kinds of reasons like failure of collection sensors, communication error, and unexpected malfunction, missing values are common to see in time series from the real-world environment. This makes partially-observed time series (POTS) a pervasive problem in open-world modeling and prevents advanced data analysis. Although this problem is important, the area of data mining on POTS still lacks a dedicated toolkit. PyPOTS is created to fill in this blank.

⦿ Mission: PyPOTS (pronounced "Pie Pots") is born to become a handy toolbox that is going to make data mining on POTS easy rather than tedious, to help engineers and researchers focus more on the core problems in their hands rather than on how to deal with the missing parts in their data. PyPOTS will keep integrating classical and the latest state-of-the-art data mining algorithms for partially-observed multivariate time series. For sure, besides various algorithms, PyPOTS is going to have unified APIs together with detailed documentation and interactive examples across algorithms as tutorials.

🤗 Please star this repo to help others notice PyPOTS if you think it is a useful toolkit. Please properly cite PyPOTS in your publications if it helps with your research. This really means a lot to our open-source research. Thank you!

TSDB logo

To make various open-source time-series datasets readily available to our users, PyPOTS gets supported by its subproject TSDB (Time-Series Data Base), a toolbox making loading time-series datasets super easy!

Visit TSDB right now to know more about this handy tool 🛠! It now supports a total of 119 open-source datasets.

The rest of this readme file is organized as follows: ❖ Installation, ❖ Usage, ❖ Available Algorithms, ❖ Citing PyPOTS, ❖ Contribution, ❖ Community.

❖ Installation

You can refer to the installation instruction in PyPOTS documentation for a guideline with more details.

PyPOTS is available on both PyPI and Anaconda. You can install PyPOTS as shown below:

# by pip
pip install pypots            # the first time installation
pip install pypots --upgrade  # update pypots to the latest version

# by conda
conda install -c conda-forge pypots  # the first time installation
conda update  -c conda-forge pypots  # update pypots to the latest version

Alternatively, you can install from the latest source code with the latest features but may be not officially released yet:

pip install https://github.com/WenjieDu/PyPOTS/archive/main.zip

❖ Usage

BrewPOTS logo

PyPOTS tutorials have been released. Considering the future workload, I separate the tutorials into a single repo, and you can find them in BrewPOTS. Take a look at it now, and learn how to brew your POTS datasets!

You can also find a simple and quick-start tutorial notebook on Google Colab with this link. If you have further questions, please refer to PyPOTS documentation docs.pypots.com. Besides, you can also raise an issue or ask in our community.

We present you a usage example of imputing missing values in time series with PyPOTS below, you can click it to view.

Click here to see an example applying SAITS on PhysioNet2012 for imputation:
import numpy as np
from sklearn.preprocessing import StandardScaler
from pypots.data import load_specific_dataset, mcar, masked_fill
from pypots.imputation import SAITS
from pypots.utils.metrics import cal_mae
# Data preprocessing. Tedious, but PyPOTS can help.
data = load_specific_dataset('physionet_2012')  # PyPOTS will automatically download and extract it.
X = data['X']
num_samples = len(X['RecordID'].unique())
X = X.drop(['RecordID', 'Time'], axis = 1)
X = StandardScaler().fit_transform(X.to_numpy())
X = X.reshape(num_samples, 48, -1)
X_intact, X, missing_mask, indicating_mask = mcar(X, 0.1) # hold out 10% observed values as ground truth
X = masked_fill(X, 1 - missing_mask, np.nan)
dataset = {"X": X}
print(dataset["X"].shape)  # (11988, 48, 37), 11988 samples, 48 time steps, 37 features
# Model training. This is PyPOTS showtime.
saits = SAITS(n_steps=48, n_features=37, n_layers=2, d_model=256, d_inner=128, n_heads=4, d_k=64, d_v=64, dropout=0.1, epochs=10)
saits.fit(dataset)  # train the model. Here I use the whole dataset as the training set, because ground truth is not visible to the model.
imputation = saits.impute(dataset)  # impute the originally-missing values and artificially-missing values
mae = cal_mae(imputation, X_intact, indicating_mask)  # calculate mean absolute error on the ground truth (artificially-missing values)

❖ Available Algorithms

PyPOTS supports imputation, classification, clustering, and forecasting tasks on multivariate time series with missing values. The currently available algorithms of four tasks are cataloged in the following table with four partitions. The paper references are all listed at the bottom of this readme file. Please refer to them if you want more details.

Imputation 🚥 🚥 🚥
Type Abbr. Full name of the algorithm/model/paper Year
Neural Net SAITS Self-Attention-based Imputation for Time Series 1 2023
Neural Net Transformer Attention is All you Need 2;
Self-Attention-based Imputation for Time Series 1;
Note: proposed in 2, and re-implemented as an imputation model in 1.
2017
Neural Net BRITS Bidirectional Recurrent Imputation for Time Series 3 2018
Neural Net M-RNN Multi-directional Recurrent Neural Network 4 2019
Naive LOCF Last Observation Carried Forward -
Classification 🚥 🚥 🚥
Type Abbr. Full name of the algorithm/model/paper Year
Neural Net BRITS Bidirectional Recurrent Imputation for Time Series 3 2018
Neural Net GRU-D Recurrent Neural Networks for Multivariate Time Series with Missing Values 5 2018
Neural Net Raindrop Graph-Guided Network for Irregularly Sampled Multivariate Time Series 6 2022
Clustering 🚥 🚥 🚥
Type Abbr. Full name of the algorithm/model/paper Year
Neural Net CRLI Clustering Representation Learning on Incomplete time-series data 7 2021
Neural Net VaDER Variational Deep Embedding with Recurrence 8 2019
Forecasting 🚥 🚥 🚥
Type Abbr. Full name of the algorithm/model/paper Year
Probabilistic BTTF Bayesian Temporal Tensor Factorization 9 2021

❖ Citing PyPOTS

[Updates in Jun 2023] 🎉A short version of the PyPOTS paper is accepted by the 9th SIGKDD international workshop on Mining and Learning from Time Series (MiLeTS'23)). Besides, PyPOTS has been included as a PyTorch Ecosystem project.

The paper introducing PyPOTS is available on arXiv at this URL, and we are pursuing to publish it in prestigious academic venues, e.g. JMLR (track for Machine Learning Open Source Software). If you use PyPOTS in your work, please cite it as below and 🌟star this repository to make others notice this library. 🤗

@article{du2023PyPOTS,
title={{PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time Series}},
author={Wenjie Du},
year={2023},
eprint={2305.18811},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2305.18811},
doi={10.48550/arXiv.2305.18811},
}

or

Wenjie Du. (2023). PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time Series. arXiv, abs/2305.18811. https://doi.org/10.48550/arXiv.2305.18811

❖ Contribution

You're very welcome to contribute to this exciting project!

By committing your code, you'll

  1. make your well-established model out-of-the-box for PyPOTS users to run, and help your work obtain more exposure and impact. Take a look at our inclusion criteria. You can utilize the template folder in each task package (e.g. pypots/imputation/template) to quickly start;
  2. be listed as one of PyPOTS contributors: ;
  3. get mentioned in our release notes;

You can also contribute to PyPOTS by simply staring🌟 this repo to help more people notice it. Your star is your recognition to PyPOTS, and it matters!

👏 Click here to view PyPOTS stargazers and forkers.
We're so proud to have more and more awesome users, as well as more bright stars:
PyPOTS stargazers
PyPOTS forkers

❖ Community

We care about the feedback from our users, so we're building PyPOTS community on

  • Slack. General discussion, Q&A, and our development team are here;
  • LinkedIn. Official announcements and news are here;
  • WeChat (微信公众号). We also run a group chat on WeChat, and you can get the QR code from the official account after following it;

If you have any suggestions or want to contribute ideas or share time-series related papers, join us and tell. PyPOTS community is open, transparent, and surely friendly. Let's work together to build and improve PyPOTS!

🏠 Visits PyPOTS visits

Footnotes

  1. Du, W., Cote, D., & Liu, Y. (2023). SAITS: Self-Attention-based Imputation for Time Series. Expert systems with applications. 2 3

  2. Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). Attention is All you Need. NeurIPS 2017. 2

  3. Cao, W., Wang, D., Li, J., Zhou, H., Li, L., & Li, Y. (2018). BRITS: Bidirectional Recurrent Imputation for Time Series. NeurIPS 2018. 2

  4. Yoon, J., Zame, W. R., & van der Schaar, M. (2019). Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks. IEEE Transactions on Biomedical Engineering.

  5. Che, Z., Purushotham, S., Cho, K., Sontag, D.A., & Liu, Y. (2018). Recurrent Neural Networks for Multivariate Time Series with Missing Values. Scientific Reports.

  6. Zhang, X., Zeman, M., Tsiligkaridis, T., & Zitnik, M. (2022). Graph-Guided Network for Irregularly Sampled Multivariate Time Series. ICLR 2022.

  7. Ma, Q., Chen, C., Li, S., & Cottrell, G. W. (2021). Learning Representations for Incomplete Time Series Clustering. AAAI 2021.

  8. Jong, J.D., Emon, M.A., Wu, P., Karki, R., Sood, M., Godard, P., Ahmad, A., Vrooman, H.A., Hofmann-Apitius, M., & Fröhlich, H. (2019). Deep learning for clustering of multivariate clinical patient trajectories with missing values. GigaScience.

  9. Chen, X., & Sun, L. (2021). Bayesian Temporal Factorization for Multidimensional Time Series Prediction. IEEE transactions on pattern analysis and machine intelligence.

More Repositories

1

SAITS

The official PyTorch implementation of the paper "SAITS: Self-Attention-based Imputation for Time Series". A fast and state-of-the-art (SOTA) deep-learning neural network model for efficient time-series imputation (impute multivariate incomplete time series containing NaN missing data/values with machine learning). https://arxiv.org/abs/2202.08516
Python
264
star
2

TSDB

Time Series Data Beans: a Python toolbox loads 169 public time-series datasets for machine learning/deep learning with a single line of code.
Python
114
star
3

Awesome_Imputation

Awesome Deep Learning Resources for Time-Series Imputation, including a must-read paper list about using deep learning neural networks to impute incomplete time series containing NaN missing values/data
Python
64
star
4

BrewPOTS

The tutorials for PyPOTS.
Jupyter Notebook
40
star
5

PyGrinder

PyGrinder grinds data beans into the incomplete by introducing missing values with different missing patterns.
Python
20
star
6

Google_Scholar_Badge_Generator

This repository helps you automatically generate citation badges of articles/profiles on Google Scholar. With GitHub actions, you can make yourself a GoogleScholar version of shields.io
Python
9
star
7

clickLikeInQzone

利用python & selenium实现爬虫在 qq 空间 自动 点赞 和 回复
Python
6
star
8

WenjieDu

6
star
9

eye_game

A python module for parsing human gaze direction
Python
5
star
10

DevNet

An implementation of Deviation Network with a case on the credit card fraud dataset.
Python
3
star
11

Spider_on_GitHub_Star_Fork

A spider crawls user information of stargazers and forkers of given repositories, then saves such information into a .csv file with pandas.
Python
3
star
12

PropBag

道具口袋: 这里存放一些有趣的小demo和小东西😁,欢迎来逛逛.
Python
3
star
13

MuLePOTS

2
star
14

WeChatAutoReply

使用itchat实现的微信自动回复脚本
Python
2
star