• Stars
    star
    203
  • Rank 192,890 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 5 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Tensor decomposition for machine learning (w/ Python implementation)

Tensor Learning (张量学习)

MIT License Python 3.7 GitHub stars

Made by Xinyu Chen • 🌐 https://twitter.com/chenxy346

Python codes for tensor factorization, tensor completion, and tensor regression techniques with the following real-world applications:

  • geotensor | Image inpainting
  • transdim | Spatiotemporal traffic data imputation and prediction
  • Recommender systems
  • mats | Multivariate time series imputation and forecasting

In a hurry? Please check out our contents as follows.

Our Research

▴ Back to top

We conduct extensive experiments on some real-world data sets:

  • Middle-scale data sets:

    • PeMS (P) registers traffic speed time series from 228 sensors over 44 days with 288 time points per day (i.e., 5-min frequency). The tensor size is 228 x 288 x 44.
    • Guanghzou (G) contains traffic speed time series from 214 road segments in Guangzhou, China over 61 days with 144 time points per day (i.e., 10-min frequency). The tensor size is 214 x 144 x 61.
    • Electricity (E) records hourly electricity consumption transactions of 370 clients from 2011 to 2014. We use a subset of the last five weeks of 321 clients in our experiments. The tensor size is 321 x 24 x 35.
  • Large-scale PeMS traffic speed data set registers traffic speed time series from 11160 sensors over 4/8/12 weeks (for PeMS-4W/PeMS-8W/PeMS-12W) with 288 time points per day (i.e., 5-min frequency) in California, USA. You can download this data set and place it at the folder of ../datasets.

    • Data size:
      • PeMS-4W: 11160 x 288 x 28 (contains about 90 million observations).
      • PeMS-8W: 11160 x 288 x 56 (contains about 180 million observations).
    • Data path example: ../datasets/California-data-set/pems-4w.csv.
    • Open data in Python with Pandas:
import pandas as pd

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)

mats

mats is a project in the tensor learning repository, and it aims to develop machine learning models for multivariate time series forecasting. In this project, we propose the following low-rank tensor learning models:

We write Python codes with Jupyter notebook and place the notebooks at the folder of ../mats. If you want to test our Python code, please run the notebook at the folder of ../mats. Note that each notebook is independent on others, you could run each individual notebook directly.

The baseline models include:

We write Python codes with Jupyter notebook and place the notebooks at the folder of ../baselines. If you want to test our Python code, please run the notebook at the folder of ../baselines. The notebook which reproduces algorithm on large-scale data sets is emphasized by Large-Scale-xx.

📖 Reproducing Literature in Python

▴ Back to top

We reproduce some tensor learning experiments in the previous literature.

Year Title PDF Authors' Code Our Code Status
2015 Accelerated Online Low-Rank Tensor Learning for Multivariate Spatio-Temporal Streams ICML 2015 Matlab code Python code Under development
2016 Scalable and Sound Low-Rank Tensor Learning AISTATS 2016 - xx Under development

📖 Tutorial

▴ Back to top

We summarize some preliminaries for better understanding tensor learning. They are given in the form of tutorial as follows.

  • Foundations of Python Numpy Programming

  • Foundations of Tensor Computations

    • Kronecker product
  • Singular Value Decomposition (SVD)

If you find these codes useful, please star (★) this repository.

Helpful Material

▴ Back to top

We believe that these material will be a valuable and useful source for the readers in the further study or advanced research.

  • Vladimir Britanak, Patrick C. Yip, K.R. Rao (2006). Discrete Cosine and Sine Transforms: General Properties, Fast Algorithms and Integer Approximations. Academic Press. [About the book]

  • Ruye Wang (2010). Introduction to Orthogonal Transforms with Applications in Data Processing and Analysis. Cambridge University Press. [PDF]

  • J. Nathan Kutz, Steven L. Brunton, Bingni Brunton, Joshua L. Proctor (2016). Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems. SIAM. [About the book]

  • Yimin Wei, Weiyang Ding (2016). Theory and Computation of Tensors: Multi-Dimensional Arrays. Academic Press.

  • Steven L. Brunton, J. Nathan Kutz (2019). Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press. [PDF] [data & code]

Quick Run

▴ Back to top

  • If you want to run the code, please
    • download (or clone) this repository,
    • open the .ipynb file using Jupyter notebook,
    • and run the code.

Citing

▴ Back to top

This repository is from the following paper, please cite our paper if it helps your research.

  • Xinyu Chen, Lijun Sun (2020). Low-rank autoregressive tensor completion for multivariate time series forecasting. arXiv: 2006.10436. [preprint] [data & Python code]

Acknowledgements

▴ Back to top

This research is supported by the Institute for Data Valorization (IVADO).

License

▴ Back to top

This work is released under the MIT license.

More Repositories

1

latex-cookbook

LaTeX论文写作教程 (清华大学出版社)
Jupyter Notebook
1,343
star
2

awesome-latex-drawing

Drawing Bayesian networks, graphical models, tensors, technical frameworks, and illustrations in LaTeX.
TeX
1,285
star
3

transdim

Machine learning for transportation data imputation and prediction.
Jupyter Notebook
1,196
star
4

academic-drawing

Providing codes (including Matlab and Python) for visualizing numerical experiment results.
MATLAB
198
star
5

awesome-beamer

Creating presentation slides by using Beamer in LaTeX.
TeX
97
star
6

tensor-book

张量计算系列教程 (Tensor Computations Tutorials)
92
star
7

tracebase

Nonstationary temporal matrix factorization for sparse traffic time series forecasting.
Jupyter Notebook
46
star
8

geotensor

Geometric low-rank tensor completion for color image inpainting.
Jupyter Notebook
41
star
9

vars

Discovering dynamic patterns from spatiotemporal data with time-varying low-rank autoregression. (IEEE TKDE'24)
Jupyter Notebook
17
star
10

fluid-inpainting

Inpainting Fluid Dynamics with Tensor Decomposition (NumPy). Blog post: https://medium.com/p/d84065fead4d
13
star
11

visual-spatial-data

Visualizing spatial data with Python
Jupyter Notebook
8
star
12

mob4cast

Multidimensional time series prediction with passenger/taxi flow data sets.
Jupyter Notebook
7
star
13

LCR

Laplacian convolutional representation for traffic time series imputation. (IEEE TKDE'24)
Jupyter Notebook
6
star
14

autoregressive-tensor

Low-rank autoregressive tensor completion for spatiotemporal traffic data imputation. (IEEE TITS'22)
Jupyter Notebook
6
star
15

tensor_completion

Low-rank tensor completion algorithm - HaLRTC.
Julia
5
star
16

climate-tensor

Spatiotemporal climate variable data and their visualization.
5
star
17

conjugate-gradient

Some simple examples for showing how does conjugate gradient method work on the system of linear equations. Blog post: https://medium.com/p/7f16cbae18a3
TeX
4
star
18

TensorPi

Low-rank tensor learning with PyTorch.
Python
3
star
19

xinychen

3
star
20

fluid-forecasting

Forecasting fluid dynamics with temporal matrix factorization in the presence of missing values.
2
star
21

DMD-py-notebooks

Jupyter Python notebook examples for using dynamic mode decomposition models.
1
star
22

tensor-var

Kronecker product decomposition/approximation.
1
star