γ°οΈ hctsa γ°οΈ: highly comparative time-series analysis
hctsa is a Matlab software package for running highly comparative time-series analysis. It extracts thousands of time-series features from a collection of univariate time series and includes a range of tools for visualizing and analyzing the resulting time-series feature matrix, including:
- Normalizing and clustering time-series data;
- Producing low-dimensional representations of time-series data;
- Identifying and interpreting discriminating features between different classes of time series; and
- Fitting and evaluating multivariate classification models.
Feel free to email me for advice on applications of hctsa π€
Installation β¬οΈ
For users familiar with git (recommended), please make a fork of the repo and then clone it to your local machine.
To update, after setting an upstream remote (git remote add upstream git://github.com/benfulcher/hctsa.git
) you can use git pull upstream main
.
To obtain the latest toolboxes (like the optimized catch22 faeture set) you should then run git submodule update --init
.
Users unfamiliar with git can instead download the repository by clicking the green "Code" button then "Download ZIP".
Once downloaded, you can install hctsa by running the install.m
script (see docs for details).
Documentation and Wiki π
Comprehensive documentation for hctsa, from getting started through to more advanced analyses is on GitBook.
There is also alot of additional information on the wiki, including:
- π Information about alternative feature sets (including the much faster catch22), and information about other time-series packages available in R, python, and Julia.
- γ°οΈ The accompanying time-series data archive for this project, CompEngine.
- πΎ Downloadable hctsa feature matrices from time-series datasets with example workflows.
- π» Resources for distributing an hctsa computation on a computing cluster.
- π A list of publications that have used hctsa to address different research questions.
- π Frequently asked questions about hctsa and related feature-based time-series analyses.
Acknowledgement π
If you use this software, please read and cite these open-access articles:
- B.D. Fulcher and N.S. Jones. hctsa: A computational framework for automated time-series phenotyping using massive feature extraction. Cell Systems: 5, 527 (2017).
- B.D. Fulcher, M.A. Little, N.S. Jones. Highly comparative time-series analysis: the empirical structure of time series and their methods. J. Roy. Soc. Interface: 10, 83 (2013).
Feedback, as email, GitHub issues or pull requests, is much appreciated.
For commercial use of hctsa, including licensing and consulting, contact Engine Analytics.
Licenses
Internal licenses
There are two licenses applied to the core parts of the repository:
-
The framework for running hctsa analyses and visualizations is licensed as the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. A license for commercial use is available from Engine Analytics.
-
Code for computing features from time-series data is licensed as GNU General Public License version 3.
A range of external code packages are provided in the Toolboxes
directory of the repository, and each have their own associated license (as outlined below).
External packages and dependencies
Many features in hctsa rely on external packages and Matlab toolboxes. In the case that some of them are unavailable, hctsa can still be used, but only a reduced set of time-series features will be computed.
hctsa uses the following Matlab Add-On Toolboxes: Statistics and Machine Learning, Signal Processing, Curve Fitting, System Identification, Wavelet, and Econometrics.
The following external time-series analysis code packages are provided with the software (in the Toolboxes
directory), and are used by our main feature-extraction algorithms to compute meaningful structural features from time series:
- TISEAN package for nonlinear time-series analysis, version 3.0.1 (GPL license).
- TSTOOL package for nonlinear time-series analysis, version 1.2 (GPL license).
- Joseph T. Lizier's Java Information Dynamics Toolkit (JIDT) for studying information-theoretic measures of computation in complex systems, version 1.3 (GPL license).
- Time-series analysis code developed by Michael Small (unlicensed).
- Max Little's Time-series analysis code (GPL license).
- Sample Entropy code from Physionet (GPL license).
- ARFIT Toolbox for AR model estimation (unlicensed).
- gpml Toolbox for Gaussian Process regression model estimation, version 3.5 (FreeBSD license).
- Danilo P. Mandic's delay vector variance code (GPL license).
- Cross Recurrence Plot Toolbox (GPL license)
- Zoubin Ghahramani's Hidden Markov Model (HMM) code (MIT license).
- Danny Kaplan's Code for embedding statistics (GPL license).
- Two-dimensional histogram code from Matlab Central (BSD license).
- Various histogram and entropy code by Rudy Moddemeijer (unlicensed).
Acknowledgements π
Many thanks go to Romesh Abeysuriya for helping with the mySQL database set-up and install scripts, and Santi Villalba for lots of helpful feedback and advice on the software.