• Stars
    star
    102
  • Rank 335,584 (Top 7 %)
  • Language
    Jupyter Notebook
  • License
    Other
  • Created over 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.

GENDIS Build Status PyPI version Read The Docs Downloads

GENetic DIscovery of Shapelets

In the time series classification domain, shapelets are small subseries that are discriminative for a certain class. It has been shown that by projecting the original dataset to a distance space, where each axis corresponds to the distance to a certain shapelet, classifiers are able to achieve state-of-the-art results on a plethora of datasets.

This repository contains an implementation of GENDIS, an algorithm that searches for a set of shapelets in a genetic fashion. The algorithm is insensitive to its parameters (such as population size, crossover and mutation probability, ...) and can quickly extract a small set of shapelets that is able to achieve predictive performances similar (or better) to that of other shapelet techniques.

Installation

We currently support Python 3.5 & Python 3.6. For installation, there are two alternatives:

  1. Clone the repository https://github.com/IBCNServices/GENDIS.git and run (python3 -m) pip -r install requirements.txt
  2. GENDIS is hosted on PyPi. You can just run (python3 -m) pip install gendis to add gendis to your dist-packages (you can use it from everywhere).

Make sure NumPy and Cython is already installed (pip install numpy and pip install Cython), since that is required for the setup script.

Tutorial & Example

1. Loading & preprocessing the datasets

In a first step, we need to construct at least a matrix with timeseries (X_train) and a vector with labels (y_train). Additionally, test data can be loaded as well in order to evaluate the pipeline in the end.

import pandas as pd
# Read in the datafiles
train_df = pd.read_csv(<DATA_FILE>)
test_df = pd.read_csv(<DATA_FILE>)
# Split into feature matrices and label vectors
X_train = train_df.drop('target', axis=1)
y_train = train_df['target']
X_test = test_df.drop('target', axis=1)
y_test = test_df['target']

2. Creating a GeneticExtractor object

Construct the object. For a list of all possible parameters, and a description, please refer to the documentation in the code

from gendis.genetic import GeneticExtractor
genetic_extractor = GeneticExtractor(population_size=50, iterations=25, verbose=True, 
                                     mutation_prob=0.3, crossover_prob=0.3, 
                                     wait=10, max_len=len(X_train) // 2)

3. Fit the GeneticExtractor and construct distance matrix

shapelets = genetic_extractor.fit(X_train, y_train)
distances_train = genetic_extractor.transform(X_train)
distances_test = genetic_extractor.transform(X_test)

4. Fit ML classifier on constructed distance matrix

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
lr = LogisticRegression()
lr.fit(distances_train, y_train)

print('Accuracy = {}'.format(accuracy_score(y_test, lr.predict(distances_test))))

Example notebook

A simple example is provided in this notebook

Data

All datasets in this repository are downloaded from timeseriesclassification. Please refer to them appropriately when using any dataset.

Paper experiments

In order to reproduce the results from the corresponding paper, please check out this directory.

Tests

We provide a few doctests and unit tests. To run the doctests: python3 -m doctest -v <FILE>, where <FILE> is the Python file you want to run the doctests from. To run unit tests: nose2 -v

Contributing, Citing and Contact

If you have any questions, are experiencing bugs in the GENDIS implementation, or would like to contribute, please feel free to create an issue/pull request in this repository or take contact with me at gilles(dot)vandewiele(at)ugent(dot)be

If you use GENDIS in your work, please use the following citation:

@article{vandewiele2021gendis,
  title={GENDIS: Genetic Discovery of Shapelets},
  author={Vandewiele, Gilles and Ongenae, Femke and Turck, Filip De},
  journal={Sensors},
  volume={21},
  number={4},
  pages={1059},
  year={2021},
  publisher={Multidisciplinary Digital Publishing Institute}
}

More Repositories

1

GENESIM

[DEPRECATED] An innovative technique that constructs an ensemble of decision trees and converts this ensemble into a single, interpretable decision tree with an enhanced predictive performance
Scilab
79
star
2

easy-openvpn-server

Plug-and-play OpenVPN server which generates server and client config files for you.
Python
41
star
3

wasm-operator

Run k8s operators in wasm to reduce their overhead
Rust
27
star
4

INK

Instance Neighbouring by using Knowledge
Python
16
star
5

MINDWALC

Code & experiments for MINDWALC: Mining Interpretable, Discriminative Walks for Classification of Nodes in a Graph
Python
12
star
6

Folio-Ontology

HTML
7
star
7

MCAppAnalysis

Code to reproduce the experiments and the proposed visualization from 'Data mining in the development of mHealth apps: assessing in-app navigation through Markov Chain analysis'
Jupyter Notebook
6
star
8

tengu-charms

ARCHIVED. SEE https://github.com/tengu-team/tengu-charms
Python
5
star
9

skMAPLE

An implementation (sklearn API) of Model Agnostic Supervised Local Explanation (MAPLE) by Plumb et al. and reproduction of "accuracy" experiments.
Python
5
star
10

StardogStreamReasoning

Example code to perform Stardog stream reasoning
Python
5
star
11

HeadacheDSS

Repository containing all code and data required to reproduce the experiments of 'A decision support system to follow up and diagnose chronic primary headache patients using semantically enriched data'
Python
5
star
12

CSV2KG

Converting tabular data into semantic knowledge
Python
4
star
13

TPEHGDB-Experiments

Experiments conducted on the TPEHGDB dataset to reproduce the reported results from "A critical look at studies applying over-sampling on the TPEHGDB dataset"
Python
4
star
14

Magic

Mining an Augmented Graph using INK, starting from a CSV
Python
3
star
15

Accio-Ontology

Web Ontology Language
2
star
16

cascading-reasoning-framework

Repository on the Cascading Reasoning Framework
2
star
17

cyclists-monitoring

Code and data related to the research on how to give personalized real-time feedback to amateur cyclists on low-end devices, using Semantic Web technologies.
Shell
2
star
18

FuzzyConstraints

Python implementation of negative sampling strategies powered by fuzzy constraints.
Python
1
star
19

memleak

A cpp memory leak checker. Original: http://wyw.dcweb.cn/leakage.htm
C++
1
star
20

DIVIDE

DIVIDE - Adaptive Context-Aware Query Derivation for IoT Data Streams
Java
1
star
21

RPiaaS

JavaScript
1
star
22

che-charmbox

A docker container based on the Charmbox that can be used as Eclipse Che workspace.
Python
1
star
23

leaker

A cross-platform C memory leak checker. Original: http://left404.com/programming/leaker/
C
1
star
24

kube-rs

kube-rs fork with patches to make it compatible with wasm-operator
Rust
1
star
25

Last-Post-Dataset

The Last Post Thermal Dataset
1
star
26

orcon

orcon: a Kubernetes relationship orchestrator
Go
1
star
27

reactive-pattern-results

Code and results of paper about the reactive pattern: "Beyond Generic Lifecycles: Reusable Modeling of Custom-Fit Management Workflows for Cloud Applications" - IEEE CLOUD 2018
Jupyter Notebook
1
star