• Stars
    star
    1,418
  • Rank 33,052 (Top 0.7 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 4 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

TODS: An Automated Time-series Outlier Detection System

TODS: Automated Time-series Outlier Detection System

Logo

Actions Status codecov

δΈ­ζ–‡ζ–‡ζ‘£

TODS is a full-stack automated machine learning system for outlier detection on multivariate time-series data. TODS provides exhaustive modules for building machine learning-based outlier detection systems, including: data processing, time series processing, feature analysis (extraction), detection algorithms, and reinforcement module. The functionalities provided via these modules include data preprocessing for general purposes, time series data smoothing/transformation, extracting features from time/frequency domains, various detection algorithms, and involving human expertise to calibrate the system. Three common outlier detection scenarios on time-series data can be performed: point-wise detection (time points as outliers), pattern-wise detection (subsequences as outliers), and system-wise detection (sets of time series as outliers), and a wide-range of corresponding algorithms are provided in TODS. This package is developed by DATA Lab @ Rice University.

TODS is featured for:

  • Full Stack Machine Learning System which supports exhaustive components from preprocessings, feature extraction, detection algorithms and also human-in-the loop interface.

  • Wide-range of Algorithms, including all of the point-wise detection algorithms supported by PyOD, state-of-the-art pattern-wise (collective) detection algorithms such as DeepLog, Telemanon, and also various ensemble algorithms for performing system-wise detection.

  • Automated Machine Learning aims to provide knowledge-free process that construct optimal pipeline based on the given data by automatically searching the best combination from all of the existing modules.

Examples and Tutorials

Resources

Cite this Work:

If you find this work useful, you may cite this work:

@article{Lai_Zha_Wang_Xu_Zhao_Kumar_Chen_Zumkhawaka_Wan_Martinez_Hu_2021, 
	title={TODS: An Automated Time Series Outlier Detection System}, 
	volume={35}, 
	number={18}, 
	journal={Proceedings of the AAAI Conference on Artificial Intelligence}, 
	author={Lai, Kwei-Herng and Zha, Daochen and Wang, Guanchu and Xu, Junjie and Zhao, Yue and Kumar, Devesh and Chen, Yile and Zumkhawaka, Purav and Wan, Minyang and Martinez, Diego and Hu, Xia}, 
	year={2021}, month={May}, 
	pages={16060-16062} 
}

Installation

This package works with Python 3.7+ and pip 19+. You need to have the following packages installed on the system (for Debian/Ubuntu):

sudo apt-get install libssl-dev libcurl4-openssl-dev libyaml-dev build-essential libopenblas-dev libcap-dev ffmpeg

Clone the repository (if you are in China and Github is slow, you can use the mirror in Gitee):

git clone https://github.com/datamllab/tods.git

Install locally with pip:

cd tods
pip install -e .

Examples

Examples are available in /examples. For basic usage, you can evaluate a pipeline on a given datasets. Here, we provide example to load our default pipeline and evaluate it on a subset of yahoo dataset.

import pandas as pd

from tods import schemas as schemas_utils
from tods import generate_dataset, evaluate_pipeline

table_path = 'datasets/anomaly/raw_data/yahoo_sub_5.csv'
target_index = 6 # what column is the target
metric = 'F1_MACRO' # F1 on both label 0 and 1

# Read data and generate dataset
df = pd.read_csv(table_path)
dataset = generate_dataset(df, target_index)

# Load the default pipeline
pipeline = schemas_utils.load_default_pipeline()

# Run the pipeline
pipeline_result = evaluate_pipeline(dataset, pipeline, metric)
print(pipeline_result)

We also provide AutoML support to help you automatically find a good pipeline for your data.

import pandas as pd

from axolotl.backend.simple import SimpleRunner

from tods import generate_dataset, generate_problem
from tods.searcher import BruteForceSearch

# Some information
table_path = 'datasets/yahoo_sub_5.csv'
target_index = 6 # what column is the target
time_limit = 30 # How many seconds you wanna search
metric = 'F1_MACRO' # F1 on both label 0 and 1

# Read data and generate dataset and problem
df = pd.read_csv(table_path)
dataset = generate_dataset(df, target_index=target_index)
problem_description = generate_problem(dataset, metric)

# Start backend
backend = SimpleRunner(random_seed=0)

# Start search algorithm
search = BruteForceSearch(problem_description=problem_description,
                          backend=backend)

# Find the best pipeline
best_runtime, best_pipeline_result = search.search_fit(input_data=[dataset], time_limit=time_limit)
best_pipeline = best_runtime.pipeline
best_output = best_pipeline_result.output

# Evaluate the best pipeline
best_scores = search.evaluate(best_pipeline).scores

Acknowledgement

We gratefully acknowledge the Data Driven Discovery of Models (D3M) program of the Defense Advanced Research Projects Agency (DARPA)

More Repositories

1

rlcard

Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
Python
2,864
star
2

awesome-game-ai

Awesome Game AI materials of Multi-Agent Reinforcement Learning
757
star
3

LongLM

[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Python
597
star
4

awesome-deepfakes-materials

A curated list of awesome Deepfakes materials
400
star
5

rlcard-showdown

Leaderboard and Visualization for RLCard
JavaScript
352
star
6

autovideo

AutoVideo: An Automated Video Action Recognition System
Python
319
star
7

awesome-fairness-in-ai

A curated list of awesome Fairness in AI resources
310
star
8

pyodds

An End-to-end Outlier Detection System
Python
251
star
9

automl-in-action-notebooks

Jupyter notebooks for the code samples of the book "Automated Machine Learning in Action"
Jupyter Notebook
89
star
10

rlcard-tutorial

Python and R tutorial for RLCard in Jupyter Notebook
Jupyter Notebook
81
star
11

BED_main

BED: A Real-Time Object Detection System for Edge Devices
Python
57
star
12

AutoRec

Python
49
star
13

xdeep

Jupyter Notebook
42
star
14

pyten

Python Package for Tensor Completion Algorithms
Python
33
star
15

autokaggle

Automated Machine Learning (AutoML) for Kaggle Competition
Python
31
star
16

awsome-LLM-generated-text-detection

24
star
17

Mitigating_Gender_Bias_In_Captioning_System

under review
Python
13
star
18

The-Science-of-LLM-generated-Text-Detection

12
star
19

autokeras-algorithm

Some other AutoML algorithms as baselines.
Python
11
star
20

autokeras-pretrained

Python
11
star
21

labnews

5
star
22

awsome-trojan-attack-in-ai

5
star
23

awsome-interpretable-ML

4
star
24

BED_GUI

C
3
star
25

BED_camera

C
1
star
26

XAI_TAMU

Hosted for DARPA XAI Project
CSS
1
star