• Stars
    star
    461
  • Rank 94,393 (Top 2 %)
  • Language
    Python
  • Created over 6 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

WWW 2018: Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications

DONUT

https://travis-ci.org/haowen-xu/donut.svg?branch=master https://coveralls.io/repos/github/haowen-xu/donut/badge.svg?branch=master

Donut is an anomaly detection algorithm for seasonal KPIs.

Citation

@inproceedings{donut,
  title={Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications},
  author={Xu, Haowen and Chen, Wenxiao and Zhao, Nengwen and Li, Zeyan and Bu, Jiahao and Li, Zhihan and Liu, Ying and Zhao, Youjian and Pei, Dan and Feng, Yang and others},
  booktitle={Proceedings of the 2018 World Wide Web Conference on World Wide Web},
  pages={187--196},
  year={2018},
  organization={International World Wide Web Conferences Steering Committee}
}

Dependencies

TensorFlow >= 1.5

Installation

Checkout this repository and execute:

pip install git+https://github.com/thu-ml/zhusuan.git
pip install git+https://github.com/haowen-xu/[email protected]
pip install .

This will first install ZhuSuan and TFSnippet, the two major dependencies of Donut, then install the Donut package itself.

API Usage

To prepare the data:

import numpy as np
from donut import complete_timestamp, standardize_kpi

# Read the raw data.
timestamp, values, labels = ...
# If there is no label, simply use all zeros.
labels = np.zeros_like(values, dtype=np.int32)

# Complete the timestamp, and obtain the missing point indicators.
timestamp, missing, (values, labels) = \
    complete_timestamp(timestamp, (values, labels))

# Split the training and testing data.
test_portion = 0.3
test_n = int(len(values) * test_portion)
train_values, test_values = values[:-test_n], values[-test_n:]
train_labels, test_labels = labels[:-test_n], labels[-test_n:]
train_missing, test_missing = missing[:-test_n], missing[-test_n:]

# Standardize the training and testing data.
train_values, mean, std = standardize_kpi(
    train_values, excludes=np.logical_or(train_labels, train_missing))
test_values, _, _ = standardize_kpi(test_values, mean=mean, std=std)

To construct a Donut model:

import tensorflow as tf
from donut import Donut
from tensorflow import keras as K
from tfsnippet.modules import Sequential

# We build the entire model within the scope of `model_vs`,
# it should hold exactly all the variables of `model`, including
# the variables created by Keras layers.
with tf.variable_scope('model') as model_vs:
    model = Donut(
        h_for_p_x=Sequential([
            K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
                           activation=tf.nn.relu),
            K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
                           activation=tf.nn.relu),
        ]),
        h_for_q_z=Sequential([
            K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
                           activation=tf.nn.relu),
            K.layers.Dense(100, kernel_regularizer=K.regularizers.l2(0.001),
                           activation=tf.nn.relu),
        ]),
        x_dims=120,
        z_dims=5,
    )

To train the Donut model, and use a trained model for prediction:

from donut import DonutTrainer, DonutPredictor

trainer = DonutTrainer(model=model, model_vs=model_vs)
predictor = DonutPredictor(model)

with tf.Session().as_default():
    trainer.fit(train_values, train_labels, train_missing, mean, std)
    test_score = predictor.get_score(test_values, test_missing)

To save and restore a trained model:

from tfsnippet.utils import get_variables_as_dict, VariableSaver

with tf.Session().as_default():
    # Train the model.
    ...

    # Remember to get the model variables after the birth of a
    # `predictor` or a `trainer`.  The :class:`Donut` instances
    # does not build the graph until :meth:`Donut.get_score` or
    # :meth:`Donut.get_training_loss` is called, which is
    # done in the `predictor` or the `trainer`.
    var_dict = get_variables_as_dict(model_vs)

    # save variables to `save_dir`
    saver = VariableSaver(var_dict, save_dir)
    saver.save()

with tf.Session().as_default():
    # Restore variables from `save_dir`.
    saver = VariableSaver(get_variables_as_dict(model_vs), save_dir)
    saver.restore()

If you need more advanced outputs from the model, you may derive the outputs by using model.vae directly, for example:

from donut import iterative_masked_reconstruct

# Obtain the reconstructed `x`, with MCMC missing data imputation.
# See also:
#   :meth:`donut.Donut.get_score`
#   :func:`donut.iterative_masked_reconstruct`
#   :meth:`tfsnippet.modules.VAE.reconstruct`
input_x = ...  # 2-D `float32` :class:`tf.Tensor`, input `x` windows
input_y = ...  # 2-D `int32` :class:`tf.Tensor`, missing point indicators
               # for the `x` windows
x = model.vae.reconstruct(
    iterative_masked_reconstruct(
        reconstruct=model.vae.reconstruct,
        x=input_x,
        mask=input_y,
        iter_count=mcmc_iteration,
        back_prop=False
    )
)
# `x` is a :class:`tfsnippet.stochastic.StochasticTensor`, from which
# you may derive many useful outputs, for example:
x.tensor  # the `x` samples
x.log_prob(group_ndims=0)  # element-wise log p(x|z) of sampled x
x.distribution.log_prob(input_x)  # the reconstruction probability
x.distribution.mean, x.distribution.std  # mean and std of p(x|z)

More Repositories

1

OmniAnomaly

KDD 2019: Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network
Python
716
star
2

TraceAnomaly

ISSRE'20: Unsupervised Detection of Microservice Trace Anomalies through Service-Level Deep Bayesian Networks
Python
311
star
3

LogParse

An adaptive log template extraction toolkit.
Python
217
star
4

LogClass

IEEE-TNSM 2021: Anomalous Log Identification and Classification with Partial Labels
Python
166
star
5

Log2Vec

A distributed representation method for online logs.
Roff
160
star
6

Squeeze

ISSRE 2019: Generic and Robust Localization of Multi-Dimensional Root Cause
Python
91
star
7

KPI-Anomaly-Detection

2018AIOps: The 1st match for AIOps
77
star
8

DejaVu

Code and datasets for FSE'22 paper "Actionable and Interpretable Fault Localization for Recurring Failures in Online Service Systems"
Jupyter Notebook
72
star
9

TraceRCA

Practical Root Cause Localization for Microservice Systems via Trace Analysis. IWQoS 2021
Python
69
star
10

CIRCA

Causal Inference-based Root Cause Analysis
Python
67
star
11

AIOps-Challenge-2020-Data

The published dataset of AIOps Challenge 2020
60
star
12

Bagel

IPCCC 2018: Robust and Unsupervised KPI Anomaly Detection Based on Conditional Variational Autoencoder
Python
50
star
13

JumpStarter

Python
42
star
14

PSqueeze

Python
28
star
15

MultiDimension-Localization

2019AIOps: The 2nd match for AIOps
23
star
16

TraceVAE

The source code for "Unsupervised Anomaly Detection on Microservice Traces through Graph VAE" in WWW2023.
Python
17
star
17

CTF_data

Data of paper "CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Model Transfer"
13
star
18

OpsEval-Datasets

Datasets for OpsEval
Python
12
star
19

DOMI_code

code for DOMI
Python
11
star
20

kontrast

Python
9
star
21

aiops2020-judge

AIOps2020่ฏ„ๆต‹่„šๆœฌ
Python
7
star
22

CMDiagnostor

Python
7
star
23

DOMI_dataset

DOMI dataset
7
star
24

RC-LIR

Python
5
star
25

AutoKAD

Python
5
star
26

GTrace

Source code for GTrace (ESEC/FSE'23 industry track).
Python
4
star
27

KAD-Disformer

Python
3
star
28

AnoTuner

Python
3
star
29

PreFix

SIGMETRICS 2018: PreFix: Switch Failure Prediction in Datacenter Networks
2
star
30

course.aiops.org

HTML
2
star
31

aiops-2022-judge

2022ๆŒ‘ๆˆ˜่ต›่ฏ„ๆต‹่„šๆœฌ
Python
2
star
32

DejaVu-Omni

Code and datasets for TOSEM paper "DejaVu-Omni: Actionable, Robust and Interpretable Fault Localization for Recurring Failures in Online Service Systems"
Jupyter Notebook
1
star
33

AlertRCA

Python
1
star
34

OpenCompass-OpsQA

Python
1
star