• Stars
    star
    2,168
  • Rank 21,071 (Top 0.5 %)
  • Language
    Python
  • License
    Other
  • Created almost 5 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Algorithms for outlier, adversarial and drift detection

Alibi Detect Logo

Build Status Documentation Status codecov PyPI - Python Version PyPI - Package Version Conda (channel only) GitHub - License Slack channel


Alibi Detect is an open source Python library focused on outlier, adversarial and drift detection. The package aims to cover both online and offline detectors for tabular data, text, images and time series. Both TensorFlow and PyTorch backends are supported for drift detection.

For more background on the importance of monitoring outliers and distributions in a production setting, check out this talk from the Challenges in Deploying and Monitoring Machine Learning Systems ICML 2020 workshop, based on the paper Monitoring and explainability of models in production and referencing Alibi Detect.

For a thorough introduction to drift detection, check out Protecting Your Machine Learning Against Drift: An Introduction. The talk covers what drift is and why it pays to detect it, the different types of drift, how it can be detected in a principled manner and also describes the anatomy of a drift detector.

Table of Contents

Installation and Usage

The package, alibi-detect can be installed from:

  • PyPI or GitHub source (with pip)
  • Anaconda (with conda/mamba)

With pip

  • alibi-detect can be installed from PyPI:

    pip install alibi-detect
  • Alternatively, the development version can be installed:

    pip install git+https://github.com/SeldonIO/alibi-detect.git
  • To install with the TensorFlow backend:

    pip install alibi-detect[tensorflow]
  • To install with the PyTorch backend:

    pip install alibi-detect[torch]
  • To install with the KeOps backend:

    pip install alibi-detect[keops]
  • To use the Prophet time series outlier detector:

    pip install alibi-detect[prophet]

With conda

To install from conda-forge it is recommended to use mamba, which can be installed to the base conda enviroment with:

conda install mamba -n base -c conda-forge

To install alibi-detect:

mamba install -c conda-forge alibi-detect

Usage

We will use the VAE outlier detector to illustrate the API.

from alibi_detect.od import OutlierVAE
from alibi_detect.saving import save_detector, load_detector

# initialize and fit detector
od = OutlierVAE(threshold=0.1, encoder_net=encoder_net, decoder_net=decoder_net, latent_dim=1024)
od.fit(x_train)

# make predictions
preds = od.predict(x_test)

# save and load detectors
filepath = './my_detector/'
save_detector(od, filepath)
od = load_detector(filepath)

The predictions are returned in a dictionary with as keys meta and data. meta contains the detector's metadata while data is in itself a dictionary with the actual predictions. It contains the outlier, adversarial or drift scores and thresholds as well as the predictions whether instances are e.g. outliers or not. The exact details can vary slightly from method to method, so we encourage the reader to become familiar with the types of algorithms supported.

Supported Algorithms

The following tables show the advised use cases for each algorithm. The column Feature Level indicates whether the detection can be done at the feature level, e.g. per pixel for an image. Check the algorithm reference list for more information with links to the documentation and original papers as well as examples for each of the detectors.

Outlier Detection

Detector Tabular Image Time Series Text Categorical Features Online Feature Level
Isolation Forest โœ” โœ”
Mahalanobis Distance โœ” โœ” โœ”
AE โœ” โœ” โœ”
VAE โœ” โœ” โœ”
AEGMM โœ” โœ”
VAEGMM โœ” โœ”
Likelihood Ratios โœ” โœ” โœ” โœ” โœ”
Prophet โœ”
Spectral Residual โœ” โœ” โœ”
Seq2Seq โœ” โœ”

Adversarial Detection

Detector Tabular Image Time Series Text Categorical Features Online Feature Level
Adversarial AE โœ” โœ”
Model distillation โœ” โœ” โœ” โœ” โœ”

Drift Detection

Detector Tabular Image Time Series Text Categorical Features Online Feature Level
Kolmogorov-Smirnov โœ” โœ” โœ” โœ” โœ”
Cramรฉr-von Mises โœ” โœ” โœ” โœ”
Fisher's Exact Test โœ” โœ” โœ” โœ”
Maximum Mean Discrepancy (MMD) โœ” โœ” โœ” โœ” โœ”
Learned Kernel MMD โœ” โœ” โœ” โœ”
Context-aware MMD โœ” โœ” โœ” โœ” โœ”
Least-Squares Density Difference โœ” โœ” โœ” โœ” โœ”
Chi-Squared โœ” โœ” โœ”
Mixed-type tabular data โœ” โœ” โœ”
Classifier โœ” โœ” โœ” โœ” โœ”
Spot-the-diff โœ” โœ” โœ” โœ” โœ” โœ”
Classifier Uncertainty โœ” โœ” โœ” โœ” โœ”
Regressor Uncertainty โœ” โœ” โœ” โœ” โœ”

TensorFlow and PyTorch support

The drift detectors support TensorFlow, PyTorch and (where applicable) KeOps backends. However, Alibi Detect does not install these by default. See the installation options for more details.

from alibi_detect.cd import MMDDrift

cd = MMDDrift(x_ref, backend='tensorflow', p_val=.05)
preds = cd.predict(x)

The same detector in PyTorch:

cd = MMDDrift(x_ref, backend='pytorch', p_val=.05)
preds = cd.predict(x)

Or in KeOps:

cd = MMDDrift(x_ref, backend='keops', p_val=.05)
preds = cd.predict(x)

Built-in preprocessing steps

Alibi Detect also comes with various preprocessing steps such as randomly initialized encoders, pretrained text embeddings to detect drift on using the transformers library and extraction of hidden layers from machine learning models. This allows to detect different types of drift such as covariate and predicted distribution shift. The preprocessing steps are again supported in TensorFlow and PyTorch.

from alibi_detect.cd.tensorflow import HiddenOutput, preprocess_drift

model = # TensorFlow model; tf.keras.Model or tf.keras.Sequential
preprocess_fn = partial(preprocess_drift, model=HiddenOutput(model, layer=-1), batch_size=128)
cd = MMDDrift(x_ref, backend='tensorflow', p_val=.05, preprocess_fn=preprocess_fn)
preds = cd.predict(x)

Check the example notebooks (e.g. CIFAR10, movie reviews) for more details.

Reference List

Outlier Detection

Adversarial Detection

Drift Detection

Datasets

The package also contains functionality in alibi_detect.datasets to easily fetch a number of datasets for different modalities. For each dataset either the data and labels or a Bunch object with the data, labels and optional metadata are returned. Example:

from alibi_detect.datasets import fetch_ecg

(X_train, y_train), (X_test, y_test) = fetch_ecg(return_X_y=True)

Sequential Data and Time Series

  • Genome Dataset: fetch_genome

    • Bacteria genomics dataset for out-of-distribution detection, released as part of Likelihood Ratios for Out-of-Distribution Detection. From the original TL;DR: The dataset contains genomic sequences of 250 base pairs from 10 in-distribution bacteria classes for training, 60 OOD bacteria classes for validation, and another 60 different OOD bacteria classes for test. There are respectively 1, 7 and again 7 million sequences in the training, validation and test sets. For detailed info on the dataset check the README.
    from alibi_detect.datasets import fetch_genome
    
    (X_train, y_train), (X_val, y_val), (X_test, y_test) = fetch_genome(return_X_y=True)
  • ECG 5000: fetch_ecg

    • 5000 ECG's, originally obtained from Physionet.
  • NAB: fetch_nab

    • Any univariate time series in a DataFrame from the Numenta Anomaly Benchmark. A list with the available time series can be retrieved using alibi_detect.datasets.get_list_nab().

Images

  • CIFAR-10-C: fetch_cifar10c

    • CIFAR-10-C (Hendrycks & Dietterich, 2019) contains the test set of CIFAR-10, but corrupted and perturbed by various types of noise, blur, brightness etc. at different levels of severity, leading to a gradual decline in a classification model's performance trained on CIFAR-10. fetch_cifar10c allows you to pick any severity level or corruption type. The list with available corruption types can be retrieved with alibi_detect.datasets.corruption_types_cifar10c(). The dataset can be used in research on robustness and drift. The original data can be found here. Example:
    from alibi_detect.datasets import fetch_cifar10c
    
    corruption = ['gaussian_noise', 'motion_blur', 'brightness', 'pixelate']
    X, y = fetch_cifar10c(corruption=corruption, severity=5, return_X_y=True)
  • Adversarial CIFAR-10: fetch_attack

    • Load adversarial instances on a ResNet-56 classifier trained on CIFAR-10. Available attacks: Carlini-Wagner ('cw') and SLIDE ('slide'). Example:
    from alibi_detect.datasets import fetch_attack
    
    (X_train, y_train), (X_test, y_test) = fetch_attack('cifar10', 'resnet56', 'cw', return_X_y=True)

Tabular

  • KDD Cup '99: fetch_kdd
    • Dataset with different types of computer network intrusions. fetch_kdd allows you to select a subset of network intrusions as targets or pick only specified features. The original data can be found here.

Models

Models and/or building blocks that can be useful outside of outlier, adversarial or drift detection can be found under alibi_detect.models. Main implementations:

  • PixelCNN++: alibi_detect.models.pixelcnn.PixelCNN

  • Variational Autoencoder: alibi_detect.models.autoencoder.VAE

  • Sequence-to-sequence model: alibi_detect.models.autoencoder.Seq2Seq

  • ResNet: alibi_detect.models.resnet

    • Pre-trained ResNet-20/32/44 models on CIFAR-10 can be found on our Google Cloud Bucket and can be fetched as follows:
    from alibi_detect.utils.fetching import fetch_tf_model
    
    model = fetch_tf_model('cifar10', 'resnet32')

Integrations

Alibi-detect is integrated in the open source machine learning model deployment platform Seldon Core and model serving framework KFServing.

Citations

If you use alibi-detect in your research, please consider citing it.

BibTeX entry:

@software{alibi-detect,
  title = {Alibi Detect: Algorithms for outlier, adversarial and drift detection},
  author = {Van Looveren, Arnaud and Klaise, Janis and Vacanti, Giovanni and Cobb, Oliver and Scillitoe, Ashley and Samoilescu, Robert and Athorne, Alex},
  url = {https://github.com/SeldonIO/alibi-detect},
  version = {0.11.4},
  date = {2023-07-07},
  year = {2019}
}

More Repositories

1

seldon-core

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
HTML
4,320
star
2

alibi

Algorithms for explaining machine learning models
Python
2,352
star
3

seldon-server

Machine Learning Platform and Recommendation Engine built on Kubernetes
Java
1,475
star
4

MLServer

An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more
Python
675
star
5

tempo

MLOps Python Library
Python
112
star
6

goven

Goven (go-oven) is a go library that allows you to have a drop-in query language for your database schema.
Go
59
star
7

seldonio.github.com

Seldon Documentation
HTML
31
star
8

k8s-local-docker-registry

Shell
29
star
9

seldon-spark

Seldon Spark Jobs
26
star
10

semantic-vectors-lucene-tools

Tools for building a Lucene index for Semantic Vectors
Java
21
star
11

mlgraph

Machine Learning Inference Graph Spec
21
star
12

seldon-ucl

Seldon UCL Project
JavaScript
17
star
13

seldon-vm

Seldon VM repo
JavaScript
16
star
14

seldon-deploy-sdk

SDK for Seldon Deploy
Mustache
13
star
15

sig-mlops-jenkins-classic

Jupyter Notebook
13
star
16

ml-prediction-schema

Generic schema structure for machine learning model predictions
13
star
17

seldon-operator

Seldon Core Operator for Kubernetes
Go
12
star
18

deploy-workshops

Jupyter Notebook
12
star
19

sig-mlops-seldon-jenkins-x

Jupyter Notebook
11
star
20

cassava-example

Example mlserver and seldon deployment for a cassava leaf classifier
Python
10
star
21

importer-movielens-10m

Create Seldon data import files from Movielens 10m source data
Python
10
star
22

trtis-k8s-scheduler

Custom Scheduler to deploy ML models to TRTIS for GPU Sharing
Go
10
star
23

movie-demo-frontend

js frontend for movie recommender demo
JavaScript
9
star
24

seldon-core-launcher

Seldon Core Cloud Launcher
Jupyter Notebook
9
star
25

seldon-core-examples

Python
8
star
26

eubot

Machine learning classifer for the EU Referendum
Python
7
star
27

seldon-js-lib

A Javascript library to interact with the Seldon Server
7
star
28

ansible-k8s-collection

Collection of Ansible roles and playbooks crafted for Seldon ecosystem
Jinja
7
star
29

seldon-importer-web

Web page importer
6
star
30

tensorflow-demo-notebooks

Building and deploying a TensorFlow MNIST digit classifier on Kubernetes with Seldon
Jupyter Notebook
6
star
31

seldon-java-client

Seldon Java REST Client
5
star
32

jenkins-x-seldon-core-sandbox

HTML
5
star
33

deep-mnist-webapp

A webapp that recognises characters you draw using Seldon and Tensorflow.
JavaScript
5
star
34

seldon-server-config-template

5
star
35

seldon-models

A repository of training, inference and packaging code for Seldon demo models
Jupyter Notebook
5
star
36

seldon-deploy-operator

Seldon Deploy installation
Makefile
4
star
37

i-am-spartakus

Go
4
star
38

cicd-demo-model-source-files

Makefile
4
star
39

triton-python-examples

Triton inference server python backend examples
Python
3
star
40

DistributedKernelShap

Python
3
star
41

JPMML-utils

Helper function to use JPMML with Seldon-Core
Java
3
star
42

seldon-server-config-vm

Seldon Server configuration required for the Docker VM
3
star
43

seldon-gcp-marketplace

Seldon Core GCP Marketplace
Makefile
2
star
44

environment-paladinrose-staging

Makefile
2
star
45

environment-seldon-core-test-ci-cluster-dev

Shell
2
star
46

alibi-testing

Repository for storing and loading model binaries for testing purposes
Python
2
star
47

environment-paladinrose-production

Makefile
2
star
48

seldon

Seldon Top Level Repo
2
star
49

helm-charts

Seldon Helm Charts
Mustache
2
star
50

cicd-demo-k8s-manifest-files

Shell
2
star
51

seldon-java-wrapper

Wrap java code for use with seldon-core
Java
2
star
52

bcrypt-tool

Go
2
star
53

test-ci-project

1
star
54

seldon-gitops

Example GitOps repository.
1
star
55

seldon-core-aws

Seldon Core AWS Marketplace Helm Charts
Smarty
1
star
56

seldon-deploy-demos-gitops-template

1
star
57

movie-demo-setup

movie-demo-setup
JavaScript
1
star
58

seldon-prometheus-exporter

Go
1
star
59

seldon-deploy-resources

Dockerfile
1
star
60

seldon-mlmd-tools

Python
1
star