• Stars
    star
    337
  • Rank 124,905 (Top 3 %)
  • Language
    R
  • Created over 6 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Tidy anomaly detection

Anomalize is being Superceded by Timetk:

anomalize

R-CMD-check Lifecycle Status Coverage status CRAN_Status_Badge

The anomalize package functionality has been superceded by timetk. We suggest you begin to use the timetk::anomalize() to benefit from enhanced functionality to get improvements going forward. Learn more about Anomaly Detection with timetk here.

The original anomalize package functionality will be maintained for previous code bases that use the legacy functionality.

To prevent the new timetk functionality from conflicting with old anomalize code, use these lines:

library(anomalize)

anomalize <- anomalize::anomalize
plot_anomalies <- anomalize::plot_anomalies

Tidy anomaly detection

anomalize enables a tidy workflow for detecting anomalies in data. The main functions are time_decompose(), anomalize(), and time_recompose(). When combined, it’s quite simple to decompose time series, detect anomalies, and create bands separating the “normal” data from the anomalous data.

Anomalize In 2 Minutes (YouTube)

Anomalize

Check out our entire Software Intro Series on YouTube!

Installation

You can install the development version with devtools or the most recent CRAN version with install.packages():

# devtools::install_github("business-science/anomalize")
install.packages("anomalize")

How It Works

anomalize has three main functions:

  • time_decompose(): Separates the time series into seasonal, trend, and remainder components
  • anomalize(): Applies anomaly detection methods to the remainder component.
  • time_recompose(): Calculates limits that separate the “normal” data from the anomalies!

Getting Started

Load the anomalize package. Usually, you will also load the tidyverse as well!

library(anomalize)
library(tidyverse)
# NOTE: timetk now has anomaly detection built in, which 
#  will get the new functionality going forward.
#  Use this script to prevent overwriting legacy anomalize:

anomalize <- anomalize::anomalize
plot_anomalies <- anomalize::plot_anomalies

Next, let’s get some data. anomalize ships with a data set called tidyverse_cran_downloads that contains the daily CRAN download counts for 15 “tidy” packages from 2017-01-01 to 2018-03-01.

Suppose we want to determine which daily download “counts” are anomalous. It’s as easy as using the three main functions (time_decompose(), anomalize(), and time_recompose()) along with a visualization function, plot_anomalies().

tidyverse_cran_downloads %>%
    # Data Manipulation / Anomaly Detection
    time_decompose(count, method = "stl") %>%
    anomalize(remainder, method = "iqr") %>%
    time_recompose() %>%
    # Anomaly Visualization
    plot_anomalies(time_recomposed = TRUE, ncol = 3, alpha_dots = 0.25) +
    ggplot2::labs(title = "Tidyverse Anomalies", subtitle = "STL + IQR Methods") 

Check out the anomalize Quick Start Guide.

Reducing Forecast Error by 32%

Yes! Anomalize has a new function, clean_anomalies(), that can be used to repair time series prior to forecasting. We have a brand new vignette - Reduce Forecast Error (by 32%) with Cleaned Anomalies.

tidyverse_cran_downloads %>%
    dplyr::filter(package == "lubridate") %>%
    dplyr::ungroup() %>%
    time_decompose(count) %>%
    anomalize(remainder) %>%
  
    # New function that cleans & repairs anomalies!
    clean_anomalies() %>%
  
    dplyr::select(date, anomaly, observed, observed_cleaned) %>%
    dplyr::filter(anomaly == "Yes")
#> # A time tibble: 19 Ă— 4
#> # Index:         date
#>    date       anomaly  observed observed_cleaned
#>    <date>     <chr>       <dbl>            <dbl>
#>  1 2017-01-12 Yes     -1.14e-13            3522.
#>  2 2017-04-19 Yes      8.55e+ 3            5202.
#>  3 2017-09-01 Yes      3.98e-13            4137.
#>  4 2017-09-07 Yes      9.49e+ 3            4871.
#>  5 2017-10-30 Yes      1.20e+ 4            6413.
#>  6 2017-11-13 Yes      1.03e+ 4            6641.
#>  7 2017-11-14 Yes      1.15e+ 4            7250.
#>  8 2017-12-04 Yes      1.03e+ 4            6519.
#>  9 2017-12-05 Yes      1.06e+ 4            7099.
#> 10 2017-12-27 Yes      3.69e+ 3            7073.
#> 11 2018-01-01 Yes      1.87e+ 3            6418.
#> 12 2018-01-05 Yes     -5.68e-14            6293.
#> 13 2018-01-13 Yes      7.64e+ 3            4141.
#> 14 2018-02-07 Yes      1.19e+ 4            8539.
#> 15 2018-02-08 Yes      1.17e+ 4            8237.
#> 16 2018-02-09 Yes     -5.68e-14            7780.
#> 17 2018-02-10 Yes      0                   5478.
#> 18 2018-02-23 Yes     -5.68e-14            8519.
#> 19 2018-02-24 Yes      0                   6218.

But Wait, There’s More!

There are a several extra capabilities:

  • plot_anomaly_decomposition() for visualizing the inner workings of how algorithm detects anomalies in the “remainder”.
tidyverse_cran_downloads %>%
    dplyr::filter(package == "lubridate") %>%
    dplyr::ungroup() %>%
    time_decompose(count) %>%
    anomalize(remainder) %>%
    plot_anomaly_decomposition() +
    ggplot2::labs(title = "Decomposition of Anomalized Lubridate Downloads")

For more information on the anomalize methods and the inner workings, please see “Anomalize Methods” Vignette.

References

Several other packages were instrumental in developing anomaly detection methods used in anomalize:

  • Twitter’s AnomalyDetection, which implements decomposition using median spans and the Generalized Extreme Studentized Deviation (GESD) test for anomalies.
  • forecast::tsoutliers() function, which implements the IQR method.

Interested in Learning Anomaly Detection?

Business Science offers two 1-hour courses on Anomaly Detection:

More Repositories

1

free_r_tips

Free R-Tips is a FREE Newsletter provided by Business Science. It comes with bite-sized code tutorials every week.
HTML
1,267
star
2

tidyquant

Bringing financial analysis to the tidyverse
R
850
star
3

pytimetk

Time series easier, faster, more fun. Pytimetk.
Python
673
star
4

timetk

Time series analysis in the `tidyverse`
R
608
star
5

modeltime

Modeltime unlocks time series forecast models and machine learning in one framework
R
515
star
6

tibbletime

Time-aware tibbles
R
179
star
7

presentations

A central repository of Business Science presentations
HTML
165
star
8

sweep

Extending broom for time series forecasting
R
155
star
9

correlationfunnel

Speed Up Exploratory Data Analysis (EDA)
R
132
star
10

cheatsheets

101
star
11

free_python_tips

HTML
79
star
12

modeltime.ensemble

Time Series Ensemble Forecasting
R
73
star
13

alphavantager

A lightweight R interface to the Alpha Vantage API
R
69
star
14

riingo

An R interface to the Tiingo stock price API
R
51
star
15

10_python_r_business_problems

Python
48
star
16

modeltime.h2o

Forecasting with H2O AutoML. Use the H2O Automatic Machine Learning algorithm as a backend for Modeltime Time Series Forecasting.
R
40
star
17

modeltime.gluonts

GluonTS Deep Learning with Modeltime
R
39
star
18

portfoliodown

An R package for creating professional data science portfolios
CSS
37
star
19

gpu_accelerated_forecasting_modeltime_gluonts

GPU-Accelerated Deep Learning for Time Series using Modeltime GluonTS (Learning Lab 53). Event sponsors: Saturn Cloud, NVIDIA, & Business Science.
HTML
22
star
20

modeltime.resample

Resampling Tools for Time Series Forecasting with Modeltime
R
19
star
21

reports

A central repository of Business Science technical reports
17
star
22

workshop_2018_dsgo

DataScienceGO 2018 - Machine Learning Workshop
R
13
star
23

lab_59_cust_lifetime_py

Learning Lab 59: Customer Lifetime Value Python
Python
12
star
24

pymodeltime

Pymodeltime offers a unified framework tailored to address a broad spectrum of requirements, including time series forecasting and various machine learning models.
Python
12
star
25

awesome-generative-ai

A curated list of resources for building and deploying generative AI specifically focusing on helping you become a GenAI developer with LLMs
12
star
26

shinyauth

Dockerfile
Dockerfile
10
star
27

gammodels

The parsnip backend for GAM Models.
R
7
star
28

modeltime_h2o_workshop

R
6
star
29

webinar_introducing_pytimetk

Jupyter Notebook
5
star
30

lab_50_lightgbm

Learning Lab 50: Hierarchical Forecasting
R
5
star
31

free_ai_training

Free training to get you started learning AI for data science today!
Jupyter Notebook
5
star
32

workshop_timetk_data_viz

R
3
star
33

bsu-dev

Code for development of Business Science University courses.
3
star
34

lab_63_nested_modeltime

R
1
star
35

courseinfo

Course information, curriculum, and brochures
1
star