Wōtan...
...offers free and open source algorithms to automagically remove trends from time-series data.
In Germanic mythology, Odin (/ˈoːðinː/ Old High German: Wōtan) is a widely revered god. He gave one of his eyes to Mimir in return for wisdom. Thus, in order to achieve a goal, one sometimes has to turn a blind eye. In Richard Wagner's "Der Ring des Nibelungen", Wotan is the King of the Gods (god of light, air, and wind) and a bass-baritone. According to Wagner, he is the "pinnacle of intelligence".
Example usage
from wotan import flatten
flatten_lc, trend_lc = flatten(time, flux, window_length=0.5, method='biweight', return_trend=True)
For more details, have a look at the interactive playground, the documentation. We also have examples and tutorials available, such as the 📑Example: Basic wotan functionality
Available detrending algorithms
-
Time-windowed sliders with location estimates: (
📑 Example: Comparison of sliders)biweight
Robust M-estimator using Tukey's biweight (📑 Example)huber
Robust M-estimator from Huber (1981) (iterative)huber_psi
Robust M-estimator based on Huber's ψ (one-step)hampel
Robust M-estimator based on Hampel (1972), 3-part descending, known as (a,b,c), 17A, 25Aandrewsinewave
Robust M-estimator using Andrew's sine wavewelsch
Robust M-estimator from Welsch-Leclercramsay
Robust M-estimator from Ramsay (1977), known as Ramsay's Eatau
Robust τ estimator from Yohai & Zamar (1986)hodges
Rank-based robust R-estimator Hodges-Lehmann-Senmedian
The most robust (but least efficient)medfilt
A cadence-based median filter (not time-windowed) for comparisonmean
The least robust (but most efficient for white noise)trim_mean
Trimmed mean (outliers are removed)winsorize
Trimmed mean (outliers are winsorized to a specified percentile)hampelfilt
Trimmed mean (outliers are replaced with the median)
-
Splines: (
📑 Example)rspline
Spline with iterative sigma-clippinghspline
Spline with a robust Huber estimator (Huber 1981)pspline
Penalized spline to automatically select the knot distance (Eilers 1996), with iterative sigma-clipping
-
Polynomials and sines: (
📑 Example)cofiam
Cosine Filtering with Autocorrelation Minimization (Kipping et al. 2013)cosine
Sum of sines and cosines, with option for iterative sigma-clippingsavgol
Sliding segments are fit with polynomials (Savitzky & Golay 1964), cadence-based
-
Regressions: (
📑 Example)lowess
Locally weighted scatterplot smoothing (Cleveland 1979)supersmoother
Friedman's (1984) Super-Smoother, a local linear regression with adaptive bandwidth
Fitting a model that is a sum of Gaussian bases: (
📑 Example)ridge
Ridge regression (L2 loss, Tikhonov regularization)lasso
LASSO regression (L1 loss, Least Absolute Shrinkage Selector Operator, Tibshirani (1996))elasticnet
Linear regression model with 50% L1 and 50% L2 norm regularization
-
gp
Gaussian Processes offering: (📑Example: GP Standard vs. robust)squared_exp
Squared-exponential kernel, with option for iterative sigma-clippingmatern
Matern 3/2 kernel, with option for iterative sigma-clippingperiodic
Periodic kernel informed by a user-specified period (📑Example)periodic_auto
Periodic kernel informed by a Lomb-Scargle periodogram pre-search
Available features
window_length
The length of the filter window in units oftime
(usually days).break_tolerance
If there are large gaps in time, especially with corresponding flux level offsets, the detrending is much improved when splitting the data into several sub-lightcurves and applying the filter to each individually. Comes with an empirical default and is fully adjustable.edge_cutoff
Trends near edges are less robust. Depending on the data, it may be beneficial to remove edges.cval
Tuning parameter for the robust estimators (see documentation)return_trend
IfTrue
, the method will return a tuple of two elements (flattened_flux
,trend_flux
) wheretrend_flux
is the removed trend. Otherwise, it will only returnflattened_flux
.transit_mask
Mask known transits during detrending (📑 Example)
What method to choose?
It depends on your data and what you like to achieve (relevant xkcd). If possible, try it out! Use wotan with a selection of methods, iterate over their parameter space, and choose what gives the best results for your research.
If that is too much effort, you should first examine your data.
- Is it mostly white (Gaussian) noise? Use a time-windowed sliding mean. This is the most efficient method for white noise.
- With prominent outliers (such as transits or flares), use a robust time-windowed method such as the
biweight
. This is usually superior to themedian
or trimmed methods. - Are there (semi-) periodic trends? In addition to a time-windowed biweight, try a spline-based method. Experimenting with periodic GPs is worthwhile.
Installation
To install the released version, type
$ pip install wotan
which automatically installs numpy
, numba
and scipy
if not present. Depending on the algorithm, additional dependencies exist:
huber
,ramsay
, andhampel
depend onstatsmodels
hspline
andgp
depend onsklearn
pspline
depends onpygam
supersmoother
depends onsupersmoother
To install all additional dependencies, type $ pip install statsmodels sklearn supersmoother pygam
.
Originality
As all scientific work, wōtan is standing on the shoulders of giants. Particularly, many detrending methods are wrapped from existing packages. Original contributions include:
- A time-windowed detrending master module with edge treatments and segmentation options
- Robust location estimates using Newton-Raphson iteration to a precision threshold for Tukey's biweight, Andrew's sine wave, and the Welsch-Leclerc. This is probably a "first", which reduces jitter in the location estimate by ~10 ppm
- Robustified (iterative sigma-clipping) penalized splines for automatic knot distance determination and outlier resistance
- Robustified (iterative sigma-clipping) Gaussian processes
- GP with a periodic kernel informed by a Lomb-Scargle periodogram pre-search
- Bringing together many methods in one place in a common interface, with sensible defaults
- Providing documentation, tutorials, and a paper which compares and benchmarks the methods
Attribution
Please cite Hippke et al. (2019, AJ, 158, 143) if you find this code useful in your research. The BibTeX entry for the paper is:
@ARTICLE{2019AJ....158..143H,
author = {{Hippke}, Michael and {David}, Trevor J. and {Mulders}, Gijs D. and
{Heller}, Ren{\'e}},
title = "{W{\={o}}tan: Comprehensive Time-series Detrending in Python}",
journal = {\aj},
keywords = {eclipses, methods: data analysis, methods: statistical, planetary systems, planets and satellites: detection, Astrophysics - Earth and Planetary Astrophysics, Astrophysics - Instrumentation and Methods for Astrophysics},
year = "2019",
month = "Oct",
volume = {158},
number = {4},
eid = {143},
pages = {143},
doi = {10.3847/1538-3881/ab3984},
archivePrefix = {arXiv},
eprint = {1906.00966},
primaryClass = {astro-ph.EP},
adsurl = {https://ui.adsabs.harvard.edu/abs/2019AJ....158..143H},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}