• Stars
    star
    119
  • Rank 297,930 (Top 6 %)
  • Language
    Python
  • Created about 9 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Proper scoring rules in Python

properscoring

https://travis-ci.org/TheClimateCorporation/properscoring.svg?branch=master

Proper scoring rules for evaluating probabilistic forecasts in Python. Evaluation methods that are "strictly proper" cannot be artificially improved through hedging, which makes them fair methods for accessing the accuracy of probabilistic forecasts. These methods are useful for evaluating machine learning or statistical models that produce probabilities instead of point estimates. In particular, these rules are often used for evaluating weather forecasts.

properscoring runs on both Python 2 and 3. It requires NumPy (1.8 or later) and SciPy (any recent version should be fine). Numba is optional, but highly encouraged: it enables significant speedups (e.g., 20x faster) for crps_ensemble and threshold_brier_score.

To install, use pip: pip install properscoring.

Example: five ways to calculate CRPS

This library focuses on the closely related Continuous Ranked Probability Score (CRPS) and Brier Score. We like these scores because they are both interpretable (e.g., CRPS is a generalization of mean absolute error) and easily calculated from a finite number of samples of a probability distribution.

We will illustrate how to calculate CRPS against a forecast given by a Gaussian random variable. To begin, import properscoring:

import numpy as np
import properscoring as ps
from scipy.stats import norm

Exact calculation using crps_gaussian (this is the fastest method):

>>>> ps.crps_gaussian(0, mu=0, sig=1)
0.23369497725510913

Numerical integration with crps_quadrature:

>>> ps.crps_quadrature(0, norm)
array(0.23369497725510724)

From a finite sample with crps_ensemble:

>>> ensemble = np.random.RandomState(0).randn(1000)
>>> ps.crps_ensemble(0, ensemble)
0.2297109370729622

Weighted by PDF values with crps_ensemble:

>>> x = np.linspace(-5, 5, num=1000)
>>> ps.crps_ensemble(0, x, weights=norm.pdf(x))
0.23370047937569616

Based on the threshold decomposition of CRPS with threshold_brier_score:

>>> threshold_scores = ps.threshold_brier_score(0, ensemble, threshold=x)
>>> (x[1] - x[0]) * threshold_scores.sum(axis=-1)
0.22973090090090081

In this example, we only scored a single observation/forecast pair. But to reliably evaluate a forecast model, you need to average these scores across many observations. Fortunately, all scoring rules in properscoring happily accept and return observations as multi-dimensional arrays:

>>> ps.crps_gaussian([-2, -1, 0, 1, 2], mu=0, sig=1)
array([ 1.45279182,  0.60244136,  0.23369498,  0.60244136,  1.45279182])

Once you calculate an average score, is often useful to normalize them relative to a baseline forecast to calculate a so-called "skill score", defined such that 0 indicates no improvement over the baseline and 1 indicates a perfect forecast. For example, suppose that our baseline forecast is to always predict 0:

>>> obs = [-2, -1, 0, 1, 2]
>>> baseline_score = ps.crps_ensemble(obs, [0, 0, 0, 0, 0]).mean()
>>> forecast_score = ps.crps_gaussian(obs, mu=0, sig=1).mean()
>>> skill = (baseline_score - forecast_score) / baseline_score
>>> skill
0.27597311068630859

A standard normal distribution was 28% better at predicting these five observations.

API

properscoring contains optimized and extensively tested routines for scoring probability forecasts. These functions currently fall into two categories:

  • Continuous Ranked Probability Score (CRPS):
    • for an ensemble forecast: crps_ensemble
    • for a Gaussian distribution: crps_gaussian
    • for an arbitrary cumulative distribution function: crps_quadrature
  • Brier score:
    • for binary probability forecasts: brier_score
    • for threshold exceedances with an ensemble forecast: threshold_brier_score

All functions are robust to missing values represented by the floating point value NaN.

History

This library was written by researchers at The Climate Corporation. The original authors include Leon Barrett, Stephan Hoyer, Alex Kleeman and Drew O'Kane.

License

Copyright 2015 The Climate Corporation

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Contributions

Outside contributions (bug fixes or new features related to proper scoring rules) would be very welcome! Please open a GitHub issue to discuss your plans.

More Repositories

1

mandoline

A distributed, versioned, multi-dimensional array database
Clojure
105
star
2

lemur

Lemur is a tool to launch hadoop jobs locally or on EMR, based on a configuration file, referred to as a jobdef. The jobdef file describes your EMR cluster, local environment, pre- and post-actions and zero or more "steps".
Clojure
86
star
3

unicorn-metrics

A small library to expose application metrics from Rack-based applications using the Unicorn HTTP server
Ruby
68
star
4

clj-spark

A Clojure api for the the Spark project (a fast, open source cluster computing system).
Clojure
63
star
5

squeedo

clojure core.async based amazon SQS message processing
Clojure
62
star
6

clj-newrelic

New Relic integration for Clojure
Clojure
43
star
7

iron_hide

IronHide is an authorization library for Ruby
Ruby
42
star
8

TCCMapTileAnimation

iOS Project to display animated map tiles within MapKit
Objective-C
32
star
9

python-dpkg

Python library for reading Debian package files and comparing version strings
Python
27
star
10

dfo-algorithm

Blackbox derivative-free optimization with DFO-TR algorithm
Python
26
star
11

astro-algo

A Clojure library designed to implement computational methods described in Jean Meeus' Astronomical Algorithms
Clojure
17
star
12

geojson-schema

A geojson schema for validating data using prismatic schema.
Clojure
15
star
13

boomhauer

Alexa Lambda Clojure
Clojure
13
star
14

S3DistVersions

Distributed version restore tool for S3
Clojure
12
star
15

api-example

Climate API Example
Python
11
star
16

mirage

Android image loading library which allows composite disk caches to facilitate offline syncing
Java
9
star
17

prng

pseudorandom number generators in Clojure
Clojure
9
star
18

ensemble

A package for making and working with probabilistic predictions
HTML
7
star
19

ecmwf-api-client-python

Python
7
star
20

mandoline-dynamodb

The DynamoDB backend for the Mandoline distributed array database.
Clojure
5
star
21

iot-firmware

A collection of hacks created during an iot hackday at the The Climate Corporation
Lua
4
star
22

ggscala2

A Scala package that produces plots using library ggplot2 from the software R
Scala
4
star
23

prom-clj

A developer-friendly Clojure wrapper for Prometheus custom metrics.
Clojure
3
star
24

document-services

A JRuby on Rails solution to document management, from template versioning and permissions to document storage and retrieval
Ruby
3
star
25

peta-sage

MNIST with Petastorm on Sagemaker using Tensorflow Estimators
Jupyter Notebook
3
star
26

tcc-oada

TCC implementation of OADA APIs
Java
3
star
27

mandoline-s3

The S3 backend for the Mandoline distributed array database.
Clojure
3
star
28

mandoline-hybrid-s3-dynamodb

Hybrid backend for Mandoline using S3 for data storage and DynamoDB to provide operational consistency
Clojure
2
star
29

label_exporter

Prometheus label_exporter
Go
2
star
30

conda-recipes

build and test recipes for conda
Shell
2
star
31

repoman

A highly scalable apt repository based on Amazon S3 and SimpleDB
Python
2
star
32

mandoline-sqlite

The sqlite backend for the Mandoline distributed array database.
Clojure
1
star
33

three-d3

JavaScript
1
star
34

tracelog

TraceLog is a runtime configurable debug logging system.
Objective-C
1
star
35

iron_hide_sample_app

A very basic demonstration for IronHide authorization
Ruby
1
star
36

fieldmaps

A minimal Python package for visualizing geographic data
Python
1
star