• Stars
    star
    162
  • Rank 232,284 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 6 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Python package for building Bayesian models with TensorFlow or PyTorch

ProbFlow

Version Badge Build Badge Docs Badge Coverage Badge

ProbFlow is a Python package for building probabilistic Bayesian models with TensorFlow 2.0 or PyTorch, performing stochastic variational inference with those models, and evaluating the models' inferences. It provides both high-level modules for building Bayesian neural networks, as well as low-level parameters and distributions for constructing custom Bayesian models.

It's very much still a work in progress.

Getting Started

ProbFlow allows you to quickly and less painfully build, fit, and evaluate custom Bayesian models (or ready-made ones!) which run on top of either TensorFlow 2.0 and TensorFlow Probability or PyTorch.

With ProbFlow, the core building blocks of a Bayesian model are parameters and probability distributions (and, of course, the input data). Parameters define how the independent variables (the features) predict the probability distribution of the dependent variables (the target).

For example, a simple Bayesian linear regression

https://raw.githubusercontent.com/brendanhasz/probflow/master/docs/img/regression_equation.svg?sanitize=true

can be built by creating a ProbFlow Model. This is just a class which inherits pf.Model (or pf.ContinuousModel or pf.CategoricalModel depending on the target type). The __init__ method sets up the parameters, and the __call__ method performs a forward pass of the model, returning the predicted probability distribution of the target:

import probflow as pf
import tensorflow as tf

class LinearRegression(pf.ContinuousModel):

    def __init__(self):
        self.weight = pf.Parameter(name='weight')
        self.bias = pf.Parameter(name='bias')
        self.std = pf.ScaleParameter(name='sigma')

    def __call__(self, x):
        return pf.Normal(x*self.weight()+self.bias(), self.std())

model = LinearRegression()

Then, the model can be fit using stochastic variational inference, in one line:

# x and y are Numpy arrays or pandas DataFrame/Series
model.fit(x, y)

You can generate predictions for new data:

# x_test is a Numpy array or pandas DataFrame
>>> model.predict(x_test)
[0.983]

Compute probabilistic predictions for new data, with 95% confidence intervals:

model.pred_dist_plot(x_test, ci=0.95)

https://raw.githubusercontent.com/brendanhasz/probflow/master/docs/img/pred_dist_light.svg?sanitize=true

Evaluate your model's performance using metrics:

>>> model.metric('mse', x_test, y_test)
0.217

Inspect the posterior distributions of your fit model's parameters, with 95% confidence intervals:

model.posterior_plot(ci=0.95)

https://raw.githubusercontent.com/brendanhasz/probflow/master/docs/img/posteriors_light.svg?sanitize=true

Investigate how well your model is capturing uncertainty by examining how accurate its predictive intervals are:

>>> model.pred_dist_coverage(ci=0.95)
0.903

and diagnose where your model is having problems capturing uncertainty:

model.coverage_by(ci=0.95)

https://raw.githubusercontent.com/brendanhasz/probflow/master/docs/img/coverage_light.svg?sanitize=true

ProbFlow also provides more complex modules, such as those required for building Bayesian neural networks. Also, you can mix ProbFlow with TensorFlow (or PyTorch!) code. For example, even a somewhat complex multi-layer Bayesian neural network like this:

https://raw.githubusercontent.com/brendanhasz/probflow/master/docs/img/dual_headed_net_light.svg?sanitize=true

Can be built and fit with ProbFlow in only a few lines:

class DensityNetwork(pf.ContinuousModel):

    def __init__(self, units, head_units):
        self.core = pf.DenseNetwork(units)
        self.mean = pf.DenseNetwork(head_units)
        self.std  = pf.DenseNetwork(head_units)

    def __call__(self, x):
        z = tf.nn.relu(self.core(x))
        return pf.Normal(self.mean(z), tf.exp(self.std(z)))

# Create the model
model = DensityNetwork([x.shape[1], 256, 128], [128, 64, 32, 1])

# Fit it!
model.fit(x, y)

For convenience, ProbFlow also includes several pre-built models for standard tasks (such as linear regressions, logistic regressions, and multi-layer dense neural networks). For example, the above linear regression example could have been done with much less work by using ProbFlow's ready-made LinearRegression model:

model = pf.LinearRegression(x.shape[1])
model.fit(x, y)

And a multi-layer Bayesian neural net can be made easily using ProbFlow's ready-made DenseRegression model:

model = pf.DenseRegression([x.shape[1], 128, 64, 1])
model.fit(x, y)

Using parameters and distributions as simple building blocks, ProbFlow allows for the painless creation of more complicated Bayesian models like generalized linear models, deep time-to-event models, neural matrix factorization models, and Gaussian mixture models. You can even mix probabilistic and non-probabilistic models! Take a look at the examples and the user guide for more!

Installation

If you already have your desired backend installed (i.e. Tensorflow/TFP or PyTorch), then you can just do:

pip install probflow

Or, to install both ProbFlow and the CPU version of TensorFlow + TensorFlow Probability,

pip install probflow[tensorflow]

Or, to install ProbFlow and the GPU version of TensorFlow + TensorFlow Probability,

pip install probflow[tensorflow_gpu]

Or, to install ProbFlow and PyTorch,

pip install probflow[pytorch]

Support

Post bug reports, feature requests, and tutorial requests in GitHub issues.

Contributing

Pull requests are totally welcome! Any contribution would be appreciated, from things as minor as pointing out typos to things as major as writing new applications and distributions.

Why the name, ProbFlow?

Because it's a package for probabilistic modeling, and it was built on TensorFlow. ¯\_(ツ)_/¯

More Repositories

1

tfp-taxi

Taxi fare prediction using tensorflow probability
Jupyter Notebook
14
star
2

target-encoding

Comparison between label, one-hot, target, and cross-fold target encoding
Jupyter Notebook
13
star
3

svi-gaussian-mixture-model

Notebook fitting a Bayesian Gaussian mixture model via stochastic variational inference w/ TensorFlow 2.0
Jupyter Notebook
13
star
4

dagio

A python package for running directed acyclic graphs of asynchronous I/O operations
Python
10
star
5

brendanhasz.github.io

Personal site and blog
HTML
5
star
6

github-io-blog-template

A template repo for a personal github.io blog site
CSS
5
star
7

hmm-vs-gp

Model comparison between a Bayesian hidden Markov model and a Gaussian process
Stan
5
star
8

dsutils

Some basic utility functions for data science and data analysis
Jupyter Notebook
3
star
9

pipedown

A data science pipelining framework for Python 🤫
Python
3
star
10

loyalty-prediction

Jupyter notebooks for Kaggle's Elo Merchant Category Recommendation
Jupyter Notebook
3
star
11

matlab-uncertainty-viz

Matlab functions for visualizing uncertainty
MATLAB
2
star
12

assoc-neural-oscs

Simulations of networks of neural oscillators which can learn associations between themselves
C
2
star
13

tfp-bayesian-regression

Jupyter Notebook
2
star
14

probflow-v2

A Python package for building Bayesian models with TensorFlow or PyTorch (v2.0)
Python
2
star
15

embedding-regression

Bayesian linear regression w/ embeddings using tensorflow probability
Jupyter Notebook
1
star
16

police-shootings-eda

Exploratory data analysis of the the Washington Post's fatal police shooting dataset
Jupyter Notebook
1
star
17

box-office-prediction

Code for Kaggle's TMDB Box Office Prediction competition
Jupyter Notebook
1
star
18

home-credit-group

Code for the Kaggle Home Credit Group loan risk prediction competition
Jupyter Notebook
1
star
19

nice-ride

Nice Ride MN bikesharing system demand analysis
Jupyter Notebook
1
star
20

career-village

Jupyter notebooks for the career village Kaggle competition
Jupyter Notebook
1
star
21

bayesian-correlation

Multilevel Bayesian correlation using Stan and PyStan
Jupyter Notebook
1
star
22

decision-tree

Simple implementation of a decision tree in python
Python
1
star
23

nice-ride-dashboard

Dashboard displaying actual and predicted numbers of bikes available at Nice Ride stations
Jupyter Notebook
1
star