• This repository has been archived on 11/Apr/2024
  • Stars
    star
    250
  • Rank 162,397 (Top 4 %)
  • Language
    HTML
  • Created almost 8 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Materials from Stan conferences
Stan Logo

Materials from StanCon

This repository contains materials from StanCon and links to StanCon content hosted elsewhere. Unless otherwise noted, the text in this repository is distributed under the CC BY 4.0 License and code is distributed under the New BSD License. Copyright to the authors.

Contents:

2020 Virtual Conference

2019 Cambridge (UK)

2018 Helsinki

2018 Asilomar

2017 NYC


StanCon 2018 | August 29-31, Aalto University, Helsinki

StanCon’s version of conference proceedings is a collection of contributed talks based on interactive notebooks. Every submission is peer reviewed by at least two reviewers. The reviewers are members of the Stan Conference Organizing Committee and the Stan Developmemt Team. This repository contains all of the accepted notebooks as well as any supplementary materials required for building the notebooks. The slides presented at the conference are also included.

2018 Helsinki Peer reviewed contributed talks

Solving ODEs in the wild: Scalable pharmacometrics with Stan

  • Authors: Sebastian Weber (Novartis)

Pharmacometric modeling involves nonlinear hierarchical models, which are most naturally expressed as forced ordinary differential equations (ODEs). These class of models lead to a number of challenges which complicate a practical modeling work-flow in Stan mostly due to long model execution times. This contribution demonstrates at the example of the drug Warfarin how forced ODEs can be written efficiently in Stan leading to a doubling in model evaluation speed for the presented example. Finally, it is demonstrated how the new map_rect facility in Stan can be used to make models scalable to large data sets leading to substantial speedups in model evaluation time and most importantly this enables to scale Stan's performance as needed.

DOI

Links:


Analysis of repeated measures data in RStan

  • Authors: Marco Munda (Pharmalex)

We illustrate the analysis of repeated measures data in the Bayesian framework using RStan. In addition to the modelling itself, we further show how to make inference on the primary effect based on a probability of success, and how to predict the longitudinal profile of a future patient, two difficult (if not impossible) tasks from a frequentist perspective.

DOI

Links:


Define custom response distributions with brms

  • Authors: Paul Bürkner (University of Münster)

The brms package (Bürkner, 2017a, 2017b) implements Bayesian regression models using the probabilistic programming language Stan (Carpenter et al., 2016) behind the scenes. It has grown to be one of the most flexible R packages when it comes to multilevel regression modelling. A wide range of response distributions are supported, allowing users to fit -- among others -- linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, and even self-defined mixture models all in a multilevel context. Predictor terms can be specified with a simple yet powerful formula syntax. Thanks to Stan, even very complex models tend to converge well in a reasonable amount of time. While brms comes with a lot of built-in response distributions (usually called families in R), there is still a long list of distributions which are not natively supported. The present notebook will explain how to specify such custom families in brms. By doing that, users can benefit from the modeling flexibility and post-processing options of brms even when applying self-defined response distributions.

DOI

Links:


Are shots predictive of soccer results?

  • Authors: Leonardo Egidi, Francesco Pauli, Nicola Torelli (University of Trieste)

Predicting the outcome of a soccer match is the subject of much debate, and several models based on different assumptions have been proposed for modeling the numbers of goals scored by two competing teams. In this case study we adopt a different perspective and propose a Bayesian hierarchical model consisting of three nested multiple outcomes: number of scores, number of shots on target and number of total shots. We model the number of scores and the number of shots on target with two binomial distributions respectively, whereas the total shots follow a negative binomial distributon. Our dataset consists of nine seasons of the English Premier League (EPL): eight seasons---from 2008/2009 to 2015/2016, 3040 matches---represent the train set, whereas the nineth season, 2016/2017 (with the remaining 380 matches), is our test set, used for out-of sample prediction.

DOI

Links:


Flexible models of holiday lift

  • Authors: Alex Braylan, Dan Marthaler (Revionics)

We develop a flexible, portable, and interpretable Bayesian model of cyclical holiday effects on time series. Our model uses five parameters for each possible holiday that capture the general shape, magnitude, and peak location offset of each holiday effect. Choice of priors prevents the model from overfitting while still achieving considerable flexibility. We experiment on simulated and real data from Google Trends and demonstrate the model's performance on held-out data.

DOI

Links:


Artificial turf advantage and predictive accuracy in Dutch football

  • Authors: Gertjan Verhoeven

This submission uses Stan to learn about the so-called artificial turf advantage in Dutch football. Two model variants are used to model match outcomes. One of the models is the model from Milad Kharratzadeh presented at Stancon 2017, the other is new to Stan (a dynamic Skellam model). I use out-of-sample forecasts together with the Ranked Probability Score (a proper scoring rule) to learn whether including the artificial turf advantage increases predictive accuracy. Bookmakers odds are used as a benchmark to check the quality of our forecasts.

DOI

Links:


ODE model of gene regulation

  • Authors: Martin Modrák (Czech Academy of Sciences)

In this notebook we fit time series of gene expression data with a non-linear ODE-based model. Splines are used to model noisily observed regulator expression. The ODE is not solved explicitly, it is instead transformed to a definite integral and integrated via the trapezoid rule. Some interesting reparametrizations are introduced to make the model well identified.

DOI

Links:


Predicting New York City school enrollment

  • Authors: Jonathan Auerbach, Timothy Jones, Robin Winstanley (Columbia University)

We propose a Bayesian hierarchical Age-Period-Cohort model to predict elementary school enrollment in New York City. We demonstrate this model using student enrollment data for grades K-5 in each Census Tract of Brooklyn's 20th School District over the 2001-02 to 2010-11 school years. Specifically, our model disaggregates enrollment into grade (age), year (period), and cohort effects so that each can be interpreted and extrapolated over the 2011-12 to 2017-18 school years. We find this approach ideal for incorporating spatial information indicative of the socioeconomic forces that determine school enrollment in New York City.

DOI

Links:


Analyzing brain taxonomy trees

  • Authors: Chris Hammill, Jason Lerch

This talk and notebook introduce the idea of analyzing brain anatomy as a taxonomy of structures, defined by containment, using Stan. This taxonomy imposes dependence between model coefficients for structures and their enclosing structure. The slides and notebook compare this approach to other hierarchical modelling strategies using the mouse brain for illustration.

DOI

Links:


Getting more out of Stan: some ideas from the Haskell bindings

  • Authors: Thomas A. Nielsen (Tweag I/O), Dominic Steinitz (Tweag I/O), Henrik Nilsson (University of Nottingham)

We present draft bindings to Stan in Haskell, a purely functional programming language. Unlike in most bindings, our models are encoded as a data type the host language. We show how this can be used to widen the range of computations that can be done based on the Stan model definition. For instance, predictions, posterior predictive checks and residual calculations can be done based on a single model definition.

DOI

Links:


Dose-finding clinical trial designs in Stan with trialr

  • Authors: Kristian Brock (University of Birmingham)

My notebook illustrates two different methods for conducting dose-finding clinical trials. The first, CRM, escalates dose according to toxicity outcomes only, assuming implicitly that higher doses are more likely to benefit the patient. The second, EffTox, escalates dose according to efficacy and toxicity outcomes, thus addressing the potential that higher may not always mean better. These models are implemented in the trialr R-package using Stan.

DOI

Links:


"The implementation of a model of choice: the (truncated) linear ballistic accumulator.

  • Authors: Bruno Nicenboim (University of Potsdam)

It is very common in cognitive science and psychology to use experimental tasks that involve making a fast choice among a restricted number of alternatives. In this notebook, I focus on one influential and relatively simple model that belongs to the class of sequential-sampling models: the linear ballistic accumulator with a drift rate drawn from a normal distribution (restricted to positive values) (S. D. Brown and Heathcote 2008; Heathcote and Love 2012). First, I discuss the motivation for fitting this model using the Stroop task (Stroop 1935) as a case study. Then, I discuss the challenges in the implementation of the model in (R)Stan (Stan Development Team 2017), which might also apply to other hierarchical models with complex likelihood functions. Finally, I show some results that exemplify how the linear ballistic accumulator can be used for examining individual differences.

DOI

Links:


Hierarchical Ornstein-Uhlenbeck type t-processes

  • Authors: Ville Laitinen, Leo Lahti (University of Turku)

This work investigates probabilistic time series models that are motivated by applications in statistical ecology. In particular, we investigate variants of the mean-reverting and stochastic Ornstein-Uhlenbeck (OU) process. We provide a hierarchical extension for joint analysis of multiple (short) time series, validate the model, and analyze its performance with simulations. The works extends the recent Stan implementation of the OU process (A 2018), where parameter estimates of a Student-t type OU process are obtained based on a single (long) time series. We have added a level of hierarchy, which allows joint inference of the model parameters across multiple time series.

DOI

Links:


GPU optimized math routines in the Stan math library

  • Rok ÄŒeÅ¡novar, Davor Sluga, Jure DemÅ¡ar, Steve Bronder, Erik Å trumbelj

The Stan Math library's Hamilton Monte Carlo (HMC) sampler has computationally expensive draws while usually searching the target distribution more efficiently than alternative MCMC methods with fewer iterations. The bottleneck within draws makes Stan a prime candidate for GPU optimizations within samples. This project implements GPU optimizations for the Cholesky decomposition and it's derivative in the Stan Math library [@stanmath2015]. This work is the first known open source implementation of the Cholesky decomposition with a GPU in an HMC setting. Furthermore, the GPU kernels use OpenCL which allows the use of these methods across any brand of GPU. While results show that GPU optimizations are not optimal for small $N\times M$ matrices, large matrices can see speedups of 7.8x while retaining the same precision as models run purely on a CPU.

DOI

Links:


Relating Disparate Measures of Coagulapathy Using Unorthodox Data: A Hybrid Mechanistic-Statistical Approach

  • Authors: Arya A. Pourzanjani, Tie Bo Wu, Benjamin B. Bales, Linda R. Petzold (University of California, Santa Barbara)

DOI

Links:


Modeling the Effects of Nutrition with Mixed-Effect Bayesian Network

  • Authors: Jari Turkia (University of Eastern Finland)

This work proposes Mixed-Effect Bayesian Network (MEBN) as a method for modeling the effects of nutrition. It allows identifying both typical and personal correlations between nutrients and their bodily responses. Predicting a personal network of nutritional reactions would allow interesting applications at personal diets and in understanding this complex system. Brief theory of MEBN is first given, followed by the implementation in R and Stan. A real life dataset from a nutritional study (Sysdimet) is then analyzed with this method and the results are visualized with a responsive JavaScript-visualization.

DOI

Links:


Using counterfactual queries to improve models for decision-support

  • Authors: Sundin, Iiris(1) Peter Schulam(2) Eero Siivola(1) Aki Vehtari(1) Suchi Saria(2) Samuel Kaski(1) 1 Department of Computer Science, Aalto University, Espoo, Finland 2 Department of Computer Science, Johns Hopkins University

In this extended abstract, we generalize active learning to tasks where a human has to choose which action a to take for a target after observing its covariates x ̃ and predicted outcomes p( ̃y|x, a ̃ ). An example case is personalized medicine and the decision of which treatment to give to a pa- tient. We show that standard active learning, which is not aware of the final task, would be very inefficient, and we introduce a new problem of decision-making-aware active learning. We for- mulate the problem as finding the query with the highest information gain for the specific decision- making task, assuming a rational decision-maker. The problem can be solved particularly efficiently assuming an expert able to answer queries about counterfactuals. We demonstrate the effective- ness of the proposed method in a binary outcome decision-making task using simulated data, and in a continuous-valued outcome task on the medical dataset IHDP with synthetic treatment outcomes. The outcomes are predicted using Gaussian processes.

DOI

Links:

2018 Helsinki Invited talks

Hierarchical modelling of galaxy clusters for cosmology

  • Presenter: Maggie Lieu (European Space Agency)
  • Video
  • Slides

Bad data, big models & statistical methods for studying evolution

  • Presenter: Richard McElreath (Max Planck Institute for Evolutionary Anthropology)
  • Video
  • Slides

Identifying the effect of public holidays on daily demand for gas

  • Presenter: Sarah Heaps (Newcastle University)
  • Video
  • Slides

Esther Williams in the Harold Holt Memorial Swimming Pool

  • Presenter: Daniel Simpson (University of Toronto)
  • Video
  • Slides

2018 Helsinki tutorials

Basics of Bayesian inference and Stan

Hierarchical models

Stan C++ development: adding a new function to Stan

Ordinary differential equation (ODE) models in Stan

  • Instructor: Daniel Lee
  • Video

Productization of Stan

Instructor: Eric Novik Productization panel discussion: Markus Ojala (Smartly), Tom Nielsen (Tweag.io), Anna Kircher (Lendable), Eric Novik (Generable)

Model assessment and selection


StanCon 2018 | January 10-12, Asilomar, California

StanCon’s version of conference proceedings is a collection of contributed talks based on interactive notebooks. Every submission is peer reviewed by at least two reviewers. The reviewers are members of the Stan Conference Organizing Committee and the Stan Developmemt Team. This repository contains all of the accepted notebooks as well as any supplementary materials required for building the notebooks. The slides presented at the conference are also included.

2018 Peer reviewed contributed talks

Does the New York City Police Department rely on quotas?

  • Authors: Jonathan Auerbach (Columbia University)

This submission investigates whether the New York City Police Department (NYPD) uses productivity targets or quotas to manage officers in contravention of New York State Law. The analysis is presented in three parts. First, the NYPD's employee evaluation system is introduced, and the criticism that it constitutes a quota is summarized. Secondly, a publically available dataset of traffic tickets issued by NYPD officers in 2014 and 2015 is described. Finally, a generative model to describe how officers write traffic tickets is proposed. The fitted model is consistent with the criticism that police officers substantially alter their ticket writing to coincide with departmental targets. The submission concludes by discussing the implication of these findings and offering directions for further research.

DOI

Links:


Diagnosing Alzheimer’s the Bayesian way

  • Authors: Arya A. Pourzanjani, Benjamin B. Bales, Linda R. Petzold, Michael Harrington (UC Santa Barbara)

Alzheimer's Disease is one the most debilitating diseases, but how do we diagnose it accurately? Researchers have been trying to answer this question by building generative models to describe how patient biomarkers, such as MRI scans, psychological tests, and lab tests relate over time to the underlying brain deterioration that's present in Alzheimer's Disease. In this notebook we show how we translated these models to the Bayesian framework in Stan and how this allowed for several model improvements that can ultimately improve our understanding of Alzheimer's and help physicians in diagnosis. In particular, we describe how we hierarchically model patient disease trajectories to obtain stable estimates for patients who lack data. We describe how fitting in Stan yields uncertainties on these disease trajectories, and why that is important for weighing the pros and cons of risky treatment. Lastly, we describe a new method for Bayesian modeling of these monotonic disease trajectories in Stan using I-Splines.

DOI

Links:


Joint longitudinal and time-to-event models via Stan

  • Authors: Sam Brilleman, Michael Crowther, Margarita Moreno-Betancur, Jacqueline Buros Novik, Rory Wolfe (Monash University, Columbia University)

The joint modelling of longitudinal and time-to-event data has received much attention in the biostatistical literature in recent years. In this notebook (and talk), we describe the implementation of a shared parameter joint model for longitudinal and time-to-event data in Stan. The methods described in the notebook are a simplified version of those underpinning the stan_jm modeling function that has recently been contributed to the rstanarm R package.

DOI

Links:


A tutorial on Hidden Markov Models using Stan

  • Authors: Luis Damiano, Brian Peterson, Michael Weylandt

We implement a standard Hidden Markov Model (HMM) and the Input-Output Hidden Markov Model for unsupervised learning of time series dynamics in Stan. We begin by reviewing three commonly-used algorithms for inference and parameter estimation, as well as a number of computational techniques and modeling strategies that make full Bayesian inference practical. For both models, we demonstrate the effectiveness of our proposed approach in simulations. Finally, we give an example of embedding a HMM within a larger model using an example from the econometrics literature.

DOI

Links:


Student Ornstein-Uhlenbeck models served three ways (with applications for population dynamics data)

  • Authors: Aaron Goodman (Stanford University)

Ornstein-Uhlenbeck (OU) processes are a mean reverting process and is used to model dynamics in biology, physics, and finance. I fit an extension of the OU process that is driven by a Lévy process with Student's t-marginals rather than Brownian motion with Gaussian marginals, which allows for heavy-tailed increments. I implement four formulations of the Student-t OU-type model in Stan and compare the sampling performance on both real and simulated population dynamic data.

DOI

Links:


SlicStan: a blockless Stan-like language

  • Authors: Maria I. Gorinova, Andrew D. Gordon, Charles Sutton (University of Edinburgh)

We present SlicStan — a probabilistic programming language that compiles to Stan and uses static analysis techniques to allow for more abstract and flexible models. SlicStan is novel in two ways: (1) it allows variable declarations and statements to be automatically shredded into different components needed for efficient Hamiltonian Monte Carlo inference, and (2) it introduces more flexible user-defined functions that allow for new model parameters to be declared as local variables. This work demonstrates that efficient automatic inference can be the result of the machine learning and programming languages communities joint efforts.

DOI

Links:


idealstan: an R package for ideal point modeling with Stan

  • Authors: Robert Kubinec (University of Virginia)

Item-response theory (IRT) ideal-point scaling/dimension reduction methods that incorporate additional response categories and missing/censored values, including absences and abstentions, for roll call voting data (or any other kind of binary or ordinal item-response theory data). Full and approximate Bayesian inference is done via Stan.

DOI

Links:


Computing steady states with Stan’s nonlinear algebraic solver

  • Authors: Charles C. Margossian (Metrum, Columbia University)

Stan’s numerical algebraic solver can be used to solve systems of nonlinear algebraic equations with no closed form solutions. One of its key applications in scientific and engineering fields is the computation of equilibrium states (equivalently steady states). This case study illustrates the use of the algebraic solver by applying it to a problem in pharmacometrics. In particular, I show the algebraic system we solve can be quite complex and embed, for instance, numerical solutions to ordinary differential equations. The code in R and Stan are provided, and a Bayesian model is fitted to simulated data.

DOI

Links:


Bayesian estimation of mechanical elastic constants

  • Authors: Ben Bales, Brent Goodlet, Tresa Pollock, Linda Petzold (UC Santa Barbara)

This outlines a Bayesian approach to resonance ultrasound spectroscopy (RUS), a technique for estimating elastic constants of a material from a sample's measured resonance modes. The notebook includes an example of how to take advantage of custom automatic differentiation in specialized Stan models (either for numerical or efficiency reasons).

DOI

Links:


Aggregate random coefficients logit — a generative approach

  • Authors: Jim Savage (Lendable Marketplace), Shoshana Vasserman (Harvard University).

This notebook illustrates how to fit aggregate random coefficient logit models in Stan, using Bayesian techniques. It’s far easier to learn and implement than the standard BLP algorithm, and has the benefits of being robust to mismeasurement of market shares, and giving limited-sample posterior uncertainty of all parameters (and demand shocks). This comes at the cost of modeling firms’ price-setting process, including how unobserved product-market demand shocks affect prices.

DOI

Links:


The threshold test: Testing for racial bias in vehicle searches by police

  • Authors: Camelia Simoiu, Sam Corbett-Davies, Sharad Goel, Emma Pierson (Stanford University)

We develop a new statistical test to detect bias in decision making — the threshold test—that mitigates the problem of infra-marginality by jointly estimating decision thresholds and risk distributions.

DOI

Links:


Assessing the safety of Rosiglitazone for the treatment of type II diabetes

  • Authors: Konstantinos Vamvourellis, K. Kalogeropoulos, L. Phillips (London School of Economics and Political Science)

A Bayesian paradigm for making drug approval decisions. Case study in the treatment of Diabetes (Type 2).

DOI

Links:


Causal inference with the g-formula in Stan

  • Authors: Leah Comment (Harvard University)

The potential outcomes framework often uses one or more parametric outcome models to learn about underlying causal processes. In Stan, parameter estimation using observed data takes place in the model block, while simulation-based estimation of causal parameters using the g-formula can be done separately with generated quantities. Bayesian estimation allows for data-driven sensitivity analysis regarding the assumption of no unmeasured confounding. This presentation shows some simple causal models, then outlines a basic sensitivity analysis using prior information derived from an external data source.

DOI

Links:


Bayesian estimation of ETAS models with Rstan

  • Authors: Fausto Fabian Crespo Fernandez (Universidad San Francisco de Quito)

Earthquake modeling with Stan. Applied to seismic recurrence in Ecuador in 2016.

DOI

Links:


2018 Invited talks

Predictive information criteria in hierarchical Bayesian models for clustered data

ScalaStan

Stan applications in physics: Testing quantum mechanics and modeling neutrino masses

Forecasting at scale: How and why we developed Prophet for forecasting at Facebook

Stan applications in human genetics: Prioritizing genetic mutations that protect individuals from human disease

Statistics using geometry to show uncertainties and integrate graph information

A brief history of Stan

Model assessment, model selection and inference after model selection

Spatial models in Stan: intrinsic auto-regressive models for areal data

Some problems I'd like to solve in Stan, and what we'll need to do to get there

  • Presenter: Andrew Gelman (Columbia University)
  • Video

StanCon 2017 | January 21, Columbia University, New York

StanCon’s version of conference proceedings is a collection of contributed talks based on interactive notebooks. Every submission is peer reviewed by at least two reviewers. The reviewers are members of the Stan Conference Organizing Committee and the Stan Developmemt Team. This repository contains all of the accepted notebooks as well as any supplementary materials required for building the notebooks. The slides presented at the conference are also included.

2017 Peer reviewed contributed talks

Twelve Cities: Does lowering speed limits save pedestrian lives?

  • Authors: Jonathan Auerbach, Rob Trangucci (Columbia University)

We investigate whether American cities can expect to achieve a meaningful reduction in pedestrian deaths by lowering the posted speed limit. We find some evidence that a lower speed limit does in fact reduce fatality rates, and our estimated causal effect is similar to the traditional before-after analysis espoused by policy analysts. Nevertheless, we conclude that adjusting the posted speed limit in urban environments does not correspond with a reliable reduction in pedestrian fatalities.

DOI

Links:


Hierarchical Bayesian Modeling of the English Premier League

  • Authors: Milad Kharratzadeh (Columbia University)

In this case study, we provide a hierarchical Bayesian model for the English Premier League in the season of 2015/2016. The league consists of 20 teams and each two teams play two games with each other (home and away games). So, in total, there are 38 weeks, and 380 games. We model the score difference (home team goals − away team goals) in each match. The main parameters of the model are the teams’ abilities which is assumed to vary over the course of the 38 weeks. The initial abilities are determined by performance in the previous season plus some variation.

DOI

Links:


Advertising Attribution Modeling in the Movie Industry

  • Authors: Victor Lei, Nathan Sanders, Abigail Dawson (Legendary Entertainment)

We present a Bayesian method for inferring advertising platform effectiveness as applied to the movie industry, and show some possibilities for drawing inferences by analyzing model parameters at different levels of the hierarchy. In addition, we show some common ways to check model efficacy, and possibilities for comparing between different models.

DOI

Links:


hBayesDM: Hierarchical Bayesian modeling of decision-making tasks

  • Authors: Woo-Young Ahn, Nate Haines, Lei Zhang (Ohio State University)

hBayesDM (hierarchical Bayesian modeling of Decision-Making tasks) is a user-friendly R package that offers hierarchical Bayesian analysis of various computational models on an array of decision-making tasks.

DOI

Links:


Differential Equation Based Models in Stan

  • Authors: Charles Margossian, Bill Gillespie (Metrum Research Group)

Differential equations can help us model sophisticated processes in biology, physics, and many other fields. Over the past year, the Stan team has developed many tools to tackle models based on differential equations.

DOI

Links:


How to Test IRT Models Using Simulated Data

  • Authors: Teddy Groves (Football Radar)

This notebook explains how to code some IRT models using Stan and test whether they can recover input parameters when given simulated data.

DOI

Links:


Models of Retrieval in Sentence Comprehension

  • Authors: Bruno Nicenboim, Shravan Vasishth (University of Potsdam)

This work presents an evaluation of two well-known models of retrieval processes in sentence comprehension, the activation-based model and the direct-access model. We implemented these models in a Bayesian hierarchical framework and showed that some aspects of the data can be explained better by the direct access model. Specifically, the activation-based cannot predict that, on average, incorrect retrievals would be faster than correct ones. More generally, our work leverages the capabilities of Stan to provide a powerful framework for flexibly developing computational models of competing theories of retrieval, and demonstrates how these models’ predictions can be compared in a Bayesian setting.

DOI

Links:


Hierarchical Gaussian Processes in Stan

  • Authors: Rob Trangucci (Columbia University)

Stan’s library has been expanded with functions that facilitate adding Gaussian processes (GPs) to Stan models. I will share the best practices for coding GPs in Stan, and demonstrate how GPs can be added as one component of a larger model.

DOI

Links:


Modeling the Rate of Public Mass Shootings with Gaussian Processes

  • Authors: Nathan Sanders, Victor Lei (Legendary Entertainment)

We have used Stan to develop a new model for the annualized rate of public mass shootings in the United States based on a Gaussian process with a time-varying mean function. This design yields a predictive model with the full non-parametric flexibility of a Gaussian process, while retaining the direct interpretability of a parametric model for long-term evolution of the mass shooting rate. We apply this model to the Mother Jones database of public mass shootings and explore the posterior consequences of different prior choices and of correlations between hyperparameters. We reach conclusions about the long term evolution of the rate of public mass shootings in the United States and short-term periods deviating from this trend.

DOI

Links:


More Repositories

1

stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
C++
2,589
star
2

rstan

RStan, the R interface to Stan
R
973
star
3

pystan2

PyStan, the Python interface to Stan
Python
918
star
4

example-models

Example models for Stan
HTML
772
star
5

math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
C++
744
star
6

bayesplot

bayesplot R package for plotting Bayesian models
R
431
star
7

rstanarm

rstanarm R package for Bayesian applied regression modeling
R
387
star
8

pystan

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io
Python
270
star
9

shinystan

shinystan R package and ShinyStan GUI
R
195
star
10

cmdstan

CmdStan, the command line interface to Stan
C++
182
star
11

posteriordb

Database with posteriors of interest for Bayesian inference
Stan
181
star
12

posterior

The posterior R package
R
167
star
13

loo

loo R package for approximate leave-one-out cross-validation (LOO-CV) and Pareto smoothed importance sampling (PSIS)
R
150
star
14

cmdstanpy

CmdStanPy is a lightweight interface to Stan for Python users which provides the necessary objects and functions to compile a Stan program and fit the model to data using CmdStan.
Python
146
star
15

cmdstanr

CmdStanR: the R interface to CmdStan
R
144
star
16

stanc3

The Stan transpiler (from Stan to C++ and beyond).
OCaml
140
star
17

projpred

Projection predictive variable selection
R
110
star
18

stan-mode

Emacs mode for Stan.
Emacs Lisp
71
star
19

rstantools

Tools for Developing R Packages Interfacing with Stan
R
51
star
20

docs

Documentation for the Stan language and CmdStan
TeX
37
star
21

httpstan

HTTP interface to Stan, a package for Bayesian inference.
Python
35
star
22

design-docs

33
star
23

MathematicaStan

A Mathematica package to interact with CmdStan
Mathematica
27
star
24

connect22-space-time

StanCon Connect 2022 space and time
HTML
24
star
25

stancon2023

Materials for StanCon 2023
Jupyter Notebook
23
star
26

statastan

Stata interface for Stan.
Stata
20
star
27

nomad

Fast autodiff.
C++
18
star
28

gmo

Inference on marginal distributions using gradient-based optimization
R
13
star
29

posteriordb-python

Python
11
star
30

stat_comp_benchmarks

Benchmark Models for Evaluating Algorithm Accuracy
R
9
star
31

posteriordb-r

R
8
star
32

pystan-wheels

Automated builds of OSX and manylinux wheels for pystan
Shell
8
star
33

performance-tests-cmdstan

Performance testing tools for use with CmdStan
Python
8
star
34

perf-math

C++
7
star
35

logos

Stan logos
HTML
5
star
36

r-packages

Repository for distributing (some) stan-dev R packages
4
star
37

httpstan-wheels

Wheels for httpstan
Shell
4
star
38

visual-diagnostics

Visual diagnostics for HMC using gnuplot.
Shell
4
star
39

sgb

Stan Governing Body issue tracker and meeting notes
4
star
40

atom-language-stan

JavaScript
3
star
41

stan2tfp

Stan2TFP is a work-in-progress alternative backend for Stanc3 which targets TensorFlow Probability
OCaml
2
star
42

.github

Stan organization READMEs and information
1
star
43

jenkins-shared-libraries

Libraries for our Jenkinsfiles
Groovy
1
star
44

stan-discourse-theme-component

HTML
1
star
45

propaganda

Sell sheets and the like
TeX
1
star
46

ci-scripts

Formerly syclik's stan-scripts repo. Contains scripts used by Jenkins as well as the release scripts and performance scripts.
Shell
1
star