• Stars
    star
    387
  • Rank 110,971 (Top 3 %)
  • Language
    R
  • License
    GNU General Publi...
  • Created over 11 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

rstanarm R package for Bayesian applied regression modeling

rstanarm

CRAN_Status_Badge Downloads R-CMD-check

Bayesian applied regression modeling (arm) via Stan

This is an R package that emulates other R model-fitting functions but uses Stan (via the rstan package) for the back-end estimation. The primary target audience is people who would be open to Bayesian inference if using Bayesian software were easier but would use frequentist software otherwise.

Fitting models with rstanarm is also useful for experienced Bayesian software users who want to take advantage the pre-compiled Stan programs that are written by Stan developers and carefully implemented to prioritize numerical stability and the avoidance of sampling problems.

Click the arrows for more details:

More detail

The rstanarm package is an appendage to the rstan package, the R interface to Stan. rstanarm enables many of the most common applied regression models to be estimated using Markov Chain Monte Carlo, variational approximations to the posterior distribution, or optimization. The package allows these models to be specified using the customary R modeling syntax (e.g., like that of glm with a formula and data.frame). Additional arguments are provided for specifying prior distributions.

The set of models supported by rstanarm is large (and will continue to grow), but also limited enough so that it is possible to integrate them tightly with the pp_check function for graphical posterior predictive checks using bayesplot and the posterior_predict function to easily estimate the effect of specific manipulations of predictor variables or to predict the outcome in a training set.

The fitted model objects returned by the rstanarm modeling functions are called stanreg objects. In addition to all of the traditional methods defined for fitted model objects, stanreg objects can also be used with the loo package for leave-one-out cross-validation, model comparison, and model weighting/averaging and the shinystan package for exploring the posterior distribution and model diagnostics with a graphical user interface.

Check out the rstanarm vignettes for examples and more details about the entire process.

Modeling functions

The model estimating functions are described in greater detail in their individual help pages and vignettes. Here we provide a very brief overview:

  • stan_lm, stan_aov,stan_biglm

    Similar to lm and aov but with novel regularizing priors on the model parameters that are driven by prior beliefs about R-squared, the proportion of variance in the outcome attributable to the predictors in a linear model.

  • stan_glm, stan_glm.nb

    Similar to glm but with various possible prior distributions for the coefficients and, if applicable, a prior distribution for any auxiliary parameter in a Generalized Linear Model (GLM) that is characterized by a family object (e.g. the shape parameter in Gamma models). It is also possible to estimate a negative binomial model similar to the glm.nb function in the MASS package.

  • stan_glmer, stan_glmer.nb, stan_lmer

    Similar to the glmer, glmer.nb, and lmer functions (lme4 package) in that GLMs are augmented to have group-specific terms that deviate from the common coefficients according to a mean-zero multivariate normal distribution with a highly-structured but unknown covariance matrix (for which rstanarm introduces an innovative prior distribution). MCMC provides more appropriate estimates of uncertainty for models that consist of a mix of common and group-specific parameters.

  • stan_nlmer

    Similar to nlmer (lme4 package) package for nonlinear "mixed-effects" models, but flexible priors can be specified for all parameters in the model, including the unknown covariance matrices for the varying (group-specific) coefficients.

  • stan_gamm4

    Similar to gamm4 (gamm4 package), which augments a GLM (possibly with group-specific terms) with nonlinear smooth functions of the predictors to form a Generalized Additive Mixed Model (GAMM). Rather than calling lme4::glmer like gamm4 does, stan_gamm4 essentially calls stan_glmer, which avoids the optimization issues that often crop up with GAMMs and provides better estimates for the uncertainty of the parameter estimates.

  • stan_polr

    Similar to polr (MASS package) in that it models an ordinal response, but the Bayesian model also implies a prior distribution on the unknown cutpoints. Can also be used to model binary outcomes, possibly while estimating an unknown exponent governing the probability of success.

  • stan_betareg

    Similar to betareg (betareg package) in that it models an outcome that is a rate (proportion) but, rather than performing maximum likelihood estimation, full Bayesian estimation is performed by default, with customizable prior distributions for all parameters.

  • stan_clogit

    Similar to clogit (survival package) in that it models an binary outcome where the number of successes and failures is fixed within each stratum by the research design. There are some minor syntactical differences relative to survival::clogit that allow stan_clogit to accept group-specific terms as in stan_glmer.

  • stan_mvmer

    A multivariate form of stan_glmer, whereby the user can specify one or more submodels each consisting of a GLM with group-specific terms. If more than one submodel is specified (i.e. there is more than one outcome variable) then a dependence is induced by assuming that the group-specific terms for each grouping factor are correlated across submodels.

  • stan_jm

    Estimates shared parameter joint models for longitudinal and time-to-event (i.e. survival) data. The joint model can be univariate (i.e. one longitudinal outcome) or multivariate (i.e. more than one longitudinal outcome). A variety of parameterisations are available for linking the longitudinal and event processes (i.e. a variety of association structures).

Estimation algorithms

The modeling functions in the rstanarm package take an algorithm argument that can be one of the following:

  • Sampling (algorithm="sampling"):

Uses Markov Chain Monte Carlo (MCMC) --- in particular, Stan's implementation of Hamiltonian Monte Carlo (HMC) with a tuned but diagonal mass matrix --- to draw from the posterior distribution of the parameters. This is the slowest but most reliable of the available estimation algorithms and it is the default and recommended algorithm for statistical inference.

  • Mean-field (algorithm="meanfield"):

Uses mean-field variational inference to draw from an approximation to the posterior distribution. In particular, this algorithm finds the set of independent normal distributions in the unconstrained space that --- when transformed into the constrained space --- most closely approximate the posterior distribution. Then it draws repeatedly from these independent normal distributions and transforms them into the constrained space. The entire process is much faster than HMC and yields independent draws but is not recommended for final statistical inference. It can be useful to narrow the set of candidate models in large problems, particularly when specifying QR=TRUE in stan_glm, stan_glmer, and stan_gamm4, but is only an approximation to the posterior distribution.

  • Full-rank (algorithm="fullrank"):

Uses full-rank variational inference to draw from an approximation to the posterior distribution by finding the multivariate normal distribution in the unconstrained space that --- when transformed into the constrained space --- most closely approximates the posterior distribution. Then it draws repeatedly from this multivariate normal distribution and transforms the draws into the constrained space. This process is slower than meanfield variational inference but is faster than HMC. Although still an approximation to the posterior distribution and thus not recommended for final statistical inference, the approximation is more realistic than that of mean-field variational inference because the parameters are not assumed to be independent in the unconstrained space. Nevertheless, fullrank variational inference is a more difficult optimization problem and the algorithm is more prone to non-convergence or convergence to a local optimum.

  • Optimizing (algorithm="optimizing"):

Finds the posterior mode using a C++ implementation of the LBGFS algorithm. If there is no prior information, then this is equivalent to maximum likelihood, in which case there is no great reason to use the functions in the rstanarm package over the emulated functions in other packages. However, if priors are specified, then the estimates are penalized maximum likelihood estimates, which may have some redeeming value. Currently, optimization is only supported for stan_glm.


Resources

Installation

Latest Release

The most recent rstanarm release can be installed from CRAN via

install.packages("rstanarm")

Development Version

To install from GitHub, first make sure that you can install the rstan package and C++ toolchain by following these instructions. Once rstan is successfully installed, you can install rstanarm from GitHub using the remotes package by executing the following in R:

# Change 2 to however many cores you can/want to use to parallelize install
# If you experience crashes or run out RAM during installation, try changing this to 1
Sys.setenv(MAKEFLAGS = "-j2")
Sys.setenv("R_REMOTES_NO_ERRORS_FROM_WARNINGS" = "true")
remotes::install_github("stan-dev/rstanarm", INSTALL_opts = "--no-multiarch", force = TRUE)

You can switch build_vignettes to TRUE but it takes a lot longer to install and the vignettes are already separately available from the Stan website and CRAN. If installation fails, please let us know by filing an issue.

Survival Analysis Version

The feature/survival branch on GitHub contains a development version of rstanarm that includes survival analysis functionality (via the stan_surv modelling function). Until this functionality is available in the CRAN release of rstanarm, users who wish to use the survival analysis functionality can install a binary version of the survival branch of rstanarm from the Stan R packages repository with:

install.packages("rstanarm", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))

Note that this binary is static (i.e. it is not automatically updated) and is only hosted so that users can access the (experimental) survival analysis functionality without needing to go through the time consuming (and sometimes painful) task of installing the development version of rstanarm from source.

Contributing

If you are interested in contributing to the development of rstanarm please see the developer notes page.

More Repositories

1

stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
C++
2,589
star
2

rstan

RStan, the R interface to Stan
R
973
star
3

pystan2

PyStan, the Python interface to Stan
Python
918
star
4

example-models

Example models for Stan
HTML
772
star
5

math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
C++
744
star
6

bayesplot

bayesplot R package for plotting Bayesian models
R
431
star
7

pystan

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io
Python
270
star
8

stancon_talks

Materials from Stan conferences
HTML
250
star
9

shinystan

shinystan R package and ShinyStan GUI
R
195
star
10

cmdstan

CmdStan, the command line interface to Stan
C++
182
star
11

posteriordb

Database with posteriors of interest for Bayesian inference
Stan
181
star
12

posterior

The posterior R package
R
167
star
13

loo

loo R package for approximate leave-one-out cross-validation (LOO-CV) and Pareto smoothed importance sampling (PSIS)
R
150
star
14

cmdstanpy

CmdStanPy is a lightweight interface to Stan for Python users which provides the necessary objects and functions to compile a Stan program and fit the model to data using CmdStan.
Python
146
star
15

cmdstanr

CmdStanR: the R interface to CmdStan
R
144
star
16

stanc3

The Stan transpiler (from Stan to C++ and beyond).
OCaml
140
star
17

projpred

Projection predictive variable selection
R
110
star
18

stan-mode

Emacs mode for Stan.
Emacs Lisp
71
star
19

rstantools

Tools for Developing R Packages Interfacing with Stan
R
51
star
20

docs

Documentation for the Stan language and CmdStan
TeX
37
star
21

httpstan

HTTP interface to Stan, a package for Bayesian inference.
Python
35
star
22

design-docs

33
star
23

MathematicaStan

A Mathematica package to interact with CmdStan
Mathematica
27
star
24

connect22-space-time

StanCon Connect 2022 space and time
HTML
24
star
25

stancon2023

Materials for StanCon 2023
Jupyter Notebook
23
star
26

statastan

Stata interface for Stan.
Stata
20
star
27

nomad

Fast autodiff.
C++
18
star
28

gmo

Inference on marginal distributions using gradient-based optimization
R
13
star
29

posteriordb-python

Python
11
star
30

stat_comp_benchmarks

Benchmark Models for Evaluating Algorithm Accuracy
R
9
star
31

posteriordb-r

R
8
star
32

pystan-wheels

Automated builds of OSX and manylinux wheels for pystan
Shell
8
star
33

performance-tests-cmdstan

Performance testing tools for use with CmdStan
Python
8
star
34

perf-math

C++
7
star
35

logos

Stan logos
HTML
5
star
36

r-packages

Repository for distributing (some) stan-dev R packages
4
star
37

httpstan-wheels

Wheels for httpstan
Shell
4
star
38

visual-diagnostics

Visual diagnostics for HMC using gnuplot.
Shell
4
star
39

sgb

Stan Governing Body issue tracker and meeting notes
4
star
40

atom-language-stan

JavaScript
3
star
41

stan2tfp

Stan2TFP is a work-in-progress alternative backend for Stanc3 which targets TensorFlow Probability
OCaml
2
star
42

.github

Stan organization READMEs and information
1
star
43

jenkins-shared-libraries

Libraries for our Jenkinsfiles
Groovy
1
star
44

stan-discourse-theme-component

HTML
1
star
45

propaganda

Sell sheets and the like
TeX
1
star
46

ci-scripts

Formerly syclik's stan-scripts repo. Contains scripts used by Jenkins as well as the release scripts and performance scripts.
Shell
1
star