• Stars
    star
    140
  • Rank 259,431 (Top 6 %)
  • Language
    OCaml
  • License
    BSD 3-Clause "New...
  • Created almost 6 years ago
  • Updated 24 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The Stan transpiler (from Stan to C++ and beyond).

A New Stan-to-C++ Compiler, stanc3

This repo contains a new compiler for Stan, stanc3, written in OCaml. Since version 2.26, this has been the default compiler for Stan. See this wiki for a list of minor differences between this compiler and the previous Stan compiler.

To read more about why we built this, see this introductory blog post. For some discussion as to how we chose OCaml, see this accidental flamewar. We're testing these models (listed under Test Results) on every pull request.

Build Status codecov

Documentation

Documentation for users of stanc3 is in the Stan Users' Guide here

The Stanc3 Developer documentation is available here: https://mc-stan.org/stanc3/stanc

Want to contribute? See Getting Started for setup instructions and some useful commands.

High-level concepts, invariants, and 30,000-ft view

Stanc3 has 4 main src packages: frontend, middle, analysis_and_optimization and stan_math_backend.

flowchart
    Stanc --> Frontend & Analysis & Backend <-.-> Middle

The goal is to keep as many details about the way Stan is implemented by the core C++ implementation in the Stan Math backend library as possible. The Middle library contains the MIR and currently any types or functions used by the two ends. The entrypoint for the compiler is in src/stanc/stanc.ml which sequences the various components together.

Distinct stanc Phases

The phases of stanc are summarized in the following information flowchart and list.

flowchart TB

    subgraph frontend[Frontend]
        direction TB
        infile>Source file]
        lexer(frontend/lexer.mll)
        parser(frontend/parser.mly)
        typecheck(frontend/Typechecker.ml)
        lower(frontend/Ast_to_Mir.ml)

        infile --> lexer -->|Tokens| parser
        parser -->|Untyped AST| typecheck -->|Typed AST| lower
    end


    subgraph middle[Middle Representation]
        data{{MIR Data Structures}}
    end

    subgraph analysis[Static Analysis and Optimization]
        optimize(analysis_and_optimization/Optimize.ml)
    end

    subgraph backend[Backend]
        codegen(*_backend/*_code_gen.ml)
        transform(*_backend/Transform_Mir.ml)

        transform -.->|MIR with backend specific code| optimize
        transform --> codegen
        optimize -->|Optimized MIR| codegen
    end

    outfile>Output File, e.g. a .hpp]

    middle --- analysis
    frontend ==> middle =====> backend ==> outfile


    click lexer "https://github.com/stan-dev/stanc3/blob/master/src/frontend/lexer.mll"
    click parser "https://github.com/stan-dev/stanc3/blob/master/src/frontend/parser.mly"
    click typecheck "https://github.com/stan-dev/stanc3/blob/master/src/frontend/Typechecker.ml"
    click lower "https://github.com/stan-dev/stanc3/blob/master/src/frontend/Ast_to_Mir.ml"
    click optimize "https://github.com/stan-dev/stanc3/blob/master/src/analysis_and_optimization/Optimize.ml"
    click data "https://github.com/stan-dev/stanc3/tree/master/src/middle"
    click codegen "https://github.com/stan-dev/stanc3/blob/master/src/stan_math_backend/Stan_math_code_gen.ml"
    click transform "https://github.com/stan-dev/stanc3/blob/master/src/stan_math_backend/Transform_Mir.ml"
  1. Lex the Stan language into tokens.
  2. Parse Stan language into AST that represents the syntax quite closely and aides in development of pretty-printers and linters. stanc --debug-ast to print this out.
  3. Typecheck & add type information Typechecker.ml. stanc --debug-decorated-ast
  4. Lower into Middle Intermediate Representation (AST -> MIR) stanc --debug-mir (or --debug-mir-pretty)
  5. Backend-specific MIR transform (MIR -> MIR) Transform_Mir.ml stanc --debug-transformed-mir
  6. Analyze & optimize (MIR -> MIR)
  7. Code generation (MIR -> C++) (or other outputs, like Tensorflow).

The central data structures

  1. src/frontend/Ast.ml defines the AST. The AST is intended to have a direct 1-1 mapping with the syntax, so there are things like parentheses being kept around. The pretty-printer in the frontend uses the AST and attempts to keep user syntax the same while just adjusting whitespace.

    The AST uses a particular functional programming trick to add metadata to the AST (and its other tree types), sometimes called the "two-level types" pattern. Essentially, many of the tree variant types are parameterized by something that ends up being a placeholder not for just metadata but for the recursive type including metadata, sometimes called the fixed point. So instead of recursively referencing expression you would instead reference type parameter 'e, which will later be filled in with something like type expr_with_meta = metadata expression.

    The AST intends to keep very close to Stan-level semantics and syntax in every way.

  2. src/middle/Program.ml contains the MIR (Middle Intermediate Language). src/frontend/Ast_to_Mir.ml performs the lowering and attempts to strip out as much Stan-specific semantics and syntax as possible, though this is still something of a work-in-progress.

    The MIR uses the same two-level types idea to add metadata, notably expression types and autodiff levels as well as locations on many things. The MIR is used as the output data type from the frontend and the input for dataflow analysis, optimization (which also outputs MIR), and code generation.

  3. src/stan_math_backend/Cpp.ml defines a minimal representation of C++ used in code generation.

    This is intentionally simpler than both the above structures and than a true C++ AST and is tailored pretty specifically to the C++ generated in our model class.

Design goals

  • Multiple phases - each with human-readable intermediate representations for easy debugging and optimization design.
  • Optimizing - takes advantage of info known at the Stan language level. Minimize information we must teach users for them to write performant code.
  • Holistic - bring as much of the code as possible into the MIR for whole-program optimization.
  • Research platform - enable a new class of optimizations based on probability theory.
  • Modular - architect & build in a way that makes it easy to outsource things like symbolic differentiation to external libraries and to use parts of the compiler as the basis for other tools built around the Stan language.
  • Simplicity first - When making a choice between correct simplicity and a perceived performance benefit, we want to make the choice for simplicity unless we can show significant (> 5%) benchmark improvements to compile times or run times. Premature optimization is the root of all evil.

More Repositories

1

stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
C++
2,570
star
2

rstan

RStan, the R interface to Stan
R
973
star
3

pystan2

PyStan, the Python interface to Stan
Python
918
star
4

example-models

Example models for Stan
HTML
772
star
5

math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
C++
732
star
6

bayesplot

bayesplot R package for plotting Bayesian models
R
432
star
7

rstanarm

rstanarm R package for Bayesian applied regression modeling
R
379
star
8

pystan

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io
Python
270
star
9

stancon_talks

Materials from Stan conferences
HTML
248
star
10

shinystan

shinystan R package and ShinyStan GUI
R
195
star
11

cmdstan

CmdStan, the command line interface to Stan
C++
182
star
12

posteriordb

Database with posteriors of interest for Bayesian inference
Stan
173
star
13

posterior

The posterior R package
R
167
star
14

loo

loo R package for approximate leave-one-out cross-validation (LOO-CV) and Pareto smoothed importance sampling (PSIS)
R
148
star
15

cmdstanpy

CmdStanPy is a lightweight interface to Stan for Python users which provides the necessary objects and functions to compile a Stan program and fit the model to data using CmdStan.
Python
146
star
16

cmdstanr

CmdStanR: the R interface to CmdStan
R
144
star
17

projpred

Projection predictive variable selection
R
110
star
18

stan-mode

Emacs mode for Stan.
Emacs Lisp
71
star
19

rstantools

Tools for Developing R Packages Interfacing with Stan
R
51
star
20

docs

Documentation for the Stan language and CmdStan
TeX
37
star
21

httpstan

HTTP interface to Stan, a package for Bayesian inference.
Python
35
star
22

design-docs

33
star
23

MathematicaStan

A Mathematica package to interact with CmdStan
Mathematica
27
star
24

connect22-space-time

StanCon Connect 2022 space and time
HTML
24
star
25

stancon2023

Materials for StanCon 2023
Jupyter Notebook
21
star
26

statastan

Stata interface for Stan.
Stata
20
star
27

nomad

Fast autodiff.
C++
18
star
28

gmo

Inference on marginal distributions using gradient-based optimization
R
13
star
29

posteriordb-python

Python
11
star
30

stat_comp_benchmarks

Benchmark Models for Evaluating Algorithm Accuracy
R
9
star
31

posteriordb-r

R
8
star
32

pystan-wheels

Automated builds of OSX and manylinux wheels for pystan
Shell
8
star
33

performance-tests-cmdstan

Performance testing tools for use with CmdStan
Python
8
star
34

perf-math

C++
7
star
35

logos

Stan logos
HTML
5
star
36

r-packages

Repository for distributing (some) stan-dev R packages
4
star
37

httpstan-wheels

Wheels for httpstan
Shell
4
star
38

visual-diagnostics

Visual diagnostics for HMC using gnuplot.
Shell
4
star
39

sgb

Stan Governing Body issue tracker and meeting notes
4
star
40

atom-language-stan

JavaScript
3
star
41

stan2tfp

Stan2TFP is a work-in-progress alternative backend for Stanc3 which targets TensorFlow Probability
OCaml
2
star
42

.github

Stan organization READMEs and information
1
star
43

jenkins-shared-libraries

Libraries for our Jenkinsfiles
Groovy
1
star
44

stan-discourse-theme-component

HTML
1
star
45

propaganda

Sell sheets and the like
TeX
1
star
46

ci-scripts

Formerly syclik's stan-scripts repo. Contains scripts used by Jenkins as well as the release scripts and performance scripts.
Shell
1
star