• Stars
    star
    123
  • Rank 290,145 (Top 6 %)
  • Language JetBrains MPS
  • License
    Boost Software Li...
  • Created about 8 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Exact Inference Engine for Probabilistic Programs

alt text

PSI Solver

PSI is a tool for performing exact inference on probabilistic programs. Given a probabilistic program, the tool produces an expression for the joint posterior distribution of the expressions computed by the program.

The PSI solver is described in the following research papers (see also http://psisolver.org):

Compatibility Disclaimer

PSI does not currently try to make any guarantees on backwards compatibility. If your project depends on PSI, write down the exact commit hash to ensure others can reproduce your results. The commit of HEAD at build time is displayed when running ./psi --help.

Build Instructions

GNU/Linux

Quick build

  1. Run dependencies-release.sh to download the LDC D compiler and extract the archive.

  2. Run build-release.sh to build PSI.

$ ./dependencies-release.sh && ./build-release.sh
Additional information

PSI is written in the D programming language. D compilers are available at http://dlang.org/download.html.

./build.sh will use DMD inside the PSI directory if it exists, otherwise it will attempt to use DMD from your path.

./build.sh creates a debug build. ./build-release creates a release build.

Using Docker

Build with docker build . -t psi, then launch the container with docker run -it psi.

You can share folders between your file system and the container using the -v option: docker run -it psi -v host/path:container/path.

Using Homebrew on macOS

You can also install psi using homebrew. See the below link for more information: https://github.com/bvoq/homebrew-bvoq/

Other platforms

The build instructions given here are for GNU/Linux. PSI also works on other platforms. Feel free to write a pull request with working build scripts for your favourite platform.

Running the tests

cd test
./runtests

Using PSI on your own models

Run ./psi example.psi, where example.psi is a file containing the source code for the model. The next section ("Quick language guide") briefly introduces the most important language features.

Additional command-line options

Run ./psi --help to display information about supported command-line options.

Examples

$ cat example.psi
def main(){
    a := gauss(0,1);
    b := gauss(0,1);
    return if a > b { a } else { b };
}

$ psi example.psi
p(r) = (d/dx)⁻¹[e^(-x²)](r·⅟√̅2)·⅟e^(1/2·r²)·⅟π·√̅2

$ psi example.psi --mathematica --cdf
p[r1_] := 1/Pi*Sqrt[Pi]*(Erf[1/2^(1/2)*r1]+1)/2^2

$ cat coin_bias_small.psi
def main(){
    p := uniform(0,1);
    x1 := flip(p);
    observe(x1==1);
    x2 := flip(p);
    observe(x2==0);
    x3 := flip(p);
    observe(x3==1);
    return p;
}

$ psi coin_bias_small.psi --plot
p(p) = (-12·p³+12·p²)·[-1+p≤0]·[-p≤0]
plotting... (PDF)
command: 
set pointsize 20
set xlabel "p"
set ylabel "density"
set samples 500, 500
set isosample 200
unset key
plot [-10:10] (-1+x<=0)*(-30*x+30)*(-x<=0)*x**4

Quick language guide

PSI operates on files with the *.psi extension.

Such a file contains one or multiple procedure definitions:

def procedureName(param1,param2,..., paramn){
    statements
}

There should be a procedure of name main, denoting the program entry and exit point.

Statements

Function bodies consist of a sequence of statements, possibly referencing the variables param1,param2,...,paramn.

  • variable := expression; introduces a new variable of name variable

  • variable = expression; assigns to an existing variable. +=, -=, *=, /=, div=, %=, ^=, &&=, ||=, ... are also supported.

  • if expression { statements } else { statements } is a conditional. else { statements } is optional

  • repeat expression { statements } repeats the given statements a number of times given by the expression. The expression should be an integer constant.

  • for variable in [start..end){ statement } is a for-loop over a range. Start and end should be integer constants. If the exact number of iterations is not known ahead of time, an if statement can be nested within the for loop in order to cut off some of the iterations.

  • while expression { statement } is a while-loop. Currently, while loops are only supported with the --dp switch and they must have bounded running time.

  • observe(expression); conditions the joint probability distribution of the program variables on the event that the expression is not equal to zero Note: The probability that the expression is not equal to zero should be positive (enforced by PSI during normalization).

  • cobserve(expression,value); conditions the joint probability distribution of the program variables on the event that the given expression equals the given value. The probability that the variable is equal to the expression must be zero, but the probability density at that point must be positive. Note: This is not currently enforced automatically by PSI, therefore some care is required when using this primitive.

  • assert(expression); puts the program in the error state in case the expression is equal to zero (PSI will then report the probability of error).

  • return expression; terminates the procedure with the given result. If no return statement is provided at the end of a procedure, return (); is automatically inserted.

Special statements

  • a := readCSV(filename); where filename is a string literal (such as "data.csv"), reads the given file containing a comma-separated list of values into a read-only array a. Indexing into such arrays requires deterministic integer indices.

Expressions

  • The following operators are supported and denote the corresponding operations on real numbers with standard C operator precedence (where applicable): expression + expression, expression - expression, expression * expression, expression / expression, expression div expression, expression ^ expression, expression | expression, expression ⊕ expression, expression & expression

    -expression

    expression < expression, expression > expression, expression <= expression, expression >= expression, expression == expression, expression != expression

(div is integer division. a div b is the same as floor(a/b). is bitwise xor. The non-unicode syntax is a xorb b.)

  • There are logical operators !, && and ||, also with standard C operator precedence. (Warning: with the --dp switch, && and || do not currently use short-circuit evaluation.)

  • Conditional expressions are supported: if expression { expression } else { expression }. Only one branch is evaluated.

  • Expressions can be parenthesized in order to change order of precedence, e.g. a*(b+c).

Primitive Distributions

Primitive distributions can be sampled arbitrarily within expressions. They are exposed as procedures with special names. If arguments do not conform to the constraints of the distribution, the program enters the error state.

  • gauss(mean,variance) samples from the Gaussian distribution with the given mean and non-negative variance.

  • uniform(a,b) samples an uniform real number between a and b, with a<=b.

  • bernoulli(p) is 1 with probability p and 0 with probability 1-p, where 0<=p<=1.

  • uniformInt(a,b) samples an uniform integer between a and b. There should be at least one integer n with a<=n<=b.

  • categorical(p) is i with probability p[i] for i in [0..p.length). p should be an array of real numbers of positive length.

  • exponential(rate) samples from the Exponential distribution with the given positive rate (inverse scale).

  • beta(alpha,beta) samples from the Beta distribution with the given positive shape parameters.

  • gamma(shape,rate) samples from the Gamma distribution with the given shape and rate.

  • laplace(location,scale) samples from the Laplace distribution with the given location and positive scale.

  • cauchy(location,scale) samples from the Cauchy distribution with the given location and positive scale.

  • pareto(shape,scale) samples from the Pareto distribution with the given shape and scale which should be positive real numbers.

  • studentT(degrees_of_freedom) samples from the StudentT distribution with the given positive degrees of freedom parameter.

  • weibull(scale,shape) samples from the Weibull distribution with the given positive scale and shape parameters.

  • rayleigh(sigma_squared) samples from the Rayleigh distribution with the given non-negative squared scale parameter.

  • binomial(n,p) samples from the Binomial distribution with the given number of trials and success probability.

  • negBinomial(r,p) samples from the Negative Binomial distribution with the given number of failures and success probability.

  • geometric(p) samples from the Geometric distribution with the given success probability. Returns the number of failures before the first success.

  • poisson(mean) samples from the Poisson distribution with the given positive mean. (At this time, expect less-than optimal simplification from PSI for expressions generated by programs sampling from this distribution.)

Built-in deterministic functions

  • floor, ceil, exp, log, abs.

Special Expressions

  • sampleFrom("(variable1,...,variablen;parameter1,...,parameterm) => pdf",argument1,...,argumentm): type This primitive samples an n-tuple of values from the given probability density function. The density function can depend both on the values and a set of parameters. To specify the pdf, the mathematical operators +-*/^, the constant e, the dirac delta delta(expr), absolute values |expression| and Iverson brackets of the shape [expression<=expression], [expression<expression], [expression=expression] and [expression != expression] can be used, among others.

    Example:

    x := sampleFrom("(x;y) => [-y<=x]*[x<=y]*x^2",y);
    (x,y,z) := sampleFrom("(x,y,z;w) => [-1<=x]*[x<=1]*[-1<=y]*[y<=1]*|x|*|y|*delta(z-w)",w);
    

    Note: PSI does not currently verify that the given density expression is in fact a generalized probability density, so some care is required when using this primitive. Furthermore, PSI will trust any type annotation on sampleFrom. The default type is ℝ.

  • Marginal(expression): Distribution[type] Returns the Marginal distribution of the given expressions, conditioned on the execution path in the current function. sample(Marginal(expression)) creates an independent value with the same marginal distribution as expression.

  • Expectation(expression) Shortcut for expectation(Marginal(expression)) Note: PSI does not currently verify that the expectation of the given expression in fact exists, so some care is required when using this primitive. (e.g. Expectation(gauss(0,1)/gauss(0,1)) does not converge.)

Further language features

Method calls and tuples

def swapped(a,b){
    return (b,a);
}

def main(){
    (z,w) = swapped(1,2);
    return ((z,w),swapped(z,w)); // p(r₁,r₂) = δ_r₁[(2,1)]·δ_r₂[(1,2)]
}

Arrays

x := [1,2,3,4,5,6]; // declare array
return x[uniformInt(0,x.length-1)]; // index randomly into array

y := ([] : R[]); // declare empty array of real numbers
z := x ~ [1]; // y is the concatenation of x with [1]. ~= is also supported

v := array(uniformInt(1,3),[1,2,3]); // declare array of arrays of random size, initialized with [1,2,3] at all indices

w := x[2..x.length-1]; // slice array (w is [3,4,5])

Note: Verifying that array indices and lengths are integers within the array bounds can be costly. Use --noboundscheck to disable bounds checking at the cost of undefined results in case a bounds violation actually occurs with positive probability.

The length of an array a can be obtained using the expression a.length.

First-class functions and lambda functions

PSI functions can be used as first-class values:

def const(x: R){
    def retX(y: R){
        return x;
    }
    return retX;
}

Lambda syntax can be used to create anonymous function values:

def const(x:R){
    return (y: R){ return x; }
}

Function bodies that return an expression can be abbreviated using => syntax:

def const(x:R)=>(y:R)=>x;

PSI function declarations and lambdas can have an arbitrary number of argument lists:

def const(x:R)(y:R)=>x;

Generic functions

Square argument lists introduce generic type parameters:

def const[a](x:a)[b](y:b)=>x;

def main(){
    return const[R](3)[R x R x R](1,2,3);
}

Generic type parameters are deduced automatically when the square argument list is omitted:

def const[a](x:a)[b](y:b)=>x;

def main(){
    return const(3)(1,2,3);
}
def id[a](x:a) => x;

def main(){
    return (id(2), id(id)(1,2)); // p(r₁,r₂) = δ[-r₁+2]·δ_r₂[(1,2)]
}

Nested inference and the Distribution[a] type

PSI supports the built-in functions:

infer[a](f: 1 -> a): Distribution[a]

sample[a](p: Distribution[a]): a

expectation(p: Distribution[R]): R

errorPr(p: Distribution[R]): R

The function infer performs nested inference on the given program f and returns the distribution of its results. (This is precisely what PSI does, but with infer it can be used arbitrarily within a PSI program.)

Note: For technical reasons that will be resolved soon, the implementation of error tracking in the presence of nested inference is very slow. Use the --nocheck flag to disable error tracking and improve performance, at the cost of getting undefined results if an error actually occurs with positive probability.

The function sample samples a value from the given distribution.

The function expectation computes the expectation of the given distribution.

Note: PSI does not currently verify that the expectation of the expression in fact exists, so some care is required when using this primitive. (e.g. expectation(infer(()=>gauss(0,1)/gauss(0,1))) does not converge.)

The function errorPr computes the probability of error for the given distribution.

Type annotations

Any expression can be annotated with a type using the (expression : Type) annotation.

A type can be R (real numbers), a Type[], the type of a tuple Type1 x Type2 x ... x Typen, Distribution[a], a function type, or the name of a custom data type (see below). The empty tuple has type 1.

One can annotate method return types:

def main(): R x R{
    return (0,1);
}

Prelude library

All built-in functions (except operators) are implemented in terms of the sampleFrom special expression. Their implementations can be found in library/prelude.psi (or library/prelude-nocheck.psi if PSI is called with --nocheck). By default, PSI will attempt to read the prelude library from a library folder in the same location as the binary. (This can be overridden by changing the default import path in psi.d.) Both library/prelude.psi and library/prelude-nocheck.psi are baked into the PSI binary at build time, to be used as a fallback.

Experimental language features

Data abstraction

E.g.

dat Foo{
    x : R;
    y : R[];
    def Foo(x: R, y: R[]){ // constructor
        this.x = x;
        this.y = y;
    }
    def method(){
        y[1]+=y[0];
        y[0]+=x;
        return (x,y[0]);
    }
}

def main(){
    x := Foo(1,[1,2,3]);
    return x.method(); // p(r₁) = δ_r₁[(1,2)]
}

Source Code Guide

This section gives a quick overview of the source code.

psi.d:

Contains the program entry point, interprets the command line arguments and calls the other components.

expression.d, type.d, declaration.d:

Contains the declaration of source language AST nodes.

lexer.d, parser.d:

Lexer and parser for the source language. (Note: Contains some cruft as it was quickly adapted from a parser for the D programming language.)

semantic_.d:

Semantic analyzer, performs type checking and annotates the AST generated by the parser.

backend.d:

Defines the backend interface.

symbolic.d:

Symbolic backend. Translates the annotated AST to distribution expressions.

distrib.d:

Contains the declaration of generalized probability density functions for the primitive distributions supported by PSI. Furthermore, this file contains the class Distribution which encapsulates a probability distribution over program variables with an additional error state.

dexpr.d:

Contains the implementation of the intermediate language of distribution expressions and most of the simplification code.

dparse.d:

Parser for distribution expressions. (Used for debugging and the sampleFrom primitive, aims to invert the default toString method for distribution expressions, while also allowing different notations).

differentiation.d:

Symbolic differentiation engine.

integration.d:

Symbolic integration engine.

summation.d:

Symbolically evaluates sum expressions.

limits.d, asymptotics.d:

Limit evaluation engine.

terminal.d:

Simple colored terminal output.

hashtable.d:

Hash table implementation with optimization for small tables.

help.d:

Implements the --help option and a few other options.

util.d:

Utility functions not available in the standard library (not everything in this file is used by PSI).

Adding new tests

Add a *.psi file into the test directory.

Citing this Work

You are encouraged to cite the CAV 2016 and PLDI 2020 publications using the following BibTex entries if you use PSI for academic research.

@inproceedings{gehr2016psi,
  title={{PSI}: Exact Symbolic Inference for Probabilistic Programs},
  author={Gehr, Timon and Misailovic, Sasa and Vechev, Martin},
  booktitle={International Conference on Computer Aided Verification},
  pages={62--83},
  year={2016},
  organization={Springer}
}
@inproceedings{gehr2020lpsi,
  title = {$\lambda${PSI}: Exact Inference for Higher-Order Probabilistic Programs},
  author = {Gehr, Timon and Steffen, Samuel and Vechev, Martin},
  doi = {10.1145/3385412.3386006},
  booktitle = {Proceedings of the 41st {ACM} {SIGPLAN} {Conference} on {Programming} {Language} {Design} and {Implementation}},
  publisher = {ACM},
  month = jun,
  year = {2020}
} 

More Repositories

1

lmql

A language for constraint-guided and efficient LLM programming.
Python
3,619
star
2

silq

Q#
608
star
3

securify2

Securify v2.0
Solidity
587
star
4

debin

Machine Learning to Deobfuscate Binaries
Python
412
star
5

eran

ETH Robustness Analyzer for Deep Neural Networks
Python
313
star
6

diffai

A certifiable defense against adversarial examples by training neural networks to be provably robust
Python
217
star
7

securify

[DEPRECATED] Security Scanner for Ethereum Smart Contracts
Java
215
star
8

Nice2Predict

Learning framework for program property prediction
C++
201
star
9

language-model-arithmetic

Controlled Text Generation via Language Model Arithmetic
Python
201
star
10

ilf

AI based fuzzer based on imitation learning
Python
149
star
11

ELINA

ELINA: ETH LIbrary for Numerical Analysis
C++
129
star
12

sven

Python
95
star
13

dl2

DL2 is a framework that allows training neural networks with logical constraints over numerical values in the network (e.g. inputs, outputs, weights) and to query networks for inputs fulfilling a logical formula.
Python
82
star
14

zkay

A programming language and compiler which enable automatic compilation of intuitive data privacy specifications to NIZK-enabled private smart contracts.
Python
81
star
15

astarix

AStarix: Fast and Optimal Sequence-to-Graph Aligner
C++
72
star
16

TFix

JavaScript
66
star
17

fastsmt

Learning to Solve SMT Formulas Fast
SMT
63
star
18

learch

C++
38
star
19

llmprivacy

Python
36
star
20

soltix

SOLTIX: Scalable automated framework for testing Solidity compilers.
Java
33
star
21

ChatProtect

This is the code for the paper "Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation".
Python
33
star
22

probabilistic-forecasts-attacks

Python
30
star
23

colt

Convex Layerwise Adversarial Training (COLT)
Python
29
star
24

SafeCoder

Python
27
star
25

lcifr

Learning Certified Individually Fair Representations
Python
24
star
26

adaptive-auto-attack

Python
23
star
27

dp-sniper

A machine-learning-based tool for discovering differential privacy violations in black-box algorithms.
Python
23
star
28

verx-benchmarks

20
star
29

lamp

LAMP: Extracting Text from Gradients with Language Model Priors (NeurIPS '22)
Python
20
star
30

dp-finder

Differential Privacy Testing System
Python
19
star
31

bayonet

Probabilistic Computer Network Analysis
D
18
star
32

phoenix

Private and Reliable Neural Network Inference (CCS '22)
C++
18
star
33

fnf

Python
16
star
34

EventRacer

A race detection tool for event driven applications.
C++
16
star
35

learning-real-bug-detector

Python
16
star
36

lassi

Latent Space Smoothing for Individually Fair Representations (ECCV 2022)
Python
15
star
37

deepg

Certifying Geometric Robustness of Neural Networks
Python
15
star
38

vscode-silq

TypeScript
15
star
39

zapper

Rust
15
star
40

robust-code

Adversarial Robustness for Code
Python
13
star
41

watermark-stealing

Watermark Stealing in Large Language Models (ICML '24)
Python
13
star
42

guiding-synthesizers

Guiding Program Synthesis by Learning to Generate Examples
Python
12
star
43

learning-to-configure-networks

[NeurIPS'22] Learning to Configure Computer Networks with Neural Algorithmic Reasoning
12
star
44

SABR

Python
11
star
45

bayes-framework-leakage

Python
11
star
46

smoothing-ensembles

[ICLR 2022] Boosting Randomized Smoothing with Variance Reduced Classifiers
Python
11
star
47

UniversalCertificationTheory

Universal Approximation with Certified Networks
Python
10
star
48

llm-quantization-attack

Python
10
star
49

eth-sri.github.io

SRI Group Website
HTML
9
star
50

ModelsPHOG

Synthesized models for PHOG to make the results reproducible by the research community
C++
9
star
51

segmentation-smoothing

Provable robustness for segmentation tasks.
9
star
52

3dcertify

3DCertify is the first verifier to certify robustness of point cloud models against semantic transformations and point perturbations
Python
8
star
53

prover

Verifier for Deep Neural Network Audio Processing
Python
7
star
54

proof-sharing

CAV'22 paper to speed up Neural Network Verification.
Python
7
star
55

mn-bab

[ICLR 2022] Complete Verification via Multi-Neuron Relaxation Guided Branch-and-Bound
Python
7
star
56

ACE

Python
7
star
57

DFENCE

Dynamic Analysis and Synthesis System for Relaxed Memory Models
C++
6
star
58

Delta-Siege

Python
6
star
59

automated-error-analysis

Automated Classification of Model Errors on ImageNet (NeurIPS 2023)
Jupyter Notebook
6
star
60

R4

C++
5
star
61

drs

[NeurIPS 2022] (De-)Randomized Smoothing for Decision Stump Ensembles
Terra
4
star
62

paradox

On the Paradox of Certified Training (TMLR 10/2022)
Python
4
star
63

fare

FARE: Provably Fair Representation Learning with Practical Certificates (ICML '23)
Shell
4
star
64

Unqomp

Automated Uncomputation for Quantum Programs
Python
4
star
65

fairness-feedback-nlp

Human-Guided Fair Classification for NLP (ICLR 2023, Spotlight)
Python
4
star
66

Spire

C#
3
star
67

TAPS

Python
3
star
68

inferui

InferUI: Robust Relational Layouts Synthesis from Examples for Android
C++
3
star
69

abstraqt

OpenQASM
3
star
70

transformation-smoothing

Randomized Smoothing for Parametric (Image) Transformations
Python
3
star
71

cuts

Python
3
star
72

ACES

[SRML@ICLR 2022] Robust and Accurate -- Compositional Architectures for Randomized Smoothing
Python
2
star
73

synthetiq

OpenQASM
2
star
74

DeepT

Python
2
star
75

ncm

Trace Based Supervision for Neural Architectures
2
star
76

malicious-contamination

Python
2
star
77

CRAFT

Python
1
star
78

fedavg_leakage

Python
1
star
79

Reqomp

Python
1
star
80

ibp-propagation-tightness

Python
1
star
81

tableak

TabLeak: Tabular Data Leakage in Federated Learning
1
star
82

domino

1
star