• Stars
    star
    322
  • Rank 130,398 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 11 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A domain-general, Bayesian method for analyzing high-dimensional data tables

Crosscat

https://travis-ci.org/probcomp/crosscat.svg?branch=master

CrossCat is a domain-general, Bayesian method for analyzing high-dimensional data tables. CrossCat estimates the full joint distribution over the variables in the table from the data, via approximate inference in a hierarchical, nonparametric Bayesian model, and provides efficient samplers for every conditional distribution. CrossCat combines strengths of nonparametric mixture modeling and Bayesian network structure learning: it can model any joint distribution given enough data by positing latent variables, but also discovers independencies between the observable variables.

A range of exploratory analysis and predictive modeling tasks can be addressed via CrossCat, including detecting predictive relationships between variables, finding multiple overlapping clusterings, imputing missing values, and simultaneously selecting features and classifying rows. Research on CrossCat has shown that it is suitable for analysis of real-world tables of up to 10 million cells, including hospital cost and quality measures, voting records, handwritten digits, and state-level unemployment time series.

Installation

Local (Ubuntu)

You can install CrossCat using pip (no need to clone from git):

$ pip install crosscat

If you'd like to install from source, CrossCat can be successfully installed locally on bare Ubuntu server 14.04 systems with:

$ sudo apt-get install build-essential cython python
$ sudo apt-get install python-setuptools python-numpy
$ git clone https://github.com/probcomp/crosscat.git

$ cd crosscat
$ python setup.py build
$ python setup.py install  # or python setup.py develop

CrossCat can also be installed in a local Python virtual environment:

$ cd crosscat
$ virtualenv --system-site-packages /path/to/venv
$ . /path/to/venv/bin/activate
$ python setup.py build
$ python setup.py install  # or python setup.py develop

A similar process has been found to work on OSX.

Tests

To run the automatic tests:

$ ./check.sh

Documentation

Note: The VM is only meant to provide an out-of-the-box usable system setup. Its resources are limited and large jobs will fail due to memory errors. To run larger jobs, increase the VM resources or install directly to your system.

Python Client

C++ backend

Example

dha_example.py (github) is a basic example of analysis using CrossCat. For a first test, run the following from above the top level crosscat dir

python crosscat/examples/dha_example.py crosscat/www/data/dha.csv --num_chains 2 --num_transitions 2

Note: the default argument values take a considerable amount of time to run and are best suited to a cluster.

License

Apache License, Version 2.0

More Repositories

1

Gen.jl

A general-purpose probabilistic programming system with programmable inference
Julia
1,794
star
2

bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.
Python
922
star
3

BayesDB

A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself. New implementation in http://github.com/probcomp/bayeslite
889
star
4

PClean

A domain-specific probabilistic programming language for scalable Bayesian data cleaning
Julia
217
star
5

metaprob

An embedded language for probabilistic programming and meta-programming.
JavaScript
168
star
6

gen-quickstart

Gen learning material as Jupyter notebooks
Jupyter Notebook
128
star
7

LLaMPPL

A domain-specific probabilistic programming language for modeling and inference with language models
Python
110
star
8

hfppl

Probabilistic programming with HuggingFace language models
Python
86
star
9

adev

Haskell prototype to accompany the paper "ADEV: Sound Automatic Differentiation of Expected Values of Probabilistic Programs"
Haskell
64
star
10

sppl

Probabilistic programming system for fast and exact symbolic probabilistic inference
Python
63
star
11

Genify.jl

Automatically convert Julia methods to Gen functions.
Julia
47
star
12

fast-loaded-dice-roller

The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions
C
44
star
13

trcrpm

Temporally-reweighted Chinese restaurant process mixture models for multivariate time series
Jupyter Notebook
37
star
14

Venturecxx

Primary implementation of the Venture probabilistic programming system
C++
28
star
15

cgpm

Library of composable generative population models which serve as the modeling and inference backend of BayesDB.
Python
25
star
16

bayes3d

Jupyter Notebook
22
star
17

GenParticleFilters.jl

Building blocks for simple and advanced particle filtering in Gen.
Julia
21
star
18

GenSMCP3.jl

Automated SMC with Probabilistic Program Proposals, for the Gen PPL.
Julia
19
star
19

GenGPT3.jl

GPT-3 as a generative function in Gen.
Julia
18
star
20

GenExperimental.jl

Featherweight embedded probabilistic programming language and compositional inference programming library
Julia
17
star
21

notebook

jupyter/datascience-notebook with probcomp libraries
Jupyter Notebook
17
star
22

Gen.clj

A general-purpose probabilistic programming system with programmable inference.
Clojure
17
star
23

ThreeDP3

Jupyter Notebook
15
star
24

iventure

An interactive, browser-based probabilistic programming environment.
Python
14
star
25

optimal-approximate-sampling

Optimal Approximate Sampling from Discrete Probability Distributions
Python
14
star
26

autoimcmc

Code accompanying the paper "Automating Involutive MCMC using Probabilistic and Differentiable Programming"
Python
12
star
27

programmable-vi-pldi-2024

Probabilistic programming with programmable variational inference.
Jupyter Notebook
12
star
28

Cloudless

Distributed computational science made easy, in Python
Python
11
star
29

CLIPS.jl

Cooperative Language-Guided Inverse Plan Search (CLIPS).
Julia
11
star
30

GenTF

TensorFlow plugin for Gen probabilistic programming system.
Julia
10
star
31

haskell-trace-types

Prototype of the system described in "Trace Types and Denotational Semantics for Sound Programmable Inference in Probabilistic Languages"
Haskell
10
star
32

developer

Developer environment for probcomp repos
Makefile
9
star
33

bdbcontrib

BayesDB contributions, including plotting, helper methods, and examples
Python
9
star
34

ADEV.jl

Experimental port of ADEV to Julia
Julia
9
star
35

GenViz

A visualization library for probabilistic programming in Gen.
Julia
7
star
36

pldi2019-gen-experiments

Experiments for PLDI 2019 submission on Gen
Jupyter Notebook
7
star
37

InversePlanning.jl

Agent modeling and inverse planning, using PDDL and Gen.
Julia
7
star
38

b3d

Bayes3D
Jupyter Notebook
7
star
39

haxcat

Experimental educational implementation of CrossCat in Haskell
Haskell
6
star
40

SPPL.jl

A small DSL for programming sppl across PythonCall.jl
Julia
6
star
41

packaging

Packaging for probcomp software.
Python
5
star
42

PoseComposition.jl

Julia
5
star
43

GenVariableElimination.jl

Experimental package for variable elimination in factor graphs derived from generative functions
Julia
5
star
44

SpikingInferenceCircuits.jl

Julia
5
star
45

GenDistributions.jl

Use Distributions.jl distributions from within Gen
Julia
5
star
46

GenTraceKernelDSL.jl

A DSL for defining stochastic maps between traces of Gen generative functions
Julia
5
star
47

gen-finance

Clojure
5
star
48

GenPyTorch.jl

Gen plugin to allow PyTorch computations to be used as Gen generative functions.
Julia
5
star
49

probcomp-stack

MIT Probabilistic Computing Project software stack
Shell
4
star
50

GenSP.jl

Probabilistic programming library extending Gen with support for Stochastic Probabilities
Julia
4
star
51

Gen2DAgentMotion.jl

Components for building generative models of the motion of an agent moving around a 2D environment.
Julia
4
star
52

GenExamples.jl

Gen examples with a Travis CI build that tests that they run
Julia
3
star
53

GenFlux.jl

Julia
3
star
54

InverseGraphics

Jupyter Notebook
3
star
55

curve-fitting

A simple application demonstrating some of the capabilities of the Metaprob probabilistic programming language
Clojure
3
star
56

bayesrest

Python
3
star
57

cgpm2

Minimal implementation of composable generative population models for Bayesian synthesis of probabilistic programs.
Jupyter Notebook
3
star
58

TracedRandom.jl

Make Julia code probabilistic-programming-ready by allowing calls to `rand` to be annotated with traced addresses.
Julia
3
star
59

nips2017-aide-experiments

Experiments and figure generation for NIPS 2017 paper on AIDE
Julia
3
star
60

parallel_map

Simple parallel mapping utility for Python 3.
Python
2
star
61

gen-examples-perception

Examples of Gen applied to perception problems
Julia
2
star
62

GenFluxOptimizers.jl

A Gen plugin for using Flux's optimizers to fit a probabilistic program's parameters
Julia
2
star
63

aistats2023-smcp3

Julia
2
star
64

Circuits.jl

Julia
2
star
65

GenDirectionalStats.jl

Distributions on spaces of rotations and other spatial spaces.
Julia
2
star
66

tutorial_highlighter

Python package for generating PNGs of code and math with custom highlighted regions using LaTeX
Python
2
star
67

ravi-uai-2022

Code to accompany the paper "Recursive Monte Carlo and Variational Inference with Auxiliary Variables"
Julia
2
star
68

SMC.jl

A Julia implementation of generic sequential Monte Carlo (SMC) and conditional SMC.
Julia
1
star
69

inferenceql.viz

Clojure
1
star
70

DynamicForwardDiff.jl

An experimental fork of ForwardDiff.jl to support differentiation with respect to an a-priori unknown number of parameters
Julia
1
star
71

GenRedner.jl

Gen.jl wrapper for the Redner differentiable renderer
Julia
1
star
72

bayeslite-apsw

C
1
star
73

GenPOMDPs.jl

Julia
1
star
74

GLRenderer.jl

High FPS rendering. Supports Depth, RGB, and RGB+Texture
Julia
1
star
75

DepthRenderer

Minimal OpenGL-based 3D depth renderer in Julia
Julia
1
star
76

durablevs

DURableVS: Data-efficient Unsupervised Recalibrating Visual Servoing via online learning in a structured generative model
Jupyter Notebook
1
star
77

JAX.jl

A wrapper package for using JAX from Julia via PythonCall.
Julia
1
star
78

GenPseudoMarginal.jl

Sequential Monte Carlo and annealed importance sampling inference library for Gen
Julia
1
star