• Stars
    star
    110
  • Rank 316,770 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created over 1 year ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A domain-specific probabilistic programming language for modeling and inference with language models

LLaMPPL: A Large Language Model Probabilistic Programming Language

LLaMPPL is a research prototype for language model probabilistic programming: specifying language generation tasks by writing probabilistic programs that combine calls to LLMs, symbolic program logic, and probabilistic conditioning. To solve these tasks, LLaMPPL uses a specialized sequential Monte Carlo inference algorithm. This technique, SMC steering, is described in our paper: https://arxiv.org/abs/2306.03081.

Note: A new version of this library is available at https://github.com/probcomp/hfppl that integrates with HuggingFace language models and supports GPU acceleration.

Installation

Clone this repository and run pip install -e . in the root directory, or python setup.py develop to install in development mode. Then run python examples/{example}.py, for one of our examples (constraints.py, infilling.py, or prompt_intersection.py) to test the installation. You will be prompted for a path to the weights, in GGML format, a pretrained LLaMA model. If you have access to Meta's LLaMA weights, you can follow the instructions here to convert them to the proper format.

Usage

A LLaMPPL program is a subclass of the llamppl.Model class.

from llamppl import Model, Transformer, EOS, TokenCategorical

# A LLaMPPL model subclasses the Model class
class MyModel(Model):

    # The __init__ method is used to process arguments
    # and initialize instance variables.
    def __init__(self, prompt, forbidden_letter):
        super().__init__()

        # The string we will be generating
        self.s         = ""
        # A stateful context object for the LLM, initialized with the prompt
        self.context   = self.new_context(prompt)
        # The forbidden letter
        self.forbidden = forbidden_letter
    
    # The step method is used to perform a single 'step' of generation.
    # This might be a single token, a single phrase, or any other division.
    # Here, we generate one token at a time.
    def step(self):
        # Sample a token from the LLM -- automatically extends `self.context`
        token = self.sample(Transformer(self.context), proposal=self.proposal())

        # Condition on the token not having the forbidden letter
        self.condition(self.forbidden not in str(token).lower())

        # Update the string
        self.s += token

        # Check for EOS or end of sentence
        if token == EOS or str(token) in ['.', '!', '?']:
            # Finish generation
            self.finish()
    
    # Helper method to define a custom proposal
    def proposal(self):
        logits = self.context.logits().copy()
        forbidden_token_ids = [i for (i, v) in enumerate(self.vocab()) if self.forbidden in str(v).lower()]
        logits[forbidden_token_ids] = -float('inf')
        return TokenCategorical(logits)

The Model class provides a number of useful methods for specifying a LLaMPPL program:

  • self.sample(dist[, proposal]) samples from the given distribution. Providing a proposal does not modify the task description, but can improve inference. Here, for example, we use a proposal that pre-emptively avoids the forbidden letter.
  • self.condition(cond) conditions on the given Boolean expression.
  • self.new_context(prompt) creates a new context object, initialized with the given prompt.
  • self.finish() indicates that generation is complete.
  • self.observe(dist, obs) performs a form of 'soft conditioning' on the given distribution. It is equivalent to (but more efficient than) sampling a value v from dist and then immediately running condition(v == obs).

To run inference, we use the smc_steer method:

from llamppl import smc_steer, LLaMAConfig
# Initialize the model with weights
LLaMAConfig.set_model_path("path/to/weights.ggml")
# Create a model instance
model = MyModel("The weather today is expected to be", "e")
# Run inference
particles = smc_steer(model, 5, 3) # number of particles N, and beam factor K

Sample output:

sunny.
sunny and cool.
34° (81°F) in Chicago with winds at 5mph.
34° (81°F) in Chicago with winds at 2-9 mph.

More Repositories

1

Gen.jl

A general-purpose probabilistic programming system with programmable inference
Julia
1,794
star
2

bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.
Python
922
star
3

BayesDB

A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself. New implementation in http://github.com/probcomp/bayeslite
889
star
4

crosscat

A domain-general, Bayesian method for analyzing high-dimensional data tables
Python
322
star
5

PClean

A domain-specific probabilistic programming language for scalable Bayesian data cleaning
Julia
217
star
6

metaprob

An embedded language for probabilistic programming and meta-programming.
JavaScript
168
star
7

gen-quickstart

Gen learning material as Jupyter notebooks
Jupyter Notebook
128
star
8

hfppl

Probabilistic programming with HuggingFace language models
Python
86
star
9

adev

Haskell prototype to accompany the paper "ADEV: Sound Automatic Differentiation of Expected Values of Probabilistic Programs"
Haskell
64
star
10

sppl

Probabilistic programming system for fast and exact symbolic probabilistic inference
Python
63
star
11

Genify.jl

Automatically convert Julia methods to Gen functions.
Julia
47
star
12

fast-loaded-dice-roller

The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions
C
44
star
13

trcrpm

Temporally-reweighted Chinese restaurant process mixture models for multivariate time series
Jupyter Notebook
37
star
14

Venturecxx

Primary implementation of the Venture probabilistic programming system
C++
28
star
15

cgpm

Library of composable generative population models which serve as the modeling and inference backend of BayesDB.
Python
25
star
16

bayes3d

Jupyter Notebook
22
star
17

GenParticleFilters.jl

Building blocks for simple and advanced particle filtering in Gen.
Julia
21
star
18

GenSMCP3.jl

Automated SMC with Probabilistic Program Proposals, for the Gen PPL.
Julia
19
star
19

GenGPT3.jl

GPT-3 as a generative function in Gen.
Julia
18
star
20

GenExperimental.jl

Featherweight embedded probabilistic programming language and compositional inference programming library
Julia
17
star
21

notebook

jupyter/datascience-notebook with probcomp libraries
Jupyter Notebook
17
star
22

Gen.clj

A general-purpose probabilistic programming system with programmable inference.
Clojure
17
star
23

ThreeDP3

Jupyter Notebook
15
star
24

iventure

An interactive, browser-based probabilistic programming environment.
Python
14
star
25

optimal-approximate-sampling

Optimal Approximate Sampling from Discrete Probability Distributions
Python
14
star
26

autoimcmc

Code accompanying the paper "Automating Involutive MCMC using Probabilistic and Differentiable Programming"
Python
12
star
27

programmable-vi-pldi-2024

Probabilistic programming with programmable variational inference.
Jupyter Notebook
12
star
28

Cloudless

Distributed computational science made easy, in Python
Python
11
star
29

CLIPS.jl

Cooperative Language-Guided Inverse Plan Search (CLIPS).
Julia
11
star
30

GenTF

TensorFlow plugin for Gen probabilistic programming system.
Julia
10
star
31

haskell-trace-types

Prototype of the system described in "Trace Types and Denotational Semantics for Sound Programmable Inference in Probabilistic Languages"
Haskell
10
star
32

developer

Developer environment for probcomp repos
Makefile
9
star
33

bdbcontrib

BayesDB contributions, including plotting, helper methods, and examples
Python
9
star
34

ADEV.jl

Experimental port of ADEV to Julia
Julia
9
star
35

GenViz

A visualization library for probabilistic programming in Gen.
Julia
7
star
36

pldi2019-gen-experiments

Experiments for PLDI 2019 submission on Gen
Jupyter Notebook
7
star
37

InversePlanning.jl

Agent modeling and inverse planning, using PDDL and Gen.
Julia
7
star
38

b3d

Bayes3D
Jupyter Notebook
7
star
39

haxcat

Experimental educational implementation of CrossCat in Haskell
Haskell
6
star
40

SPPL.jl

A small DSL for programming sppl across PythonCall.jl
Julia
6
star
41

packaging

Packaging for probcomp software.
Python
5
star
42

PoseComposition.jl

Julia
5
star
43

GenVariableElimination.jl

Experimental package for variable elimination in factor graphs derived from generative functions
Julia
5
star
44

SpikingInferenceCircuits.jl

Julia
5
star
45

GenDistributions.jl

Use Distributions.jl distributions from within Gen
Julia
5
star
46

GenTraceKernelDSL.jl

A DSL for defining stochastic maps between traces of Gen generative functions
Julia
5
star
47

gen-finance

Clojure
5
star
48

GenPyTorch.jl

Gen plugin to allow PyTorch computations to be used as Gen generative functions.
Julia
5
star
49

probcomp-stack

MIT Probabilistic Computing Project software stack
Shell
4
star
50

GenSP.jl

Probabilistic programming library extending Gen with support for Stochastic Probabilities
Julia
4
star
51

Gen2DAgentMotion.jl

Components for building generative models of the motion of an agent moving around a 2D environment.
Julia
4
star
52

GenExamples.jl

Gen examples with a Travis CI build that tests that they run
Julia
3
star
53

GenFlux.jl

Julia
3
star
54

InverseGraphics

Jupyter Notebook
3
star
55

curve-fitting

A simple application demonstrating some of the capabilities of the Metaprob probabilistic programming language
Clojure
3
star
56

bayesrest

Python
3
star
57

cgpm2

Minimal implementation of composable generative population models for Bayesian synthesis of probabilistic programs.
Jupyter Notebook
3
star
58

TracedRandom.jl

Make Julia code probabilistic-programming-ready by allowing calls to `rand` to be annotated with traced addresses.
Julia
3
star
59

nips2017-aide-experiments

Experiments and figure generation for NIPS 2017 paper on AIDE
Julia
3
star
60

parallel_map

Simple parallel mapping utility for Python 3.
Python
2
star
61

gen-examples-perception

Examples of Gen applied to perception problems
Julia
2
star
62

GenFluxOptimizers.jl

A Gen plugin for using Flux's optimizers to fit a probabilistic program's parameters
Julia
2
star
63

aistats2023-smcp3

Julia
2
star
64

Circuits.jl

Julia
2
star
65

GenDirectionalStats.jl

Distributions on spaces of rotations and other spatial spaces.
Julia
2
star
66

tutorial_highlighter

Python package for generating PNGs of code and math with custom highlighted regions using LaTeX
Python
2
star
67

ravi-uai-2022

Code to accompany the paper "Recursive Monte Carlo and Variational Inference with Auxiliary Variables"
Julia
2
star
68

SMC.jl

A Julia implementation of generic sequential Monte Carlo (SMC) and conditional SMC.
Julia
1
star
69

inferenceql.viz

Clojure
1
star
70

DynamicForwardDiff.jl

An experimental fork of ForwardDiff.jl to support differentiation with respect to an a-priori unknown number of parameters
Julia
1
star
71

GenRedner.jl

Gen.jl wrapper for the Redner differentiable renderer
Julia
1
star
72

bayeslite-apsw

C
1
star
73

GenPOMDPs.jl

Julia
1
star
74

GLRenderer.jl

High FPS rendering. Supports Depth, RGB, and RGB+Texture
Julia
1
star
75

DepthRenderer

Minimal OpenGL-based 3D depth renderer in Julia
Julia
1
star
76

durablevs

DURableVS: Data-efficient Unsupervised Recalibrating Visual Servoing via online learning in a structured generative model
Jupyter Notebook
1
star
77

JAX.jl

A wrapper package for using JAX from Julia via PythonCall.
Julia
1
star
78

GenPseudoMarginal.jl

Sequential Monte Carlo and annealed importance sampling inference library for Gen
Julia
1
star