• Stars
    star
    106
  • Rank 316,193 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created about 1 year ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A domain-specific probabilistic programming language for modeling and inference with language models

LLaMPPL: A Large Language Model Probabilistic Programming Language

LLaMPPL is a research prototype for language model probabilistic programming: specifying language generation tasks by writing probabilistic programs that combine calls to LLMs, symbolic program logic, and probabilistic conditioning. To solve these tasks, LLaMPPL uses a specialized sequential Monte Carlo inference algorithm. This technique, SMC steering, is described in our paper: https://arxiv.org/abs/2306.03081.

Note: A new version of this library is available at https://github.com/probcomp/hfppl that integrates with HuggingFace language models and supports GPU acceleration.

Installation

Clone this repository and run pip install -e . in the root directory, or python setup.py develop to install in development mode. Then run python examples/{example}.py, for one of our examples (constraints.py, infilling.py, or prompt_intersection.py) to test the installation. You will be prompted for a path to the weights, in GGML format, a pretrained LLaMA model. If you have access to Meta's LLaMA weights, you can follow the instructions here to convert them to the proper format.

Usage

A LLaMPPL program is a subclass of the llamppl.Model class.

from llamppl import Model, Transformer, EOS, TokenCategorical

# A LLaMPPL model subclasses the Model class
class MyModel(Model):

    # The __init__ method is used to process arguments
    # and initialize instance variables.
    def __init__(self, prompt, forbidden_letter):
        super().__init__()

        # The string we will be generating
        self.s         = ""
        # A stateful context object for the LLM, initialized with the prompt
        self.context   = self.new_context(prompt)
        # The forbidden letter
        self.forbidden = forbidden_letter
    
    # The step method is used to perform a single 'step' of generation.
    # This might be a single token, a single phrase, or any other division.
    # Here, we generate one token at a time.
    def step(self):
        # Sample a token from the LLM -- automatically extends `self.context`
        token = self.sample(Transformer(self.context), proposal=self.proposal())

        # Condition on the token not having the forbidden letter
        self.condition(self.forbidden not in str(token).lower())

        # Update the string
        self.s += token

        # Check for EOS or end of sentence
        if token == EOS or str(token) in ['.', '!', '?']:
            # Finish generation
            self.finish()
    
    # Helper method to define a custom proposal
    def proposal(self):
        logits = self.context.logits().copy()
        forbidden_token_ids = [i for (i, v) in enumerate(self.vocab()) if self.forbidden in str(v).lower()]
        logits[forbidden_token_ids] = -float('inf')
        return TokenCategorical(logits)

The Model class provides a number of useful methods for specifying a LLaMPPL program:

  • self.sample(dist[, proposal]) samples from the given distribution. Providing a proposal does not modify the task description, but can improve inference. Here, for example, we use a proposal that pre-emptively avoids the forbidden letter.
  • self.condition(cond) conditions on the given Boolean expression.
  • self.new_context(prompt) creates a new context object, initialized with the given prompt.
  • self.finish() indicates that generation is complete.
  • self.observe(dist, obs) performs a form of 'soft conditioning' on the given distribution. It is equivalent to (but more efficient than) sampling a value v from dist and then immediately running condition(v == obs).

To run inference, we use the smc_steer method:

from llamppl import smc_steer, LLaMAConfig
# Initialize the model with weights
LLaMAConfig.set_model_path("path/to/weights.ggml")
# Create a model instance
model = MyModel("The weather today is expected to be", "e")
# Run inference
particles = smc_steer(model, 5, 3) # number of particles N, and beam factor K

Sample output:

sunny.
sunny and cool.
34° (81°F) in Chicago with winds at 5mph.
34° (81°F) in Chicago with winds at 2-9 mph.

More Repositories

1

Gen.jl

A general-purpose probabilistic programming system with programmable inference
Julia
1,766
star
2

bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.
Python
914
star
3

BayesDB

A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself. New implementation in http://github.com/probcomp/bayeslite
887
star
4

crosscat

A domain-general, Bayesian method for analyzing high-dimensional data tables
Python
321
star
5

PClean

A domain-specific probabilistic programming language for scalable Bayesian data cleaning
Julia
215
star
6

metaprob

An embedded language for probabilistic programming and meta-programming.
JavaScript
166
star
7

gen-quickstart

Gen learning material as Jupyter notebooks
Jupyter Notebook
125
star
8

sppl

Probabilistic programming system for fast and exact symbolic probabilistic inference
Python
63
star
9

adev

Haskell prototype to accompany the paper "ADEV: Sound Automatic Differentiation of Expected Values of Probabilistic Programs"
Haskell
62
star
10

hfppl

Probabilistic programming with HuggingFace language models
Python
59
star
11

Genify.jl

Automatically convert Julia methods to Gen functions.
Julia
47
star
12

fast-loaded-dice-roller

The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions
C
44
star
13

trcrpm

Temporally-reweighted Chinese restaurant process mixture models for multivariate time series
Jupyter Notebook
36
star
14

Venturecxx

Primary implementation of the Venture probabilistic programming system
C++
27
star
15

cgpm

Library of composable generative population models which serve as the modeling and inference backend of BayesDB.
Python
25
star
16

notebook

jupyter/datascience-notebook with probcomp libraries
Jupyter Notebook
17
star
17

bayes3d

Jupyter Notebook
17
star
18

GenParticleFilters.jl

Building blocks for simple and advanced particle filtering in Gen.
Julia
16
star
19

GenExperimental.jl

Featherweight embedded probabilistic programming language and compositional inference programming library
Julia
16
star
20

ThreeDP3

Jupyter Notebook
14
star
21

iventure

An interactive, browser-based probabilistic programming environment.
Python
14
star
22

optimal-approximate-sampling

Optimal Approximate Sampling from Discrete Probability Distributions
Python
14
star
23

GenSMCP3.jl

Automated SMC with Probabilistic Program Proposals, for the Gen PPL.
Julia
12
star
24

GenGPT3.jl

GPT-3 as a generative function in Gen.
Julia
12
star
25

Gen.clj

A general-purpose probabilistic programming system with programmable inference.
Clojure
12
star
26

autoimcmc

Code accompanying the paper "Automating Involutive MCMC using Probabilistic and Differentiable Programming"
Python
12
star
27

Cloudless

Distributed computational science made easy, in Python
Python
11
star
28

GenTF

TensorFlow plugin for Gen probabilistic programming system.
Julia
10
star
29

haskell-trace-types

Prototype of the system described in "Trace Types and Denotational Semantics for Sound Programmable Inference in Probabilistic Languages"
Haskell
10
star
30

developer

Developer environment for probcomp repos
Makefile
9
star
31

ADEV.jl

Experimental port of ADEV to Julia
Julia
9
star
32

bdbcontrib

BayesDB contributions, including plotting, helper methods, and examples
Python
9
star
33

pldi2019-gen-experiments

Experiments for PLDI 2019 submission on Gen
Jupyter Notebook
8
star
34

hierarchical-irm

Hierarchical infinite relational model: Probabilistic structure discovery for rich relational systems
C++
8
star
35

GenViz

A visualization library for probabilistic programming in Gen.
Julia
7
star
36

haxcat

Experimental educational implementation of CrossCat in Haskell
Haskell
6
star
37

SPPL.jl

A small DSL for programming sppl across PythonCall.jl
Julia
6
star
38

packaging

Packaging for probcomp software.
Python
5
star
39

PoseComposition.jl

Julia
5
star
40

GenVariableElimination.jl

Experimental package for variable elimination in factor graphs derived from generative functions
Julia
5
star
41

SpikingInferenceCircuits.jl

Julia
5
star
42

GenDistributions.jl

Use Distributions.jl distributions from within Gen
Julia
5
star
43

gen-finance

Clojure
5
star
44

InversePlanning.jl

Agent modeling and inverse planning, using PDDL and Gen.
Julia
5
star
45

GenPyTorch.jl

Gen plugin to allow PyTorch computations to be used as Gen generative functions.
Julia
5
star
46

probcomp-stack

MIT Probabilistic Computing Project software stack
Shell
4
star
47

GenTraceKernelDSL.jl

A DSL for defining stochastic maps between traces of Gen generative functions
Julia
4
star
48

GenSP.jl

Julia
4
star
49

cgpm2

Minimal implementation of composable generative population models for Bayesian synthesis of probabilistic programs.
Jupyter Notebook
4
star
50

Gen2DAgentMotion.jl

Components for building generative models of the motion of an agent moving around a 2D environment.
Julia
4
star
51

GenExamples.jl

Gen examples with a Travis CI build that tests that they run
Julia
3
star
52

GenFlux.jl

Julia
3
star
53

InverseGraphics

Jupyter Notebook
3
star
54

curve-fitting

A simple application demonstrating some of the capabilities of the Metaprob probabilistic programming language
Clojure
3
star
55

bayesrest

Python
3
star
56

TracedRandom.jl

Make Julia code probabilistic-programming-ready by allowing calls to `rand` to be annotated with traced addresses.
Julia
3
star
57

nips2017-aide-experiments

Experiments and figure generation for NIPS 2017 paper on AIDE
Julia
3
star
58

CLIPS.jl

Cooperative Language-Guided Inverse Plan Search (CLIPS).
Julia
3
star
59

parallel_map

Simple parallel mapping utility for Python 3.
Python
2
star
60

gen-examples-perception

Examples of Gen applied to perception problems
Julia
2
star
61

GenFluxOptimizers.jl

A Gen plugin for using Flux's optimizers to fit a probabilistic program's parameters
Julia
2
star
62

ravi-uai-2022

Code to accompany the paper "Recursive Monte Carlo and Variational Inference with Auxiliary Variables"
Julia
2
star
63

aistats2023-smcp3

Julia
2
star
64

Circuits.jl

Julia
2
star
65

GenDirectionalStats.jl

Distributions on spaces of rotations and other spatial spaces.
Julia
2
star
66

tutorial_highlighter

Python package for generating PNGs of code and math with custom highlighted regions using LaTeX
Python
2
star
67

SMC.jl

A Julia implementation of generic sequential Monte Carlo (SMC) and conditional SMC.
Julia
1
star
68

inferenceql.viz

Clojure
1
star
69

DynamicForwardDiff.jl

An experimental fork of ForwardDiff.jl to support differentiation with respect to an a-priori unknown number of parameters
Julia
1
star
70

bayeslite-apsw

C
1
star
71

GenRedner.jl

Gen.jl wrapper for the Redner differentiable renderer
Julia
1
star
72

GenPOMDPs.jl

Julia
1
star
73

GLRenderer.jl

High FPS rendering. Supports Depth, RGB, and RGB+Texture
Julia
1
star
74

DepthRenderer

Minimal OpenGL-based 3D depth renderer in Julia
Julia
1
star
75

GenPseudoMarginal.jl

Sequential Monte Carlo and annealed importance sampling inference library for Gen
Julia
1
star
76

b3d

Bayes3D
C++
1
star