• Stars
    star
    142
  • Rank 249,121 (Top 6 %)
  • Language
    Scala
  • License
    MIT License
  • Created over 8 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Beautiful trees, without the landscaping.

Bonsai

Beautiful trees, without the landscaping. Bonsai is a Scala library for transforming arbitrary tree structures into read-only versions that take up a fraction of the space.

Overview

Bonsai compresses trees in 2 ways: by using significantly less space to store the tree structure itself (tree compression), and by encoding the node labels in a memory efficient structure (label compression).

What is a "Tree"?

Bonsai works over arbitrary trees, so it assumes a fairly generic interface for interacting with trees. In Bonsai a tree;

  • has 0 or 1 root nodes
  • each node has 0 or more children
  • each node has a label attached to it

The actual type of the node is unimportant. What is important is the node labels and the relationships between the nodes (parent, child, sibling, etc). This structure is enough to describe most of the types of trees you are familiar with.

Bonsai encodes this notion of trees with the TreeOps type class. Here is a truncated version of the type class:

trait TreeOps[Tree, Label] {

  /** The type of the nodes in the tree. */
  type Node

  /**
   * Returns the root node of the tree.
   */
  def root(t: Tree): Option[Node]

  /**
   * Returns all the direct children of the given node. The order may or may
   * not matter. TreeOps does not provide any guarantees here.
   */
  def children(node: Node): Iterable[Node]

  /**
   * Returns the label attached to the given node.
   */
  def label(node: Node): Label

  ...
}

The type T is our actual tree type. The Node type is the way we reference internal nodes in our tree T. The actual type of Node isn't important, however, and is mostly an implementation detail. The important bit is the Label type, which is the user-facing data associated with each node.

The bonsai-example subproject has an example of a Huffman tree. A Huffman tree is used to store a Huffman coding for decoding a compressed message (a bitstring). We decode the bitstring, bit-by-bit, using the tree.

Starting at the root of the tree, we follow the left child if the current bit is a 0 and the right child if it is a 1. We continue until reaching a leaf node, at which poitn we output the symbol associated with it, then start back at the beginning of the tree. When we've exhausted the entire bitstring, we'll have our decoded message.

Here is how we may implement a Huffman tree in Scala:

sealed trait HuffmanTree[+A]
case class Branch[+A](zero: HuffmanTree[A], one: HuffmanTree[A]) extends HuffmanTree[A]
case class Leaf[+A](value: A) extends HuffmanTree[A]

And here is how we would implement its TreeOps instance:

import com.stripe.bonsai.TreeOps

object HuffmanTree {
  implicit def huffmanTreeOps[A]: TreeOps[HuffmanTree[A], Option[A]] =
    new TreeOps[HuffmanTree[A], Option[A]] {
      type Node = HuffmanTree[A]

      def root(tree: HuffmanTree[A]): Option[HuffmanTree[A]] = Some(tree)
      def children(tree: HuffmanTree[A]): Iterable[HuffmanTree[A]] = tree match {
        case Branch(l, r) => l :: r :: Nil
        case _ => Nil
      }
      def label(tree: HuffmanTree[A]): Option[A] = tree match {
        case Leaf(value) => Some(value)
        case _ => None
      }
    }
}

As long as we are careful to implement all our operations on a Huffman tree by using its more generic TreeOps interface, rather than HuffmanTree directly, we can then swap out the actual tree data structure, without affecting the code using it.

For example, below we implement a decode operation as an implicit class using just TreeOps.

import scala.collection.immutable.BitSet

implicit class HuffmanTreeOps[T, A](tree: T)(implicit treeOps: TreeOps[T, Option[A]]) {
  // Importing treeOps gives us some useful methods on `tree`
  import treeOps._

  def decode(bits: BitSet, len: Int): Vector[A] = {
    val root = tree.root.get
    val (_, result) = (0 until len)
      .foldLeft((root, Vector.empty[A])) { case ((node, acc), i) =>
        node.label match {
          case Some(value) => (root, acc :+ value)
          case None if bits(i) => (node.children.head, acc)
          case None => (node.children.iterator.drop(1).next, acc)
        }
      }
    result
  }
}

The goal of this indirection through TreeOps is to let us use a compressed version of the tree instead of an actual HuffmanTree, which will see below.

Tree Compression

Bonsai's tree compression is based off of a succinct data structure for binary trees. Bonsai supports k-ary trees by first transforming the original tree into a left-child right-sibling tree, which preserves all the relationships from the original tree, but ensures we have at most 2 children per node. You can read more about the details of the actual compression algorithm used in "Practical Implementation of Rank and Select Queries". The upshot is that we can store the entire structure of a tree in only ~2.73bits per node. This replaces the normal strategy of using JVM objects for nodes and references to store the relationships.

We actually compress trees by transforming them into Bonsai Trees. Bonsai's Tree constructor takes any arbitrary tree T that has a TreeOps[T] available and will return a Tree with the same structure and labels (and Label type) as the original tree. However, the entire structure and labels of the tree will have been compressed, so this new tree requires significantly less space.

In the example in bonsai-example, we use the Huffman encoding described above to construct a simple Huffman tree for the printable ASCII characters (0x20 -> 0x7E) and compress it using Bonsai's Tree. The result is a 11x reduction in memory requirements. Since our decode operation was implemented using TreeOps, we can use this compressed tree just as we would've used the original tree.

This example is a bit contrived, since the trees are small to begin with, but you can imagine that applying this to a large random forest yields great results.

Label Compression

Bonsai provides a Layout type class, along with some simple combinators, for describing how to (de)serialize your labels. At the lowest level are a set of Layout "primitives" that can encode simple data types into compact data structures. The combinators then allow more complex structures to be described (tuples, Either, mappings to case classes, etc), without adding much, if any, overhead.

Here is an example of a Layout for some Widget type we made up:

import com.stripe.bonsai.Layout

sealed trait Widget
case class Sprocket(radius: Int, weight: Option[Double]) extends Widget
case class Doodad(length: Int, width: Int, weight: Option[Double]) extends Widget

object Widget {
  implicit val WidgetLayout: Layout[Widget] = {
    Layout[Either[(Int, Option[Double]), ((Int, Int), Option[Double])]].transform(
      {
        case Left((r, wt)) => Sprocket(r, wt)
        case Right(((l, w), wt)) => Doodad(l, w, wt)
      },
      {
        case Sprocket(r, wt) => Left((r, wt))
        case Doodad(l, w, wt) => Right(((l, w), wt))
      }
    )
  }
}

You can see the full Widget code/example in the bonsai-example sub project. In that example, we compress a Vector[Option[Widget]] using the layout and end up with over a 6x reduction in memory requirements.

Currently, Bonsai focuses mainly on compressing the overhead of the structure your data requires (eg options, eithers, tuples), rather than the data itself. This will likely change in future releases, and we'll support better compression for primitive types, as well as things like dictionary encoding for all types.

Using Bonsai in SBT or Maven

Bonsai is published on sonatype. To use it in your SBT project, you can add the following to your build.sbt:

libraryDependencies += "com.stripe" %% "bonsai" % "0.3.0"

Miscellaneous

Bonsai is Open Source and available under the MIT License.

For more help, feel free to contact the authors or create an issue.

More Repositories

1

stripe-node

Node.js library for the Stripe API.
TypeScript
3,591
star
2

stripe-php

PHP library for the Stripe API.
PHP
3,558
star
3

stripe-ios

Stripe iOS SDK
Swift
2,023
star
4

stripe-go

Go library for the Stripe API.
Go
1,948
star
5

stripe-ruby

Ruby library for the Stripe API.
Ruby
1,866
star
6

veneur

A distributed, fault-tolerant pipeline for observability data
Go
1,690
star
7

react-stripe-js

React components for Stripe.js and Stripe Elements
TypeScript
1,612
star
8

stripe-cli

A command-line tool for Stripe
Go
1,545
star
9

stripe-python

Python library for the Stripe API.
Python
1,507
star
10

stripe-dotnet

Stripe.net is a sync/async .NET 4.6.1+ client, and a portable class library for stripe.com.
C#
1,307
star
11

stripe-mock

stripe-mock is a mock HTTP server that responds like the real Stripe API. It can be used instead of Stripe's testmode to make test suites integrating with Stripe faster and less brittle.
Go
1,278
star
12

stripe-android

Stripe Android SDK
Kotlin
1,207
star
13

stripe-react-native

React Native library for Stripe.
TypeScript
1,200
star
14

smokescreen

A simple HTTP proxy that fogs over naughty URLs
Go
988
star
15

elements-examples

Stripe Elements examples.
HTML
987
star
16

stripe-java

Java library for the Stripe API.
Java
735
star
17

skycfg

Skycfg is an extension library for the Starlark language that adds support for constructing Protocol Buffer messages.
Go
629
star
18

stripe-connect-rocketrides

Sample on-demand platform built on Stripe: Connect onboarding for pilots, iOS app for passengers to request rides.
Swift
614
star
19

stripe-js

Loading wrapper for Stripe.js
TypeScript
561
star
20

poncho

Easily create REST APIs
Ruby
518
star
21

rainier

Bayesian inference in Scala.
Scala
435
star
22

openapi

An OpenAPI specification for the Stripe API.
351
star
23

stripe-billing-typographic

โšก๏ธTypographic is a webfont service (and demo) built with Stripe Billing.
JavaScript
216
star
24

subprocess

A port of Python's subprocess module to Ruby
Ruby
193
star
25

carbon-removal-source-materials

Source materials supporting Stripe Climate carbon removal purchases (http://stripe.com/climate)
189
star
26

dagon

Tools for rewriting and optimizing DAGs (directed-acyclic graphs) in Scala
Scala
148
star
27

pg-schema-diff

Go library for diffing Postgres schemas and generating SQL migrations
Go
141
star
28

ios-dashboard-ui

[DEPRECATED] UI components from the Stripe Dashboard iOS app
Swift
136
star
29

stripe-apps

Stripe Apps lets you embed custom user experiences directly in the Stripe Dashboard and orchestrate the Stripe API.
TypeScript
134
star
30

vscode-stripe

Stripe for Visual Studio Code
TypeScript
117
star
31

example-mobile-backend

A simple, easy-to-deploy backend that you can use to demo our example mobile apps.
Ruby
105
star
32

stripe.github.io

A landing page for Stripe's GitHub organization
HTML
94
star
33

stripe-terminal-ios

Stripe Terminal iOS SDK
Objective-C
92
star
34

stripe-demo-connect-roastery-saas-platform

Roastery demo SaaS platform using Stripe Connect
JavaScript
89
star
35

stripe-terminal-react-native

React Native SDK for Stripe Terminal
TypeScript
84
star
36

example-terminal-backend

A simple, easy-to-deploy backend that you can use to run the Stripe Terminal example apps
Ruby
79
star
37

stripe-terminal-android

Stripe Terminal Android SDK
Java
77
star
38

unilog

A logger for use with daemontools.
Go
77
star
39

stripe-terminal-js-demo

Demo app for the Stripe Terminal JS SDK
JavaScript
75
star
40

stripe-postman

Postman collection for Stripe's API
73
star
41

goforit

A feature flags client library for Go
Go
68
star
42

stripe-connect-custom-rocketdeliveries

Sample on-demand platform built on Stripe Connect: Custom Accounts and Connect Onboarding for deliveries. https://rocketdeliveries.io
JavaScript
62
star
43

tracer-objc

Generic record & playback framework for Objective-C
Objective-C
51
star
44

mobile-viewport-control

Dynamically control the mobile viewport
JavaScript
47
star
45

log4j-remediation-tools

Tools for remediating the recent log4j2 RCE vulnerability (CVE-2021-44228)
Go
41
star
46

terminal-js

Loading wrapper for the Terminal JS SDK
TypeScript
30
star
47

stripe-reachability

A bash script to test access to the Stripe API
Shell
29
star
48

krl

OpenSSH Key Revocation List support for Go
Go
29
star
49

stripe-identity-react-native

React Native library for Stripe Identity
TypeScript
28
star
50

stripe-magento2-releases

22
star
51

checkout-sales-demo

Sales demo of Stripe Checkout with different locales around the world.
HTML
12
star
52

stripe-ios-spm

Swift Package Manager mirror for the Stripe iOS SDK. See http://github.com/stripe/stripe-ios for details.
12
star
53

connect-js

Loading wrapper for Connect.js
TypeScript
11
star
54

react-connect-js

React components for Connect.js and Connect embedded components
TypeScript
11
star
55

salesforce-connector-examples

Stripe Salesforce Connector examples
JavaScript
10
star
56

stripe-mirakl-connector

Official Stripe Mirakl Connector
PHP
10
star
57

homebrew-stripe-mock

Homebrew tap for stripestub
Ruby
6
star
58

homebrew-stripe-cli

Ruby
6
star
59

.github

Stripe uses the Contributor Covenant Code of Conduct for our open-source community.
6
star
60

scoop-stripe-cli

4
star
61

ssf-ruby

A Ruby client for the Sensor Sensibility Format
Ruby
4
star
62

stripe-sfcc-b2c-connector

JavaScript
2
star