• Stars
    star
    141
  • Rank 259,971 (Top 6 %)
  • Language
    Scala
  • License
    MIT License
  • Created about 9 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Beautiful trees, without the landscaping.

Bonsai

Beautiful trees, without the landscaping. Bonsai is a Scala library for transforming arbitrary tree structures into read-only versions that take up a fraction of the space.

Overview

Bonsai compresses trees in 2 ways: by using significantly less space to store the tree structure itself (tree compression), and by encoding the node labels in a memory efficient structure (label compression).

What is a "Tree"?

Bonsai works over arbitrary trees, so it assumes a fairly generic interface for interacting with trees. In Bonsai a tree;

  • has 0 or 1 root nodes
  • each node has 0 or more children
  • each node has a label attached to it

The actual type of the node is unimportant. What is important is the node labels and the relationships between the nodes (parent, child, sibling, etc). This structure is enough to describe most of the types of trees you are familiar with.

Bonsai encodes this notion of trees with the TreeOps type class. Here is a truncated version of the type class:

trait TreeOps[Tree, Label] {

  /** The type of the nodes in the tree. */
  type Node

  /**
   * Returns the root node of the tree.
   */
  def root(t: Tree): Option[Node]

  /**
   * Returns all the direct children of the given node. The order may or may
   * not matter. TreeOps does not provide any guarantees here.
   */
  def children(node: Node): Iterable[Node]

  /**
   * Returns the label attached to the given node.
   */
  def label(node: Node): Label

  ...
}

The type T is our actual tree type. The Node type is the way we reference internal nodes in our tree T. The actual type of Node isn't important, however, and is mostly an implementation detail. The important bit is the Label type, which is the user-facing data associated with each node.

The bonsai-example subproject has an example of a Huffman tree. A Huffman tree is used to store a Huffman coding for decoding a compressed message (a bitstring). We decode the bitstring, bit-by-bit, using the tree.

Starting at the root of the tree, we follow the left child if the current bit is a 0 and the right child if it is a 1. We continue until reaching a leaf node, at which poitn we output the symbol associated with it, then start back at the beginning of the tree. When we've exhausted the entire bitstring, we'll have our decoded message.

Here is how we may implement a Huffman tree in Scala:

sealed trait HuffmanTree[+A]
case class Branch[+A](zero: HuffmanTree[A], one: HuffmanTree[A]) extends HuffmanTree[A]
case class Leaf[+A](value: A) extends HuffmanTree[A]

And here is how we would implement its TreeOps instance:

import com.stripe.bonsai.TreeOps

object HuffmanTree {
  implicit def huffmanTreeOps[A]: TreeOps[HuffmanTree[A], Option[A]] =
    new TreeOps[HuffmanTree[A], Option[A]] {
      type Node = HuffmanTree[A]

      def root(tree: HuffmanTree[A]): Option[HuffmanTree[A]] = Some(tree)
      def children(tree: HuffmanTree[A]): Iterable[HuffmanTree[A]] = tree match {
        case Branch(l, r) => l :: r :: Nil
        case _ => Nil
      }
      def label(tree: HuffmanTree[A]): Option[A] = tree match {
        case Leaf(value) => Some(value)
        case _ => None
      }
    }
}

As long as we are careful to implement all our operations on a Huffman tree by using its more generic TreeOps interface, rather than HuffmanTree directly, we can then swap out the actual tree data structure, without affecting the code using it.

For example, below we implement a decode operation as an implicit class using just TreeOps.

import scala.collection.immutable.BitSet

implicit class HuffmanTreeOps[T, A](tree: T)(implicit treeOps: TreeOps[T, Option[A]]) {
  // Importing treeOps gives us some useful methods on `tree`
  import treeOps._

  def decode(bits: BitSet, len: Int): Vector[A] = {
    val root = tree.root.get
    val (_, result) = (0 until len)
      .foldLeft((root, Vector.empty[A])) { case ((node, acc), i) =>
        node.label match {
          case Some(value) => (root, acc :+ value)
          case None if bits(i) => (node.children.head, acc)
          case None => (node.children.iterator.drop(1).next, acc)
        }
      }
    result
  }
}

The goal of this indirection through TreeOps is to let us use a compressed version of the tree instead of an actual HuffmanTree, which will see below.

Tree Compression

Bonsai's tree compression is based off of a succinct data structure for binary trees. Bonsai supports k-ary trees by first transforming the original tree into a left-child right-sibling tree, which preserves all the relationships from the original tree, but ensures we have at most 2 children per node. You can read more about the details of the actual compression algorithm used in "Practical Implementation of Rank and Select Queries". The upshot is that we can store the entire structure of a tree in only ~2.73bits per node. This replaces the normal strategy of using JVM objects for nodes and references to store the relationships.

We actually compress trees by transforming them into Bonsai Trees. Bonsai's Tree constructor takes any arbitrary tree T that has a TreeOps[T] available and will return a Tree with the same structure and labels (and Label type) as the original tree. However, the entire structure and labels of the tree will have been compressed, so this new tree requires significantly less space.

In the example in bonsai-example, we use the Huffman encoding described above to construct a simple Huffman tree for the printable ASCII characters (0x20 -> 0x7E) and compress it using Bonsai's Tree. The result is a 11x reduction in memory requirements. Since our decode operation was implemented using TreeOps, we can use this compressed tree just as we would've used the original tree.

This example is a bit contrived, since the trees are small to begin with, but you can imagine that applying this to a large random forest yields great results.

Label Compression

Bonsai provides a Layout type class, along with some simple combinators, for describing how to (de)serialize your labels. At the lowest level are a set of Layout "primitives" that can encode simple data types into compact data structures. The combinators then allow more complex structures to be described (tuples, Either, mappings to case classes, etc), without adding much, if any, overhead.

Here is an example of a Layout for some Widget type we made up:

import com.stripe.bonsai.Layout

sealed trait Widget
case class Sprocket(radius: Int, weight: Option[Double]) extends Widget
case class Doodad(length: Int, width: Int, weight: Option[Double]) extends Widget

object Widget {
  implicit val WidgetLayout: Layout[Widget] = {
    Layout[Either[(Int, Option[Double]), ((Int, Int), Option[Double])]].transform(
      {
        case Left((r, wt)) => Sprocket(r, wt)
        case Right(((l, w), wt)) => Doodad(l, w, wt)
      },
      {
        case Sprocket(r, wt) => Left((r, wt))
        case Doodad(l, w, wt) => Right(((l, w), wt))
      }
    )
  }
}

You can see the full Widget code/example in the bonsai-example sub project. In that example, we compress a Vector[Option[Widget]] using the layout and end up with over a 6x reduction in memory requirements.

Currently, Bonsai focuses mainly on compressing the overhead of the structure your data requires (eg options, eithers, tuples), rather than the data itself. This will likely change in future releases, and we'll support better compression for primitive types, as well as things like dictionary encoding for all types.

Using Bonsai in SBT or Maven

Bonsai is published on sonatype. To use it in your SBT project, you can add the following to your build.sbt:

libraryDependencies += "com.stripe" %% "bonsai" % "0.3.0"

Miscellaneous

Bonsai is Open Source and available under the MIT License.

For more help, feel free to contact the authors or create an issue.

More Repositories

1

stripe-node

Node.js library for the Stripe API.
TypeScript
3,874
star
2

stripe-php

PHP library for the Stripe API.
PHP
3,736
star
3

stripe-go

Go library for the Stripe API.
Go
2,144
star
4

stripe-ios

Stripe iOS SDK
Swift
2,110
star
5

stripe-ruby

Ruby library for the Stripe API.
Ruby
1,952
star
6

react-stripe-js

React components for Stripe.js and Stripe Elements
TypeScript
1,757
star
7

veneur

A distributed, fault-tolerant pipeline for observability data
Go
1,734
star
8

stripe-python

Python library for the Stripe API.
Python
1,669
star
9

stripe-cli

A command-line tool for Stripe
Go
1,609
star
10

stripe-mock

stripe-mock is a mock HTTP server that responds like the real Stripe API. It can be used instead of Stripe's testmode to make test suites integrating with Stripe faster and less brittle.
Go
1,374
star
11

stripe-dotnet

Stripe.net is a sync/async .NET 4.6.1+ client, and a portable class library for stripe.com.
C#
1,362
star
12

stripe-android

Stripe Android SDK
Kotlin
1,277
star
13

stripe-react-native

React Native library for Stripe.
TypeScript
1,272
star
14

smokescreen

A simple HTTP proxy that fogs over naughty URLs
Go
1,081
star
15

elements-examples

Stripe Elements examples.
HTML
996
star
16

stripe-java

Java library for the Stripe API.
Java
818
star
17

skycfg

Skycfg is an extension library for the Starlark language that adds support for constructing Protocol Buffer messages.
Go
646
star
18

stripe-connect-rocketrides

Sample on-demand platform built on Stripe: Connect onboarding for pilots, iOS app for passengers to request rides.
Swift
634
star
19

stripe-js

Loading wrapper for Stripe.js
TypeScript
625
star
20

poncho

Easily create REST APIs
Ruby
516
star
21

rainier

Bayesian inference in Scala.
Scala
433
star
22

openapi

An OpenAPI specification for the Stripe API.
391
star
23

pg-schema-diff

Go library for diffing Postgres schemas and generating SQL migrations
Go
364
star
24

stripe-billing-typographic

โšก๏ธTypographic is a webfont service (and demo) built with Stripe Billing.
JavaScript
216
star
25

subprocess

A port of Python's subprocess module to Ruby
Ruby
208
star
26

carbon-removal-source-materials

Source materials supporting Stripe Climate carbon removal purchases (http://stripe.com/climate)
190
star
27

stripe-apps

Stripe Apps lets you embed custom user experiences directly in the Stripe Dashboard and orchestrate the Stripe API.
TypeScript
148
star
28

dagon

Tools for rewriting and optimizing DAGs (directed-acyclic graphs) in Scala
Scala
148
star
29

ios-dashboard-ui

[DEPRECATED] UI components from the Stripe Dashboard iOS app
Swift
138
star
30

vscode-stripe

Stripe for Visual Studio Code
TypeScript
123
star
31

stripe.github.io

A landing page for Stripe's GitHub organization
HTML
117
star
32

stripe-terminal-react-native

React Native SDK for Stripe Terminal
TypeScript
108
star
33

example-mobile-backend

A simple, easy-to-deploy backend that you can use to demo our example mobile apps.
Ruby
106
star
34

stripe-terminal-ios

Stripe Terminal iOS SDK
Objective-C
102
star
35

stripe-demo-connect-roastery-saas-platform

Roastery demo SaaS platform using Stripe Connect
JavaScript
94
star
36

stripe-terminal-android

Stripe Terminal Android SDK
Java
93
star
37

example-terminal-backend

A simple, easy-to-deploy backend that you can use to run the Stripe Terminal example apps
Ruby
88
star
38

stripe-terminal-js-demo

Demo app for the Stripe Terminal JS SDK
JavaScript
82
star
39

stripe-postman

Postman collection for Stripe's API
77
star
40

unilog

A logger for use with daemontools.
Go
77
star
41

goforit

A feature flags client library for Go
Go
70
star
42

stripe-connect-custom-rocketdeliveries

Sample on-demand platform built on Stripe Connect: Custom Accounts and Connect Onboarding for deliveries. https://rocketdeliveries.io
JavaScript
69
star
43

tracer-objc

Generic record & playback framework for Objective-C
Objective-C
52
star
44

mobile-viewport-control

Dynamically control the mobile viewport
JavaScript
48
star
45

log4j-remediation-tools

Tools for remediating the recent log4j2 RCE vulnerability (CVE-2021-44228)
Go
41
star
46

terminal-js

Loading wrapper for the Terminal JS SDK
TypeScript
34
star
47

stripe-connect-furever-demo

Code sample demo built on Stripe Connect embedded components.
TypeScript
34
star
48

stripe-identity-react-native

React Native library for Stripe Identity
TypeScript
33
star
49

stripe-reachability

A bash script to test access to the Stripe API
Shell
32
star
50

krl

OpenSSH Key Revocation List support for Go
Go
30
star
51

stripe-magento2-releases

27
star
52

react-connect-js

React components for Connect.js and Connect embedded components
TypeScript
22
star
53

connect-js

Loading wrapper for Connect.js
TypeScript
21
star
54

stripe-mirakl-connector

Official Stripe Mirakl Connector
PHP
13
star
55

salesforce-connector-examples

Stripe Salesforce Connector examples
Apex
12
star
56

checkout-sales-demo

Sales demo of Stripe Checkout with different locales around the world.
HTML
12
star
57

stripe-ios-spm

Swift Package Manager mirror for the Stripe iOS SDK. See http://github.com/stripe/stripe-ios for details.
12
star
58

.github

Stripe uses the Contributor Covenant Code of Conduct for our open-source community.
8
star
59

homebrew-stripe-mock

Homebrew tap for stripestub
Ruby
7
star
60

scoop-stripe-cli

7
star
61

homebrew-stripe-cli

Ruby
7
star
62

stripe-sfcc-b2c-connector

JavaScript
7
star
63

stripe-commercetools-connect-app

TypeScript
5
star
64

ssf-ruby

A Ruby client for the Sensor Sensibility Format
Ruby
5
star
65

terminal-connector-releases

Release binaries for Stripe Terminal connectors
3
star
66

open-banking-v2-docs

Open banking docs for V2 APIs
CSS
2
star
67

vscode-endsmart

vscode extension to smartly add `end` keywords
TypeScript
1
star