• Stars
    star
    296
  • Rank 140,464 (Top 3 %)
  • Language
    Java
  • License
    BSD 2-Clause "Sim...
  • Created over 9 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Cross language bloom filter implementation

inbloom

inbloom - a cross language Bloom filter implementation (https://en.wikipedia.org/wiki/Bloom_filter).

inbloom

What's a Bloom Filter?

A Bloom filter is a probabalistic data structure which provides an extremely space-efficient method of representing large sets. It can have false positives but never false negatives which means a query returns either "possibly in set" or "definitely not in set".

You can tune a Bloom filter to the desired error rate, it's basically a tradeoff between size and accuracy (See example: http://hur.st/bloomfilter). For example, a filter for about 100 keys with 1% error rate can be compressed to just 120 bytes. With 5% error rate it can be compressed to 78 bytes.

Why Cross Language?

At EverythingMe we have an Android client written in Java, and servers written mostly in Python and Go. When we wanted to pass filters from the client to the server to avoid saving some state on the server side, we needed an efficient implementation that can read and write Bloom Filters in all three languages at least, and found none.

Having such a library allows us to send filters between clients and any server component easily.

So we decided to build on top of an existing simple implementation in C called libbloom (https://github.com/jvirkki/libbloom) and expand it to all 3 langauges. We chose to use the original C implementation for the Python version only, and translated the code to pure Java and Go, without calling any C code. We chose this approach because the original C code is fairly short and straightforward, so porting it to other languages was a simple task; and avoiding calling C from Java and Go simplifies and shortens the build process, and reduces executable size - in both cases.

Filter headers

InBloom provides utilities for serializing / deserializing Bloom filters so they can be sent over the network. Since when you create a Bloom filter, you need to initialize it with parameters of expected cardinality and false positive rates, they are also needed to read a filter written by another party. Instead of choosing fixed parameters in our configurations, we opted for encoding those parameters as a header when serizlizing the filter. We've added a 16 bit checksum for good measure as part of the header.

Serialized filter structure:

Field Type bits
checksum ushort 16
errorRate (1/N) ushort 16
cardinality int 32
data byte[] ?

Installation

Python

pip install inbloom

Go

go get github.com/EverythingMe/inbloom/go/inbloom

Java

Add the following lines to your build.gradle script.

repositories {
    jcenter {
        url 'http://dl.bintray.com/everythingme/generic'
    }
}

dependencies {
    compile 'me.everything:inbloom:0.1'
}

Example Usage

Python

import inbloom
import base64
import requests

# Basic usage
bf = inbloom.Filter(entries=100, error=0.01)
bf.add("abc")
bf.add("def")

assert bf.contains("abc")
assert bf.contains("def")
assert not bf.contains("ghi")

bf2 = inbloom.Filter(entries=100, error=0.01, data=bf.buffer())
assert bf2.contains("abc")
assert bf2.contains("def")
assert not bf2.contains("ghi")


# Serialization
payload = 'Yg0AZAAAABQAAAAAACAAEAAIAAAAAAAAIAAQAAgABAA='
assert base64.b64encode(inbloom.dump(inbloom.load(base64.b64decode(payload)))) == payload

# Sending it over HTTP
serialized = base64.b64encode(inbloom.dump(bf))
requests.get('http://api.endpoint.me', params={'filter': serialized})

Go

// create a blank filter - expecting 20 members and an error rate of 1/100
f, err := NewFilter(20, 0.01)
if err != nil {
    panic(err)
}

// the size of the filter
fmt.Println(f.Len())

// insert some values
f.Add("foo")
f.Add("bar")

// test for existence of keys
fmt.Println(f.Contains("foo"))
fmt.Println(f.Contains("wat"))

fmt.Println("marshaled data:", f.MarshalBase64())

// Output:
// 24
// true
// false
// marshaled data: oU4AZAAAABQAAAAAAEIAABEAGAQAAgAgAAAwEAAJAAA=
// a 20 cardinality 0.01 precision filter with "foo" and "bar" in it
data := "oU4AZAAAABQAAAAAAEIAABEAGAQAAgAgAAAwEAAJAAA="

// load it from base64
f, err := UnmarshalBase64(data)
if err != nil {
    panic(err)
}

// test it...
fmt.Println(f.Contains("foo"))
fmt.Println(f.Contains("wat"))
fmt.Println(f.Len())

// dump to pure binary
fmt.Printf("%x\n", f.Marshal())
// Output:
// true
// false
// 24
// a14e006400000014000000000042000011001804000200200000301000090000

Java

import me.everything.inbloom.BloomFilter;
import me.everything.inbloom.BinAscii;  // Optional - for hex representation

// The basics
BloomFilter bf = new BloomFilter(20, 0.01);
bf.add("foo");
bf.add("bar");

assertTrue(bf.contains("foo"));
assertTrue(bf.contains("bar"));
assertFalse(bf.contains("baz"));


BloomFilter bf2 = new BloomFilter(bf.bf, bf.entries, bf.error);
assertTrue(bf2.contains("foo"));
assertTrue(bf2.contains("bar"));
assertFalse(bf2.contains("baz"));

// Serialization
String serialized = BinAscii.hexlify(BloomFilter.dump(bf));
System.out.printf("Serialized: %s\n", serialized);

String hexPayload = "620d006400000014000000000020001000080000000000002000100008000400";
BloomFilter deserialized = BloomFilter.load(BinAscii.unhexlify(hexPayload));
String dump = BinAscii.hexlify(BloomFilter.dump(deserialized));
System.out.printf("Re-Serialized: %s\n", dump);
assertEquals(dump.toLowerCase(), hexPayload);

assertEquals(deserialized.entries, 20);
assertEquals(deserialized.error, 0.01);
assertTrue(deserialized.contains("abc"));

More Repositories

1

overscroll-decor

Android: iOS-like over-scrolling effect applicable over almost all scrollable Android views.
Java
2,833
star
2

easy-content-providers

Easy integration with Android's built-in and custom content providers data
Java
351
star
3

geodis

A redis based geo-resolving library
Python
322
star
4

webp-android

webp support for Android API Level 4 and up, includes the native library and a WebPImageView to render webp in your app
C
266
star
5

OverScrollView

A ScrollView with over scroll capabilities, a complete replacement for android's ScrollView.
Java
264
star
6

kickass-redis

Loose python framework for kickass redis patterns
Python
154
star
7

plaxien

An Android library to create Explain Views for algorithms
Java
142
star
8

go-disque

Go client for Disque
Go
136
star
9

ncdu-s3

Run ncdu on S3 buckets
Python
110
star
10

redshift_console

Redshift Ops Console
JavaScript
92
star
11

openspace

An open source app to showcase your open source work (yo dawg)
JavaScript
74
star
12

fabric-aws

fabric AWS integration
Python
72
star
13

rainbow

Cloudformation on steroids
Python
55
star
14

vertex

Go API management framework
JavaScript
54
star
15

probe

Android performance instrumentation tool
Python
42
star
16

webp-test

Testing WebP compression for app icons
Ruby
33
star
17

click-config

Config parsing for click cli applications
Python
31
star
18

recat

logcat alternative with on the fly log deobfuscation!
Python
31
star
19

meduza

A fast data store on top of redis
Go
29
star
20

teleport

Execute Python code in country-specific networking context
Python
21
star
21

pyretrace

A python reimplementation on Proguard's Retrace
Python
20
star
22

lobo

Our Dev Flow Tool
Python
20
star
23

TouchyJS

Mobile web application framework
JavaScript
18
star
24

disposable-redis

Create disposable redis servers on the fly for testing
Go
16
star
25

pystream

Stream backups directly to/from S3/HDFS without wasting disk space during the process
Python
14
star
26

gofigure

A simple config file reading library for Go
Go
11
star
27

pyfrank

python binding for iOS automation using frank.
Python
9
star
28

Screaction

A js script enabling you to change css values as the page scrolls
JavaScript
8
star
29

jitter

Jitt Command Line Tool
Python
7
star
30

everythingme.github.io

A showcase of our open source work, built with openspace
JavaScript
7
star
31

android-logger

Java
6
star
32

ffos-notes

Notes App for FirefoxOS
JavaScript
5
star
33

pytest-blocker

A pytest plugin to mark a test as blocker and skip all tests after it upon failure.
Python
4
star
34

sublime.me

SublimeText3 setup for Everything.me FEDS
JavaScript
3
star
35

jitt

A crowd-sourcing oriented localization service for Andorid applications
CSS
3
star
36

python-disposable-redis

Disposable Redis for your Unittests
Python
3
star
37

logstash-scribeinput

A logstash input plugin which receives scribe log entries via thrift
Java
2
star
38

generator-savedata

Photoshop plugin to save document data for BabelFish
JavaScript
2
star
39

cql_dump

Python
1
star
40

go-accuweather

Go wrapper for the Accuweather API.
Go
1
star