• Stars
    star
    472
  • Rank 93,034 (Top 2 %)
  • Language
    C
  • Created almost 16 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

BloomFilter(s) in Ruby: Native counting filter + Redis counting/non-counting filters

BloomFilter(s) in Ruby

  • Native (MRI/C) counting bloom filter
  • Redis-backed getbit/setbit non-counting bloom filter
  • Redis-backed set-based counting (+TTL) bloom filter

Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. For more detail, check the wikipedia article. Instead of using k different hash functions, this implementation seeds the CRC32 hash with k different initial values (0, 1, ..., k-1). This may or may not give you a good distribution, it all depends on the data.

Performance of the Bloom filter depends on a number of variables:

  • size of the bit array
  • size of the counter bucket
  • number of hash functions

Resources


MRI/C API Example

MRI/C implementation which creates an in-memory filter which can be saved and reloaded from disk.

require 'bloomfilter-rb'

bf = BloomFilter::Native.new(:size => 100, :hashes => 2, :seed => 1, :bucket => 3, :raise => false)
bf.insert("test")
bf.include?("test")     # => true
bf.include?("blah")     # => false

bf.delete("test")
bf.include?("test")     # => false

# Hash with a bloom filter!
bf["test2"] = "bar"
bf["test2"]             # => true
bf["test3"]             # => false

bf.stats
# => Number of filter bits (m): 10
# => Number of filter elements (n): 2
# => Number of filter hashes (k) : 2
# => Predicted false positive rate = 10.87%

Redis-backed setbit/getbit bloom filter

Uses getbit/setbit on Redis strings - efficient, fast, can be shared by multiple/concurrent processes.

bf = BloomFilter::Redis.new

bf.insert('test')
bf.include?('test')     # => true
bf.include?('blah')     # => false

bf.delete('test')
bf.include?('test')     # => false

Memory footprint

  • 1.0% error rate for 1M items, 10 bits/item: 2.5 mb
  • 1.0% error rate for 150M items, 10 bits per item: 358.52 mb
  • 0.1% error rate for 150M items, 15 bits per item: 537.33 mb

Redis-backed counting bloom filter with TTLs

Uses regular Redis get/set counters to implement a counting filter with optional TTL expiry. Because each "bit" requires its own key in Redis, you do incur a much larger memory overhead.

bf = BloomFilter::CountingRedis.new(:ttl => 2)

bf.insert('test')
bf.include?('test')     # => true

sleep(2)
bf.include?('test')     # => false

Credits

Tatsuya Mori [email protected] (Original C implementation: http://vald.x0.com/sb/)

License

MIT License - Copyright (c) 2011 Ilya Grigorik

More Repositories

1

videospeed

HTML5 video speed controller (for Google Chrome)
JavaScript
3,812
star
2

ga-beacon

Google Analytics collector-as-a-service (using GA measurement protocol).
Go
3,536
star
3

gharchive.org

GH Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.
Ruby
2,680
star
4

em-websocket

EventMachine based WebSocket server
Ruby
1,690
star
5

decisiontree

ID3-based implementation of the ML Decision Tree algorithm
Ruby
1,437
star
6

em-http-request

Asynchronous HTTP Client (EventMachine + Ruby)
Ruby
1,219
star
7

em-synchrony

Fiber aware EventMachine clients and convenience classes
Ruby
1,041
star
8

http-2

Pure Ruby implementation of HTTP/2 protocol
Ruby
894
star
9

bugspots

Implementation of simple bug prediction hotspot heuristic
Ruby
853
star
10

agent

Agent is an attempt at modelling Go-like concurrency, in Ruby
Ruby
729
star
11

vimgolf

Real Vim ninjas count every keystroke - do you?
Ruby
678
star
12

em-proxy

EventMachine Proxy DSL for writing high-performance transparent / intercepting proxies in Ruby
Ruby
662
star
13

node-spdyproxy

SPDY forwarding proxy - fast and secure
JavaScript
527
star
14

async-rails

async Rails 3 stack demo
Ruby
465
star
15

istlsfastyet.com

Is TLS fast yet? Yes, yes it is.
HTML
422
star
16

hackernews-button

Embeddable Hacker News button + vote counter for your site
Go
415
star
17

http-client-hints

Ruby
402
star
18

spdy

SPDY is a protocol designed to reduce latency of web pages
Ruby
315
star
19

hpbn.co

High Performance Browser Networking (O'Reilly)
HTML
299
star
20

webp-detect

WebP with Accept negotiation
C++
242
star
21

zeroconf-router

Zero-config reverse proxies: let's get there!
Ruby
206
star
22

autoperf

Ruby driver for httperf - automated load and performance testing
Ruby
179
star
23

PubSubHubbub

Asynchronous PubSubHubbub Ruby Client
Ruby
175
star
24

heroku-buildpack-dart

Heroku buildpack for Dart
Shell
170
star
25

rack-speedtracer

SpeedTracer middleware for server side debugging
Ruby
155
star
26

textquery

Evaluate any text against a collection of match rules
Ruby
145
star
27

tokyo-recipes

Lean & mean Tokyo Cabinet recipes (with Lua)
Lua
143
star
28

slowgrowl

Surface slow code paths in your Rails 3 app via Growl
Ruby
116
star
29

mneme

Mneme is an HTTP web-service for recording and identifying previously seen records - aka, duplicate detection.
Ruby
108
star
30

RRRDTool

Round robin database pattern via Redis sorted sets
Ruby
79
star
31

pregel

Single-node implementation of Google's Pregel framework for graph processing.
Ruby
74
star
32

gmetric

Pure Ruby interface for generating Ganglia gmetric packets
Ruby
69
star
33

rack-aggregate

Rack response-time statistics aggregator middleware
Ruby
67
star
34

em-jack

An Evented Beanstalk Client
Ruby
64
star
35

rb-pagerank

Code from RailsConf '09 pres: Building Mini Google in Ruby
Ruby
54
star
36

closure-sprockets

Sprockets processor for Google's Closure tools
Python
54
star
37

netinfo-monitor

Displays network quality as reported by Network Information API.
JavaScript
48
star
38

shopify-core-web-vitals

This embedded app provides a report on how real-world Google Chrome users experience the Shopify-powered storefront, as captured by the Chrome UX Report, and enables the site owner to benchmark their site against a custom list of competitors.
Ruby
48
star
39

libsnappy

Snappy, a fast compressor/decompressor (courtesy of Google)
Ruby
46
star
40

hydra5

Load-balanced (multi-headed) SOCKS5 proxy
Ruby
42
star
41

zdevice

ZDevice is a Ruby DSL for assembling ZeroMQ routing devices, with support for the ZDCF configuration syntax
Ruby
42
star
42

ruby2lolz

Ruby to Lolcode translator, kthnxbai.
Ruby
38
star
43

bmr-wordcount

Browser Map-Reduce: distributed word count example
Ruby
33
star
44

resource-hints

Moved to...
JavaScript
32
star
45

gitter

XML history generator for CodeSwarm
32
star
46

em-socksify

Transparent proxy support for any EventMachine protocol
Ruby
31
star
47

em-handlersocket

EventMachine HandlerSocket MySQL plugin for direct read/write of InnoDB tables
Ruby
29
star
48

canicrawl

Hosted robots.txt permissions verifier
Go
23
star
49

udacity-webperf

JavaScript
17
star
50

omnipipe

web pipes for your browser's omnibar!
Ruby
12
star
51

issue-tracker

W3C webperf issue tracker
JavaScript
11
star
52

contextual

runtime contextual HTML autoescaper
Ruby
10
star
53

presentations

Slides, notes, code examples from some of the bigger conferences & talks.
9
star
54

libgeohash

Ruby FFI wrapper for libgeohash
Ruby
7
star
55

performance-observer

JavaScript
7
star
56

ImageQuote

Convert text quotes to images
Ruby
7
star
57

resourcehints.info

HTML
2
star
58

igrigorik

1
star