• Stars
    star
    108
  • Rank 309,760 (Top 7 %)
  • Language
    Ruby
  • Created about 13 years ago
  • Updated almost 11 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Mneme is an HTTP web-service for recording and identifying previously seen records - aka, duplicate detection.

Mneme

mneme (n.)  mne·me
   1. Psychology: the retentive basis or basic principle in a mind or organism accounting for memory, persisting effect of memory of past events.
   2. Mythology: the Muse of memory, one of the original three Muses. Cf."Aoede, Melete."

Mneme is an HTTP web-service for recording and identifying previously seen records - aka, duplicate detection. To achieve this goal in a scalable, and zero-maintenance manner, it is implemented via a collection of automatically rotated bloomfilters. By using a collection of bloomfilters, you can customize your false-positive error rate, as well as the amount of time you want your memory to perist (ex: remember all keys for the last 6 hours).

To minimize the require memory footprint, mneme does not store the actual key names, instead each specified key is hashed and mapped onto the bloomfilter. For data storage, we use Redis getbit/setbit to efficiently store and retrieve bit-level data for each key. Couple this with Goliath app-server, and you have an out-of-the-box, high-performance, customizable duplicate filter.

For more details: Mneme: Scalable Duplicate Filtering Service

Sample configuration

# example_config.rb

config['namespace'] = 'default' # namespace for your app (if you're sharing a redis instance)
config['periods'] = 3           # number of periods to store data for
config['length']  = 60          # length of a period in seconds (length = 60, periods = 3.. 180s worth of data)

config['size']    = 1000        # desired size of the bloomfilter
config['bits']    = 10          # number of bits allocated per key
config['hashes']  = 7           # number of times each key will be hashed
config['seed']    = 30          # seed value for the hash function

config['pool']    = 2           # number of concurrent Redis connections

To learn more about Bloom filter configuration: Scalable Datasets: Bloom Filters in Ruby

Launching mneme

$> redis-server
$> gem install mneme
$> mneme -p 9000 -sv -c config.rb   # run with -h to see all options

That's it! You now have a mneme web service running on port 9000. Let's try querying and inserting some data:

$> curl "http://127.0.0.1:9000?key=abcd"
{"found":[],"missing":["abcd"]}

# -d creates a POST request with key=abcd, aka insert into filter
$> curl "http://127.0.0.1:9000?key=abcd" -d' '

$> curl "http://127.0.0.1:9000?key=abcd"
{"found":["abcd"],"missing":[]}

Performance & Memory requirements

  • Redis is used as an in-memory datastore of the bloomfilter

  • Goliath provides the high-performance HTTP frontend

  • The speed of storing a new key is: O(number of BF hashes) - aka, O(1)

  • The speed of retrieving a key is: O(number of filters * number of BF hashes) - aka, O(1)

  • Sample ab benchmarks for single key lookup: https://gist.github.com/895326

Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. Because we are using Redis as a backend, in-memory store for the filters, there is some extra overhead. Sample memory requirements:

  • 1.0% error rate for 1M items, 10 bits/item: 2.5 mb
  • 1.0% error rate for 150M items, 10 bits per item: 358.52 mb
  • 0.1% error rate for 150M items, 15 bits per item: 537.33 mb

Ex: If you wanted to store up to 24 hours (with 1 hour = 1 bloom filter) of keys, where each hour can have up to 1M keys, and you are willing to accept a 1.0% error rate, then your memory footprint is: 24 * 2.5mb = 60mb of memory. The footprint will not change after 24 hours, because Mneme will automatically rotate and delete old filters for you!

License

(MIT License) - Copyright (c) 2011 Ilya Grigorik

More Repositories

1

videospeed

HTML5 video speed controller (for Google Chrome)
JavaScript
3,654
star
2

ga-beacon

Google Analytics collector-as-a-service (using GA measurement protocol).
Go
3,527
star
3

gharchive.org

GH Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.
Ruby
2,567
star
4

em-websocket

EventMachine based WebSocket server
Ruby
1,690
star
5

decisiontree

ID3-based implementation of the ML Decision Tree algorithm
Ruby
1,414
star
6

em-http-request

Asynchronous HTTP Client (EventMachine + Ruby)
Ruby
1,216
star
7

em-synchrony

Fiber aware EventMachine clients and convenience classes
Ruby
1,040
star
8

http-2

Pure Ruby implementation of HTTP/2 protocol
Ruby
876
star
9

bugspots

Implementation of simple bug prediction hotspot heuristic
Ruby
841
star
10

agent

Agent is an attempt at modelling Go-like concurrency, in Ruby
Ruby
729
star
11

em-proxy

EventMachine Proxy DSL for writing high-performance transparent / intercepting proxies in Ruby
Ruby
664
star
12

vimgolf

Real Vim ninjas count every keystroke - do you?
Ruby
632
star
13

node-spdyproxy

SPDY forwarding proxy - fast and secure
JavaScript
528
star
14

bloomfilter-rb

BloomFilter(s) in Ruby: Native counting filter + Redis counting/non-counting filters
C
468
star
15

async-rails

async Rails 3 stack demo
Ruby
467
star
16

hackernews-button

Embeddable Hacker News button + vote counter for your site
Go
417
star
17

istlsfastyet.com

Is TLS fast yet? Yes, yes it is.
HTML
417
star
18

http-client-hints

Ruby
401
star
19

spdy

SPDY is a protocol designed to reduce latency of web pages
Ruby
317
star
20

hpbn.co

High Performance Browser Networking (O'Reilly)
HTML
286
star
21

webp-detect

WebP with Accept negotiation
C++
242
star
22

zeroconf-router

Zero-config reverse proxies: let's get there!
Ruby
205
star
23

autoperf

Ruby driver for httperf - automated load and performance testing
Ruby
179
star
24

PubSubHubbub

Asynchronous PubSubHubbub Ruby Client
Ruby
174
star
25

heroku-buildpack-dart

Heroku buildpack for Dart
Shell
166
star
26

rack-speedtracer

SpeedTracer middleware for server side debugging
Ruby
156
star
27

textquery

Evaluate any text against a collection of match rules
Ruby
145
star
28

tokyo-recipes

Lean & mean Tokyo Cabinet recipes (with Lua)
Lua
144
star
29

slowgrowl

Surface slow code paths in your Rails 3 app via Growl
Ruby
117
star
30

RRRDTool

Round robin database pattern via Redis sorted sets
Ruby
79
star
31

pregel

Single-node implementation of Google's Pregel framework for graph processing.
Ruby
72
star
32

gmetric

Pure Ruby interface for generating Ganglia gmetric packets
Ruby
70
star
33

rack-aggregate

Rack response-time statistics aggregator middleware
Ruby
67
star
34

em-jack

An Evented Beanstalk Client
Ruby
65
star
35

rb-pagerank

Code from RailsConf '09 pres: Building Mini Google in Ruby
Ruby
54
star
36

closure-sprockets

Sprockets processor for Google's Closure tools
Python
54
star
37

netinfo-monitor

Displays network quality as reported by Network Information API.
JavaScript
48
star
38

shopify-core-web-vitals

This embedded app provides a report on how real-world Google Chrome users experience the Shopify-powered storefront, as captured by the Chrome UX Report, and enables the site owner to benchmark their site against a custom list of competitors.
Ruby
48
star
39

libsnappy

Snappy, a fast compressor/decompressor (courtesy of Google)
Ruby
45
star
40

hydra5

Load-balanced (multi-headed) SOCKS5 proxy
Ruby
42
star
41

zdevice

ZDevice is a Ruby DSL for assembling ZeroMQ routing devices, with support for the ZDCF configuration syntax
Ruby
42
star
42

ruby2lolz

Ruby to Lolcode translator, kthnxbai.
Ruby
38
star
43

bmr-wordcount

Browser Map-Reduce: distributed word count example
Ruby
33
star
44

em-socksify

Transparent proxy support for any EventMachine protocol
Ruby
32
star
45

resource-hints

Moved to...
JavaScript
32
star
46

gitter

XML history generator for CodeSwarm
31
star
47

em-handlersocket

EventMachine HandlerSocket MySQL plugin for direct read/write of InnoDB tables
Ruby
29
star
48

canicrawl

Hosted robots.txt permissions verifier
Go
23
star
49

udacity-webperf

JavaScript
17
star
50

omnipipe

web pipes for your browser's omnibar!
Ruby
12
star
51

issue-tracker

W3C webperf issue tracker
JavaScript
11
star
52

contextual

runtime contextual HTML autoescaper
Ruby
10
star
53

presentations

Slides, notes, code examples from some of the bigger conferences & talks.
9
star
54

performance-observer

JavaScript
7
star
55

libgeohash

Ruby FFI wrapper for libgeohash
Ruby
7
star
56

ImageQuote

Convert text quotes to images
Ruby
7
star
57

resourcehints.info

HTML
2
star
58

igrigorik

1
star