• Stars
    star
    34
  • Rank 741,938 (Top 16 %)
  • Language
    Crystal
  • License
    MIT License
  • Created over 8 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Bloom filter implementation in Crystal lang

Bloom Filter Build Status

Implementation of Bloom Filter in Crystal lang.

Installation

Add this to your application's shard.yml:

dependencies:
  bloom_filter:
    github: crystal-community/bloom_filter

Usage

Basic

require "bloom_filter"

# Create filter with bitmap size of 32 bytes and 3 hash functions.
filter = BloomFilter.new(bytesize = 32, hash_num = 3)

# Insert elements
filter.insert("Esperanto")
filter.insert("Toki Pona")

# Check elements presence
filter.has?("Esperanto")  # => true
filter.has?("Toki Pona")  # => true
filter.has?("Englsh")     # => false

Creating a filter with optimal parameters

Based on your needs(expected number of items and desired probability of false positives), your can create an optimal bloom filter:

# Create a filter, that with one million inserted items, gives 2% of false positives for #has? method
filter = BloomFilter.new_optimal(1_000_000, 0.02)
filter.bytesize # => 1017796 (993Kb)
filter.hash_num # => 6

Dumping into a file and loading

It's possible to save existing bloom filter as a binary file and then load it back.

filter = BloomFilter.new_optimal(2, 0.01)
filter.insert("Esperanto")
filter.dump_file("/tmp/bloom_languages")

loaded_filter = BloomFilter.load_file("/tmp/bloom_languages")
loaded_filter.has?("Esperanto") # => true
loaded_filter.has?("English")   # => false

Union and intersection

Having two filters of the same size and number of hash functions, it's possible to perform union and intersection operations:

f1 = BloomFilter.new(32, 3)
f1.insert("Esperanto")
f1.insert("Spanish")

f2 = BloomFilter.new(32, 3)
f2.insert("Esperanto")
f2.insert("English")

# Union
f3 = f1 | f2
f3.has?("Esperanto") # => true
f3.has?("Spanish")   # => true
f3.has?("English")   # => true

# Intersection
f4 = f1 & f2
f4.has?("Esperanto") # => true
f4.has?("Spanish")   # => false
f4.has?("English")   # => false

Visualization

If you want to see how your filter looks like, you can visualize it:

f1 = BloomFilter.new(16, 2)
f1.insert("Esperanto")
puts "f1 = (Esperanto)"
puts f1.visualize

f2 = BloomFilter.new(16, 2)
f2.insert("Spanish")
puts "f2 = (Spanish)"
puts f2.visualize

f3 = f1 | f2
puts "f3 = f1 | f2 = (Esperanto, Spanish)"
puts f3.visualize

Output:

f1 = (Esperanto)
░░░░░░░░ ░░░░░░█░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░
â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘ â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘ â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘ â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘ â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–ˆ â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘ â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘ â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘

f2 = (Spanish)
â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘ â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘ â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘ â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘ â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘ â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘ â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘ â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘
░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░█░ ░█░░░░░░

f3 = f1 | f2 = (Esperanto, Spanish)
░░░░░░░░ ░░░░░░█░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░
░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░█ ░░░░░░░░ ░░░░░░█░ ░█░░░░░░

In this way, you can actually see which bits are set:)

Benchmark

Performance of Bloom filter depends on the following parameters:

  • Size of the filter
  • Number of hash functions
  • Length of the input string

To run benchmark from ./samples/benchmark.cr, simply run make task:

$ make benchmark

Number of items: 100000000
Filter size: 117005Kb
Hash functions: 7
String size: 13

                     user     system      total        real
insert           0.004227   0.000000   0.004227 (  2.769349)
has? (present)   0.007980   0.000000   0.007980 (  5.223778)
has? (missing)   0.004318   0.000000   0.004318 (  2.829521)

Contributors

More Repositories

1

icr

Interactive console for Crystal programming language
Crystal
500
star
2

crystal-patterns

📖 Examples of GOF patterns written in Crystal
Crystal
291
star
3

jwt

JWT implementation in Crystal
Crystal
204
star
4

crystal-libraries-needed

A list of libraries that are needed or wanted for the Crystal-Language
141
star
5

msgpack-crystal

MessagePack implementation in Crystal msgpack.org[Crystal]
Crystal
133
star
6

cossack

Simple and flexible HTTP client for Crystal with middleware and test support.
Crystal
105
star
7

hardware

Get CPU, Memory and Network informations of the running OS and its processes
Crystal
71
star
8

kiwi

A unified Crystal interface for key-value stores.
Crystal
61
star
9

toml.cr

TOML parser for Crystal
Crystal
58
star
10

zeromq-crystal

Crystal
46
star
11

crystal-ann

Web site to announce new Crystal projects, blog posts, updates and other work activities
CSS
45
star
12

leveldb

Crystal binding for LevelDB
Crystal
38
star
13

future.cr

Crystal
34
star
14

timecop.cr

A testing library that allows "time travel," "freezing time," and "time acceleration". Inspired by the ruby-timecop library.
Crystal
19
star
15

crystal-kcov

Crystal
16
star
16

autolink.cr

🔗 Auto link for Crystal
Crystal
16
star
17

crystal-notifications

A library for notifications, this started as a port from ActiveSupport::Notifications
Crystal
16
star
18

bluetooth

Bluetooth Bluez binding in Crystal
Crystal
10
star
19

community

The crystal community
10
star
20

cr-config

An all-in-one configuration library to handle any possible configuration need
Crystal
9
star
21

snappy-crystal

Snappy bindings for Crystal
Crystal
6
star
22

cr-i18n

Crystal
3
star
23

kilt-components

Crystal
2
star