CuckooFilter
Pure Ruby implementation of the Cuckoo Filter - a probabilistic datastructure which is objectively better than Bloom Filters for set-membership queries.
What the heck is a Cuckoo Filter?
It is a probabilistic data structure which is used to determine set-membership, i.e. finding out if a given element exists in a given set.
For practical uses think - checking if an item is present in a cache, if it is present in a database or YouTube trying to figure out if you have already watched some video before it is recommended to you.
It is closely related to Bloom Filters - but outshines them in performance and space efficiency. You can read more from the original paper that introduced it in 2014. Yes, it is that young!
Installation
Add this line to your application's Gemfile:
gem 'cuckoo_filter'
And then execute:
$ bundle
Or install it yourself as:
$ gem install cuckoo_filter
Usage
# Create a filter with 1024 (next power of 2) slots with each bucket holding 4
cf = CuckooFilter.make(size: 1000, kicks: 500, bucket_size: 4)
# => returns a CuckooFilter::CuckooFilter instance
# Insert an element into the filter
cf.insert("foo")
# => true
# Lookup the existence of an element
cf.lookup("foo")
# => true
cf.lookup("bar")
# => false
# Delete an existing element
cf.delete("foo")
# => true
Frequenty Anticipated Questions
-
Q: Is this useful?
A: Yes, but have a look at the benchmarks below to see if it satisfies your performance requirements before using it in a real world app. -
Q: Why did you make this?
A: For fun and education, of course! Although that is not stopping it from having any real world usage! -
Q: But why Ruby? Why not Go/Rust/Elixir/FooBar
A: Because why not? I like Ruby and I couldn't find a full blown implementation in it. -
Q: Can I use it in a real project?
A: If it satisfies your criteria, then why not? Have a look at the benchmarks for performance stats. Let me know if you do!
Benchmarks
You can run the benchmark script to see the iterations per second performance of different methods on an initially half full filter.
$ ruby test/benchmark.rb
Setting up for benchmarking...
Done.
Warming up --------------------------------------
# warmup stats -> can be ignored
Calculating -------------------------------------
Iterations per second - Insertions
214.863M (±47.1%) i/s - 1.214B in 9.859735s
Iterations per second - Lookups
355.392M (±10.9%) i/s - 3.371B in 9.697599s
Iterations per second - Deletions
343.890M (± 9.6%) i/s - 3.286B
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake test
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/pawandubey/cuckoo_filter.
License
Copyright 2017 Pawan Dubey
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.