• Stars
    star
    20
  • Rank 1,083,714 (Top 22 %)
  • Language
    Elixir
  • License
    MIT License
  • Created almost 8 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast HyperLogLog implementation for Elixir/Erlang

Hypex

Build Status Coverage Status Hex.pm Version Documentation

Hypex is a fast HyperLogLog implementation in Elixir which provides an easy way to count unique values with a small memory footprint. This library is based on the paper documenting the algorithm written by Philippe Flajolet et al.

Installation

Hypex is available on Hex. You can install the package via:

  1. Add hypex to your list of dependencies in mix.exs:
def deps do
  [{ :hypex, "~> 1.1" }]
end
  1. Ensure hypex is started before your application:
def application do
  [applications: [:hypex]]
end

Usage

Hypex is extremely straightforward to use, you simply create a new Hypex instance and start adding values to it:

iex> hypex = Hypex.new(4)
{Hypex.Array, 4, {:array, 16, 0, 0, 100}}
iex> hypex = Hypex.update(hypex, "my term")
{Hypex.Array, 4,
 {:array, 16, 0, 0,
  {10, {0, 2, 0, 0, 0, 0, 0, 0, 0, 0}, 10, 10, 10, 10, 10, 10, 10, 10, 10}}}
iex> hypex |> Hypex.cardinality |> round
1

The 4 being passed to Hypex.new/1 is the width which determines the underlying memory structure of a Hypex instance. This value can be within the range 4 <= width <= 16, per the HyperLogLog algorithm. If you don't provide a width, it defaults to 16. Be aware that you should typically scale this number higher based upon the more unique values you expect to see.

For any other examples of how to use Hypex, please read the documentation.

Memory Optimization

As of v1.1.0, the default implementation has moved from a Bitstring to an Erlang Array. This is mainly due to Arrays performing faster on all operations when compared with Bitstrings. However in the case that you're operating in a low-memory environment (or simply want predictable memory usage), you might still wish to use the Bitstring implementation. You can do this by simply using Hypex.new(4, Bitstring) when creating a Hypex.

A rough memory estimate (in bytes) for a Bitstring Hypex can be calculated using the formula ((2 ^ width) * width) / 8 - although this will only include the memory of the registers and not the rest of the tuple structure (which should be minimal). This means that using the highest width available of 16, your memory usage will still only be 131,072 bytes.

At this point I don't know of a good way to measure the size of the Array implementation, but a rough estimate would suggest that it's probably within the range of 6-8 times more memory (if anyone can help measure, I'd appreciate it). Still, this amount of memory shouldn't pose an issue for most systems, and the throughput likely matters more to most users.

Rough Benchmarks

Below are some rough benchmarks for Hypex instances with the different underlying structures. Note that the update/2 tests are inserting a unique value - in the case a duplicate value is inserted, the operation is typically constant across widths at under 0.5 µs/op.

These tests use a width of 4, so it should be noted that larger widths will have slower performance. However, these benchmarks are for reference only and you should gauge which widths work best for the data you're operating with, rather than the performance shown below.

## Array Hypex

Array Hypex.new/1                0.53 µs/op
Array Hypex.update/2             2.13 µs/op
Array Hypex.cardinality/1        6.87 µs/op
Array Hypex.merge/2              16.61 µs/op

## Bitstring Hypex

Bitstring Hypex.new/1            0.46 µs/op
Bitstring Hypex.update/2         2.13 µs/op
Bitstring Hypex.cardinality/1    6.70 µs/op
Bitstring Hypex.merge/2          8.69 µs/op

Contributions

If you feel something can be improved, or have any questions about certain behaviours or pieces of implementation, please feel free to file an issue. Proposed changes should be taken to issues before any PRs to avoid wasting time on code which might not be merged upstream.

If you do make changes to the codebase, please make sure you test your changes thoroughly, and include any unit tests alongside new or changed behaviours. Hypex currently uses the excellent excoveralls to track code coverage.

$ mix test
$ mix coveralls
$ mix coveralls.html && open cover/excoveralls.html

More Repositories

1

cachex

A powerful caching library for Elixir with support for transactions, fallbacks and expirations
Elixir
1,441
star
2

runiq

An efficient way to filter duplicate lines from input, à la uniq.
Rust
203
star
3

local-cluster

Easy local cluster creation for Elixir to aid in unit testing
Elixir
193
star
4

eternal

Keep your ETS tables running forever using bouncing GenServers
Elixir
84
star
5

bytelines

Read input lines as byte slices for high efficiency
Rust
61
star
6

jen

A fast utility to generate fake/test documents based on a template
Rust
59
star
7

sleeplocks

BEAM friendly spinlocks for Elixir/Erlang
Erlang
53
star
8

s3-utils

Utilities and tools based around Amazon S3 to provide convenience APIs in a CLI
Rust
53
star
9

stash

A small and user-friendly ETS wrapper for caching in Elixir
Elixir
52
star
10

limber

A simple (but quick) tool for backing up Elasticsearch documents
Rust
50
star
11

retainer

Minimal async cache in Rust with support for key expirations
Rust
49
star
12

s3-meta

Gather metadata about your S3 buckets
Rust
49
star
13

tiny

A small, fast and fully compliant JSON parser in Elixir
Elixir
47
star
14

s3-concat

Concatenate Amazon S3 files remotely using flexible patterns
Rust
38
star
15

usher

Parameterized routing for generic resources in Rust
Rust
37
star
16

efflux

Easy Hadoop Streaming and MapReduce interfaces in Rust
Rust
34
star
17

siphash-java

SipHash in Java; zero-allocation and streaming implementations
Java
29
star
18

expool

Extremely simple Process pooling and task submission in Elixir
Elixir
25
star
19

vessel

Elixir MapReduce interfaces with Hadoop Streaming integration
Elixir
23
star
20

neek

A simple way to filter duplicate lines from a list, à la uniq.
JavaScript
22
star
21

it.each

A Mocha extension allowing for test looping with (a)sync calls
JavaScript
21
star
22

siphash-elixir

An Elixir implementation of the SipHash cryptographic hash family
Elixir
18
star
23

detox

Quickly clean up your development directories before backups
Rust
18
star
24

RSSDemo

A simple demonstration of an RSS reader for Android.
Java
16
star
25

capture-console

Simple and easy stdio capture for Node.js
JavaScript
16
star
26

sentix

A cross-platform file watcher for Elixir based on fswatch.
Elixir
15
star
27

unsafe

Generate unsafe (!) bindings for Elixir functions
Elixir
12
star
28

siphash-cpp

A small C++ implementation of SipHash using a streaming algorithm with CLI
C++
12
star
29

dot-notes-js

Simple dot/bracket notation parsing/conversion for JSON
JavaScript
12
star
30

jumper

Jump consistent hash implementation in Elixir (without NIFs)
Elixir
12
star
31

rabbitmq-delimiter-exchange

Performant RabbitMQ exchange with support for multiple routing keys per message
Makefile
12
star
32

deppie

Elixir's coolest deprecation logger
Elixir
11
star
33

pre_plug

Guarantee your Elixir Plugs execute on every request
Elixir
9
star
34

docker-geoipupdate

Minimal container for updating GeoIP databases on your host system
Shell
8
star
35

global-flags

Write once global flags for Erlang and Elixir.
Erlang
8
star
36

kscrash-converter

Converts KSCrash JSON output to Apple format
JavaScript
7
star
37

gen_delegate

Macro based delegates for GenServer functions
Elixir
7
star
38

dot-notes-elixir

Simple dot/bracket notation parsing/conversion for Maps/Lists
Elixir
7
star
39

andrest

Basic REST protocol implementation for Android.
Java
6
star
40

phoenix_mongo

An example of setting up Phoenix to use the unofficial Mongo Ecto library
CSS
6
star
41

rabbitmq-manager

A RabbitMQ monitoring system written in NodeJS.
JavaScript
6
star
42

elasticsearch-bulk-operator

Elasticsearch Bulk API bindings using the Java REST client
Java
5
star
43

lowdb-titanium-adapter

Titanium SDK adapter for the LowDB embedded database
JavaScript
4
star
44

loki-titanium-adapter

Titanium SDK adapter for the LokiJS embedded database
JavaScript
4
star
45

dot-notes-java

Jackson JSON flattening/iteration/inflation using simple key parsing
Java
4
star
46

siphash-c

A C (89) implementation of the SipHash cryptographic hash family, using a single pass algorithm
C
3
star
47

child-spec-compat

Compatibility macros for Elixir v1.5+ child specifications
Elixir
3
star
48

global-lazy

Lazy global initialization for Elixir, without state
Elixir
3
star
49

argle

Convenient argument shifting because you know you need it
JavaScript
2
star
50

dropwizard-environment-substitutor

Global environment overrides for Dropwizard configuration
Java
2
star
51

native-hashset

A native HashSet implementation for Node.js and io.js.
C++
2
star
52

luger

Handy logging plug for Elixir with IP and status support
Elixir
2
star
53

dottie

A small library for dealing with JSON dot notation.
Java
2
star
54

dep-validate

Dependency verification for npm packages with Gulp support
JavaScript
1
star
55

archive_bundle

A small task to bundle dependencies into an archive build in Elixir
Elixir
1
star
56

granular-logger

A Winston wrapper to allow time-based log files
JavaScript
1
star
57

noddle

Simple loading of the current project workspace into a REPL
JavaScript
1
star
58

expansion-js

Simple library for NodeJS to expand (and make it easier to expand) the functionality of objects
JavaScript
1
star
59

neps

Node.js REPL with package installation and automatic loading support
JavaScript
1
star
60

appddl

Extremely small CLI tool used to update agent archives from AppDynamics
Rust
1
star
61

svn-to-git

Scripts to convert a Subversion repository to Git.
Shell
1
star
62

json-output-format

JSON output formats for Hadoop MapReduce jobs
Java
1
star
63

projects

A collection of notes and whatnot in regards to either planned or existing projects
JavaScript
1
star
64

waterline-express-example

A base MVC layout using Waterline and Express
JavaScript
1
star