• Stars
    star
    302
  • Rank 138,002 (Top 3 %)
  • Language
    Go
  • License
    MIT License
  • Created almost 7 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

HyperMinHash: Bringing intersections to HyperLogLog

HyperMinSketch

Besides being a compact and pretty speedy HyperLogLog implementation for cardinality counting, this modified HyperLogLog allows intersection and similarity estimation of different HyperLogLogs.

Details

A simple implementation of HyperLogLog (LogLog-Beta to be specific):

  • 16 bit registers instead of 6 bit, the new 10 bit are for b-bit signatures
  • Similarity function estimates Jaccard indices (a number between 0-1) of 0.01 for set cardinalities on the order of 1e9 with accuracy around 5%
  • Intersection applies the Jaccard index on the union of the sets to return the intersecting set cardinality

The work is based on "HyperMinHash: Jaccard index sketching in LogLog space - Yun William Yu, Griffin M. Weber"

Example Usage

sk1 := hyperminhash.New()
sk2 := hyperminhash.New()

for i := 0; i < 10000; i++ {
    sk1.Add([]byte(strconv.Itoa(i)))
}

sk1.Cardinality() // 10001 (should be 10000)

for i := 3333; i < 23333; i++ {
    sk2.Add([]byte(strconv.Itoa(i)))
}

sk2.Cardinality()     // 19977 (should be 20000)
sk1.Similarity(sk2)   // 0.284589082 (should be 0.2857326533)
sk1.Intersection(sk2) // 6623 (should be 6667)

sk1.Merge(sk2)
sk1.Cardinality() // 23271 (should be 23333)

Results

Max Cardinality 1e3

Set1 HLL1 Set2 HLL2 S1 ∪ S2 HLL1 ∪ HLL2 S1 ∩ S2 HLL1 ∩ HLL2
350 352 752 752 831 832 271 (32.611312%) 274 (32.932692%)
746 748 591 590 834 835 503 (60.311751%) 501 (60.000000%)
248 248 789 791 897 899 140 (15.607581%) 144 (16.017798%)
9 9 818 818 824 825 3 (0.364078%) 3 (0.363636%)
408 411 412 408 771 771 49 (6.355383%) 47 (6.095979%)

Max Cardinality 1e4

Set1 HLL1 Set2 HLL2 S1 ∪ S2 HLL1 ∪ HLL2 S1 ∩ S2 HLL1 ∩ HLL2
2126 2138 1162 1158 3063 3060 225 (7.345739%) 223 (7.287582%)
7767 7706 7054 7064 8889 8887 5932 (66.734166%) 5888 (66.254079%)
842 844 5183 5135 5880 5842 145 (2.465986%) 135 (2.310852%)
6833 6791 664 666 7410 7345 87 (1.174089%) 89 (1.211709%)
1814 1820 6214 6169 7697 7639 331 (4.300377%) 320 (4.189030%)

Max Cardinality 1e5

Set1 HLL1 Set2 HLL2 S1 ∪ S2 HLL1 ∪ HLL2 S1 ∩ S2 HLL1 ∩ HLL2
29667 29540 88700 88167 92444 91667 25923 (28.041842%) 25036 (27.311901%)
79242 78731 30216 30137 83502 82953 25956 (31.084285%) 25995 (31.337022%)
57830 57223 79550 79194 82112 81595 55268 (67.308067%) 54684 (67.018812%)
64610 63501 21696 21729 75895 74816 10411 (13.717636%) 10083 (13.477064%)
92204 91453 96417 95556 165025 163370 23596 (14.298440%) 24130 (14.770154%)

Max Cardinality 1e6

Set1 HLL1 Set2 HLL2 S1 ∪ S2 HLL1 ∪ HLL2 S1 ∩ S2 HLL1 ∩ HLL2
150443 149810 974366 979514 1088517 1096991 36292 (3.334077%) 37417 (3.410876%)
156337 155347 19083 19070 167353 165433 8067 (4.820350%) 8017 (4.846071%)
800969 802044 51053 51244 851388 853396 634 (0.074467%) 511 (0.059878%)
176155 174707 520111 516822 570092 569289 126174 (22.132217%) 123766 (21.740452%)
485954 481362 967341 972651 1081990 1091296 371305 (34.316861%) 376007 (34.455088%)

Max Cardinality 1e7

Set1 HLL1 Set2 HLL2 S1 ∪ S2 HLL1 ∪ HLL2 S1 ∩ S2 HLL1 ∩ HLL2
7132942 7150720 122116 121539 7243153 7261709 11905 (0.164362%) 12550 (0.172824%)
8646240 8649049 1277784 1295017 9821480 9854242 102544 (1.044079%) 99163 (1.006298%)
4192390 4164637 2788913 2779975 4526476 4499897 2454827 (54.232630%) 2454356 (54.542493%)
9803344 9826412 1705700 1715798 10255010 10262719 1254034 (12.228501%) 1273821 (12.412120%)
1308849 1322604 9940327 9971519 11179030 11201850 70146 (0.627478%) 80717 (0.720568%)

Max Cardinality 1e8

Set1 HLL1 Set2 HLL2 S1 ∪ S2 HLL1 ∪ HLL2 S1 ∩ S2 HLL1 ∩ HLL2
13237748 13298469 57073758 57124720 59474437 59394847 10837069 (18.221390%) 11143669 (18.762013%)
90757994 88576114 5717797 5701796 95061178 93016636 1414613 (1.488108%) 1350058 (1.451416%)
60150663 60033013 79238333 77672994 110438475 108311818 28950521 (26.214162%) 27666946 (25.543792%)
30187492 30718889 37756209 37153655 67443566 66938074 500135 (0.741561%) 447406 (0.668388%)
53196095 53461989 48344583 47535284 93284291 91321031 8256387 (8.850780%) 8036467 (8.800237%)

More Repositories

1

hyperloglog

HyperLogLog with lots of sugar (Sparse, LogLog-Beta bias correction and TailCut space reduction) brought to you by Axiom
Go
920
star
2

next-axiom

The official Next.js library for Axiom.
TypeScript
353
star
3

rust-cuckoofilter

Cuckoo Filter: Practically Better Than Bloom (In Rust)
Rust
270
star
4

quantiles

Optimal Quantile Approximation in Streams
Go
162
star
5

axiom-js

Official language bindings and library extensions for Axiom
TypeScript
85
star
6

zig-hyperloglog

Zig library for HyperLogLog estimation
Zig
82
star
7

axiom-node

[DEPRECATED] Use axiomhq/axiom-js instead.
TypeScript
78
star
8

axiom-go

Official Go bindings for the Axiom API
Go
47
star
9

cli

The power of Axiom on the command line.
Go
41
star
10

flipcounter

A counter data structure that knows when to start estimating to save space
Go
36
star
11

axiom-rs

Official Rust bindings for the Axiom API
Rust
31
star
12

axiom-py

The official Python bindings for the Axiom API
Python
23
star
13

axiom-cloudflare-workers

Send logs from Cloudflare Workers to Axiom
JavaScript
22
star
14

tracing-axiom

The official Rust tracing layer for Axiom
Rust
18
star
15

puppeteer-request-intercepter

Intercept API Requests and return Mocked Data
TypeScript
16
star
16

prisma-axiom

Axiom observability for Prisma
TypeScript
16
star
17

variance

Go implementation of variance's method for one-pass variance computation with D. H. D. West improved methods which features merging of several multiple sets of statistics and adding weighted values.
Go
16
star
18

hypertwobits

HyperTwoBits implementation
Rust
12
star
19

awesome-axiom

A curated list of awesome Axiom Platform, libraries, open source repos, guides, blogs, Documentation and other resources.
12
star
20

axiom-demo

Take a look at Axiom on your local machine.
Shell
11
star
21

ngbuild

Dream builders 😴💭
Go
9
star
22

axiom-lambda-extension

Ingest logs and platform events from your lambda functions
Go
9
star
23

axiom-syslog-proxy

A syslog push interface to Axiom.
Go
8
star
24

axiom-honeycomb-proxy

A log forwarder/multiplexer for Axiom and Honeycomb.
Go
7
star
25

deno-client

Minimal deno library for sending events and logs to Axiom
TypeScript
6
star
26

axiom-ai

The official package to send events from AI libraries to Axiom
TypeScript
6
star
27

axiom-grafana

The official Axiom datasource plugin for Grafana.
TypeScript
5
star
28

splitblockbloom

Go
5
star
29

pkg

Commonly used Go packages for Axiom projects.
Go
4
star
30

axiom-loki-multiplexer

A push interface to Axiom via Loki endpoint.
Go
4
star
31

axiom-elements

TypeScript
4
star
32

topkapi

Topkapi: Parallel and Fast Sketches for Finding Top-K Frequent Elements
Go
3
star
33

axiom-cloudwatch-forwarder

Forward CloudWatch Logs to Axiom.
Python
3
star
34

randhn

Random new or top 500 HN story
TypeScript
3
star
35

setup-axiom

Set up a local Axiom stack for testing your integration.
JavaScript
2
star
36

annotation-action

This action allows you to create an annotation in Axiom.
TypeScript
2
star
37

terraform-provider-axiom

Axiom Terraform Provider
Go
2
star
38

homebrew-tap

Collection of Homebrew formulas for Axiom, Inc. open source projects.
Ruby
1
star
39

golang-sync-bench

Benchmarking Synchronization Primitives in Go
Go
1
star
40

axiom-nomad

Axiom 🤝 HashiCorp
HCL
1
star
41

axiom-helm-charts

Axiom Helm Charts
Smarty
1
star
42

logmanager

Yet another Go logging library.
Go
1
star
43

terraform-aws-axiom

Setup required infrastructure and install Axiom Enterprise on AWS using Terraform.
1
star
44

monaco-kusto

JavaScript
1
star
45

terraform-aws-axiom-cloudwatch-forwarder

Official Terraform modules for Axiom Cloudwatch Forwarder
HCL
1
star