• Stars
    star
    316
  • Rank 132,587 (Top 3 %)
  • Language
    C
  • License
    zlib License
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The PolymurHash universal hash function.

PolymurHash

PolymurHash is a 64-bit universal hash function designed for use in hash tables. It has a couple desirable properties:

  • It is mathematically proven to have a statistically low collision rate. When initialized with an independently chosen random seed, for any distinct pair of inputs m and m' of up to n bytes the probability that h(m) = h(m') is at most n * 2^-60.2. This is known as an almost-universal hash function. In fact PolymurHash has a stronger property: it is almost pairwise independent. For any two distinct inputs m and m' the probability they hash to the pair of specific 64-bit hash values (x, y) is at most n * 2^-124.2.

  • It is very fast for short inputs, while being no slouch for longer inputs. On an Apple M1 machine it can hash any input <= 49 bytes in 21 cycles, and processes 33.3 GiB/sec (11.6 bytes / cycle) for long inputs.

  • It is cross-platform, using no extended instruction sets such as CLMUL or AES-NI. For good speed it only requires native 64 x 64 -> 128 bit multiplication, which almost all 64-bit processors have.

  • It is small in code size and space. Ignoring cross-platform compatibility definitions, the hash function and initialization procedure is just over 100 lines of C code combined. The initialized hash function uses 32 bytes of memory to store its parameters, and it uses only a small constant amount of stack memory when computing a hash.

To my knowledge PolymurHash is the first hash function to have all those properties. There are already very fast universal hash functions, such as CLHASH, umash, HalftimeHash, etc., but they all require a large amount of space (1KiB+) to store the hash function parameters, are not optimized for hashing small strings, and/or require specific instruction sets such as CLMUL. There are also very fast cross-platform hashes such as xxHash3, komihash or wyhash, but they do not come with proofs of universality. SipHash claims to be cryptographically secure, but is relatively slow, leading people to use reduced-round variants with unknown cryptanalysis, or to use a fast but insecure hash altogether.

Needless to say, PolymurHash passes the full SMHasher suite without any failures*. For the proof of the collision rate, see extras/universality-proof.md.

How to use it

PolymurHash is provided as a header-only C library polymur-hash.h. Simply include it and you are good to go. First the hash function must have its PolymurHashParams initialized from a seed, for which there are two functions. PolymurHash uses two 64-bit secrets, but provides a convenient function to expand a single 64-bit seed to that:

void polymur_init_params(PolymurHashParams* p, uint64_t k, uint64_t s);
void polymur_init_params_from_seed(PolymurHashParams* p, uint64_t seed);

The proof of almost universality applies to both, but for the proof of almost pairwise independence to hold you must provide 128 bits of entropy. Once initialization is complete, you can compute as many hashes as you want with it:

// Computes the full hash of buf. The tweak is added to the hash before final
// mixing, allowing different outputs much faster than re-seeding. No claims are
// made about the collision probability between hashes with different tweaks.
uint64_t polymur_hash(const uint8_t* buf, size_t len, const PolymurHashParams* p, uint64_t tweak);

The tweak allows you to have a different hash function for each hash table without having to calculate new parameters from another seed. This can prevent certain worst-case problems when inserting into a second hash table while iterating over another.

License

PolymurHash is available under the zlib license, included in polymur-hash.h.

How it works and why it's fast

At its core, PolymurHash is a Carter-Wegman style polynomial hash that treats the input as a series of coefficients for a polynomial, and then evaluates that polynomial at a secret key k modulo some prime p. The result of this universal hash is then fed into a Murmur-style permutation followed by the addition of a second secret key s. The polynomial part of the hash provides the universality guarantee, and the Murmur-style bit mixing part provides good bit avalanching and uniformity over the full 64 bit output. The final addition of s provides pairwise independence and resistance against cryptanalysis by making the otherwise trivially invertible permutation a lot harder to invert.

Now to make the above fast, a couple tricks are employed. The prime used is p = 2^61 - 1, a Mersenne prime. By expressing any number x as 2^61 a + b where b < 2^61, we notice that mod p this is equal to just a + b. With this we can do efficient reduction:

reduce(x) = (x >> 61) + (x & P611)

This allows us to keep the numbers small very efficiently. Furthermore we also limit k during initialization in such a way that we overflow 64/128 bit numbers less often and thus need to perform the above reduction less often.

We also use a trick similar to the one found in "Polynomial evaluation and message authentication" by Daniel J. Bernstein, where we forego computing an exact polynomial m[0]k + m[1]k^2 + m[2]k^3 + ... and instead compute any polynomial that is injective. That is, we're good as long as each input maps to a distinct polynomial. Then we can use the 'pseudo-dot-product' to compute

(k + m[0])*(k^2 + m[1]) + (k^3 + m[2])*(k^4 + m[3]) + ...

which allows us to process twice as much data per multiplication. It also allows us to pre-compute a couple powers of k to then use instruction-level parallelism to further increase throughput. The loop used for large inputs

m[i] = loadword(buf + 7*i) & ((1 << 56) - 1)
f =  (f   + m[6]) * k^7
f += (k   + m[0]) * (k^6 + m[1])
f += (k^2 + m[2]) * (k^5 + m[3])
f += (k^3 + m[4]) * (k^4 + m[5])
f = reduce(f)

processes 49 bytes of input using seven parallel 64-bit additions and binary ANDs, four parallel 64 x 64 -> 128 bit multiplications, three 128-bit additions and one 128-bit to 64-bit modular reduction. For small inputs we use custom injective polynomials that are fast to evaluate, filled with input from potentially overlapping reads to avoid branches on the input length.

Resistance against cryptanalysis

PolymurHash has strong collision guarantees for input chosen independently from its random seed. An interactive attacker however can craft its input based on earlier seen hashes. PolymurHash is not a cryptographically secure collision-resistant hash if the attacker can see (or worse, request) hash values at will. This is not a failure of the underlying hash, for example the well-known Poly1305 hash used to secure TLS suffers from the same problem. It solves this by hiding its hash values from attackers by adding an encrypted nonce acting as a one-time-pad.

PolymurHash is not intended to be used in a context where an attacker can see the hash values. Its main intended use case is as a hash function for DoS-resistant hash tables. However, a clever attacker might still acquire some information about the hash values by using side-channels such as hash table iteration order, or timing attacks on collisions.

To protect against this PolymurHash is structured in a way so that it is not trivial to invert the hash, nor to set up controlled informative experiments through side-channels. Roughly speaking, every bit of input is first mixed with the secret key k by modular multiplication and addition modulo a prime. This is a linear process, so to hide the linear structure of the underlying hash we pass the resulting value through a Murmur-style bit-mixing permutation which is highly non-linear. Finally, to ensure the permutation is not trivially invertible, we add the second secret key s.

The above process is by no means a strong multi-round cipher and would likely not hold up to proper cryptanalysis in a standard context. But the underlying structure is sound (e.g. the ChaCha cipher has the form mix(secret + input) + secret where mix is an unkeyed permutation), and I believe that extrapolating what little information you can get from side-channels to recover k, s to execute a HashDoS attack is difficult.

No weak keys

Polynomial hashing has one potential issue: weak keys. The multiplicative group of numbers mod p = 2^61-1 has subgroups of (much) smaller size. This means that for some keys k^(i + d) mod p == k^i mod p for small constant d. In other words, you can swap the 7-byte block at index i with the one at i + d without changing the hash value for such a weak key.

What is the probability you choose such a weak key if you pick a key at random from the 2^61 - 2 possible keys? If d is a divisor of p - 1, then there are totient(d) such keys with the above property. And of course, for the attack to work we need d <= n / 7. Here is a small table showing the probabilities:

Max length of input   Divisors   Weak key probability
64 bytes              8          2^-56.5
1 kilobyte            49         2^-50.5
1 megabyte            811        2^-37.3
1 gigabyte            3420       2^-26.1

These probabilities are very small. Nevertheless, it didn't sit well with me that the possibility existed of picking a key that has such a flaw. So, the seeding algorithm for PolymurHash makes sure to select a key that is a generator of the multiplicative group, that is, the only solution to k^(i + d) mod p == k^i mod p is d = 2^61 - 1.

In other words, PolymurHash does not suffer from weak keys. As a trade-off our key space is slightly more limited: in combination with the fact we want k^7 < 2^60 - 2^56 for efficiency reasons we get a total key space of totient(2^61 - 2) * (2^60 - 2^56) / (2^61 - 1) ~= 2^57.4. Additionally, initialization is also slower than simply selecting a random key, on an Apple M1 it takes ~300 cycles on average. If you feel the need to seed many different hashes, consider looking at the tweak parameter instead to see if it fits your criteria.

Acknowledgements

I am standing on the shoulders of giants, and in the well-researched field of (universal) hash functions there are a lot of them. J. Lawrence Carter, Mark N. Wegman, Ted Krovetz, Phillip Rogaway, Mikkel Thorup, Daniel J. Bernstein, Daniel Lemire, Martin Dietzfelbinger, Austin Appleby, many names come to mind. I have read many publications by them, and borrowed ideas from all of them.

More Repositories

1

pdqsort

Pattern-defeating quicksort.
C++
2,334
star
2

glidesort

A Rust implementation of Glidesort, my stable adaptive quicksort/mergesort hybrid sorting algorithm.
Rust
1,571
star
3

slotmap

Slotmap data structure for Rust
Rust
1,104
star
4

ed25519

Portable C implementation of Ed25519, a high-speed high-security public-key signature system.
C
487
star
5

dev-on-windows

An opiniated guide to set up a development environment on Windows.
226
star
6

matt-parker-five-letter-clique

Rust
45
star
7

devector

Resizable contiguous sequence container with fast appends on either end.
C++
37
star
8

recursive

Easy recursion in Rust, without stack overflows.
Rust
28
star
9

num-ord

A wrapper type for cross-type numeric comparisons.
Rust
27
star
10

peekread

Rust crate for making Read streams peekable.
Rust
26
star
11

ReducePing

ReducePing is a small utility to tune the "TcpAckFrequency" setting of Windows to get better latency in TCP networked games.
C
20
star
12

aoc2022

My Advent of Code 2022 solutions, in Rust.
Rust
20
star
13

golf-cpu

Reference implementation of the GOLF CPU.
Python
17
star
14

pyglfw

Python bindings for GLFW
C
16
star
15

PyGG2

A Python rewrite of Gang Garrison 2
Python
14
star
16

pyth5

Clean-sheet rewrite of Pyth.
Python
13
star
17

bitwise-binary-search

Accompanying code for https://orlp.net/blog/bitwise-binary-search/.
C++
12
star
18

dotfiles

My dotfiles.
HTML
10
star
19

pygrafix

pygrafix is a Python/Cython hardware-accelerated 2D graphics library.
C
10
star
20

xcharter

XCharter font build.
Python
10
star
21

vim-bunlink

A replacement for :bdelete that decouples the concept of 'deleting a buffer' from 'closing a window'.
Vim Script
9
star
22

libop

My personal C++ library.
Objective-C
8
star
23

aoc2023

My Advent of Code 2023 solutions, in Rust.
Rust
7
star
24

vim-quick-replace

A quick find/replace plugin for Vim.
Vim Script
6
star
25

secudht

A secure design and implementation of the Kademlia DHT.
C
6
star
26

sum-bench

This is the code accompanying https://orlp.net/blog/taming-float-sums/.
Rust
6
star
27

multilive

Multiple poe.trade live searches at once.
JavaScript
5
star
28

iwyu

A small utility that helps you include the right C++ headers.
Python
4
star
29

aoc2021

My Advent of Code 2021 solutions, in Rust.
Rust
4
star
30

commonc

Various common C algorithms and things
C
3
star
31

stable-alloc-shim

A stable Rust shim for the unstable Allocator API.
Rust
3
star
32

synth

A real-time self-hosting MIDI software synth written in Rust from scratch.
Rust
3
star
33

qcon

Quake-style console for windows.
AutoHotkey
3
star
34

ncUI

A lightweight user interface for World of Warcraft - DEAD
Lua
3
star
35

deps

deps is a minimalistic building system for any process which consists of smaller processes that depend on each other.
Python
2
star
36

euler

My solutions for Project Euler.
C++
1
star
37

osrs-fragment-calc

OSRS fragment set calculator
HTML
1
star
38

hades-boons

Python
1
star
39

picture-in-picture

Java
1
star
40

Robotics2020-Final

This repository hosts my final project for the 2020 Robotics course at Leiden university.
JavaScript
1
star
41

boost-win-builds

Some Windows builds for Boost.
Shell
1
star
42

lolpriority

Little tool for Leage of Legends, automatically puts the LoLClient.exe process on low priority when the in-game client is open for extra performance.
C
1
star
43

amazons

Web-based Game of the Amazons
CSS
1
star
44

distris

Distributed socializing.
C++
1
star
45

p

C++
1
star
46

poe-trade-qol

Quality of life for PoE's trade site.
JavaScript
1
star
47

poedps

A Path of Exile weapon DPS calculator with optional crafting.
JavaScript
1
star
48

StrongholdCoords

An application for finding out the exact coordinates of end portal frames in Minecraft seeds.
Java
1
star
49

unite-git-repo

A source for Unite.vim that lists all files from the git repository root.
Vim Script
1
star
50

pyflat

pyflat is a Python hardware-accelerated 2D graphics library.
C
1
star