• Stars
    star
    672
  • Rank 66,609 (Top 2 %)
  • Language
    C
  • License
    The Unlicense
  • Created about 6 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Automated integer hash function discovery

Hash Function Prospector

This is a little tool for automated integer hash function discovery. It generates billions of integer hash functions at random from a selection of nine reversible operations (also). The generated functions are JIT compiled and their avalanche behavior is evaluated. The current best function is printed out in C syntax.

The avalanche score is the number of output bits that remain "fixed" on average when a single input bit is flipped. Lower scores are better. Ideally the score is 0 — e.g. every output bit flips with a 50% chance when a single input bit is flipped.

Prospector can generate both 32-bit and 64-bit integer hash functions. Check the usage (-h) for the full selection of options. Due to the JIT compiler, only x86-64 is supported, though the functions it discovers can, of course, be used anywhere.

Article: Prospecting for Hash Functions

Discovered Hash Functions

There are two useful classes of hash functions discovered by the prospector and the other helper utilities here. Both use an xorshift-multiply-xorshift construction, but with a different number of rounds.

Two round functions

Update: TheIronBorn has used combinatorial optimization to discover the best known parameters for this construction:

[16 21f0aaad 15 d35a2d97 15] = 0.10760229515479501

This 32-bit, two-round permutation has a particularly low bias and even beats the venerable MurmurHash3 32-bit finalizer by a tiny margin. The hash function construction was discovered by the prospector, then the parameters were tuned using hill climbing and a genetic algorithm.

// exact bias: 0.17353355999581582
uint32_t
lowbias32(uint32_t x)
{
    x ^= x >> 16;
    x *= 0x7feb352d;
    x ^= x >> 15;
    x *= 0x846ca68b;
    x ^= x >> 16;
    return x;
}

// inverse
uint32_t
lowbias32_r(uint32_t x)
{
    x ^= x >> 16;
    x *= 0x43021123;
    x ^= x >> 15 ^ x >> 30;
    x *= 0x1d69e2a5;
    x ^= x >> 16;
    return x;
}

More 2-round constants with low bias, some even better than lowbias32:

[15 d168aaad 15 af723597 15] = 0.15983776156606694
[17 9e485565 16 ef1d6b47 16] = 0.16143129787074881
[16 604baa5d 15 43d6ce97 15] = 0.16491052655811722
[16 a812d533 15 b278e4ad 17] = 0.16540778981744320
[16 9c8f2d35 15 5d1346b5 17] = 0.16835348823718840
[16 88c0a94b 14 9d06da59 17] = 0.16898511658356749
[16 a52fb2cd 15 551e4d49 16] = 0.17162579707098322
[16 b237694b 15 eb5b4593 15] = 0.17274184020173433
[16 7feb352d 15 846ca68b 16] = 0.17353355999581582
[16 4bdc9aa5 15 2729b469 16] = 0.17355424787865850
[16 dc63b4d3 15 2c32b9a9 15] = 0.17368589564800074
[16 e02bd533 15 0364c8ad 17] = 0.17447893149410759
[16 603a32a7 15 5a522677 15] = 0.17514135907753242
[16 ac10d4eb 15 9d51b169 16] = 0.17676510450127819
[15 f15f5959 14 7db29359 16] = 0.18103205436627479
[16 83747333 14 aa256573 16] = 0.18105722344231542
[16 be8b6ca7 14 6dd624b5 16] = 0.18223928664971270
[17 7186cd35 15 fe6bba73 15] = 0.18312741727971640
[16 93f2552b 15 959b4a4d 15] = 0.18360629205797341
[16 df892d4b 15 3c2da6b3 16] = 0.18368195486921446
[15 49c34cd3 13 e7418ca7 16] = 0.18400092964673831
[15 4811acab 15 5591acd7 16] = 0.18522661033580071
[16 dc85aaa7 15 6658a5cb 15] = 0.18577280285788791
[16 1ec9b4db 15 3224d38d 17] = 0.18631684392389897
[16 8ee0d535 15 5dc6b5af 15] = 0.18664478683752250
[16 462daaad 15 0a36c95d 16] = 0.18674876992866513
[16 17cdd657 15 a426cb25 15] = 0.18995262675473334
[16 ab39aacb 15 a1b5d19b 15] = 0.19045785238099658
[17 cd8512ad 15 b95c5a73 15] = 0.19050717016846502
[16 aecc96b5 15 f64dcd47 15] = 0.19077817816874504
[15 2548acd5 15 0b39d397 16] = 0.19121161052714156
[15 7f19c559 15 b356358d 16] = 0.19198007174447981
[16 4ffcab35 15 e98db28b 16] = 0.19423994132339928
[15 1216ccb5 15 3abcdca9 15] = 0.19426091938816648
[16 97219aad 15 ab46b735 15] = 0.19536391240344408
[16 c845a997 15 f214db9b 17] = 0.19553179377831409
[15 3a7ba96b 13 5e919299 16] = 0.19563436462680908
[16 c3d9a965 16 362e4b47 15] = 0.19575424692659107
[17 179cd515 15 4c495d47 15] = 0.19608530402798924
[16 5dce3553 15 a655d8e9 15] = 0.19621753012889542
[17 88a5ad35 16 96338b27 16] = 0.19653922266398804
[17 0364d657 15 ac2a34c5 15] = 0.19665754791333651
[16 3c9aa9ab 16 051369d7 16] = 0.19687211117412906
[17 0ee6d967 15 9c8a4a33 16] = 0.19722490309575344
[16 b921a6cb 14 30b5a6d1 16] = 0.19745192295417058
[18 a136aaad 16 9f6d62d7 17] = 0.19768193144773874
[16 0ae84d3b 15 3b9d4e5b 17] = 0.19776257374279985
[17 24f4d2cd 15 1ba3b969 16] = 0.19789489706453650
[16 418fb5b3 15 8cf3539b 16] = 0.19817117175199098
[16 f0ae2ad7 15 8965d939 16] = 0.19881758420284917
[17 9bde596b 16 1c9e9647 16] = 0.19882570872036193
[16 bd10754b 14 35a29b0d 16] = 0.19885203058591913
[17 78d31553 15 c547ac65 15] = 0.19918133404528665
[15 81aab34d 15 18e746a3 15] = 0.19938572052445763
[16 054335ab 15 146da68b 16] = 0.19943843016872725
[17 a1c76a55 16 5ca46b97 16] = 0.19959562213253398
[15 c62f4d53 14 62b8a46b 16] = 0.19973996656987172
[16 6872cd2d 15 f4a0d975 17] = 0.19992260539370590

This next function was discovered using only the prospector. It has a bit more bias than the previous function.

// exact bias: 0.34968228323361017
uint32_t
prospector32(uint32_t x)
{
    x ^= x >> 15;
    x *= 0x2c1b3c6d;
    x ^= x >> 12;
    x *= 0x297a2d39;
    x ^= x >> 15;
    return x;
}

To use the prospector search randomly for alternative multiplication constants, run it like so:

$ ./prospector -p xorr:15,mul,xorr:12,mul,xorr:15

Three round functions

Another round of multiply-xorshift in this construction allows functions with carefully chosen parameters to reach the theoretical bias limit (bias = ~0.021). For example, this hash function is indistinguishable from a perfect PRF (e.g. a random permutation of all 32-bit integers):

// exact bias: 0.020888578919738908
uint32_t
triple32(uint32_t x)
{
    x ^= x >> 17;
    x *= 0xed5ad4bb;
    x ^= x >> 11;
    x *= 0xac4c1b51;
    x ^= x >> 15;
    x *= 0x31848bab;
    x ^= x >> 14;
    return x;
}

// inverse
uint32_t
triple32_r(uint32_t x)
{
    x ^= x >> 14 ^ x >> 28;
    x *= 0x32b21703;
    x ^= x >> 15 ^ x >> 30;
    x *= 0x469e0db1;
    x ^= x >> 11 ^ x >> 22;
    x *= 0x79a85073;
    x ^= x >> 17;
    return x;
}

More 3-round constants with low bias:

[17 ed5ad4bb 11 ac4c1b51 15 31848bab 14] = 0.020888578919738908
[16 aeccedab 14 ac613e37 16 19c89935 17] = 0.021246568167078764
[16 236f7153 12 33cd8663 15 3e06b66b 16] = 0.021280991798512679
[18 4260bb47 13 27e8e1ed 15 9d48a33b 15] = 0.021576730651802156
[17 3f6cde45 12 51d608ef 16 6e93639d 17] = 0.021772288363808408
[15 5dfa224b 14 4bee7e4b 17 930ee371 15] = 0.02184521628884813
[17 3964f363 14 9ac3751d 16 4e8772cb 17] = 0.021883292578109576
[16 66046c65 14 d3f0865b 16 f9999193 16] = 0.0219446068365007
[16 b1a89b33 14 09136aaf 16 5f2a44a7 15] = 0.021998624107282542
[16 24767aad 12 daa18229 16 e9e53beb 16] = 0.022043911220395354
[15 42f91d8d 14 61355a85 15 dcf2a949 14] = 0.022052539152635078
[15 4df8395b 15 466b428b 16 b4b2868b 16] = 0.022140187420461286
[16 2bbed51b 14 cd09896b 16 38d4c587 15] = 0.022159936298777144
[16 0ab694cd 14 4c139e47 16 11a42c3b 16] = 0.02220928191220355
[17 7f1e072b 12 8750a507 16 ecbb5b5f 16] = 0.022283743052847804
[16 f1be7bad 14 73a54099 15 3b85b963 15] = 0.022316544125749647
[16 66e756d5 14 b5f5a9cd 16 84e56b11 16] = 0.022372957847491555
[15 233354bb 15 ce1247bd 16 855089bb 17] = 0.022406591070966285
[16 eb6805ab 15 d2c7b7a7 16 7645a32b 16] = 0.022427060650927547
[16 8288ab57 14 0d1bfe57 16 131631e5 16] = 0.022431656871313443
[16 45109e55 14 3b94759d 16 adf31ea5 17] = 0.022436433678417977
[15 26cd1933 14 e3da1d59 16 5a17445d 16] = 0.022460520416491526
[16 7001e6eb 14 bb8e7313 16 3aa8c523 15] = 0.022491767264054854
[16 49ed0a13 14 83588f29 15 658f258d 15] = 0.022500668856510898
[16 6cdb9705 14 4d58d2ed 14 c8642b37 16] = 0.022504626537729222
[16 a986846b 14 bdd5372d 15 ad44de6b 17] = 0.022528238323120016
[16 c9575725 15 9448f4c5 16 3b7a5443 16] = 0.022586511310042686
[15 fc54c453 13 08213789 15 669f96eb 16] = 0.022591114646032095
[16 d47ef17b 14 642fa58f 16 a8b65b9b 16] = 0.022600633971701509
[15 00bfaa73 14 8799c69b 16 731985b1 16] = 0.022645866629596379
[16 953a55e9 15 8523822b 17 56e7aa63 15] = 0.022667180032713324
[16 a3d7345b 15 7f41c9c7 16 308bd62d 17] = 0.022688845770122031
[16 195565c7 14 16064d6f 16 0f9ec575 15] = 0.022697810688752193
[16 13566dbb 14 59369a03 15 990f9d1b 16] = 0.022712430070797596
[16 8430cc4b 15 a7831cbd 15 c6ccbd33 15] = 0.022734765033419774
[16 699f272b 14 09c01023 16 39bd48c3 15] = 0.022854175321846512
[15 336536c3 13 4f0e38b1 16 15d229f7 16] = 0.022884125170795171
[16 221f686d 12 d8948a07 16 ed8a8345 16] = 0.022902500408830236
[16 d7ca8cbb 13 eb4e259f 15 34ab1143 16] = 0.022905955538176669
[16 7cb04f65 14 9b96da73 16 83625687 15] = 0.022906573700088178
[15 5156196b 14 940d8869 15 0086f473 17] = 0.022984943828687553

Prepending an increment to triple32 breaks the hash(0) = 0 issue while also lowering the bias a tiny bit further:

// exact bias: 0.020829410544597495
uint32_t
triple32inc(uint32_t x)
{
    x++;
    x ^= x >> 17;
    x *= 0xed5ad4bb;
    x ^= x >> 11;
    x *= 0xac4c1b51;
    x ^= x >> 15;
    x *= 0x31848bab;
    x ^= x >> 14;
    return x;
}

// inverse
uint32_t
triple32inc_r(uint32_t x)
{
    x ^= x >> 14 ^ x >> 28;
    x *= 0x32b21703;
    x ^= x >> 15 ^ x >> 30;
    x *= 0x469e0db1;
    x ^= x >> 11 ^ x >> 22;
    x *= 0x79a85073;
    x ^= x >> 17;
    x--;
    return x;
}

Measuring exact bias

The -E mode evaluates the bias of a given hash function (-p or -l). By default the prospector uses an estimate to quickly evaluate a function's bias, but it's non-deterministic and there's a lot of noise in the result. To exhaustively measure the exact bias, use the -e option.

The function to be checked can be defined using -p and a pattern or -l and a shared library containing a function named hash(). For example, to measure the exact bias of the best hash function above:

$ ./prospector -Eep xorr:16,mul:e2d0d4cb,xorr:15,mul:3c6ad939,xorr:15

Or drop the function in a C file named hash.c, and name the function hash(). This lets you test hash functions that can't be represented using the prospector's limited notion of hash functions.

$ cc -O3 -shared -fPIC -l hash.so hash.c
$ ./prospector -Eel ./hash.so

By default it treats its input as a 32-bit hash function. Use the -8 switch to test (by estimation) 64-bit functions. There is no exact, exhaustive test for 64-bit hash functions since that would take far too long.

Reversible operation selection

x  = ~x;
x ^= constant;
x *= constant | 1; // e.g. only odd constants
x += constant;
x ^= x >> constant;
x ^= x << constant;
x += x << constant;
x -= x << constant;
x <<<= constant; // left rotation

Technically x = ~x is covered by x ^= constant. However, ~x is uniquely special and particularly useful. The generator is very unlikely to generate the one correct constant for the XOR operator that achieves the same effect.

16-bit hashes

Because the constraints are different for 16-bit hashes there's a separate tool for generating these hashes: hp16. Unlike the 32-bit / 64-bit prospector, this implementation is fully portable and will run on just about any system. It's also capable of generating and evaluating 128KiB s-boxes.

Since 16-bit hashes are more likely to be needed on machines that, say, lack fast multiplication instructions, certain operations can be omitted during exploration (-m, -r).

Some interesting results so far:

// 2-round xorshift-multiply (-Xn2)
// bias = 0.0085905051336723701
uint16_t hash16_xm2(uint16_t x)
{
    x ^= x >> 8; x *= 0x88b5U;
    x ^= x >> 7; x *= 0xdb2dU;
    x ^= x >> 9;
    return x;
}

// 3-round xorshift-multiply (-Xn3)
// bias = 0.0045976709018820602
uint16_t hash16_xm3(uint16_t x)
{
    x ^= x >>  7; x *= 0x2993U;
    x ^= x >>  5; x *= 0xe877U;
    x ^= x >>  9; x *= 0x0235U;
    x ^= x >> 10;
    return x;
}

// No multiplication (-Imn6)
// bias = 0.023840118344741465
uint16_t hash16_s6(uint16_t x)
{
    x += x << 7; x ^= x >> 8;
    x += x << 3; x ^= x >> 2;
    x += x << 4; x ^= x >> 8;
    return x;
}

// Which is identical to this xorshift-multiply
uint16_t hash16_s6(uint16_t x)
{
    x *= 0x0081U; x ^= x >> 8;
    x *= 0x0009U; x ^= x >> 2;
    x *= 0x0011U; x ^= x >> 8;
    return x;
}

A good 3-round xorshift hash (a short search via hp16 -Xn3) is a close approximation of a good s-box (i.e. hp16 -S).

Be mindful of C integer promotion rules when doing 16-bit operations. For instance, on 32-bit implementations unsigned 16-bit operands will be promoted to signed 32-bit integers, leading to incorrect results in certain cases. The C programs printed by this program are careful to promote 16-bit operations to "unsigned int" where needed.

More Repositories

1

endlessh

SSH tarpit that slowly sends an endless banner
C
7,113
star
2

w64devkit

Portable C and C++ Development Kit for x64 (and x86) Windows
C
2,871
star
3

elfeed

An Emacs web feeds client
Emacs Lisp
1,371
star
4

skewer-mode

Live web development in Emacs
Emacs Lisp
1,066
star
5

enchive

Encrypted personal archives
C
630
star
6

branchless-utf8

Branchless UTF-8 decoder
C
589
star
7

scratch

Personal scratch code
C
366
star
8

pixelcity

Shamus Young's procedural city project
C++
359
star
9

optparse

Portable, reentrant, getopt-like option parser
C
326
star
10

pdjson

C JSON parser library that doesn't suck
C
277
star
11

interactive-c-demo

Demonstration of interactive C programming
C
249
star
12

emacs-aio

async/await for Emacs Lisp
Emacs Lisp
215
star
13

webgl-particles

WebGL particle system demo
JavaScript
203
star
14

fantasyname

Fantasy name generator
C
180
star
15

lstack

C11 Lock-free Stack
C
172
star
16

resurrect-js

JavaScript serialization that preserves behavior and reference circularity.
JavaScript
169
star
17

passphrase2pgp

Generate a PGP key from a passphrase
Go
168
star
18

pure-linux-threads-demo

Pthreads-free Linux threading demo
Assembly
147
star
19

ptrace-examples

Examples for Linux ptrace(2)
C
134
star
20

memdig

Memory cheat tool for Windows and Linux games
C
131
star
21

dosdefender-ld31

DOS Defender (Ludum Dare #31)
C
130
star
22

dotfiles

My personal dotfiles
Shell
124
star
23

Prelude-of-the-Chambered

Notch's Prelude of the Chambered 48-hour game
Java
124
star
24

sort-circle

Colorful sorting animations
C
124
star
25

.emacs.d

My personal .emacs.d
Emacs Lisp
119
star
26

growable-buf

Growable Memory Buffer for C99
C
116
star
27

youtube-dl-emacs

Emacs youtube-dl download manager
Emacs Lisp
104
star
28

getopt

POSIX getopt() as a portable header library
C
102
star
29

trie

C99 trie library
C
98
star
30

opengl-demo

Minimal OpenGL 3.3 core profile demo
C
97
star
31

Minicraft

Notch's Ludum Dare 22 entry.
Java
95
star
32

igloojs

Low-level, fluent, OOP WebGL wrapper
JavaScript
89
star
33

hastyhex

A blazing fast hex dumper
C
88
star
34

webgl-game-of-life

WebGL Game of Life
JavaScript
88
star
35

u-config

a smaller, simpler, portable pkg-config clone
C
84
star
36

bmp

24-bit BMP (Bitmap) ANSI C header library
C
84
star
37

elisp-ffi

Emacs Lisp Foreign Function Interface
C++
83
star
38

rng-js

JavaScript seedable random number generation tools.
JavaScript
82
star
39

mandel-simd

Mandelbrot set in SIMD (SSE, AVX)
C
81
star
40

sample-java-project

Example Ant-based Java project
Java
78
star
41

nasm-mode

Major mode for editing NASM assembly programs
Emacs Lisp
78
star
42

vulkan-test

Test if your system supports Vulkan
C
77
star
43

gap-buffer-animator

Gap buffer animation creator
C
72
star
44

at-el

Prototype-based Emacs Lisp object system
Emacs Lisp
71
star
45

skeeto.github.com

Personal website/blog
HTML
64
star
46

ulid-c

ULID Library for C
C
62
star
47

xf8

8-bit Xor Filter in C99
C
61
star
48

race64

World's fastest Base64 encoder / decoder
C
58
star
49

devdocs-lookup

Quick Emacs API lookup on devdocs.io
Emacs Lisp
58
star
50

webgl-path-solver

WebGL shortest path solver
JavaScript
57
star
51

x86-lookup

Quickly jump to x86 documentation from Emacs
Emacs Lisp
57
star
52

javadoc-lookup

Quickly lookup Javadoc pages from Emacs
Emacs Lisp
55
star
53

am-i-shadowbanned

Online reddit shadowban test
JavaScript
53
star
54

fun-liquid

Physics engine liquid in Java.
Java
53
star
55

minimail

Embeddable POP3 + SMTP server.
C
50
star
56

emacs-memoize

Elisp memoization functions
Emacs Lisp
48
star
57

webgl-fire

WebGL fire effect
JavaScript
47
star
58

simplegpg

Simplified, signify-like interface to GnuPG signatures
Shell
47
star
59

autotetris-mode

Automatically play Emacs Tetris
Emacs Lisp
45
star
60

fiber-await

Win32 Fiber async/await demo
C
45
star
61

lorenz-webgl

Lorenz System WebGL
JavaScript
42
star
62

asteroids-demo

Asteroids Clone for Windows
C
40
star
63

elisp-json-rpc

JSON-RPC library for Emacs Lisp
Emacs Lisp
39
star
64

hashtab

Simple C hash table
C
37
star
65

pgp-poisoner

PGP key poisoner
Go
36
star
66

bf-x86

x86_64 brainfuck compiler
C
36
star
67

wisp

Wisp, a lisp programming language
C
33
star
68

binitools

Bini file translator for the game Freelancer
C
32
star
69

double-pendulum

JavaScript double pendulum simulation with RK4 integration
JavaScript
30
star
70

predd

Multimethods for Emacs Lisp
Emacs Lisp
29
star
71

atomkv

In-memory, JSON, key-value service with compare-and-swap updates and event streams
Go
29
star
72

purgeable

Purgeable memory allocations for Linux
C
28
star
73

lqueue

C11 + Pthreads Atomic Bounded Work Queue
C
28
star
74

uuid

UUID generator for Go
Go
27
star
75

goblin-com

Goblin-COM roguelike game for 7DRL 2015
C
27
star
76

jekyll-deck

Template for Jekyll / deck.js presentations
27
star
77

rlhk

Roguelike Header Kit
C
26
star
78

voronoi-toy

WebGL interactive Voronoi diagram
JavaScript
26
star
79

transcription-mode

Emacs mode for editing transcripts.
Emacs Lisp
25
star
80

october-chess-engine

Java Chess Engine
Java
25
star
81

boids-js

HTML5 boids (skewer-mode demo)
JavaScript
25
star
82

optparse-go

GNU style long options for Go
Go
24
star
83

geohash

Fast, lean, efficient geohash C library
C
24
star
84

bitpack

Emacs Lisp structure packing
Emacs Lisp
23
star
85

lean-static-gpg

Lean, static GnuPG build for Linux
Shell
23
star
86

connect4

Connect Four AI and Engine
C
22
star
87

blowpipe

Authenticated Blowfish-encrypted pipe
C
22
star
88

markov-text

Markov chain text generation in Emacs Lisp
Emacs Lisp
22
star
89

joymacs

Joystick support for Emacs
C
21
star
90

emacs-rsa

RSA cryptography in Emacs Lisp
Emacs Lisp
21
star
91

live-dev-env

A live CD of my personal development environment
Shell
20
star
92

dynamic-function-benchmark

Benchmark for three different kinds of dynamic function calls
C
20
star
93

utf-7

UTF-7 encoder and decoder in ANSI C
C
19
star
94

elisp-fakespace

Emacs Lisp namespaces (defpackage)
Emacs Lisp
18
star
95

siphash

Incremental SipHash in C
C
18
star
96

bencode-c

Bencode decoder in ANSI C
C
17
star
97

british-square

British Square Engine (Analysis and Perfect AI Player)
C
17
star
98

pokerware

Pokerware Secure Passphrase Generation
Makefile
17
star
99

xxtea

100% XXTEA authenticated, chunked file encryption
C
17
star
100

gnupg-windows-build

Cross-compile GnuPG for Windows using Docker
Dockerfile
17
star