• Stars
    star
    266
  • Rank 154,103 (Top 4 %)
  • Language
    C
  • License
    Apache License 2.0
  • Created over 10 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python Non-cryptographic Hash Library

Introduction pypi Travis CI Status codecov

pyhash is a python non-cryptographic hash library.

It provides several common hash algorithms with C/C++ implementation for performance and compatibility.

>>> import pyhash
>>> hasher = pyhash.fnv1_32()

>>> hasher('hello world')
2805756500L

>>> hasher('hello', ' ', 'world')
2805756500L

>>> hasher('world', seed=hasher('hello '))
2805756500L

It also can be used to generate fingerprints without seed.

>>> import pyhash
>>> fp = pyhash.farm_fingerprint_64()

>>> fp('hello')
>>> 13009744463427800296L

>>> fp('hello', 'world')
>>> [13009744463427800296L, 16436542438370751598L]

Notes

hasher('hello', ' ', 'world') is a syntax sugar for hasher('world', seed=hasher(' ', seed=hasher('hello'))), and may not equals to hasher('hello world'), because some hash algorithms use different hash and seed size.

For example, metro hash always use 32bit seed for 64/128 bit hash value.

>>> import pyhash
>>> hasher = pyhash.metro_64()

>>> hasher('hello world')
>>> 5622782129197849471L

>>> hasher('hello', ' ', 'world')
>>> 16402988188088019159L

>>> hasher('world', seed=hasher(' ', seed=hasher('hello')))
>>> 16402988188088019159L

Installation

$ pip install pyhash

Notes

If pip install failed with similar errors, #27

/usr/lib/gcc/x86_64-linux-gnu/6/include/smmintrin.h:846:1: error: inlining failed in call to always_inline 'long long unsigned int _mm_crc32_u64(long long unsigned int, long long unsigned int)': target specific option mismatch
 _mm_crc32_u64 (unsigned long long __C, unsigned long long __V)
 ^~~~~~~~~~~~~
src/smhasher/metrohash64crc.cpp:52:34: note: called from here
             v[0] ^= _mm_crc32_u64(v[0], read_u64(ptr)); ptr += 8;
                     ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~

Please upgrade pip and setuptools to latest version and try again

$ pip install --upgrade pip setuptools

Notes

If pip install failed on MacOS with similar errors #28

   creating build/temp.macosx-10.6-intel-3.6
   ...
   /usr/bin/clang -fno-strict-aliasing -Wsign-compare -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -arch i386 -arch x86_64 -g -c src/smhasher/metrohash64crc.cpp -o build/temp.macosx-10.6-intel-3.6/src/smhasher/metrohash64crc.o -msse4.2 -maes -mavx -mavx2
    src/smhasher/metrohash64crc.cpp:52:21: error: use of undeclared identifier '_mm_crc32_u64'
                v[0] ^= _mm_crc32_u64(v[0], read_u64(ptr)); ptr += 8;
                        ^

You may try to

$ CFLAGS="-mmacosx-version-min=10.13" pip install pyhash

Notes

pyhash only support pypy v6.0 or newer, please download and install the latest pypy.

Algorithms

pyhash supports the following hash algorithms

  • FNV (Fowler-Noll-Vo) hash
    • fnv1_32
    • fnv1a_32
    • fnv1_64
    • fnv1a_64
  • MurmurHash
    • murmur1_32
    • murmur1_aligned_32
    • murmur2_32
    • murmur2a_32
    • murmur2_aligned_32
    • murmur2_neutral_32
    • murmur2_x64_64a
    • murmur2_x86_64b
    • murmur3_32
    • murmur3_x86_128
    • murmur3_x64_128
  • lookup3
    • lookup3
    • lookup3_little
    • lookup3_big
  • SuperFastHash
    • super_fast_hash
  • City Hash _ city_32
    • city_64
    • city_128
    • city_crc_128
    • city_fingerprint_256
  • Spooky Hash
    • spooky_32
    • spooky_64
    • spooky_128
  • FarmHash
    • farm_32
    • farm_64
    • farm_128
    • farm_fingerprint_32
    • farm_fingerprint_64
    • farm_fingerprint_128
  • MetroHash
    • metro_64
    • metro_128
    • metro_crc_64
    • metro_crc_128
  • MumHash
    • mum_64
  • T1Ha
    • t1ha2 (64-bit little-endian)
    • t1ha2_128 (128-bit little-endian)
    • t1ha1 (64-bit native-endian)
    • t1ha1_le (64-bit little-endian)
    • t1ha1_be (64-bit big-endian)
    • t1ha0 (64-bit, choice fastest function in runtime.)
    • t1_32
    • t1_32_be
    • t1_64
    • t1_64_be
  • XXHash
    • xx_32
    • xx_64
    • xxh3_64 NEW
    • xxh3_128 NEW
  • Highway Hash
    • highway_64 NEW
    • highway_128 NEW
    • highway_256 NEW

String and Bytes literals

Python has two types can be used to present string literals, the hash values of the two types are definitely different.

  • For Python 2.x String literals, str will be used by default, unicode can be used with the u prefix.
  • For Python 3.x String and Bytes literals, unicode will be used by default, bytes can be used with the b prefix.

For example,

$ python2
Python 2.7.15 (default, Jun 17 2018, 12:46:58)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyhash
>>> hasher = pyhash.murmur3_32()
>>> hasher('foo')
4138058784L
>>> hasher(u'foo')
2085578581L
>>> hasher(b'foo')
4138058784L
$ python3
Python 3.7.0 (default, Jun 29 2018, 20:13:13)
[Clang 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyhash
>>> hasher = pyhash.murmur3_32()
>>> hasher('foo')
2085578581
>>> hasher(u'foo')
2085578581
>>> hasher(b'foo')
4138058784

You can also import unicode_literals to use unicode literals in Python 2.x

from __future__ import unicode_literals

In general, it is more compelling to use unicode_literals when back-porting new or existing Python 3 code to Python 2/3 than when porting existing Python 2 code to 2/3. In the latter case, explicitly marking up all unicode string literals with u'' prefixes would help to avoid unintentionally changing the existing Python 2 API. However, if changing the existing Python 2 API is not a concern, using unicode_literals may speed up the porting process.

More Repositories

1

gohs

GoLang Binding of HyperScan https://www.hyperscan.io/
Go
248
star
2

rust-fasthash

A suite of non-cryptographic hash functions for Rust.
Rust
134
star
3

rust-t1ha

Rust implementation for T1HA (Fast Positive Hash)
Rust
86
star
4

rust-dpdk

Rust bindings to DPDK
Rust
76
star
5

rust-hyperscan

Hyperscan bindings for Rust with Multiple Pattern and Streaming Scan
Rust
75
star
6

rust-macho

Mach-O File Format Parser for Rust
Rust
74
star
7

adb.js

A node.js module which implement a pure javascript adb protocol to control Android device
JavaScript
69
star
8

pyv8

Python
60
star
9

curator.go

Golang porting for Curator
Go
50
star
10

zipkin-cpp

Zipkin tracing library for C/C++
C++
49
star
11

tokio-kafka

Asynchronous Rust client for Apache Kafka
Rust
31
star
12

rust-mime-sniffer

Detecting mime types base on content sniffer
Rust
26
star
13

rust-mqtt

MQTT protocol v3 implementation
Rust
25
star
14

rust-maglev

Google's consistent hashing algorithm
Rust
21
star
15

rust-pdl

Parse PDL file for the Chrome DevTools Protocol
Rust
13
star
16

jav8

An implementation of the Java Scripting API (JSR223) base on the Google V8 Javascript engine.
Java
13
star
17

tornado-rest-swagger

Swagger Documentation Generator for the Tornado Web Framework
JavaScript
12
star
18

docker-wdt

Docker image for Warp speed Data Transfer (WDT)
Shell
10
star
19

asyncdns

Asynchronous DNS query pipeline for Python
Python
10
star
20

rust-jit

Rust
10
star
21

arrow

Golang implementation for Apache Arrow format
Go
9
star
22

rust-rscope

Rust cross-reference generator for cscope
Rust
8
star
23

pyvpx

Python Binding of WebM VP8 Codec
Python
8
star
24

rust-atomic-traits

The traits for generic atomic operations in Rust.
Rust
8
star
25

rust-quickjs

Rust binding for the QuickJS Javascript Engine
Rust
8
star
26

quartz

Golang Native Clone of Quartz Scheduler
Go
7
star
27

rust-ebpf

Rust
6
star
28

rsocket-go

Golang implementation of RSocket http://rsocket.io
Go
6
star
29

rust-zipkin

Zipkin tracing library for Rust language
Rust
6
star
30

rust-retrofit

A experimental declarative HTTP/REST client for Rust
Rust
6
star
31

rust-cfile

Rust bindings to C FILE stream
Rust
6
star
32

named-tuple.rs

A macro for declaring a struct that manages a set of fields in a tuple.
Rust
5
star
33

rust-haproxy

An implementation of the HAProxy Stream Processing Offload Protocol (SPOP) in Rust
Rust
4
star
34

rust-ntlm

Rust
4
star
35

spring-framework.docset

Spring Framework docset for Dash
4
star
36

zkpipe

Consume Zookeeper binary log to sync filtered transactions to Kafka topic.
Scala
3
star
37

rust-filename

Get filename from a raw file descriptor
Rust
3
star
38

node-iphoto

iPhoto bindings for Node.js
JavaScript
3
star
39

lz4frame

Rust library for LZ4 Frame Format
Rust
2
star
40

rust-wast

WAST parser
Rust
2
star
41

vine

Virtual Interface of NoSQL Engine for Javascript
JavaScript
2
star
42

rust-dapr

Dapr SDK for Rust
Rust
2
star
43

xql

XQL is an experimental project for building type-safe query statements in Golang
Go
2
star
44

rust-manuf

Ethernet vendor codes, and well-known MAC addresses
Rust
2
star
45

quick-csv

CSV parser which picks up values directly without performing tokenization in Rust
Rust
2
star
46

install-hyperscan

Install Hyperscan for your build
1
star
47

rust-btf

Rust
1
star
48

go-bitflags

Bitflags is a tool to automate generate code which behave like a set of bitflags
Go
1
star
49

tsjs

Javascript based Plugin Framework for the Apache Traffic Server
C++
1
star
50

rust-chrome-devtools-protocol

The Chrome DevTools Protocol in Rust
Rust
1
star
51

rust-dagger

An experimental dependency injector for Rust.
Rust
1
star
52

rust-pcap2

A Rust library for reading and writing PCAP/PCAPNG files.
Rust
1
star
53

mmseg4a

MMSeg for Android
C
1
star
54

generator-react-typescript

Yeoman generator for using React.js with Webpack via TypeScript
JavaScript
1
star
55

doubanned

Douban's Facebook Show
Python
1
star
56

HTML5i

HTML5 extension for IE
C
1
star
57

astq

Abstract Syntax Tree (AST) Query Engine for Golang
Go
1
star
58

gocombine

gocombine is an experimental implementation of parser combinators for Golang[Generic]
Go
1
star
59

contructor-derive

Registers a function to be called before/after main (if an executable) or when loaded/unloaded (if a dynamic library).
Rust
1
star
60

rust-bin

Rust
1
star
61

rust-tipc

TIPC API in Rust
Rust
1
star
62

atom

Unique integer codes (also known as atoms) for a fixed set of frequently occurring strings
Go
1
star