• Stars
    star
    884
  • Rank 51,291 (Top 2 %)
  • Language
    Shell
  • License
    MIT License
  • Created over 10 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Self describing hashes - for future proofing

multihash

Self identifying hashes

Multihash is a protocol for differentiating outputs from various well-established cryptographic hash functions, addressing size + encoding considerations.

It is useful to write applications that future-proof their use of hashes, and allow multiple hash functions to coexist. See jbenet/random-ideas#1 for a longer discussion.

Table of Contents

Example

Outputs of <encoding>.encode(multihash(<digest>, <function>)):

# sha1 - 0x11 - sha1("multihash")
111488c2f11fb2ce392acb5b2986e640211c4690073e # sha1 in hex
CEKIRQXRD6ZM4OJKZNNSTBXGIAQRYRUQA47A==== # sha1 in base32
5dsgvJGnvAfiR3K6HCBc4hcokSfmjj # sha1 in base58
ERSIwvEfss45KstbKYbmQCEcRpAHPg== # sha1 in base64

# sha2-256 0x12 - sha2-256("multihash")
12209cbc07c3f991725836a3aa2a581ca2029198aa420b9d99bc0e131d9f3e2cbe47 # sha2-256 in hex
CIQJZPAHYP4ZC4SYG2R2UKSYDSRAFEMYVJBAXHMZXQHBGHM7HYWL4RY= # sha256 in base32
QmYtUc4iTCbbfVSDNKvtQqrfyezPPnFvE33wFmutw9PBBk # sha256 in base58
EiCcvAfD+ZFyWDajqipYHKICkZiqQgudmbwOEx2fPiy+Rw== # sha256 in base64

Note: You should consider using multibase to base-encode these hashes instead of base-encoding them directly.

Format

<varint hash function code><varint digest size in bytes><hash function output>

Binary example (only 4 bytes for simplicity):

fn code  dig size hash digest
-------- -------- -----------------------------------
00010001 00000100 10110110 11111000 01011100 10110101
sha1     4 bytes  4 byte sha1 digest

Why have digest size as a separate number?

Because otherwise you end up with a function code really meaning "function-and-digest-size-code". Makes using custom digest sizes annoying, and is less flexible.

Why isn't the size first?

Because aesthetically I prefer the code first. You already have to write your stream parsing code to understand that a single byte already means "a length in bytes more to skip". Reversing these doesn't buy you much.

Why varints?

So that we have no limitation on functions or lengths.

What kind of varints?

A Most Significant Bit unsigned varint (also called base-128 varints), as defined by the multiformats/unsigned-varint.

Don't we have to agree on a table of functions?

Yes, but we already have to agree on functions, so this is not hard. The table even leaves some room for custom function codes.

Implementations

Table for Multihash

We use a single Multicodec table across all of our multiformat projects. The shared namespace reduces the chances of accidentally interpreting a code in the wrong context. Multihash entries are identified with a multihash value in the tag column.

The current table lives here

Other Tables

Cannot find a good standard on this. Found some different IANA ones:

They disagree. :(

Notes

Multihash and randomness

Obviously multihash values bias the first two bytes. Do not expect them to be uniformly distributed. The entropy size is len(multihash) - 2. Skip the first two bytes when using them with bloom filters, etc. Why not _ap_pend instead of _pre_pend? Because when reading a stream of hashes, you can know the length of the whole value, and allocate the right amount of memory, skip it, or discard it.

Insecure / obsolete hash functions

Obsolete and deprecated hash functions are included in this list. MD4, MD5 and SHA-1 should no longer be used for cryptographic purposes, but since many such hashes already exist they are included in this specification and may be implemented in multihash libraries.

Non-cryptographic hash functions

Multihash is intended for "well-established cryptographic hash functions" as non-cryptographic hash functions are not suitable for content addressing systems. However, there may be use-cases where it is desireable to identify non-cryptographic hash functions or their digests by use of a multihash. Non-cryptographic hash functions are identified in the Multicodec table with a tag hash value in the tag column.

Visual Examples

These are visual aids that help tell the story of why Multihash matters.

Consider these 4 different hashes of same input

Same length: 256 bits

Different hash functions

Idea: self-describe the values to distinguish

Multihash: fn code + length prefix

Multihash: a pretty good multiformat

Multihash: has a bunch of implementations already

Maintainers

Captain: @jbenet.

Contribute

Contributions welcome. Please check out the issues.

Check out our contributing document for more information on how we work, and about contributing in general. Please be aware that all interactions related to multiformats are subject to the IPFS Code of Conduct.

Small note: If editing the README, please conform to the standard-readme specification.

License

This repository is only for documents. All of these are licensed under the CC-BY-SA 3.0 license Β© 2016 Protocol Labs Inc. Any code is under a MIT Β© 2016 Protocol Labs Inc.

More Repositories

1

multiformats

The main repository for discussing multiformats.
543
star
2

multiaddr

Composable and future-proof network addresses
Go
421
star
3

cid

Self-describing content-addressed identifiers for distributed systems
415
star
4

multicodec

Compact self-describing codecs. Save space by using predefined multicodec tables.
Python
335
star
5

multibase

Self identifying base encodings
277
star
6

go-multiaddr

Composable and future-proof network addresses
Go
263
star
7

go-multihash

Multihash implementation in Go
Go
234
star
8

js-multiformats

Multiformats interface (multihash, multicodec, multibase and CID)
TypeScript
224
star
9

rust-multihash

multihash implementation in Rust
Rust
150
star
10

js-multihash

multihash implementation in JavaScript
JavaScript
119
star
11

js-multiaddr

JavaScript implementation of multiaddr
TypeScript
109
star
12

js-cid

CID implementation in JavaScript
JavaScript
97
star
13

rust-cid

CID in rust
Rust
86
star
14

rust-multiaddr

multiaddr implementation in rust
Rust
86
star
15

unsigned-varint

unsigned varint in use in multiformat specs
77
star
16

multistream

Make data and streams self-described by prefixing them with human readable codecs.
62
star
17

multistream-select

Friendly protocol negotiation. It enables a multicodec to be negotiated between two entities.
62
star
18

cs-multihash

Multihash implementation in C#
C#
48
star
19

rust-multibase

Multibase in rust
Rust
46
star
20

java-multihash

A Java implementation of Multihash
Java
42
star
21

clj-multihash

Clojure implementation of the Multihash spec
Clojure
40
star
22

go-multistream

an implementation of the multistream protocol in go
Go
39
star
23

go-multiaddr-net

DEPRECATED: Please use the "net" subpackage in https://github.com/multiformats/go-multiaddr.
Go
34
star
24

go-multicodec

Go constants for the multicodec table
Go
34
star
25

java-multibase

A Java implementation of multibase
Java
34
star
26

py-multiaddr

multiaddr implementation in Python
Python
33
star
27

cid-utils-website

A website for decoding CIDs
HTML
33
star
28

go-multibase

Implementation of multibase parser in go
Go
32
star
29

js-multihashing-async

The fast version of js-multihashing
JavaScript
29
star
30

multigram

Protocol negotiation and multiplexing over datagrams
29
star
31

js-multistream-select

JavaScript implementation of multistream-select
JavaScript
29
star
32

go-multiaddr-dns

Go library and CLI tool for /dns4, /dns6, /dnsaddr multiaddr resolution
Go
28
star
33

haskell-multihash

Multihash Haskell implementation
Haskell
27
star
34

specs

Specification work regarding multihash, multiaddr, and others
26
star
35

ex_multihash

Multihash implementation in Elixir
Elixir
24
star
36

js-multibase

JavaScript implementation of the multibase specification
JavaScript
23
star
37

py-multibase

Multibase implementation in Python
Python
22
star
38

ruby-multihash

A simple multihash (https://github.com/multiformats/multihash) implementation for ruby.
Ruby
22
star
39

js-multicodec

JavaScript implementation of the multicodec specification
JavaScript
21
star
40

website

The multiformats website
HTML
20
star
41

py-multicodec

Multicodec implementation in Python
Python
17
star
42

cs-multibase

Multibase implementation in C#
C#
16
star
43

java-multiaddr

Java implementation of multiaddr
Java
15
star
44

js-mafmt

javascript multiaddr validation
TypeScript
15
star
45

py-multihash

Multihash implementation in Python
Python
14
star
46

SwiftMultihash

Swift implementation of multihash
Swift
14
star
47

c-multihash

C implementation of Multihash parsing and encoding (but not hashing)
C
12
star
48

js-multihashing

Use all the functions in multihash.
JavaScript
11
star
49

php-multihash

PHP implementation of multihash
PHP
10
star
50

scala-multihash

Scala multihash implementation
Scala
9
star
51

js-cid-tool

A module and command line tool for converting, formatting and discovering properties of CIDs
JavaScript
8
star
52

SwiftMultiaddr

A Multiaddr implementation in Swift.
Swift
8
star
53

cs-multiaddress

Multiaddress implementation in C#
C#
8
star
54

js-multiaddr-to-uri

Convert a Multiaddr to a URI /dnsaddr/ipfs.io/http -> http://ipfs.io
TypeScript
7
star
55

go-base36

Go
7
star
56

MultiHash.Net

.Net implementation of multihash
PowerShell
7
star
57

go-multigram

Go implementation of multigram
6
star
58

go-varint

Go
6
star
59

clj-multistream

Clojure implementation of multistream codecs
Clojure
6
star
60

haskell-multibase

haskell implementation of the multibase multiformat (project by protocol labs)
Haskell
6
star
61

haskell-multicodec

An implementation of the multicodec specification in haskell.
Haskell
5
star
62

clj-varint

Simple wrapper around Bazel VarInt code.
Java
4
star
63

go-multiaddr-fmt

A declarative validator for multiaddrs.
Go
4
star
64

js-sha3

Multiformats hash functions for SHA3
JavaScript
4
star
65

go-multicodec-packed

DEPRECATED -- see go-multicodec
Go
4
star
66

ma-pipe

multiaddr powered pipes
Go
4
star
67

js-uri-to-multiaddr

Convert a URI to a Multiaddr: https://protocol.ai -> /dns4/protocol.ai/tcp/443/https
TypeScript
3
star
68

js-murmur3

Multiformats hash functions for MurmurHash3
JavaScript
3
star
69

cs-multicodec

Multicodec implementation i C#
C#
3
star
70

c-multihashing

Use all the functions in multihash, in C
3
star
71

docs

Multiformats documentation website
2
star
72

js-blake2

BLAKE2 multihash hashers for JavaScript multiformats
JavaScript
2
star
73

cs-multistream

Multistream
C#
2
star
74

github-mgmt

TypeScript
1
star
75

js-multicodec-table

@multiformats/multicodec-table a JavaScript form of the current multicodec table
1
star
76

js-dns

Resolve DNS queries with browser fallback
TypeScript
1
star