• Stars
    star
    277
  • Rank 148,013 (Top 3 %)
  • Language
  • Created about 8 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Self identifying base encodings

multibase

Self identifying base encodings

Multibase is a protocol for disambiguating the encoding of base-encoded (e.g., base32, base36, base64, base58, etc.) binary appearing in text.

When text is encoded as bytes, we can usually use a one-size-fits-all encoding (UTF-8) because we're always encoding to the same set of 256 bytes (+/- the NUL byte). When that doesn't work, usually for historical or performance reasons, we can usually infer the encoding from the context.

However, when bytes are encoded as text (using a base encoding), the base choice of base encoding is often restricted by the context. Worse, these restrictions can change based on where the data appears in the text. In some cases, we can only use [a-z0-9]. In others, we can use a larger set of characters but need a compact encoding. This has lead to a large set of "base encodings", one for every use-case. Unlike when encoding text to bytes, we can't just standardize around a single base encoding because there is no optimal encoding for all cases.

Unfortunately, it's not always clear what base encoding is used; that's where multibase comes in. It answers the question:

Given data d encoded into text s, what base is it encoded with?

Table of Contents

Format

The Format is:

<base-encoding-character><base-encoded-data>

Where <base-encoding-character> is used according to the multibase table.

Multibase Table

The current multibase table is here:

encoding,          code, description,                                                  status
identity,          0x00, 8-bit binary (encoder and decoder keeps data unmodified),     default
base2,             0,    binary (01010101),                                            candidate
base8,             7,    octal,                                                        draft
base10,            9,    decimal,                                                      draft
base16,            f,    hexadecimal,                                                  default
base16upper,       F,    hexadecimal,                                                  default
base32hex,         v,    rfc4648 case-insensitive - no padding - highest char,         candidate
base32hexupper,    V,    rfc4648 case-insensitive - no padding - highest char,         candidate
base32hexpad,      t,    rfc4648 case-insensitive - with padding,                      candidate
base32hexpadupper, T,    rfc4648 case-insensitive - with padding,                      candidate
base32,            b,    rfc4648 case-insensitive - no padding,                        default
base32upper,       B,    rfc4648 case-insensitive - no padding,                        default
base32pad,         c,    rfc4648 case-insensitive - with padding,                      candidate
base32padupper,    C,    rfc4648 case-insensitive - with padding,                      candidate
base32z,           h,    z-base-32 (used by Tahoe-LAFS),                               draft
base36,            k,    base36 [0-9a-z] case-insensitive - no padding,                draft
base36upper,       K,    base36 [0-9a-z] case-insensitive - no padding,                draft
base58btc,         z,    base58 bitcoin,                                               default
base58flickr,      Z,    base58 flicker,                                               candidate
base64,            m,    rfc4648 no padding,                                           default
base64pad,         M,    rfc4648 with padding - MIME encoding,                         candidate
base64url,         u,    rfc4648 no padding,                                           default
base64urlpad,      U,    rfc4648 with padding,                                         default
proquint,          p,    PRO-QUINT https://arxiv.org/html/0901.4016,                   draft
base256emoji,      ๐Ÿš€,    base256 with custom alphabet using variable-sized-codepoints, draft

NOTE: Multibase-prefixes are encoding agnostic. "z" is "z", not 0x7a ("z" encoded as ASCII/UTF-8). For example, in UTF-32, "z" would be [0x7a, 0x00, 0x00, 0x00].

Reserved

The following codes are reserved for (backwards) compatibility with existing systems.

  • / - Separator used by multiaddr.
  • 1 - Base58 encoded identity multihashes used by libp2p peer IDs.
  • Q - Base58 encoded sha2-256 multihashes used by libp2p/ipfs for peer IDs and CIDv0.

If you'd like to switch a project over to multibase and would also like to reserve a prefix for compatibility, please file an issue.

Status

Each multibase encoding has a status:

  • draft - these encodings have been proposed but are not widely implemented and may be removed.
  • candidate - these encodings are mature and widely implemented but may not be implemented by all implementations.
  • default - these encodings should be implemented by all implementations and are widely used.

Multibase By Example

Consider the following encodings of the same binary string:

4D756C74696261736520697320617765736F6D6521205C6F2F # base16 (hex)
JV2WY5DJMJQXGZJANFZSAYLXMVZW63LFEEQFY3ZP           # base32
3IY8QKL64VUGCX009XWUHKF6GBBTS3TVRXFRA5R            # base36
YAjKoNbau5KiqmHPmSxYCvn66dA1vLmwbt                 # base58
TXVsdGliYXNlIGlzIGF3ZXNvbWUhIFxvLw==               # base64

And consider the same encodings with their multibase prefix

F4D756C74696261736520697320617765736F6D6521205C6F2F # base16 F
BJV2WY5DJMJQXGZJANFZSAYLXMVZW63LFEEQFY3ZP           # base32 B
K3IY8QKL64VUGCX009XWUHKF6GBBTS3TVRXFRA5R            # base36 K
zYAjKoNbau5KiqmHPmSxYCvn66dA1vLmwbt                 # base58 z
MTXVsdGliYXNlIGlzIGF3ZXNvbWUhIFxvLw==               # base64 M

The base prefixes used are: F, B, K, z, M.

FAQ

Is this a real problem?

Yes. If i give you "1214314321432165" is that decimal? or hex? or something else? See also:

Why the strange selection of codes / characters?

The code values are selected such that they are included in the alphabets of the base they represent. For example, f is the base code for base16 (hex), because f is in hex's 16 character alphabet. Note that the alphabets can be encoded in UTF8, and most can be encoded in ASCII. We have not found a case needing something else.

Don't we have to agree on a table of base encodings?

Yes, but we already have to agree on base encodings, so this is not hard. The table even leaves some room for custom encodings.

Implementations:

Disclaimers

Warning: obviously multibase changes the first character depending on the encoding. Do not expect the value to be exactly the same. Remove the multibase prefix before using the value.

Contribute

Contributions welcome. Please check out the issues.

Check out our contributing document for more information on how we work, and about contributing in general. Please be aware that all interactions related to multiformats are subject to the IPFS Code of Conduct.

Small note: If editing the README, please conform to the standard-readme specification.

License

This repository is only for documents. All of these are licensed under the CC-BY-SA 3.0 license ยฉ 2016 Protocol Labs Inc. Any code is under a MIT ยฉ 2016 Protocol Labs Inc.

More Repositories

1

multihash

Self describing hashes - for future proofing
Shell
884
star
2

multiformats

The main repository for discussing multiformats.
543
star
3

multiaddr

Composable and future-proof network addresses
Go
421
star
4

cid

Self-describing content-addressed identifiers for distributed systems
415
star
5

multicodec

Compact self-describing codecs. Save space by using predefined multicodec tables.
Python
335
star
6

go-multiaddr

Composable and future-proof network addresses
Go
263
star
7

go-multihash

Multihash implementation in Go
Go
234
star
8

js-multiformats

Multiformats interface (multihash, multicodec, multibase and CID)
TypeScript
224
star
9

rust-multihash

multihash implementation in Rust
Rust
150
star
10

js-multihash

multihash implementation in JavaScript
JavaScript
119
star
11

js-multiaddr

JavaScript implementation of multiaddr
TypeScript
109
star
12

js-cid

CID implementation in JavaScript
JavaScript
97
star
13

rust-cid

CID in rust
Rust
86
star
14

rust-multiaddr

multiaddr implementation in rust
Rust
86
star
15

unsigned-varint

unsigned varint in use in multiformat specs
77
star
16

multistream

Make data and streams self-described by prefixing them with human readable codecs.
62
star
17

multistream-select

Friendly protocol negotiation. It enables a multicodec to be negotiated between two entities.
62
star
18

cs-multihash

Multihash implementation in C#
C#
48
star
19

rust-multibase

Multibase in rust
Rust
46
star
20

java-multihash

A Java implementation of Multihash
Java
42
star
21

clj-multihash

Clojure implementation of the Multihash spec
Clojure
40
star
22

go-multistream

an implementation of the multistream protocol in go
Go
39
star
23

go-multiaddr-net

DEPRECATED: Please use the "net" subpackage in https://github.com/multiformats/go-multiaddr.
Go
34
star
24

go-multicodec

Go constants for the multicodec table
Go
34
star
25

java-multibase

A Java implementation of multibase
Java
34
star
26

py-multiaddr

multiaddr implementation in Python
Python
33
star
27

cid-utils-website

A website for decoding CIDs
HTML
33
star
28

go-multibase

Implementation of multibase parser in go
Go
32
star
29

js-multihashing-async

The fast version of js-multihashing
JavaScript
29
star
30

multigram

Protocol negotiation and multiplexing over datagrams
29
star
31

js-multistream-select

JavaScript implementation of multistream-select
JavaScript
29
star
32

go-multiaddr-dns

Go library and CLI tool for /dns4, /dns6, /dnsaddr multiaddr resolution
Go
28
star
33

haskell-multihash

Multihash Haskell implementation
Haskell
27
star
34

specs

Specification work regarding multihash, multiaddr, and others
26
star
35

ex_multihash

Multihash implementation in Elixir
Elixir
24
star
36

js-multibase

JavaScript implementation of the multibase specification
JavaScript
23
star
37

py-multibase

Multibase implementation in Python
Python
22
star
38

ruby-multihash

A simple multihash (https://github.com/multiformats/multihash) implementation for ruby.
Ruby
22
star
39

js-multicodec

JavaScript implementation of the multicodec specification
JavaScript
21
star
40

website

The multiformats website
HTML
20
star
41

py-multicodec

Multicodec implementation in Python
Python
17
star
42

cs-multibase

Multibase implementation in C#
C#
16
star
43

java-multiaddr

Java implementation of multiaddr
Java
15
star
44

js-mafmt

javascript multiaddr validation
TypeScript
15
star
45

py-multihash

Multihash implementation in Python
Python
14
star
46

SwiftMultihash

Swift implementation of multihash
Swift
14
star
47

c-multihash

C implementation of Multihash parsing and encoding (but not hashing)
C
12
star
48

js-multihashing

Use all the functions in multihash.
JavaScript
11
star
49

php-multihash

PHP implementation of multihash
PHP
10
star
50

scala-multihash

Scala multihash implementation
Scala
9
star
51

js-cid-tool

A module and command line tool for converting, formatting and discovering properties of CIDs
JavaScript
8
star
52

SwiftMultiaddr

A Multiaddr implementation in Swift.
Swift
8
star
53

cs-multiaddress

Multiaddress implementation in C#
C#
8
star
54

js-multiaddr-to-uri

Convert a Multiaddr to a URI /dnsaddr/ipfs.io/http -> http://ipfs.io
TypeScript
7
star
55

go-base36

Go
7
star
56

MultiHash.Net

.Net implementation of multihash
PowerShell
7
star
57

go-multigram

Go implementation of multigram
6
star
58

go-varint

Go
6
star
59

clj-multistream

Clojure implementation of multistream codecs
Clojure
6
star
60

haskell-multibase

haskell implementation of the multibase multiformat (project by protocol labs)
Haskell
6
star
61

haskell-multicodec

An implementation of the multicodec specification in haskell.
Haskell
5
star
62

clj-varint

Simple wrapper around Bazel VarInt code.
Java
4
star
63

go-multiaddr-fmt

A declarative validator for multiaddrs.
Go
4
star
64

js-sha3

Multiformats hash functions for SHA3
JavaScript
4
star
65

go-multicodec-packed

DEPRECATED -- see go-multicodec
Go
4
star
66

ma-pipe

multiaddr powered pipes
Go
4
star
67

js-uri-to-multiaddr

Convert a URI to a Multiaddr: https://protocol.ai -> /dns4/protocol.ai/tcp/443/https
TypeScript
3
star
68

js-murmur3

Multiformats hash functions for MurmurHash3
JavaScript
3
star
69

cs-multicodec

Multicodec implementation i C#
C#
3
star
70

c-multihashing

Use all the functions in multihash, in C
3
star
71

docs

Multiformats documentation website
2
star
72

js-blake2

BLAKE2 multihash hashers for JavaScript multiformats
JavaScript
2
star
73

cs-multistream

Multistream
C#
2
star
74

github-mgmt

TypeScript
1
star
75

js-multicodec-table

@multiformats/multicodec-table a JavaScript form of the current multicodec table
1
star
76

js-dns

Resolve DNS queries with browser fallback
TypeScript
1
star