• Stars
    star
    573
  • Rank 77,865 (Top 2 %)
  • Language
    Erlang
  • Created about 13 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A decentralized, k-ordered id generation service in Erlang

Flake: A decentralized, k-ordered id generation service in Erlang

Flake produces 128-bit, k-ordered ids (read time-ordered lexically). Run one on each node in your infrastructure and they will generate conflict-free ids on-demand without coordination.

Read the original post on the Boundary blog.

To get started

git clone git://github.com/boundary/flake.git

Then edit rel/files/sys.config to fit your environment.

  • interface - set to an available network interface from which to pull a 48-bit mac address as the worker id.
  • timestamp_path - set to a location where flake can periodically save the current time. If flake detects on startup that this file contains timestamps in the future or the distant past, it will refuse to startup. This is to prevent problematic ids from being distributed.
  • allowable_downtime - an added safeguard to prevent flake from starting up if it sees it hasn't run in a long period of time according to the system clock since this might be an indication that the clock has been skewed far into the future.

Example configuration:

	[
	 {flake, [
	    {interface, "en0"},
	    {timestamp_path, "/srv/flake/timestamp-dets"},
	    {allowable_downtime, 2592000000}
	  ]}
	]

Then simply run the server inline

./start.sh

And use the embedded test harness to ensure that you are able to generate ids.

Generate 1 id and receive the erlang binary (flake@localhost)1> flake_harness:generate(1).

[<<0,0,1,52,212,33,45,67,16,154,221,94,14,143,0,0>>]

Generate 10 base-62 encoded ids

(flake@localhost)2> flake_harness:generate(10,62).

["8HFaR8qWtRlGDHnO57","8HFaR8qWtRlGDHnO56",
 "8HFaR8qWtRlGDHnO55","8HFaR8qWtRlGDHnO54",
 "8HFaR8qWtRlGDHnO53","8HFaR8qWtRlGDHnO52",
 "8HFaR8qAulTgCBd6Wp","8HFaR8qAulTgCBd6Wo",
 "8HFaR8qAulTgCBd6Wn","8HFaR8qAulTgCBd6Wm"]

Time how long it takes to generate 100,000 ids

(flake@localhost)3> flake_harness:timed_generate(100000).

src/flake_harness.erl:33: generating ids: 0.402 s

These last steps simple ensure that a flake application is up and running. Next we'll talk more about operational use.

Deployment

Flake is a standalone application. Request ids with a gen_server:call from another Erlang VM (or application that speaks Erlang distribution a la Scalang).

Example usage from your application.

	flake() ->
	    Node = {flake, flake@localhost},
	    {ok, FlakeId} = gen_server:call(Node, get),
	    {ok, FlakeIdBase62} = gen_server:call(Node, {get,62}),
		%% example id decomposition for demonstration only
    	<<_Time:64/integer,_WorkerId:48/integer,_Sequence:16/integer>> = FlakeId,
   	 	FlakeId.

Anatomy

Flake ids are 128-bits wide described here from most significant to least significant bits.

  • 64-bit timestamp - milliseconds since the epoch (Jan 1 1970)
  • 48-bit worker id - MAC address from a configurable device
  • 16-bit sequence # - usually 0, incremented when more than one id is requested in the same millisecond and reset to 0 when the clock ticks forward

Roadmap

  • Bulk id generation
  • HTTP interface
  • Client library (Erlang, possibly others)
    • Provide an abstraction for the messaging format to avoid using gen_server calls as the public API.
    • Will generate a notional flake id (with an empty worker id) given a timestamp. This is useful for generating a lexical range of values that could have been generated in a span of time. At Boundary we store data in Riak keyed by flake ids and use keyfilters to page out blocks of data using this technique.

Frequently Asked Questions

What if I'm not using Erlang?

An HTTP interface is on the roadmap. For applications written in Scala, Scalang is an option.

How does this differ from snowflake developed at Twitter?

The differences stem primarily from the fact that Twitter snowflake ids are 64-bits wide. This means that additional coordination is required to pick a worker id (twitter does this via a ZooKeeper ensemble). Their scheme works great when your ids must fit into 64-bits. However this comes at the cost of additional coordination among nodes and a system that is generally a little more difficult to reason about. It is a fine system though and we were able to learn from it in our efforts.

How is flake different from rearranging the bits of a UUID-1 id?

First, successive UUID versions aim to make ids increasingly opaque in nature through various means. We have actually found a great deal of utility in structurally transparent unique ids and that has motivated much of this work. The most transparent variant is UUID-1 (for which it has received a fair bit of criticism) and thus the nearest relative to a flake id. There are some important differences though.

UUID-1 is an odd beast. First, the timestamp is based on the number of 100 nanosecond intervals since October 15, 1582. This is not how most of us familiar with Unix timestamps reason about time. If that isn't bad enough, the timestamp is an odd 60-bits in length with the most significant bits shifted to the least significant bits of the UUID. This property makes lexical ordering essentially meaningless. The remaining bits contain a clock id (initially set to a random number) and a node id (usually the MAC address).

The first problem is the timestamp. We could rearrange the bits to get some k-ordering love, but reasoning on timestamps of this nature makes reasoning about the resulting ids more complex than it needs to be. This is why flake uses a standard 64-bit Unix timestamp, unaltered, as the most significant bits.

The next problem is clock skew and protection against replaying ids for a time in the past. The UUID-1 spec dictates that the clock id be incremented in such an event, but this behavior is implementation-specific and we aren't aware of any Erlang implementations that met our safety goals. Flake durably writes a timestamp to a dets table periodically while running. Following a restart, flake will refuse to startup if the timestamp written there is from the future. Furthermore, flake will refuse to generate ids if it detects that the system clock is running backwards.

When are flake ids not appropriate?

Flake ids are predictable by design. Don't use use flake to generate ids that you'd rather be unpredictable. Don't use flake to generate passwords, security tokens, or anything else you wouldn't want someone to be able to guess.

Flake ids expose the identity of the machine which generated the id (by way of its MAC address) and the time at which it did so. This could be a problem for some security-sensitive applications.

Don't do modulo 2 arithmetic on flake ids with the expectation of random distribution. The least significant 16-bits are usually going to be 0 on a machine that is generating an average of one or fewer ids per millisecond.

More Repositories

1

folsom

Expose Erlang Events and Metrics
Erlang
587
star
2

high-scale-lib

A fork of Cliff Click's High Scale Library. Improved with bug fixes and a real build system.
Java
408
star
3

ordasity

Ordasity is Boundary's library for building stateful clustered services on the JVM.
Scala
343
star
4

scalang

Scalang is a scala wrapper that makes it easy to write services that interface with erlang.
Scala
203
star
5

wireshark

wireshark + boundary IPFIX decode patches
C
165
star
6

fasttuple

Java
142
star
7

firespray

Blazingly fast streaming charts
JavaScript
106
star
8

html5-node-diagram

JavaScript
86
star
9

bear

a set of statistics functions for erlang
Erlang
68
star
10

overlock

Boundary's suite of concurrent scala utilities.
Scala
67
star
11

gen_lb

A generic library to load balance communication between Erlang nodes
Erlang
64
star
12

zoocreeper

A ZooKeeper backup tool.
Java
49
star
13

folsom_cowboy

A Cowboy based Folsom HTTP Wrapper.
Erlang
39
star
14

libdnet

updated fork of libdnet from https://code.google.com/p/libdnet/
C
39
star
15

folsom_webmachine

folsom based metrics via HTTP and JSON
Erlang
37
star
16

small_wonder

A Deployment Tool
Ruby
25
star
17

knife-plugins

Ruby
21
star
18

khial

a fake network driver to test network applications
C
19
star
19

atomicmap_challenge

A multithreaded scala programming challenge.
Scala
16
star
20

winpcap-installer

fork of the NMAP's silent WinPCAP installer
NSIS
13
star
21

labs-graphs

JavaScript
13
star
22

boundary_scripts

Shell
12
star
23

archaius-consul

A consul-backed configuration source for Archaius
Java
10
star
24

childspec_validator

Verify your Erlang childspecs before you use them.
Erlang
7
star
25

bprobe_cookbook

The Boundary bprobe cookbook.
Ruby
7
star
26

pulse-api-cli

Command line tools for Boundary APIs
Python
6
star
27

chef-boundary-annotations-handler

Chef exception handler for adding annotations to Boundary.
Ruby
5
star
28

boundary-event-plugins

Python
4
star
29

meter-plugin-rabbitmq

Metrics Plugin for RabbitMQ
Java
4
star
30

dw-consul

Tools for integrating consul with dropwizard applications.
Java
4
star
31

dropwizard-kafka

Kotlin
3
star
32

meter-plugin-sdk-lua

This is the framework for making the creation and maintenance of plugins easy.
Lua
3
star
33

boundary_splunk_app

A Splunk App for integration with Boundary
Python
3
star
34

meter-plugin-sdk-java

Java framework for creating plugins
Java
3
star
35

boundary-client-js

JavaScript
3
star
36

ibrowse

Erlang
3
star
37

boundary-vmware

Boundary Enterprise Integration with VMWare
Java
3
star
38

boundary-plugin-disk-summary

Disk Summary Plugin
Lua
2
star
39

boundary-plugin-lua-test

Repo for testing the LUA plugin framework
Lua
2
star
40

jenkins-boundary-annotations-plugin

Java
2
star
41

boundary_puppet

Boundary Meter puppet module
Puppet
2
star
42

java-streaming-client-example

A Java WebSocket client example for the Boundary Streaming API
Java
2
star
43

boundary-vagrant-snmp

Environment for testing SNMP integrations and plugins
Puppet
2
star
44

meter-plugin-zookeeper

Lua
2
star
45

meter-plugin-docker

Lua
2
star
46

chef-boundary-events-handler

Ruby
2
star
47

dataCommander

Data manager using Backbone: query, cache, merge
JavaScript
2
star
48

boundary-meter_cookbook

Boundary Meter Chef Cookbook
Ruby
2
star
49

boundary-plugin-aws-elb

Plugin that extracts metrics from AWS Cloud Watch on ELBs(Elastic Load Balancers)
Python
2
star
50

boundary-event-sdk

Boundary Event SDK
Java
2
star
51

meter-plugin-developer-guide

Documentation on how to development and deploy Boundary metric plugins
Makefile
2
star
52

boundary-plugin-aws-rds

Collects metrics from the Amazon Relational Database Service (RDS)
Python
2
star
53

meter-plugin-nginx-plus

This plugin is for Nginx+ Users to Monitor Nginx
Lua
2
star
54

jenkins-boundary-event-plugin

Java
1
star
55

boundary-plugin-syslog

Boundary Plugin for Syslog
1
star
56

meter-plugin-activemq

Collect metrics from an active MQ instance
Lua
1
star
57

meter-plugin-snmp

Boundary Plugin for SNMP
1
star
58

boundary-plugin-consul-healthchecks

Queries Consul health checks and returns events to Boundary on a status change of a given health check
Lua
1
star
59

boundary-plugin-diskrw_summary

Lua
1
star
60

webhook-action-example

Provides an example of the handling of the webhook call from a Boundary action
1
star
61

vagrant-plugin-dev-env

Vagrant environment for developing plugins
Shell
1
star
62

meter-plugin-wmi

Lua
1
star
63

ZenPacks.boundary.EventAdapter

Zenoss 4.x event adapter for Boundary
Python
1
star
64

meter-plugin-apache-tomcat

Java
1
star
65

libstatsite

C
1
star
66

meter-plugin-solr

Lua
1
star
67

plugin-flask

boundary meter flask plugin
Lua
1
star
68

meter-plugin-cassandra

Collects metrics from a cassandra instance
Java
1
star
69

boundary-plugin-elasticsearch

Collects metrics from a elasticsearch
Lua
1
star
70

boundary-plugin-riak

Plugin for extracting metrics from a Riak instance.
Lua
1
star
71

boundary-plugin-aws-redshift

Collects metrics from Amazon Redshift
Python
1
star
72

meter-plugin-windows-process

PowerShell
1
star
73

tsi-lab

Virtual machine for examples on getting data into TrueSight Intelligence
Python
1
star
74

boundary-action-handler

Endpoint to handle actions from Boundary Premimum
Java
1
star
75

meter-plugin-hello-world-lua

Example plugin in "Hello World" fashion.
Lua
1
star
76

meter-plugin-http

Meter plugin to measure http request page load
Python
1
star