• Stars
    star
    369
  • Rank 115,686 (Top 3 %)
  • Language
    Ruby
  • License
    MIT License
  • Created about 11 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Off-heap large object storage

Hammerspace

Hash-like interface to persistent, concurrent, off-heap storage

What is Hammerspace?

Hammerspace ... is a fan-envisioned extradimensional, instantly accessible storage area in fiction, which is used to explain how animated, comic, and game characters can produce objects out of thin air.

This gem provides persistent, concurrently-accessible off-heap storage of strings with a familiar hash-like interface. It is optimized for bulk writes and random reads.

Motivation

Applications often use data that never changes or changes very infrequently. In many cases, some latency is acceptable when accessing this data. For example, a user's profile may be loaded from a web service, a database, or an external shared cache like memcache. In other cases, latency is much more sensitive. For example, translations may be used many times and incurring even a ~2ms delay to access them from an external cache would be prohibitively slow.

To work around the performance issue, this type of data is often loaded into the application at startup. Unfortunately, this means the data is stored on the heap, where the garbage collector must scan over the objects on every run (at least in the case of Ruby MRI). Further, for application servers that utilize multiple processes, each process has its own copy of the data which is an inefficient use of memory.

Hammerspace solves these problems by moving the data off the heap onto disk. Leveraging libraries and data structures optimized for bulk writes and random reads allows an acceptable level of performance to be maintained. Because the data is persistent, it does not need to be reloaded from an external cache or service on application startup unless the data has changed.

Unfortunately, these low-level libraries don't always support concurrent writers. Hammerspace adds concurrency control to allow multiple processes to update and read from a single shared copy of the data safely. Finally, hammerspace's interface is designed to mimic Ruby's Hash to make integrating with existing applications simple and straightforward. Different low-level libraries can be used by implementing a new backend that uses the library. (Currently, only Sparkey is supported.) Backends only need to implement a small set of methods ([], []=, close, delete, each, uid), but can override the default implementation of other methods if the underlying library supports more efficient implementations.

Installation

Requirements

  • Gnista, Ruby bindings for Sparkey
  • Sparkey, constant key/value storage library
  • Snappy, compression/decompression library (unused, but required to compile Sparkey)
  • A filesystem that supports flock(2) and unlinking files/directories with outstanding file descriptors (ext3/4 will do just fine)

Installation

Add the following line to your Gemfile:

gem 'hammerspace'

Then run:

bundle

Vagrant

To make development easier, the source tree contains a Vagrantfile and a small cookbook to install all the prerequisites. The vagrant environment also serves as a consistent environment to run the test suite.

To use it, make sure you have vagrant installed, then:

vagrant up
vagrant ssh
bundle exec rspec

Usage

Getting Started

For the most part, hammerspace acts like a Ruby hash. But since it's a hash that persists on disk, you have to tell it where to store the files. The enclosing directory and any parent directories are created if they don't already exist.

h = Hammerspace.new("/tmp/hammerspace")

h["cartoons"] = "mallets"
h["games"]    = "inventory"
h["rubyists"] = "data"

h.size          #=> 3
h["cartoons"]   #=> "mallets"

h.map { |k,v| "#{k.capitalize} use hammerspace to store #{v}." }

h.close

You should call close on the hammerspace object when you're done with it. This flushes any pending writes to disk and closes any open file handles.

Options

The constructor takes a hash of options as an optional second argument. Currently the only option supported is :backend which specifies which backend class to use. Since there is only one backend supported at this time, there is currently no reason to pass this argument.

h = Hammerspace.new("/tmp/hammerspace", {:backend => Hammerspace::Backend::Sparkey})

Default Values

The constructor takes a default value as an optional third argument. This functions the same as Ruby's Hash, except with Hash it is the first argument.

h = Hammerspace.new("/tmp/hammerspace", {}, "default")
h["foo"] = "bar"
h["foo"]  #=> "bar"
h["new"]  #=> "default"
h.close

The constructor also takes a block to specify a default Proc, which works the same way as Ruby's Hash. As with Hash, it is the block's responsibility to store the value in the hash if required.

h = Hammerspace.new("/tmp/hammerspace") { |hash, key| hash[key] = "#{key} (default)" }
h["new"]  #=> "new (default)"
h.has_key?("new")  #=> true
h.close

Supported Data Types

Only string keys and values are supported.

h = Hammerspace.new("/tmp/hammerspace")
h[1] = "foo"     #=> TypeError
h["fixnum"] = 8  #=> TypeError
h["nil"] = nil   #=> TypeError
h.close

Ruby hashes store references to objects, but hammerspace stores raw bytes. A new Ruby String object is created from those bytes when a key is accessed.

value = "bar"

hash = {"foo" => value}
hash["foo"] == value       #=> true
hash["foo"].equal?(value)  #=> true

hammerspace = Hammerspace.new("/tmp/hammerspace")
hammerspace["foo"] = value
hammerspace["foo"] == value       #=> true
hammerspace["foo"].equal?(value)  #=> false
hammerspace.close

Since every access results in a new String object, mutating values doesn't work unless you create an explicit reference to the string.

h = Hammerspace.new("/tmp/hammerspace")
h["foo"] = "bar"

# This doesn't work like Ruby's Hash because every access creates a new object
h["foo"].upcase!
h["foo"]  #=> "bar"

# An explicit reference is required
value = h["foo"]
value.upcase!
value  #=> "BAR"

# Another access, another a new object
h["foo"]  #=> "bar"

h.close

This also imples that strings "lose" their encoding when retrieved from hammerspace.

value = "bar"
value.encoding  #=> #<Encoding:UTF-8>

h = Hammerspace.new("/tmp/hammerspace")
h["foo"] = value
h["foo"].encoding  #=> #<Encoding:ASCII-8BIT>
h.close

If you require strings in UTF-8, make sure strings are encoded as UTF-8 when storing the key, then force the encoding to be UTF-8 when accessing the key.

h[key] = value.encode('utf-8')
value = h[key].force_encoding('utf-8')

Persistence

Hammerspace objects are backed by files on disk, so even a new object may already have data in it.

h = Hammerspace.new("/tmp/hammerspace")
h["foo"] = "bar"
h.close

h = Hammerspace.new("/tmp/hammerspace")
h["foo"]  #=> "bar"
h.close

Calling clear deletes the data files on disk. The parent directory is not removed, nor is it guaranteed to be empty. Some files containing metadata may still be present, e.g., lock files.

Concurrency

Multiple concurrent readers are supported. Readers are isolated from writers, i.e., reads are consistent to the time that the reader was opened. Note that the reader opens its files lazily on first read, not when the hammerspace object is created.

h = Hammerspace.new("/tmp/hammerspace")
h["foo"] = "bar"
h.close

reader1 = Hammerspace.new("/tmp/hammerspace")
reader1["foo"]  #=> "bar"

writer = Hammerspace.new("/tmp/hammerspace")
writer["foo"] = "updated"
writer.close

# Still "bar" because reader1 opened its files before the write
reader1["foo"]  #=> "bar"

# Updated key is visible because reader2 opened its files after the write
reader2 = Hammerspace.new("/tmp/hammerspace")
reader2["foo"]  #=> "updated"
reader2.close

reader1.close

A new hammerspace object does not necessarily need to be created. Calling close will close the files, then the reader will open them lazily again on the next read.

h = Hammerspace.new("/tmp/hammerspace")
h["foo"] = "bar"
h.close

reader = Hammerspace.new("/tmp/hammerspace")
reader["foo"]  #=> "bar"

writer = Hammerspace.new("/tmp/hammerspace")
writer["foo"] = "updated"
writer.close

reader["foo"]  #=> "bar"

# Close files now, re-open lazily on next read
reader.close

reader["foo"]  #=> "updated"
reader.close

If no hammerspace files exist on disk yet, the reader will fail to open the files. It will try again on next read.

reader = Hammerspace.new("/tmp/hammerspace")
reader.has_key?("foo")  #=> false

writer = Hammerspace.new("/tmp/hammerspace")
writer["foo"] = "bar"
writer.close

# Files are opened here
reader.has_key?("foo")  #=> true
reader.close

You can call uid to get a unique id that identifies the version of the files being read. uid will be nil if no hammerspace files exist on disk yet.

reader = Hammerspace.new("/tmp/hammerspace")
reader.uid  #=> nil

writer = Hammerspace.new("/tmp/hammerspace")
writer["foo"] = "bar"
writer.close

reader.close
reader.uid  #=> "24913_53943df0-e784-4873-ade6-d1cccc848a70"

# The uid changes on every write, even if the content is the same, i.e., it's
# an identifier, not a checksum
writer["foo"] = "bar"
writer.close

reader.close
reader.uid  #=> "24913_9371024e-8c80-477b-8558-7c292bfcbfc1"

reader.close

Multiple concurrent writers are also supported. When a writer flushes its changes it will overwrite any previous versions of the hammerspace.

In practice, this works because hammerspace is designed to hold data that is bulk-loaded from some authoritative external source. Rather than block writers to enforce consistency, it is simpler to allow writers to concurrently attempt to load the data. The last writer to finish loading the data and flush its writes will have its data persisted.

writer1 = Hammerspace.new("/tmp/hammerspace")
writer1["color"] = "red"

# Can start while writer1 is still open
writer2 = Hammerspace.new("/tmp/hammerspace")
writer2["color"] = "blue"
writer2["fruit"] = "banana"
writer2.close

# Reads at this point see writer2's data
reader1 = Hammerspace.new("/tmp/hammerspace")
reader1["color"]  #=> "blue"
reader1["fruit"]  #=> "banana"
reader1.close

# Replaces writer2's data
writer1.close

# Reads at this point see writer1's data; note that "fruit" key is absent
reader2 = Hammerspace.new("/tmp/hammerspace")
reader2["color"]  #=> "red"
reader2["fruit"]  #=> nil
reader2.close

Flushing Writes

Flushing a write incurs some overhead to build the on-disk hash structures that allows fast lookup later. To avoid the overhead of rebuilding the hash after every write, most write operations do not implicitly flush. Writes can be flushed explicitly by calling close.

Delaying flushing of writes has the side effect of allowing "transactions" -- all unflushed writes are private to the hammerspace object doing the writing.

One exception is the clear method which deletes the files on disk. If a reader attempts to open the files immediately after they are deleted, it will perceive the hammerspace to be empty.

h = Hammerspace.new("/tmp/hammerspace")
h["yesterday"] = "foo"
h["today"]     = "bar"
h.close

reader1 = Hammerspace.new("/tmp/hammerspace")
reader1.keys  #=> ["yesterday", "today"]
reader1.close

# Writer wants to remove everything except "today"
writer = Hammerspace.new("/tmp/hammerspace")
writer.clear

# Effect of clear is immediately visible to readers
reader2 = Hammerspace.new("/tmp/hammerspace")
reader2.keys  #=> []
reader2.close

writer["today"] = "bar"
writer.close

reader3 = Hammerspace.new("/tmp/hammerspace")
reader3.keys  #=> ["today"]
reader3.close

If you want to replace the existing data with new data without flushing in between (i.e., in a "transaction"), use replace instead.

h = Hammerspace.new("/tmp/hammerspace")
h["yesterday"] = "foo"
h["today"]     = "bar"
h.close

reader1 = Hammerspace.new("/tmp/hammerspace")
reader1.keys  #=> ["yesterday", "today"]
reader1.close

# Writer wants to remove everything except "today"
writer = Hammerspace.new("/tmp/hammerspace")
writer.replace({"today" => "bar"})

# Old keys still present because writer has not flushed yet
reader2 = Hammerspace.new("/tmp/hammerspace")
reader2.keys  #=> ["yesterday", "today"]
reader2.close

writer.close

reader3 = Hammerspace.new("/tmp/hammerspace")
reader3.keys  #=> ["today"]
reader3.close

Interleaving Reads and Writes

To ensure writes are available to subsequent reads, every read operation implicitly flushes any previous writes.

h = Hammerspace.new("/tmp/hammerspace")
h["foo"] = "bar"

# Implicitly flushes write (builds on-disk hash for fast lookup), then opens
# newly written on-disk hash for reading
h["foo"]  #=> "bar"

h.close

While batch reads or writes are relatively fast, interleaved reads and writes are slow because the hash is rebuilt very often.

# One flush, fast
h = Hammerspace.new("/tmp/hammerspace")
h["a"] = "100"
h["b"] = "200"
h["c"] = "300"
h["a"]  #=> "100"
h["b"]  #=> "200"
h["c"]  #=> "300"
h.close

# Three flushes, slow
h = Hammerspace.new("/tmp/hammerspace")
h["a"] = "100"
h["a"]  #=> "100"
h["b"] = "200"
h["b"]  #=> "200"
h["c"] = "300"
h["c"]  #=> "300"
h.close

To avoid this overhead, and to ensure consistency during iteration, the each method opens its own private reader for the duration of the iteration. This is also true for any method that uses each, including all methods provided by Enumerable.

h = Hammerspace.new("/tmp/hammerspace")
h["a"] = "100"
h["b"] = "200"
h["c"] = "300"

# Flushes the above writes, then opens a private reader for the each call
h.each do |key, value|
  # Writes are done in bulk without flushing in between
  h[key] = value[0]
end

# Flushes the above writes, then opens the reader
h.to_hash  #=> {"a"=>"1", "b"=>"2", "c"=>"3"}

h.close

Unsupported Methods

Besides the incompatibilities with Ruby's Hash discussed above, there are some Hash methods that are not supported.

  • Methods that return a copy of the hash: invert, merge, reject, select
  • rehash is not needed, since hammerspace only supports string keys, and keys are effectively dupd
  • delete does not return the value deleted, and it does not support block usage
  • hash and to_s are not overriden, so the behavior is that of Object#hash and Object#to_s
  • compare_by_identity, compare_by_identity?
  • pretty_print, pretty_print_cycle

More Repositories

1

javascript

JavaScript Style Guide
JavaScript
145,177
star
2

lottie-android

Render After Effects animations natively on Android and iOS, Web, and React Native
Java
35,010
star
3

lottie-web

Render After Effects animations natively on Web, Android and iOS, and React Native. http://airbnb.io/lottie/
JavaScript
30,535
star
4

lottie-ios

An iOS library to natively render After Effects vector animations
Swift
25,760
star
5

visx

🐯 visx | visualization components
TypeScript
19,315
star
6

react-sketchapp

render React components to Sketch βš›οΈπŸ’Ž
TypeScript
14,939
star
7

react-dates

An easily internationalizable, mobile-friendly datepicker library for the web
JavaScript
11,630
star
8

epoxy

Epoxy is an Android library for building complex screens in a RecyclerView
Java
8,517
star
9

css

A mostly reasonable approach to CSS and Sass.
6,937
star
10

mavericks

Mavericks: Android on Autopilot
Kotlin
5,829
star
11

hypernova

A service for server-side rendering your JavaScript views
JavaScript
5,821
star
12

knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professions.
Python
5,478
star
13

ts-migrate

A tool to help migrate JavaScript code quickly and conveniently to TypeScript
TypeScript
5,405
star
14

aerosolve

A machine learning package built for humans.
Scala
4,795
star
15

lottie

Lottie documentation for http://airbnb.io/lottie.
HTML
4,457
star
16

DeepLinkDispatch

A simple, annotation-based library for making deep link handling better on Android
Java
4,380
star
17

ruby

Ruby Style Guide
Ruby
3,711
star
18

polyglot.js

Give your JavaScript the ability to speak many languages.
JavaScript
3,706
star
19

MagazineLayout

A collection view layout capable of laying out views in vertically scrolling grids and lists.
Swift
3,296
star
20

native-navigation

Native navigation library for React Native applications
Java
3,128
star
21

streamalert

StreamAlert is a serverless, realtime data analysis framework which empowers you to ingest, analyze, and alert on data from any environment, using datasources and alerting logic you define.
Python
2,847
star
22

infinity

UITableViews for the web (DEPRECATED)
JavaScript
2,802
star
23

HorizonCalendar

A declarative, performant, iOS calendar UI component that supports use cases ranging from simple date pickers all the way up to fully-featured calendar apps.
Swift
2,772
star
24

airpal

Web UI for PrestoDB.
Java
2,757
star
25

swift

Airbnb's Swift Style Guide
Markdown
2,407
star
26

Showkase

πŸ”¦ Showkase is an annotation-processor based Android library that helps you organize, discover, search and visualize Jetpack Compose UI elements
Kotlin
2,093
star
27

synapse

A transparent service discovery framework for connecting an SOA
Ruby
2,072
star
28

paris

Define and apply styles to Android views programmatically
Kotlin
1,907
star
29

AirMapView

A view abstraction to provide a map user interface with various underlying map providers
Java
1,870
star
30

react-with-styles

Use CSS-in-JavaScript with themes for React without being tightly coupled to one implementation
JavaScript
1,704
star
31

rheostat

Rheostat is a www, mobile, and accessible slider component built with React
JavaScript
1,692
star
32

binaryalert

BinaryAlert: Serverless, Real-time & Retroactive Malware Detection.
Python
1,405
star
33

epoxy-ios

Epoxy is a suite of declarative UI APIs for building UIKit applications in Swift
Swift
1,201
star
34

nerve

A service registration daemon that performs health checks; companion to airbnb/synapse
Ruby
942
star
35

okreplay

πŸ“Ό Record and replay OkHttp network interaction in your tests.
Groovy
782
star
36

chronon

Chronon is a data platform for serving for AI/ML applications.
Scala
731
star
37

RxGroups

Easily group RxJava Observables together and tie them to your Android Activity lifecycle
Java
694
star
38

react-outside-click-handler

OutsideClickHandler component for React.
JavaScript
612
star
39

ResilientDecoding

This package makes your Decodable types resilient to decoding errors and allows you to inspect those errors.
Swift
595
star
40

babel-plugin-dynamic-import-node

Babel plugin to transpile import() to a deferred require(), for node
JavaScript
575
star
41

kafkat

KafkaT-ool
Ruby
503
star
42

babel-plugin-dynamic-import-webpack

Babel plugin to transpile import() to require.ensure, for Webpack
JavaScript
499
star
43

babel-plugin-inline-react-svg

A babel plugin that optimizes and inlines SVGs for your React Components.
JavaScript
473
star
44

BuckSample

An example app showing how Buck can be used to build a simple iOS app.
Objective-C
461
star
45

lunar

πŸŒ— React toolkit and design language for Airbnb open source and internal projects.
TypeScript
461
star
46

SpinalTap

Change Data Capture (CDC) service
Java
430
star
47

artificial-adversary

πŸ—£οΈ Tool to generate adversarial text examples and test machine learning models against them
Python
394
star
48

dynein

Airbnb's Open-source Distributed Delayed Job QueueingΒ System
Java
383
star
49

trebuchet

Trebuchet launches features at people
Ruby
312
star
50

reair

ReAir is a collection of easy-to-use tools for replicating tables and partitions between Hive data warehouses.
Java
279
star
51

zonify

a command line tool for generating DNS records from EC2 instances
Ruby
270
star
52

ottr

Serverless Public Key Infrastructure Framework
Python
270
star
53

omniduct

A toolkit providing a uniform interface for connecting to and extracting data from a wide variety of (potentially remote) data stores (including HDFS, Hive, Presto, MySQL, etc).
Python
254
star
54

hypernova-react

React bindings for Hypernova.
JavaScript
248
star
55

smartstack-cookbook

The chef recipes for running and testing Airbnb's SmartStack
Ruby
245
star
56

interferon

Signaling you about infrastructure or application issues
Ruby
239
star
57

babel-preset-airbnb

A babel preset for transforming your JavaScript for Airbnb
JavaScript
227
star
58

backpack

A pack of UI components for Backbone projects. Grab your backpack and enjoy the Views.
HTML
223
star
59

goji-js

React ❀️ Mini Program
TypeScript
218
star
60

react-with-direction

Components to provide and consume RTL or LTR direction in React
JavaScript
191
star
61

stemcell

Airbnb's EC2 instance creation and bootstrapping tool
Ruby
185
star
62

hypernova-ruby

Ruby client for Hypernova.
Ruby
141
star
63

kafka-statsd-metrics2

Send Kafka Metrics to StatsD.
Java
135
star
64

optica

A tool for keeping track of nodes in your infrastructure
Ruby
133
star
65

sparsam

Fast Thrift Bindings for Ruby
C++
124
star
66

js-shims

JS language shims used by Airbnb.
JavaScript
123
star
67

lottie-spm

Swift Package Manager support for Lottie, an iOS library to natively render After Effects vector animations
Ruby
122
star
68

bossbat

Stupid simple distributed job scheduling in node, backed by redis.
JavaScript
118
star
69

nimbus

Centralized CLI for JavaScript and TypeScript developer tools.
TypeScript
118
star
70

browser-shims

Browser and JS shims used by Airbnb.
JavaScript
117
star
71

twitter-commons-sample

A sample REST service based on Twitter Commons
Java
103
star
72

is-touch-device

Is the current JS environment a touch device?
JavaScript
90
star
73

rudolph

A serverless sync server for Santa, built on AWS
Go
79
star
74

hypernova-node

node.js client for Hypernova
JavaScript
73
star
75

plog

Fire-and-forget UDP logging service with custom Netty pipelines & extensive monitoring
Java
72
star
76

react-create-hoc

Create a React Higher-Order Component (HOC) following best practices.
JavaScript
67
star
77

vulnture

Python
67
star
78

cloud-maker

Building castles in the sky
Ruby
67
star
79

deline

An ES6 template tag that strips unwanted newlines from strings.
JavaScript
64
star
80

react-with-styles-interface-react-native

Interface to use react-with-styles with React Native
JavaScript
63
star
81

sputnik

Scala
63
star
82

mocha-wrap

Fluent pluggable interface for easily wrapping `describe` and `it` blocks in Mocha tests.
JavaScript
54
star
83

react-with-styles-interface-aphrodite

Interface to use react-with-styles with Aphrodite
JavaScript
54
star
84

eslint-plugin-react-with-styles

ESLint plugin for react-with-styles
JavaScript
49
star
85

sssp

Software distribution by way of S3 signed URLs
Haskell
47
star
86

alerts

An example alerts repo, for use with airbnb/interferon.
Ruby
46
star
87

apple-tv-auth

Example application to demonstrate how to build Apple TV style authentication.
Ruby
44
star
88

airbnb-spark-thrift

A library for loadling Thrift data into Spark SQL
Scala
42
star
89

jest-wrap

Fluent pluggable interface for easily wrapping `describe` and `it` blocks in Jest tests.
JavaScript
40
star
90

billow

Query AWS data without API credentials. Don't wait for a response.
Java
38
star
91

gosal

A Sal client written in Go
Go
35
star
92

backbone.baseview

DEPRECATED: A simple base view class for Backbone.View
JavaScript
34
star
93

anotherlens

News Deeply X Airbnb.Design - Another Lens
HTML
33
star
94

eslint-plugin-miniprogram

TypeScript
33
star
95

react-component-variations

JavaScript
33
star
96

react-with-styles-interface-css

πŸ“ƒ CSS interface for react-with-styles
JavaScript
32
star
97

transformpy

transformpy is a Python 2/3 module for doing transforms on "streams" of data
Python
29
star
98

appear

reveal terminal programs in the gui
Ruby
29
star
99

puppet-munki

Puppet
28
star
100

pool-hall

JavaScript
26
star