• Stars
    star
    364
  • Rank 112,804 (Top 3 %)
  • Language
    Ruby
  • License
    MIT License
  • Created over 10 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Off-heap large object storage

Hammerspace

Hash-like interface to persistent, concurrent, off-heap storage

What is Hammerspace?

Hammerspace ... is a fan-envisioned extradimensional, instantly accessible storage area in fiction, which is used to explain how animated, comic, and game characters can produce objects out of thin air.

This gem provides persistent, concurrently-accessible off-heap storage of strings with a familiar hash-like interface. It is optimized for bulk writes and random reads.

Motivation

Applications often use data that never changes or changes very infrequently. In many cases, some latency is acceptable when accessing this data. For example, a user's profile may be loaded from a web service, a database, or an external shared cache like memcache. In other cases, latency is much more sensitive. For example, translations may be used many times and incurring even a ~2ms delay to access them from an external cache would be prohibitively slow.

To work around the performance issue, this type of data is often loaded into the application at startup. Unfortunately, this means the data is stored on the heap, where the garbage collector must scan over the objects on every run (at least in the case of Ruby MRI). Further, for application servers that utilize multiple processes, each process has its own copy of the data which is an inefficient use of memory.

Hammerspace solves these problems by moving the data off the heap onto disk. Leveraging libraries and data structures optimized for bulk writes and random reads allows an acceptable level of performance to be maintained. Because the data is persistent, it does not need to be reloaded from an external cache or service on application startup unless the data has changed.

Unfortunately, these low-level libraries don't always support concurrent writers. Hammerspace adds concurrency control to allow multiple processes to update and read from a single shared copy of the data safely. Finally, hammerspace's interface is designed to mimic Ruby's Hash to make integrating with existing applications simple and straightforward. Different low-level libraries can be used by implementing a new backend that uses the library. (Currently, only Sparkey is supported.) Backends only need to implement a small set of methods ([], []=, close, delete, each, uid), but can override the default implementation of other methods if the underlying library supports more efficient implementations.

Installation

Requirements

  • Gnista, Ruby bindings for Sparkey
  • Sparkey, constant key/value storage library
  • Snappy, compression/decompression library (unused, but required to compile Sparkey)
  • A filesystem that supports flock(2) and unlinking files/directories with outstanding file descriptors (ext3/4 will do just fine)

Installation

Add the following line to your Gemfile:

gem 'hammerspace'

Then run:

bundle

Vagrant

To make development easier, the source tree contains a Vagrantfile and a small cookbook to install all the prerequisites. The vagrant environment also serves as a consistent environment to run the test suite.

To use it, make sure you have vagrant installed, then:

vagrant up
vagrant ssh
bundle exec rspec

Usage

Getting Started

For the most part, hammerspace acts like a Ruby hash. But since it's a hash that persists on disk, you have to tell it where to store the files. The enclosing directory and any parent directories are created if they don't already exist.

h = Hammerspace.new("/tmp/hammerspace")

h["cartoons"] = "mallets"
h["games"]    = "inventory"
h["rubyists"] = "data"

h.size          #=> 3
h["cartoons"]   #=> "mallets"

h.map { |k,v| "#{k.capitalize} use hammerspace to store #{v}." }

h.close

You should call close on the hammerspace object when you're done with it. This flushes any pending writes to disk and closes any open file handles.

Options

The constructor takes a hash of options as an optional second argument. Currently the only option supported is :backend which specifies which backend class to use. Since there is only one backend supported at this time, there is currently no reason to pass this argument.

h = Hammerspace.new("/tmp/hammerspace", {:backend => Hammerspace::Backend::Sparkey})

Default Values

The constructor takes a default value as an optional third argument. This functions the same as Ruby's Hash, except with Hash it is the first argument.

h = Hammerspace.new("/tmp/hammerspace", {}, "default")
h["foo"] = "bar"
h["foo"]  #=> "bar"
h["new"]  #=> "default"
h.close

The constructor also takes a block to specify a default Proc, which works the same way as Ruby's Hash. As with Hash, it is the block's responsibility to store the value in the hash if required.

h = Hammerspace.new("/tmp/hammerspace") { |hash, key| hash[key] = "#{key} (default)" }
h["new"]  #=> "new (default)"
h.has_key?("new")  #=> true
h.close

Supported Data Types

Only string keys and values are supported.

h = Hammerspace.new("/tmp/hammerspace")
h[1] = "foo"     #=> TypeError
h["fixnum"] = 8  #=> TypeError
h["nil"] = nil   #=> TypeError
h.close

Ruby hashes store references to objects, but hammerspace stores raw bytes. A new Ruby String object is created from those bytes when a key is accessed.

value = "bar"

hash = {"foo" => value}
hash["foo"] == value       #=> true
hash["foo"].equal?(value)  #=> true

hammerspace = Hammerspace.new("/tmp/hammerspace")
hammerspace["foo"] = value
hammerspace["foo"] == value       #=> true
hammerspace["foo"].equal?(value)  #=> false
hammerspace.close

Since every access results in a new String object, mutating values doesn't work unless you create an explicit reference to the string.

h = Hammerspace.new("/tmp/hammerspace")
h["foo"] = "bar"

# This doesn't work like Ruby's Hash because every access creates a new object
h["foo"].upcase!
h["foo"]  #=> "bar"

# An explicit reference is required
value = h["foo"]
value.upcase!
value  #=> "BAR"

# Another access, another a new object
h["foo"]  #=> "bar"

h.close

This also imples that strings "lose" their encoding when retrieved from hammerspace.

value = "bar"
value.encoding  #=> #<Encoding:UTF-8>

h = Hammerspace.new("/tmp/hammerspace")
h["foo"] = value
h["foo"].encoding  #=> #<Encoding:ASCII-8BIT>
h.close

If you require strings in UTF-8, make sure strings are encoded as UTF-8 when storing the key, then force the encoding to be UTF-8 when accessing the key.

h[key] = value.encode('utf-8')
value = h[key].force_encoding('utf-8')

Persistence

Hammerspace objects are backed by files on disk, so even a new object may already have data in it.

h = Hammerspace.new("/tmp/hammerspace")
h["foo"] = "bar"
h.close

h = Hammerspace.new("/tmp/hammerspace")
h["foo"]  #=> "bar"
h.close

Calling clear deletes the data files on disk. The parent directory is not removed, nor is it guaranteed to be empty. Some files containing metadata may still be present, e.g., lock files.

Concurrency

Multiple concurrent readers are supported. Readers are isolated from writers, i.e., reads are consistent to the time that the reader was opened. Note that the reader opens its files lazily on first read, not when the hammerspace object is created.

h = Hammerspace.new("/tmp/hammerspace")
h["foo"] = "bar"
h.close

reader1 = Hammerspace.new("/tmp/hammerspace")
reader1["foo"]  #=> "bar"

writer = Hammerspace.new("/tmp/hammerspace")
writer["foo"] = "updated"
writer.close

# Still "bar" because reader1 opened its files before the write
reader1["foo"]  #=> "bar"

# Updated key is visible because reader2 opened its files after the write
reader2 = Hammerspace.new("/tmp/hammerspace")
reader2["foo"]  #=> "updated"
reader2.close

reader1.close

A new hammerspace object does not necessarily need to be created. Calling close will close the files, then the reader will open them lazily again on the next read.

h = Hammerspace.new("/tmp/hammerspace")
h["foo"] = "bar"
h.close

reader = Hammerspace.new("/tmp/hammerspace")
reader["foo"]  #=> "bar"

writer = Hammerspace.new("/tmp/hammerspace")
writer["foo"] = "updated"
writer.close

reader["foo"]  #=> "bar"

# Close files now, re-open lazily on next read
reader.close

reader["foo"]  #=> "updated"
reader.close

If no hammerspace files exist on disk yet, the reader will fail to open the files. It will try again on next read.

reader = Hammerspace.new("/tmp/hammerspace")
reader.has_key?("foo")  #=> false

writer = Hammerspace.new("/tmp/hammerspace")
writer["foo"] = "bar"
writer.close

# Files are opened here
reader.has_key?("foo")  #=> true
reader.close

You can call uid to get a unique id that identifies the version of the files being read. uid will be nil if no hammerspace files exist on disk yet.

reader = Hammerspace.new("/tmp/hammerspace")
reader.uid  #=> nil

writer = Hammerspace.new("/tmp/hammerspace")
writer["foo"] = "bar"
writer.close

reader.close
reader.uid  #=> "24913_53943df0-e784-4873-ade6-d1cccc848a70"

# The uid changes on every write, even if the content is the same, i.e., it's
# an identifier, not a checksum
writer["foo"] = "bar"
writer.close

reader.close
reader.uid  #=> "24913_9371024e-8c80-477b-8558-7c292bfcbfc1"

reader.close

Multiple concurrent writers are also supported. When a writer flushes its changes it will overwrite any previous versions of the hammerspace.

In practice, this works because hammerspace is designed to hold data that is bulk-loaded from some authoritative external source. Rather than block writers to enforce consistency, it is simpler to allow writers to concurrently attempt to load the data. The last writer to finish loading the data and flush its writes will have its data persisted.

writer1 = Hammerspace.new("/tmp/hammerspace")
writer1["color"] = "red"

# Can start while writer1 is still open
writer2 = Hammerspace.new("/tmp/hammerspace")
writer2["color"] = "blue"
writer2["fruit"] = "banana"
writer2.close

# Reads at this point see writer2's data
reader1 = Hammerspace.new("/tmp/hammerspace")
reader1["color"]  #=> "blue"
reader1["fruit"]  #=> "banana"
reader1.close

# Replaces writer2's data
writer1.close

# Reads at this point see writer1's data; note that "fruit" key is absent
reader2 = Hammerspace.new("/tmp/hammerspace")
reader2["color"]  #=> "red"
reader2["fruit"]  #=> nil
reader2.close

Flushing Writes

Flushing a write incurs some overhead to build the on-disk hash structures that allows fast lookup later. To avoid the overhead of rebuilding the hash after every write, most write operations do not implicitly flush. Writes can be flushed explicitly by calling close.

Delaying flushing of writes has the side effect of allowing "transactions" -- all unflushed writes are private to the hammerspace object doing the writing.

One exception is the clear method which deletes the files on disk. If a reader attempts to open the files immediately after they are deleted, it will perceive the hammerspace to be empty.

h = Hammerspace.new("/tmp/hammerspace")
h["yesterday"] = "foo"
h["today"]     = "bar"
h.close

reader1 = Hammerspace.new("/tmp/hammerspace")
reader1.keys  #=> ["yesterday", "today"]
reader1.close

# Writer wants to remove everything except "today"
writer = Hammerspace.new("/tmp/hammerspace")
writer.clear

# Effect of clear is immediately visible to readers
reader2 = Hammerspace.new("/tmp/hammerspace")
reader2.keys  #=> []
reader2.close

writer["today"] = "bar"
writer.close

reader3 = Hammerspace.new("/tmp/hammerspace")
reader3.keys  #=> ["today"]
reader3.close

If you want to replace the existing data with new data without flushing in between (i.e., in a "transaction"), use replace instead.

h = Hammerspace.new("/tmp/hammerspace")
h["yesterday"] = "foo"
h["today"]     = "bar"
h.close

reader1 = Hammerspace.new("/tmp/hammerspace")
reader1.keys  #=> ["yesterday", "today"]
reader1.close

# Writer wants to remove everything except "today"
writer = Hammerspace.new("/tmp/hammerspace")
writer.replace({"today" => "bar"})

# Old keys still present because writer has not flushed yet
reader2 = Hammerspace.new("/tmp/hammerspace")
reader2.keys  #=> ["yesterday", "today"]
reader2.close

writer.close

reader3 = Hammerspace.new("/tmp/hammerspace")
reader3.keys  #=> ["today"]
reader3.close

Interleaving Reads and Writes

To ensure writes are available to subsequent reads, every read operation implicitly flushes any previous writes.

h = Hammerspace.new("/tmp/hammerspace")
h["foo"] = "bar"

# Implicitly flushes write (builds on-disk hash for fast lookup), then opens
# newly written on-disk hash for reading
h["foo"]  #=> "bar"

h.close

While batch reads or writes are relatively fast, interleaved reads and writes are slow because the hash is rebuilt very often.

# One flush, fast
h = Hammerspace.new("/tmp/hammerspace")
h["a"] = "100"
h["b"] = "200"
h["c"] = "300"
h["a"]  #=> "100"
h["b"]  #=> "200"
h["c"]  #=> "300"
h.close

# Three flushes, slow
h = Hammerspace.new("/tmp/hammerspace")
h["a"] = "100"
h["a"]  #=> "100"
h["b"] = "200"
h["b"]  #=> "200"
h["c"] = "300"
h["c"]  #=> "300"
h.close

To avoid this overhead, and to ensure consistency during iteration, the each method opens its own private reader for the duration of the iteration. This is also true for any method that uses each, including all methods provided by Enumerable.

h = Hammerspace.new("/tmp/hammerspace")
h["a"] = "100"
h["b"] = "200"
h["c"] = "300"

# Flushes the above writes, then opens a private reader for the each call
h.each do |key, value|
  # Writes are done in bulk without flushing in between
  h[key] = value[0]
end

# Flushes the above writes, then opens the reader
h.to_hash  #=> {"a"=>"1", "b"=>"2", "c"=>"3"}

h.close

Unsupported Methods

Besides the incompatibilities with Ruby's Hash discussed above, there are some Hash methods that are not supported.

  • Methods that return a copy of the hash: invert, merge, reject, select
  • rehash is not needed, since hammerspace only supports string keys, and keys are effectively dupd
  • delete does not return the value deleted, and it does not support block usage
  • hash and to_s are not overriden, so the behavior is that of Object#hash and Object#to_s
  • compare_by_identity, compare_by_identity?
  • pretty_print, pretty_print_cycle

More Repositories

1

javascript

JavaScript Style Guide
JavaScript
141,845
star
2

lottie-android

Render After Effects animations natively on Android and iOS, Web, and React Native
Java
34,600
star
3

lottie-web

Render After Effects animations natively on Web, Android and iOS, and React Native. http://airbnb.io/lottie/
JavaScript
29,564
star
4

lottie-ios

An iOS library to natively render After Effects vector animations
Swift
24,897
star
5

visx

🐯 visx | visualization components
TypeScript
18,609
star
6

react-sketchapp

render React components to Sketch ⚛️💎
TypeScript
14,951
star
7

react-dates

An easily internationalizable, mobile-friendly datepicker library for the web
JavaScript
11,630
star
8

epoxy

Epoxy is an Android library for building complex screens in a RecyclerView
Java
8,426
star
9

css

A mostly reasonable approach to CSS and Sass.
6,869
star
10

hypernova

A service for server-side rendering your JavaScript views
JavaScript
5,824
star
11

mavericks

Mavericks: Android on Autopilot
Kotlin
5,741
star
12

knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professions.
Python
5,432
star
13

ts-migrate

A tool to help migrate JavaScript code quickly and conveniently to TypeScript
TypeScript
5,307
star
14

aerosolve

A machine learning package built for humans.
Scala
4,790
star
15

DeepLinkDispatch

A simple, annotation-based library for making deep link handling better on Android
Java
4,356
star
16

lottie

Lottie documentation for http://airbnb.io/lottie.
HTML
4,289
star
17

ruby

Ruby Style Guide
Ruby
3,711
star
18

polyglot.js

Give your JavaScript the ability to speak many languages.
JavaScript
3,644
star
19

MagazineLayout

A collection view layout capable of laying out views in vertically scrolling grids and lists.
Swift
3,232
star
20

native-navigation

Native navigation library for React Native applications
Java
3,127
star
21

streamalert

StreamAlert is a serverless, realtime data analysis framework which empowers you to ingest, analyze, and alert on data from any environment, using datasources and alerting logic you define.
Python
2,825
star
22

infinity

UITableViews for the web (DEPRECATED)
JavaScript
2,809
star
23

airpal

Web UI for PrestoDB.
Java
2,760
star
24

HorizonCalendar

A declarative, performant, iOS calendar UI component that supports use cases ranging from simple date pickers all the way up to fully-featured calendar apps.
Swift
2,656
star
25

swift

Airbnb's Swift Style Guide
Markdown
2,239
star
26

synapse

A transparent service discovery framework for connecting an SOA
Ruby
2,067
star
27

Showkase

🔦 Showkase is an annotation-processor based Android library that helps you organize, discover, search and visualize Jetpack Compose UI elements
Kotlin
2,018
star
28

paris

Define and apply styles to Android views programmatically
Kotlin
1,894
star
29

AirMapView

A view abstraction to provide a map user interface with various underlying map providers
Java
1,861
star
30

react-with-styles

Use CSS-in-JavaScript with themes for React without being tightly coupled to one implementation
JavaScript
1,697
star
31

rheostat

Rheostat is a www, mobile, and accessible slider component built with React
JavaScript
1,692
star
32

binaryalert

BinaryAlert: Serverless, Real-time & Retroactive Malware Detection.
Python
1,382
star
33

epoxy-ios

Epoxy is a suite of declarative UI APIs for building UIKit applications in Swift
Swift
1,142
star
34

nerve

A service registration daemon that performs health checks; companion to airbnb/synapse
Ruby
942
star
35

okreplay

📼 Record and replay OkHttp network interaction in your tests.
Groovy
775
star
36

RxGroups

Easily group RxJava Observables together and tie them to your Android Activity lifecycle
Java
693
star
37

prop-types

Custom React PropType validators that we use at Airbnb.
JavaScript
672
star
38

react-outside-click-handler

OutsideClickHandler component for React.
JavaScript
603
star
39

ResilientDecoding

This package makes your Decodable types resilient to decoding errors and allows you to inspect those errors.
Swift
580
star
40

babel-plugin-dynamic-import-node

Babel plugin to transpile import() to a deferred require(), for node
JavaScript
575
star
41

kafkat

KafkaT-ool
Ruby
504
star
42

babel-plugin-dynamic-import-webpack

Babel plugin to transpile import() to require.ensure, for Webpack
JavaScript
500
star
43

chronon

Chronon is a data platform for serving for AI/ML applications.
Scala
479
star
44

babel-plugin-inline-react-svg

A babel plugin that optimizes and inlines SVGs for your React Components.
JavaScript
474
star
45

lunar

🌗 React toolkit and design language for Airbnb open source and internal projects.
TypeScript
461
star
46

BuckSample

An example app showing how Buck can be used to build a simple iOS app.
Objective-C
459
star
47

SpinalTap

Change Data Capture (CDC) service
Java
428
star
48

artificial-adversary

🗣️ Tool to generate adversarial text examples and test machine learning models against them
Python
390
star
49

dynein

Airbnb's Open-source Distributed Delayed Job Queueing System
Java
383
star
50

trebuchet

Trebuchet launches features at people
Ruby
313
star
51

reair

ReAir is a collection of easy-to-use tools for replicating tables and partitions between Hive data warehouses.
Java
278
star
52

zonify

a command line tool for generating DNS records from EC2 instances
Ruby
270
star
53

ottr

Serverless Public Key Infrastructure Framework
Python
266
star
54

omniduct

A toolkit providing a uniform interface for connecting to and extracting data from a wide variety of (potentially remote) data stores (including HDFS, Hive, Presto, MySQL, etc).
Python
249
star
55

hypernova-react

React bindings for Hypernova.
JavaScript
248
star
56

smartstack-cookbook

The chef recipes for running and testing Airbnb's SmartStack
Ruby
244
star
57

interferon

Signaling you about infrastructure or application issues
Ruby
239
star
58

prop-types-exact

For use with React PropTypes. Will error on any prop not explicitly specified.
JavaScript
237
star
59

backpack

A pack of UI components for Backbone projects. Grab your backpack and enjoy the Views.
HTML
223
star
60

babel-preset-airbnb

A babel preset for transforming your JavaScript for Airbnb
JavaScript
222
star
61

goji-js

React ❤️ Mini Program
TypeScript
213
star
62

react-with-direction

Components to provide and consume RTL or LTR direction in React
JavaScript
192
star
63

stemcell

Airbnb's EC2 instance creation and bootstrapping tool
Ruby
185
star
64

hypernova-ruby

Ruby client for Hypernova.
Ruby
141
star
65

kafka-statsd-metrics2

Send Kafka Metrics to StatsD.
Java
135
star
66

optica

A tool for keeping track of nodes in your infrastructure
Ruby
134
star
67

sparsam

Fast Thrift Bindings for Ruby
C++
125
star
68

js-shims

JS language shims used by Airbnb.
JavaScript
123
star
69

browser-shims

Browser and JS shims used by Airbnb.
JavaScript
118
star
70

bossbat

Stupid simple distributed job scheduling in node, backed by redis.
JavaScript
118
star
71

nimbus

Centralized CLI for JavaScript and TypeScript developer tools.
TypeScript
118
star
72

lottie-spm

Swift Package Manager support for Lottie, an iOS library to natively render After Effects vector animations
Ruby
106
star
73

twitter-commons-sample

A sample REST service based on Twitter Commons
Java
103
star
74

is-touch-device

Is the current JS environment a touch device?
JavaScript
90
star
75

rudolph

A serverless sync server for Santa, built on AWS
Go
73
star
76

hypernova-node

node.js client for Hypernova
JavaScript
73
star
77

plog

Fire-and-forget UDP logging service with custom Netty pipelines & extensive monitoring
Java
72
star
78

cloud-maker

Building castles in the sky
Ruby
67
star
79

react-create-hoc

Create a React Higher-Order Component (HOC) following best practices.
JavaScript
66
star
80

vulnture

Python
65
star
81

deline

An ES6 template tag that strips unwanted newlines from strings.
JavaScript
63
star
82

react-with-styles-interface-react-native

Interface to use react-with-styles with React Native
JavaScript
63
star
83

sputnik

Scala
61
star
84

mocha-wrap

Fluent pluggable interface for easily wrapping `describe` and `it` blocks in Mocha tests.
JavaScript
54
star
85

react-with-styles-interface-aphrodite

Interface to use react-with-styles with Aphrodite
JavaScript
54
star
86

eslint-plugin-react-with-styles

ESLint plugin for react-with-styles
JavaScript
49
star
87

sssp

Software distribution by way of S3 signed URLs
Haskell
47
star
88

alerts

An example alerts repo, for use with airbnb/interferon.
Ruby
46
star
89

apple-tv-auth

Example application to demonstrate how to build Apple TV style authentication.
Ruby
44
star
90

airbnb-spark-thrift

A library for loadling Thrift data into Spark SQL
Scala
43
star
91

jest-wrap

Fluent pluggable interface for easily wrapping `describe` and `it` blocks in Jest tests.
JavaScript
39
star
92

billow

Query AWS data without API credentials. Don't wait for a response.
Java
38
star
93

gosal

A Sal client written in Go
Go
36
star
94

backbone.baseview

DEPRECATED: A simple base view class for Backbone.View
JavaScript
34
star
95

anotherlens

News Deeply X Airbnb.Design - Another Lens
HTML
33
star
96

eslint-plugin-miniprogram

TypeScript
33
star
97

react-component-variations

JavaScript
33
star
98

react-with-styles-interface-css

📃 CSS interface for react-with-styles
JavaScript
31
star
99

appear

reveal terminal programs in the gui
Ruby
29
star
100

puppet-munki

Puppet
29
star