• This repository has been archived on 29/Jul/2020
  • Stars
    star
    245
  • Rank 165,304 (Top 4 %)
  • Language
    Ruby
  • License
    MIT License
  • Created about 11 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The chef recipes for running and testing Airbnb's SmartStack

Description

This cookbook configures Airbnb's SmartStack. SmartStack is our service registration, discovery and monitoring platform. It allows you to quickly and reliably connect to other services that you need, and for others to connect to your service.

Getting started with this cookbook

This cookbook contains everything you need to get SmartStack up and running, both in development and in production.

Production Use

Set up zookeeper

If you are ready to install SmartStack on your machines, you will first need to do a bit of prep. First, you will need Zookeeper running in your infrastructure. We recommend using an existing cookbook. For now, you can just set up a single machine, but for production use we recommend an ensemble of at least 3 nodes managed with exhibitor.

Configure chef

In your role, environment file, or infrastructure repo:

  • set node.zookeeper.smartstack_cluster to a list of the zookeeper machines you'll be using for smartstack.
  • create a services hash in smartstack/attributes/services.rb and ports.rb describing how you want your services configured. more information is below
  • enable the services you want:
    • where the service is running, add it to node.nerve.enabled_services
    • where it is being consumed, add it to node.synapse.enabled_services

That's all! See the more extensive documentation below if you need additional help.

Dev and Testing

This cookbook is configured to be easy to run in dev using vagrant. To get started:

  • Install Virtualbox; it's free!
  • Install Vagrant; this cookbook has been tested with v1.3.5
  • Install the berkshelf plugin for vagrant: vagrant plugin install vagrant-berkshelf
  • Bring up SmartStack in a VM: vagrant up

This will bring up an Ubuntu VM configured with Zookeeper, SmartStack, and a few sample services. The SmartStack integration tests will automatically run inside the Vagrant VM.

How SmartStack Works

Synapse

Synapse is a service discovery platform. It lets you reliably connect to an available worker for a given service. You don't have to worry about discovery within your application, and you can easily do the same thing in dev as in prod.

How to use synapse

Using synapse to talk to a service is easy. Just specify that you would like to do so in your role file. You'll need to add a 'synapse' => {'enabled_services' => ['desired_service']} section to your default_attributes section:

name 'myrole'
description 'my role file'

default_attributes({
  'synapse' => { 'enabled_services' => [ 'service1', 'service2' ] }
})

run_list(
  'recipe[smartstack]',
  'recipe[myrole]'
)

Once you've done this and reconverged your boxes, the service will be available to you on localhost at its synapse port. If you are writing out a config file in chef and need to specify the port to use, just use node.smartstack.services.desired_service.local_port in your config. You can manually look up your synapse port in attributes/ports.rb in this cookbook.

How synapse works

For every enabled service, synapse looks up a list of available servers which run the service in Zookeeper. It then configures a local haproxy to forward requests for localhost:synapse_port to one of those backends (by default, in a round-robin fashion). Whenever the list of servers for the service changes in zookeeper, synapse reconfigures haproxy to reflect the latest information.

If synapse is not running, haproxy is still running, containing the latest set of servers. So, even with synapse or zookeeper broken, the list of servers remains reasonably current unless there's massive change.

How to troubleshoot synapse

The immediate course of action is to visit the haproxy stats page. This is accessible at your.box:3212 -- just hit it in your web browser. The stats page will show you all of your enabled services and the backends for those services. You'll be able to see many per-service and per-backend stats, including the current status and insight into processed requests and how they are doing.

You can restart synapse via the usual way with runit: sv restart synapse. You can also safely reload haproxy if you suspect issues there -- existing connections will be unaffected.

Nerve

Nerve is the registration component for synapse. It takes care of creating entries for your services in Zookeeper. Your service will be published in zookeeper only when it passes the configured health checks. When your service stops passing health checks, it will be removed, and placed in maintenance mode in all of its synapse consumers.

Using Nerve

Using nerve is as simple as using synapse. You just add a 'nerve' => {'enabled_services' => ['your_service']} section to your default_attributes in your role file:

name 'myservice'
description 'sets up myservice'

default_attributes({
  'nerve' => { 'enabled_services' => [ 'myservice' ] }
})

run_list(
  'recipe[smartstack]',
  'recipe[myservice]'
)

However, you would normally do this if you are writing a role file for your service. This probably means that you wrote the service as well. In this case, you'll need to write the nerve/synapse configuration for the service. You'll also want to make sure that your service has the correct endpoints for health and connectivity checks.

Once nerve is configured to check your service on your boxes, it will start making health checks. You can see the health checks being made in nerve's log, in /etc/service/nerve/log.

Configuring Smartstack

Smartstack configuration lives in two files in this cookbook. The first file is attributes/ports.rb. This just contains a port reservation for your service.

The second, more important file, attributes/services.rb. Let's take a look at an example:

  'ssspy' => {
    'synapse' => {
      'server_options' => 'check inter 30s downinter 2s fastinter 2s rise 3 fall 1',
      'discovery' => { 'method' => 'zookeeper', },
      'listen' => [
        'mode http',
        'option httpchk GET /ping',
      ],
    },
    'nerve' => {
      'port' => 3260,
      'check_interval' => 2,
      'checks' => [
        { 'type' => 'http', 'uri' => '/health', 'timeout' => 0.5, 'rise' => 2, 'fall' => 1 },
      ]
    },
  },

You can see, there are several sections here. Let's start with the nerve config:

    'nerve' => {
      'port' => 3260,
      'check_interval' => 2,
      'checks' => [
        { 'type' => 'http', 'uri' => '/health', 'timeout' => 0.5, 'rise' => 2, 'fall' => 1 },
      ]
    },

Nerve here is configured to make its health checks on port 3260. This means that ssspy is properly running on its own synapse port locally. The checks happen every 2 seconds, and there's only one check -- an http check to the /health endpoint.

This is the most usual configuration. However, sometimes you might see multiple checks defined per service. For instance, here is the config for flog_thrift:

    'nerve' => {
      'port' => 4567,
      'check_interval' => 1,
      'checks' => [
        { 'type' => 'tcp', 'timeout' => 1, 'rise' => 5, 'fall' => 2 },
        { 'type' => 'http', 'port' => 8422, 'uri' => '/health', 'timeout' => 1, 'rise' => 5, 'fall' => 2 },
      ]
    },

For flog_thift to be up, it has to both be listening on its thrift port via TCP and also pass its http health check.

Lets look at ssspy's synapse config:

    'synapse' => {
      'server_options' => 'check inter 30s downinter 2s fastinter 2s rise 3 fall 1',
      'discovery' => {
        'method' => 'zookeeper',
        'hosts' => []
      },
      'listen' => [
        'mode http',
        'option httpchk GET /ping',
      ],
    },

The server_options directive tells haproxy to run checks on each backend with proper check intervals. You can read more about the haproxy check options. The discovery section tells us how synapse will find ssspy; in this case, via zookeeper.

Finally, the listen section contains additional haproxy configuration. It specifies how haproxy will conduct its own health checks. SSSPy is following convention by properly implemented a /ping endpoint for connectivity checks.

Health Checks

Nobody wants your service to recieve traffic when it's not actually functional. Your consumers do not want that, because they want their service calls to work. And you don't want that, because you also want your service to work.

You can make sure that a broken service instance won't recieve traffic by making your /health checks fail when your service is broken. Simply return a non-200 status code. Here is an example from optica, a simple Sinatra service:

  get '/health' do
    if settings.store.healthy?
      content_type 'text/plain', :charset => 'utf-8'
      return "OK"
    else
      halt(503)
    end
  end

The healthy? function does real work to make sure the service actually functions. Only nerve will ever hit that endpoint, so you can and should feel free to make it take some time.

Connectivity Checks

If a particular backend for your service passes its health checks, it might still be unavailable to consumers. One example is a network partition -- synapse has discovered your service, but can't actually reach it. To prevent such problems, we configure the haproxy on the consumer end to do connectivity checks when possible.

We do this by utilizing haproxy's built-in checking mechanism. To destinguish between health checks made by nerve and connectivity checks made by haproxy on the synapse end, we define a /ping endpoint. This endpoint should always return 200 with a conventional text body of PONG.

Because the number of machines making connectivity checks may be large, you should strive to make the /ping check as lightweight as possible.

Zookeeper and Smartstack

Smartstack cannot function without zookeeper. This shared file-like store provides the correct semantics for ensuring that service information is correct and distributed across our infrastructure. We use zookeeper because it provides the ephemeral nodes nerve uses to register services. Its distributed nature prevents it from becoming a scaling choke point or a single points of failure in our infrastructure.

Debugging Smartstack

You would like to use your service from another service, but something is not working. These instructions will tell you how to debug the situation.

First, on a consumer box (a box which has the_service in its 'synapse' => { 'enabled_services') go to port 3212 in your browser. You'll see the haproxy stats page. There should be a section for the_service containing the boxes providing the_service

If the section exists and contains some boxes, but they are all in red, those boxes are failing connectivity checks. You should double-check your security group settings with SRE. If the section is not there at all, or is missing some boxes, then there could be two reasons:

  1. the service is not properly discovered
  2. the service is not properly registered

To check if it's (1), check synapse on the consumer box.

  1. It should be running; check with sv s synapse
  2. Try restarting it with sv restart synapse
  3. Check the synapse logs in /etc/service/synapse/log/current for anything unusual

If it looks like synapse is working, then the problem is probably (2) -- no registration. To debug, follow these steps:

  1. Check the service on one of its instances
  • Is it running? Is it insta-crashing? watch sv s the_service
  1. If it's insta-crashing, figure out why
  • Check /etc/service/the_service/logs/current
  • Run it live; sv down the_service; cd /etc/service/the_service; ./run
  1. If it's running, is it passing health checks?
  • curl -D - localhost:32xx/health and ensure you get a 200
  1. Is it passing health checks from a remote box?
  • this happens if you accidentally only bind to lo in your service
  • run the health check curl from another box
  1. Is nerve running?
  • sv s nerve; if something is wrong with nerve, alert SRE

You can also smartstack by directly looking in zookeeper for registered services, and watching how that list changes over time. You can do this via an exhibitor UI. Another way is to use a zkCli client and connect directly to one of the machines in the cluster.

More Repositories

1

javascript

JavaScript Style Guide
JavaScript
145,177
star
2

lottie-android

Render After Effects animations natively on Android and iOS, Web, and React Native
Java
35,010
star
3

lottie-web

Render After Effects animations natively on Web, Android and iOS, and React Native. http://airbnb.io/lottie/
JavaScript
30,535
star
4

lottie-ios

An iOS library to natively render After Effects vector animations
Swift
25,760
star
5

visx

🐯 visx | visualization components
TypeScript
19,315
star
6

react-sketchapp

render React components to Sketch βš›οΈπŸ’Ž
TypeScript
14,939
star
7

react-dates

An easily internationalizable, mobile-friendly datepicker library for the web
JavaScript
11,630
star
8

epoxy

Epoxy is an Android library for building complex screens in a RecyclerView
Java
8,517
star
9

css

A mostly reasonable approach to CSS and Sass.
6,937
star
10

mavericks

Mavericks: Android on Autopilot
Kotlin
5,829
star
11

hypernova

A service for server-side rendering your JavaScript views
JavaScript
5,821
star
12

knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professions.
Python
5,478
star
13

ts-migrate

A tool to help migrate JavaScript code quickly and conveniently to TypeScript
TypeScript
5,405
star
14

aerosolve

A machine learning package built for humans.
Scala
4,795
star
15

lottie

Lottie documentation for http://airbnb.io/lottie.
HTML
4,457
star
16

DeepLinkDispatch

A simple, annotation-based library for making deep link handling better on Android
Java
4,380
star
17

ruby

Ruby Style Guide
Ruby
3,711
star
18

polyglot.js

Give your JavaScript the ability to speak many languages.
JavaScript
3,706
star
19

MagazineLayout

A collection view layout capable of laying out views in vertically scrolling grids and lists.
Swift
3,296
star
20

native-navigation

Native navigation library for React Native applications
Java
3,128
star
21

streamalert

StreamAlert is a serverless, realtime data analysis framework which empowers you to ingest, analyze, and alert on data from any environment, using datasources and alerting logic you define.
Python
2,847
star
22

infinity

UITableViews for the web (DEPRECATED)
JavaScript
2,802
star
23

HorizonCalendar

A declarative, performant, iOS calendar UI component that supports use cases ranging from simple date pickers all the way up to fully-featured calendar apps.
Swift
2,772
star
24

airpal

Web UI for PrestoDB.
Java
2,757
star
25

swift

Airbnb's Swift Style Guide
Markdown
2,407
star
26

Showkase

πŸ”¦ Showkase is an annotation-processor based Android library that helps you organize, discover, search and visualize Jetpack Compose UI elements
Kotlin
2,093
star
27

synapse

A transparent service discovery framework for connecting an SOA
Ruby
2,072
star
28

paris

Define and apply styles to Android views programmatically
Kotlin
1,907
star
29

AirMapView

A view abstraction to provide a map user interface with various underlying map providers
Java
1,870
star
30

react-with-styles

Use CSS-in-JavaScript with themes for React without being tightly coupled to one implementation
JavaScript
1,704
star
31

rheostat

Rheostat is a www, mobile, and accessible slider component built with React
JavaScript
1,692
star
32

binaryalert

BinaryAlert: Serverless, Real-time & Retroactive Malware Detection.
Python
1,405
star
33

epoxy-ios

Epoxy is a suite of declarative UI APIs for building UIKit applications in Swift
Swift
1,201
star
34

nerve

A service registration daemon that performs health checks; companion to airbnb/synapse
Ruby
942
star
35

okreplay

πŸ“Ό Record and replay OkHttp network interaction in your tests.
Groovy
782
star
36

chronon

Chronon is a data platform for serving for AI/ML applications.
Scala
731
star
37

RxGroups

Easily group RxJava Observables together and tie them to your Android Activity lifecycle
Java
694
star
38

react-outside-click-handler

OutsideClickHandler component for React.
JavaScript
612
star
39

ResilientDecoding

This package makes your Decodable types resilient to decoding errors and allows you to inspect those errors.
Swift
595
star
40

babel-plugin-dynamic-import-node

Babel plugin to transpile import() to a deferred require(), for node
JavaScript
575
star
41

kafkat

KafkaT-ool
Ruby
503
star
42

babel-plugin-dynamic-import-webpack

Babel plugin to transpile import() to require.ensure, for Webpack
JavaScript
499
star
43

babel-plugin-inline-react-svg

A babel plugin that optimizes and inlines SVGs for your React Components.
JavaScript
473
star
44

BuckSample

An example app showing how Buck can be used to build a simple iOS app.
Objective-C
461
star
45

lunar

πŸŒ— React toolkit and design language for Airbnb open source and internal projects.
TypeScript
461
star
46

SpinalTap

Change Data Capture (CDC) service
Java
430
star
47

artificial-adversary

πŸ—£οΈ Tool to generate adversarial text examples and test machine learning models against them
Python
394
star
48

dynein

Airbnb's Open-source Distributed Delayed Job QueueingΒ System
Java
383
star
49

hammerspace

Off-heap large object storage
Ruby
369
star
50

trebuchet

Trebuchet launches features at people
Ruby
312
star
51

reair

ReAir is a collection of easy-to-use tools for replicating tables and partitions between Hive data warehouses.
Java
279
star
52

zonify

a command line tool for generating DNS records from EC2 instances
Ruby
270
star
53

ottr

Serverless Public Key Infrastructure Framework
Python
270
star
54

omniduct

A toolkit providing a uniform interface for connecting to and extracting data from a wide variety of (potentially remote) data stores (including HDFS, Hive, Presto, MySQL, etc).
Python
254
star
55

hypernova-react

React bindings for Hypernova.
JavaScript
248
star
56

interferon

Signaling you about infrastructure or application issues
Ruby
239
star
57

babel-preset-airbnb

A babel preset for transforming your JavaScript for Airbnb
JavaScript
227
star
58

backpack

A pack of UI components for Backbone projects. Grab your backpack and enjoy the Views.
HTML
223
star
59

goji-js

React ❀️ Mini Program
TypeScript
218
star
60

react-with-direction

Components to provide and consume RTL or LTR direction in React
JavaScript
191
star
61

stemcell

Airbnb's EC2 instance creation and bootstrapping tool
Ruby
185
star
62

hypernova-ruby

Ruby client for Hypernova.
Ruby
141
star
63

kafka-statsd-metrics2

Send Kafka Metrics to StatsD.
Java
135
star
64

optica

A tool for keeping track of nodes in your infrastructure
Ruby
133
star
65

sparsam

Fast Thrift Bindings for Ruby
C++
124
star
66

js-shims

JS language shims used by Airbnb.
JavaScript
123
star
67

lottie-spm

Swift Package Manager support for Lottie, an iOS library to natively render After Effects vector animations
Ruby
122
star
68

bossbat

Stupid simple distributed job scheduling in node, backed by redis.
JavaScript
118
star
69

nimbus

Centralized CLI for JavaScript and TypeScript developer tools.
TypeScript
118
star
70

browser-shims

Browser and JS shims used by Airbnb.
JavaScript
117
star
71

twitter-commons-sample

A sample REST service based on Twitter Commons
Java
103
star
72

is-touch-device

Is the current JS environment a touch device?
JavaScript
90
star
73

rudolph

A serverless sync server for Santa, built on AWS
Go
79
star
74

hypernova-node

node.js client for Hypernova
JavaScript
73
star
75

plog

Fire-and-forget UDP logging service with custom Netty pipelines & extensive monitoring
Java
72
star
76

react-create-hoc

Create a React Higher-Order Component (HOC) following best practices.
JavaScript
67
star
77

vulnture

Python
67
star
78

cloud-maker

Building castles in the sky
Ruby
67
star
79

deline

An ES6 template tag that strips unwanted newlines from strings.
JavaScript
64
star
80

react-with-styles-interface-react-native

Interface to use react-with-styles with React Native
JavaScript
63
star
81

sputnik

Scala
63
star
82

mocha-wrap

Fluent pluggable interface for easily wrapping `describe` and `it` blocks in Mocha tests.
JavaScript
54
star
83

react-with-styles-interface-aphrodite

Interface to use react-with-styles with Aphrodite
JavaScript
54
star
84

eslint-plugin-react-with-styles

ESLint plugin for react-with-styles
JavaScript
49
star
85

sssp

Software distribution by way of S3 signed URLs
Haskell
47
star
86

alerts

An example alerts repo, for use with airbnb/interferon.
Ruby
46
star
87

apple-tv-auth

Example application to demonstrate how to build Apple TV style authentication.
Ruby
44
star
88

airbnb-spark-thrift

A library for loadling Thrift data into Spark SQL
Scala
42
star
89

jest-wrap

Fluent pluggable interface for easily wrapping `describe` and `it` blocks in Jest tests.
JavaScript
40
star
90

billow

Query AWS data without API credentials. Don't wait for a response.
Java
38
star
91

gosal

A Sal client written in Go
Go
35
star
92

backbone.baseview

DEPRECATED: A simple base view class for Backbone.View
JavaScript
34
star
93

anotherlens

News Deeply X Airbnb.Design - Another Lens
HTML
33
star
94

eslint-plugin-miniprogram

TypeScript
33
star
95

react-component-variations

JavaScript
33
star
96

react-with-styles-interface-css

πŸ“ƒ CSS interface for react-with-styles
JavaScript
32
star
97

transformpy

transformpy is a Python 2/3 module for doing transforms on "streams" of data
Python
29
star
98

appear

reveal terminal programs in the gui
Ruby
29
star
99

puppet-munki

Puppet
28
star
100

pool-hall

JavaScript
26
star