• Stars
    star
    4,793
  • Rank 8,261 (Top 0.2 %)
  • Language
    Scala
  • License
    Apache License 2.0
  • Created almost 9 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A machine learning package built for humans.

aerosolve

Machine learning for humans.

Build Status Download Download

What is it?

A machine learning library designed from the ground up to be human friendly. It is different from other machine learning libraries in the following ways:

This library is meant to be used with sparse, interpretable features such as those that commonly occur in search (search keywords, filters) or pricing (number of rooms, location, price). It is not as interpretable with problems with very dense non-human interpretable features such as raw pixels or audio samples.

There are a few reasons to focus on interpretability:

  • Your corpus is new and not fully defined and you want more insight into your corpus
  • Having interpretable models lets you iterate quickly. Figure out where the model disagrees most and have insight into what kind of new features are needed.
  • Debugging noisy features. By plotting the feature weights you can discover buggy features or fit them to splines and discover features that are unexpectedly complex (which usually indicates overfitting).
  • You can discover relationships between different variables and your target prediction. e.g. For the Airbnb demand model, plotting graphs of reviews and 3-star reviews is more interpretable than many nested if then else rules.

Graph of reviews and 3-star reviews and feature weight

How to get started?

The artifacts for aerosolve are hosted on bintray. If you use Maven, SBT or Gradle you can just point to bintray as a repository and automatically fetch the artifacts.

Check out the image impression demo where you can learn how to teach the algorithm to paint in the pointillism style of painting. Image Impressionism Demo.

There is also an income prediction demo based on a popular machine learning benchmark. Income Prediction Demo.

Feature Representation

This section dives into the thrift based feature representation.

Features are grouped into logical groups called families of features. The reason for this is so we can express transformations on an entire feature family at once or interact two different families of features together to create a new feature family.

There are three kinds of features per FeatureVector:

  • stringFeatures - this is a map of feature family to binary feature strings. For example "GEO" -> { "San Francisco", "CA", "USA" }
  • floatFeatures - this is a map of feature family to feature name and value. For example "LOC" -> { "Latitude" : 37.75, "Longitude" : -122.43 }
  • denseFeatures - this is a map of feature family to a dense array of floats. Not really used except for the image content analysis code.

Example Representation

Examples are the basic unit of creating training data and scoring. A single example is composed of:

  • context - this is a FeatureVector that occurs once in the example. It could be the features representing a search session for example. e.g. "Keyword" -> "Free parking"
  • example(0..N) - this is a repeated list of FeatureVectors that represent the items being scored. These can correspond to documents in a search session. e.g. "LISTING CITY" -> "San Francisco"

The reasons for having this structure are:

  • having one context for hundreds of items saves a lot of space during RPCs or even on disk
  • you can compute the transforms for the context once, then apply the transformed context repeatedly in conjunction with each item
  • having a list of items allows the use of list based loss functions such as pairwise ranking loss, domination loss etc where we evaluate multiple items at once

Feature Transform language

This section dives into the feature transform language.

Feature transforms are applied with a separate transformer module that is decoupled from the model. This allows the user to break apart transforms or transform data ahead of time of scoring for example. e.g. in an application the items in a corpus may be transformed ahead of time and stored, while the context is not known until runtime. Then at runtime, one can transform the context and combined them with each transformed item to get the final feature vector that is then fed to the models.

Feature transforms allow us to modify FeatureVectors on the fly. This allows engineers to rapidly iterate on feature engineering quickly and in a controlled way.

Here are some examples of feature transforms that are commonly used:

  • List transform. A meta transform that specifies other transforms to be applied
  • Cross transform. Operates only on stringFeatures. Allows interactions between two different string feature families. e.g. "Keyword" cross "LISTING CITY" creates the new feature family "Keyword_x_city" -> "Free parking^San Francisco"
  • Multiscale grid transform Constructs multiple nested grids for 2D coordinates. Useful for modelling geography.

Please see the corresponding unit tests as to what these transforms do, what kind of features they operate on and what kind of config they expect.

Models

This section covers debuggable models.

Although there are several models in the model directory only two are the main debuggable models. The rest are experimental or sub-models that create transforms for the interpretable models.

Linear model. Supports hinge, logistic, epsilon insensitive regression, ranking loss functions. Only operates on stringFeatures. The label for the task is stored in a special feature family and specified by rank_key in the config. See the linear model unit tests on how to set up the models. Note that in conjunction with quantization and crosses you can get incredible amounts of complexity from the "linear" model, so it is not actually your regular linear model but something more complex and can be thought of as a bushy, very wide decision tree with millions of branches.

Spline model. A general additive linear piecewise spline model. The training is done at a higher resolution specified by num_buckets between the min and max of a feature's range. At the end of each iteration we attempt to project the linear piecewise spline into a lower dimensional function such as a polynomial spline with Dirac delta endpoints. If the RMSE of the projection is above threshold, we leave the spline alone in the high resolution piecewise linear mode. This allows us to debug the spline model for features that are buggy or unexpectedly complex (e.g. jumping up and down when we expect some kind of smoothness)

  • Boosted stumps model - small compact model. Not very interpretable but at small sizes useful for feature selection.
  • Decision tree model - in memory only. Mostly used to generate transforms for the linear or spline model.
  • Maxout neural network model. Experimental and mostly used as a comparison baseline.

IDE

If you use intellij, try build first, so that thrift classes is available and to fix the spark compiling error inside intellij, type command+; and click dependency and change related files from test to compile, such as org.apache.spark and org.apache.hadoop:hadoop-common. We keep gradle config as testCompile so that to reduce jar file size.

Support

Hackpad

Dev group

User group

In the wild

Organizations and projects using aerosolve can list themselves here.

More Repositories

1

javascript

JavaScript Style Guide
JavaScript
140,958
star
2

lottie-android

Render After Effects animations natively on Android and iOS, Web, and React Native
Java
34,600
star
3

lottie-web

Render After Effects animations natively on Web, Android and iOS, and React Native. http://airbnb.io/lottie/
JavaScript
29,517
star
4

lottie-ios

An iOS library to natively render After Effects vector animations
Swift
24,843
star
5

visx

🐯 visx | visualization components
TypeScript
18,429
star
6

react-sketchapp

render React components to Sketch ⚛️💎
TypeScript
14,952
star
7

react-dates

An easily internationalizable, mobile-friendly datepicker library for the web
JavaScript
11,630
star
8

epoxy

Epoxy is an Android library for building complex screens in a RecyclerView
Java
8,426
star
9

css

A mostly reasonable approach to CSS and Sass.
6,860
star
10

hypernova

A service for server-side rendering your JavaScript views
JavaScript
5,825
star
11

mavericks

Mavericks: Android on Autopilot
Kotlin
5,696
star
12

knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professions.
Python
5,425
star
13

ts-migrate

A tool to help migrate JavaScript code quickly and conveniently to TypeScript
TypeScript
5,287
star
14

DeepLinkDispatch

A simple, annotation-based library for making deep link handling better on Android
Java
4,355
star
15

lottie

Lottie documentation for http://airbnb.io/lottie.
HTML
4,289
star
16

ruby

Ruby Style Guide
Ruby
3,711
star
17

polyglot.js

Give your JavaScript the ability to speak many languages.
JavaScript
3,641
star
18

MagazineLayout

A collection view layout capable of laying out views in vertically scrolling grids and lists.
Swift
3,232
star
19

native-navigation

Native navigation library for React Native applications
Java
3,129
star
20

streamalert

StreamAlert is a serverless, realtime data analysis framework which empowers you to ingest, analyze, and alert on data from any environment, using datasources and alerting logic you define.
Python
2,823
star
21

infinity

UITableViews for the web (DEPRECATED)
JavaScript
2,808
star
22

airpal

Web UI for PrestoDB.
Java
2,762
star
23

HorizonCalendar

A declarative, performant, iOS calendar UI component that supports use cases ranging from simple date pickers all the way up to fully-featured calendar apps.
Swift
2,656
star
24

swift

Airbnb's Swift Style Guide
Markdown
2,239
star
25

synapse

A transparent service discovery framework for connecting an SOA
Ruby
2,070
star
26

Showkase

🔦 Showkase is an annotation-processor based Android library that helps you organize, discover, search and visualize Jetpack Compose UI elements
Kotlin
2,018
star
27

paris

Define and apply styles to Android views programmatically
Kotlin
1,894
star
28

AirMapView

A view abstraction to provide a map user interface with various underlying map providers
Java
1,864
star
29

react-with-styles

Use CSS-in-JavaScript with themes for React without being tightly coupled to one implementation
JavaScript
1,698
star
30

rheostat

Rheostat is a www, mobile, and accessible slider component built with React
JavaScript
1,690
star
31

binaryalert

BinaryAlert: Serverless, Real-time & Retroactive Malware Detection.
Python
1,377
star
32

epoxy-ios

Epoxy is a suite of declarative UI APIs for building UIKit applications in Swift
Swift
1,142
star
33

nerve

A service registration daemon that performs health checks; companion to airbnb/synapse
Ruby
944
star
34

okreplay

📼 Record and replay OkHttp network interaction in your tests.
Groovy
776
star
35

RxGroups

Easily group RxJava Observables together and tie them to your Android Activity lifecycle
Java
693
star
36

prop-types

Custom React PropType validators that we use at Airbnb.
JavaScript
672
star
37

react-outside-click-handler

OutsideClickHandler component for React.
JavaScript
604
star
38

ResilientDecoding

This package makes your Decodable types resilient to decoding errors and allows you to inspect those errors.
Swift
580
star
39

babel-plugin-dynamic-import-node

Babel plugin to transpile import() to a deferred require(), for node
JavaScript
575
star
40

kafkat

KafkaT-ool
Ruby
502
star
41

babel-plugin-dynamic-import-webpack

Babel plugin to transpile import() to require.ensure, for Webpack
JavaScript
500
star
42

babel-plugin-inline-react-svg

A babel plugin that optimizes and inlines SVGs for your React Components.
JavaScript
473
star
43

lunar

🌗 React toolkit and design language for Airbnb open source and internal projects.
TypeScript
461
star
44

BuckSample

An example app showing how Buck can be used to build a simple iOS app.
Objective-C
459
star
45

SpinalTap

Change Data Capture (CDC) service
Java
428
star
46

artificial-adversary

🗣️ Tool to generate adversarial text examples and test machine learning models against them
Python
390
star
47

dynein

Airbnb's Open-source Distributed Delayed Job Queueing System
Java
383
star
48

hammerspace

Off-heap large object storage
Ruby
364
star
49

trebuchet

Trebuchet launches features at people
Ruby
313
star
50

reair

ReAir is a collection of easy-to-use tools for replicating tables and partitions between Hive data warehouses.
Java
279
star
51

zonify

a command line tool for generating DNS records from EC2 instances
Ruby
271
star
52

ottr

Serverless Public Key Infrastructure Framework
Python
266
star
53

omniduct

A toolkit providing a uniform interface for connecting to and extracting data from a wide variety of (potentially remote) data stores (including HDFS, Hive, Presto, MySQL, etc).
Python
249
star
54

hypernova-react

React bindings for Hypernova.
JavaScript
248
star
55

smartstack-cookbook

The chef recipes for running and testing Airbnb's SmartStack
Ruby
244
star
56

interferon

Signaling you about infrastructure or application issues
Ruby
240
star
57

prop-types-exact

For use with React PropTypes. Will error on any prop not explicitly specified.
JavaScript
238
star
58

backpack

A pack of UI components for Backbone projects. Grab your backpack and enjoy the Views.
HTML
223
star
59

babel-preset-airbnb

A babel preset for transforming your JavaScript for Airbnb
JavaScript
223
star
60

goji-js

React ❤️ Mini Program
TypeScript
213
star
61

react-with-direction

Components to provide and consume RTL or LTR direction in React
JavaScript
191
star
62

stemcell

Airbnb's EC2 instance creation and bootstrapping tool
Ruby
185
star
63

hypernova-ruby

Ruby client for Hypernova.
Ruby
141
star
64

kafka-statsd-metrics2

Send Kafka Metrics to StatsD.
Java
136
star
65

optica

A tool for keeping track of nodes in your infrastructure
Ruby
134
star
66

sparsam

Fast Thrift Bindings for Ruby
C++
125
star
67

js-shims

JS language shims used by Airbnb.
JavaScript
123
star
68

browser-shims

Browser and JS shims used by Airbnb.
JavaScript
118
star
69

bossbat

Stupid simple distributed job scheduling in node, backed by redis.
JavaScript
118
star
70

nimbus

Centralized CLI for JavaScript and TypeScript developer tools.
TypeScript
117
star
71

lottie-spm

Swift Package Manager support for Lottie, an iOS library to natively render After Effects vector animations
Ruby
106
star
72

twitter-commons-sample

A sample REST service based on Twitter Commons
Java
103
star
73

is-touch-device

Is the current JS environment a touch device?
JavaScript
90
star
74

rudolph

A serverless sync server for Santa, built on AWS
Go
73
star
75

hypernova-node

node.js client for Hypernova
JavaScript
73
star
76

plog

Fire-and-forget UDP logging service with custom Netty pipelines & extensive monitoring
Java
73
star
77

cloud-maker

Building castles in the sky
Ruby
68
star
78

react-create-hoc

Create a React Higher-Order Component (HOC) following best practices.
JavaScript
66
star
79

vulnture

Python
65
star
80

deline

An ES6 template tag that strips unwanted newlines from strings.
JavaScript
63
star
81

react-with-styles-interface-react-native

Interface to use react-with-styles with React Native
JavaScript
63
star
82

sputnik

Scala
61
star
83

mocha-wrap

Fluent pluggable interface for easily wrapping `describe` and `it` blocks in Mocha tests.
JavaScript
54
star
84

react-with-styles-interface-aphrodite

Interface to use react-with-styles with Aphrodite
JavaScript
54
star
85

eslint-plugin-react-with-styles

ESLint plugin for react-with-styles
JavaScript
49
star
86

sssp

Software distribution by way of S3 signed URLs
Haskell
47
star
87

alerts

An example alerts repo, for use with airbnb/interferon.
Ruby
46
star
88

apple-tv-auth

Example application to demonstrate how to build Apple TV style authentication.
Ruby
44
star
89

airbnb-spark-thrift

A library for loadling Thrift data into Spark SQL
Scala
43
star
90

billow

Query AWS data without API credentials. Don't wait for a response.
Java
40
star
91

jest-wrap

Fluent pluggable interface for easily wrapping `describe` and `it` blocks in Jest tests.
JavaScript
39
star
92

gosal

A Sal client written in Go
Go
35
star
93

backbone.baseview

DEPRECATED: A simple base view class for Backbone.View
JavaScript
34
star
94

anotherlens

News Deeply X Airbnb.Design - Another Lens
HTML
33
star
95

eslint-plugin-miniprogram

TypeScript
33
star
96

react-component-variations

JavaScript
33
star
97

react-with-styles-interface-css

📃 CSS interface for react-with-styles
JavaScript
30
star
98

appear

reveal terminal programs in the gui
Ruby
29
star
99

puppet-munki

Puppet
29
star
100

transformpy

transformpy is a Python 2/3 module for doing transforms on "streams" of data
Python
29
star