• Stars
    star
    4,795
  • Rank 8,785 (Top 0.2 %)
  • Language
    Scala
  • License
    Apache License 2.0
  • Created over 9 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A machine learning package built for humans.

aerosolve

Machine learning for humans.

Build Status Download Download

What is it?

A machine learning library designed from the ground up to be human friendly. It is different from other machine learning libraries in the following ways:

This library is meant to be used with sparse, interpretable features such as those that commonly occur in search (search keywords, filters) or pricing (number of rooms, location, price). It is not as interpretable with problems with very dense non-human interpretable features such as raw pixels or audio samples.

There are a few reasons to focus on interpretability:

  • Your corpus is new and not fully defined and you want more insight into your corpus
  • Having interpretable models lets you iterate quickly. Figure out where the model disagrees most and have insight into what kind of new features are needed.
  • Debugging noisy features. By plotting the feature weights you can discover buggy features or fit them to splines and discover features that are unexpectedly complex (which usually indicates overfitting).
  • You can discover relationships between different variables and your target prediction. e.g. For the Airbnb demand model, plotting graphs of reviews and 3-star reviews is more interpretable than many nested if then else rules.

Graph of reviews and 3-star reviews and feature weight

How to get started?

The artifacts for aerosolve are hosted on bintray. If you use Maven, SBT or Gradle you can just point to bintray as a repository and automatically fetch the artifacts.

Check out the image impression demo where you can learn how to teach the algorithm to paint in the pointillism style of painting. Image Impressionism Demo.

There is also an income prediction demo based on a popular machine learning benchmark. Income Prediction Demo.

Feature Representation

This section dives into the thrift based feature representation.

Features are grouped into logical groups called families of features. The reason for this is so we can express transformations on an entire feature family at once or interact two different families of features together to create a new feature family.

There are three kinds of features per FeatureVector:

  • stringFeatures - this is a map of feature family to binary feature strings. For example "GEO" -> { "San Francisco", "CA", "USA" }
  • floatFeatures - this is a map of feature family to feature name and value. For example "LOC" -> { "Latitude" : 37.75, "Longitude" : -122.43 }
  • denseFeatures - this is a map of feature family to a dense array of floats. Not really used except for the image content analysis code.

Example Representation

Examples are the basic unit of creating training data and scoring. A single example is composed of:

  • context - this is a FeatureVector that occurs once in the example. It could be the features representing a search session for example. e.g. "Keyword" -> "Free parking"
  • example(0..N) - this is a repeated list of FeatureVectors that represent the items being scored. These can correspond to documents in a search session. e.g. "LISTING CITY" -> "San Francisco"

The reasons for having this structure are:

  • having one context for hundreds of items saves a lot of space during RPCs or even on disk
  • you can compute the transforms for the context once, then apply the transformed context repeatedly in conjunction with each item
  • having a list of items allows the use of list based loss functions such as pairwise ranking loss, domination loss etc where we evaluate multiple items at once

Feature Transform language

This section dives into the feature transform language.

Feature transforms are applied with a separate transformer module that is decoupled from the model. This allows the user to break apart transforms or transform data ahead of time of scoring for example. e.g. in an application the items in a corpus may be transformed ahead of time and stored, while the context is not known until runtime. Then at runtime, one can transform the context and combined them with each transformed item to get the final feature vector that is then fed to the models.

Feature transforms allow us to modify FeatureVectors on the fly. This allows engineers to rapidly iterate on feature engineering quickly and in a controlled way.

Here are some examples of feature transforms that are commonly used:

  • List transform. A meta transform that specifies other transforms to be applied
  • Cross transform. Operates only on stringFeatures. Allows interactions between two different string feature families. e.g. "Keyword" cross "LISTING CITY" creates the new feature family "Keyword_x_city" -> "Free parking^San Francisco"
  • Multiscale grid transform Constructs multiple nested grids for 2D coordinates. Useful for modelling geography.

Please see the corresponding unit tests as to what these transforms do, what kind of features they operate on and what kind of config they expect.

Models

This section covers debuggable models.

Although there are several models in the model directory only two are the main debuggable models. The rest are experimental or sub-models that create transforms for the interpretable models.

Linear model. Supports hinge, logistic, epsilon insensitive regression, ranking loss functions. Only operates on stringFeatures. The label for the task is stored in a special feature family and specified by rank_key in the config. See the linear model unit tests on how to set up the models. Note that in conjunction with quantization and crosses you can get incredible amounts of complexity from the "linear" model, so it is not actually your regular linear model but something more complex and can be thought of as a bushy, very wide decision tree with millions of branches.

Spline model. A general additive linear piecewise spline model. The training is done at a higher resolution specified by num_buckets between the min and max of a feature's range. At the end of each iteration we attempt to project the linear piecewise spline into a lower dimensional function such as a polynomial spline with Dirac delta endpoints. If the RMSE of the projection is above threshold, we leave the spline alone in the high resolution piecewise linear mode. This allows us to debug the spline model for features that are buggy or unexpectedly complex (e.g. jumping up and down when we expect some kind of smoothness)

  • Boosted stumps model - small compact model. Not very interpretable but at small sizes useful for feature selection.
  • Decision tree model - in memory only. Mostly used to generate transforms for the linear or spline model.
  • Maxout neural network model. Experimental and mostly used as a comparison baseline.

IDE

If you use intellij, try build first, so that thrift classes is available and to fix the spark compiling error inside intellij, type command+; and click dependency and change related files from test to compile, such as org.apache.spark and org.apache.hadoop:hadoop-common. We keep gradle config as testCompile so that to reduce jar file size.

Support

Hackpad

Dev group

User group

In the wild

Organizations and projects using aerosolve can list themselves here.

More Repositories

1

javascript

JavaScript Style Guide
JavaScript
145,177
star
2

lottie-android

Render After Effects animations natively on Android and iOS, Web, and React Native
Java
35,010
star
3

lottie-web

Render After Effects animations natively on Web, Android and iOS, and React Native. http://airbnb.io/lottie/
JavaScript
30,535
star
4

lottie-ios

An iOS library to natively render After Effects vector animations
Swift
25,760
star
5

visx

🐯 visx | visualization components
TypeScript
19,315
star
6

react-sketchapp

render React components to Sketch ⚛️💎
TypeScript
14,939
star
7

react-dates

An easily internationalizable, mobile-friendly datepicker library for the web
JavaScript
11,630
star
8

epoxy

Epoxy is an Android library for building complex screens in a RecyclerView
Java
8,517
star
9

css

A mostly reasonable approach to CSS and Sass.
6,937
star
10

mavericks

Mavericks: Android on Autopilot
Kotlin
5,829
star
11

hypernova

A service for server-side rendering your JavaScript views
JavaScript
5,821
star
12

knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professions.
Python
5,478
star
13

ts-migrate

A tool to help migrate JavaScript code quickly and conveniently to TypeScript
TypeScript
5,405
star
14

lottie

Lottie documentation for http://airbnb.io/lottie.
HTML
4,457
star
15

DeepLinkDispatch

A simple, annotation-based library for making deep link handling better on Android
Java
4,380
star
16

ruby

Ruby Style Guide
Ruby
3,711
star
17

polyglot.js

Give your JavaScript the ability to speak many languages.
JavaScript
3,706
star
18

MagazineLayout

A collection view layout capable of laying out views in vertically scrolling grids and lists.
Swift
3,296
star
19

native-navigation

Native navigation library for React Native applications
Java
3,128
star
20

streamalert

StreamAlert is a serverless, realtime data analysis framework which empowers you to ingest, analyze, and alert on data from any environment, using datasources and alerting logic you define.
Python
2,847
star
21

infinity

UITableViews for the web (DEPRECATED)
JavaScript
2,802
star
22

HorizonCalendar

A declarative, performant, iOS calendar UI component that supports use cases ranging from simple date pickers all the way up to fully-featured calendar apps.
Swift
2,772
star
23

airpal

Web UI for PrestoDB.
Java
2,757
star
24

swift

Airbnb's Swift Style Guide
Markdown
2,407
star
25

Showkase

🔦 Showkase is an annotation-processor based Android library that helps you organize, discover, search and visualize Jetpack Compose UI elements
Kotlin
2,093
star
26

synapse

A transparent service discovery framework for connecting an SOA
Ruby
2,072
star
27

paris

Define and apply styles to Android views programmatically
Kotlin
1,907
star
28

AirMapView

A view abstraction to provide a map user interface with various underlying map providers
Java
1,870
star
29

react-with-styles

Use CSS-in-JavaScript with themes for React without being tightly coupled to one implementation
JavaScript
1,704
star
30

rheostat

Rheostat is a www, mobile, and accessible slider component built with React
JavaScript
1,692
star
31

binaryalert

BinaryAlert: Serverless, Real-time & Retroactive Malware Detection.
Python
1,405
star
32

epoxy-ios

Epoxy is a suite of declarative UI APIs for building UIKit applications in Swift
Swift
1,201
star
33

nerve

A service registration daemon that performs health checks; companion to airbnb/synapse
Ruby
942
star
34

okreplay

📼 Record and replay OkHttp network interaction in your tests.
Groovy
782
star
35

chronon

Chronon is a data platform for serving for AI/ML applications.
Scala
731
star
36

RxGroups

Easily group RxJava Observables together and tie them to your Android Activity lifecycle
Java
694
star
37

react-outside-click-handler

OutsideClickHandler component for React.
JavaScript
612
star
38

ResilientDecoding

This package makes your Decodable types resilient to decoding errors and allows you to inspect those errors.
Swift
595
star
39

babel-plugin-dynamic-import-node

Babel plugin to transpile import() to a deferred require(), for node
JavaScript
575
star
40

kafkat

KafkaT-ool
Ruby
503
star
41

babel-plugin-dynamic-import-webpack

Babel plugin to transpile import() to require.ensure, for Webpack
JavaScript
499
star
42

babel-plugin-inline-react-svg

A babel plugin that optimizes and inlines SVGs for your React Components.
JavaScript
473
star
43

BuckSample

An example app showing how Buck can be used to build a simple iOS app.
Objective-C
461
star
44

lunar

🌗 React toolkit and design language for Airbnb open source and internal projects.
TypeScript
461
star
45

SpinalTap

Change Data Capture (CDC) service
Java
430
star
46

artificial-adversary

🗣️ Tool to generate adversarial text examples and test machine learning models against them
Python
394
star
47

dynein

Airbnb's Open-source Distributed Delayed Job Queueing System
Java
383
star
48

hammerspace

Off-heap large object storage
Ruby
369
star
49

trebuchet

Trebuchet launches features at people
Ruby
312
star
50

reair

ReAir is a collection of easy-to-use tools for replicating tables and partitions between Hive data warehouses.
Java
279
star
51

zonify

a command line tool for generating DNS records from EC2 instances
Ruby
270
star
52

ottr

Serverless Public Key Infrastructure Framework
Python
270
star
53

omniduct

A toolkit providing a uniform interface for connecting to and extracting data from a wide variety of (potentially remote) data stores (including HDFS, Hive, Presto, MySQL, etc).
Python
254
star
54

hypernova-react

React bindings for Hypernova.
JavaScript
248
star
55

smartstack-cookbook

The chef recipes for running and testing Airbnb's SmartStack
Ruby
245
star
56

interferon

Signaling you about infrastructure or application issues
Ruby
239
star
57

babel-preset-airbnb

A babel preset for transforming your JavaScript for Airbnb
JavaScript
227
star
58

backpack

A pack of UI components for Backbone projects. Grab your backpack and enjoy the Views.
HTML
223
star
59

goji-js

React ❤️ Mini Program
TypeScript
218
star
60

react-with-direction

Components to provide and consume RTL or LTR direction in React
JavaScript
191
star
61

stemcell

Airbnb's EC2 instance creation and bootstrapping tool
Ruby
185
star
62

hypernova-ruby

Ruby client for Hypernova.
Ruby
141
star
63

kafka-statsd-metrics2

Send Kafka Metrics to StatsD.
Java
135
star
64

optica

A tool for keeping track of nodes in your infrastructure
Ruby
133
star
65

sparsam

Fast Thrift Bindings for Ruby
C++
124
star
66

js-shims

JS language shims used by Airbnb.
JavaScript
123
star
67

lottie-spm

Swift Package Manager support for Lottie, an iOS library to natively render After Effects vector animations
Ruby
122
star
68

bossbat

Stupid simple distributed job scheduling in node, backed by redis.
JavaScript
118
star
69

nimbus

Centralized CLI for JavaScript and TypeScript developer tools.
TypeScript
118
star
70

browser-shims

Browser and JS shims used by Airbnb.
JavaScript
117
star
71

twitter-commons-sample

A sample REST service based on Twitter Commons
Java
103
star
72

is-touch-device

Is the current JS environment a touch device?
JavaScript
90
star
73

rudolph

A serverless sync server for Santa, built on AWS
Go
79
star
74

hypernova-node

node.js client for Hypernova
JavaScript
73
star
75

plog

Fire-and-forget UDP logging service with custom Netty pipelines & extensive monitoring
Java
72
star
76

react-create-hoc

Create a React Higher-Order Component (HOC) following best practices.
JavaScript
67
star
77

vulnture

Python
67
star
78

cloud-maker

Building castles in the sky
Ruby
67
star
79

deline

An ES6 template tag that strips unwanted newlines from strings.
JavaScript
64
star
80

react-with-styles-interface-react-native

Interface to use react-with-styles with React Native
JavaScript
63
star
81

sputnik

Scala
63
star
82

mocha-wrap

Fluent pluggable interface for easily wrapping `describe` and `it` blocks in Mocha tests.
JavaScript
54
star
83

react-with-styles-interface-aphrodite

Interface to use react-with-styles with Aphrodite
JavaScript
54
star
84

eslint-plugin-react-with-styles

ESLint plugin for react-with-styles
JavaScript
49
star
85

sssp

Software distribution by way of S3 signed URLs
Haskell
47
star
86

alerts

An example alerts repo, for use with airbnb/interferon.
Ruby
46
star
87

apple-tv-auth

Example application to demonstrate how to build Apple TV style authentication.
Ruby
44
star
88

airbnb-spark-thrift

A library for loadling Thrift data into Spark SQL
Scala
42
star
89

jest-wrap

Fluent pluggable interface for easily wrapping `describe` and `it` blocks in Jest tests.
JavaScript
40
star
90

billow

Query AWS data without API credentials. Don't wait for a response.
Java
38
star
91

gosal

A Sal client written in Go
Go
35
star
92

backbone.baseview

DEPRECATED: A simple base view class for Backbone.View
JavaScript
34
star
93

anotherlens

News Deeply X Airbnb.Design - Another Lens
HTML
33
star
94

eslint-plugin-miniprogram

TypeScript
33
star
95

react-component-variations

JavaScript
33
star
96

react-with-styles-interface-css

📃 CSS interface for react-with-styles
JavaScript
32
star
97

transformpy

transformpy is a Python 2/3 module for doing transforms on "streams" of data
Python
29
star
98

appear

reveal terminal programs in the gui
Ruby
29
star
99

puppet-munki

Puppet
28
star
100

pool-hall

JavaScript
26
star