• Stars
    star
    609
  • Rank 70,673 (Top 2 %)
  • Language
    Ruby
  • License
    MIT License
  • Created almost 6 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Machine learning for Ruby

Eps

Machine learning for Ruby

  • Build predictive models quickly and easily
  • Serve models built in Ruby, Python, R, and more

Check out this post for more info on machine learning with Rails

Build Status

Installation

Add this line to your application’s Gemfile:

gem "eps"

On Mac, also install OpenMP:

brew install libomp

Getting Started

Create a model

data = [
  {bedrooms: 1, bathrooms: 1, price: 100000},
  {bedrooms: 2, bathrooms: 1, price: 125000},
  {bedrooms: 2, bathrooms: 2, price: 135000},
  {bedrooms: 3, bathrooms: 2, price: 162000}
]
model = Eps::Model.new(data, target: :price)
puts model.summary

Make a prediction

model.predict(bedrooms: 2, bathrooms: 1)

Store the model

File.write("model.pmml", model.to_pmml)

Load the model

pmml = File.read("model.pmml")
model = Eps::Model.load_pmml(pmml)

A few notes:

  • The target can be numeric (regression) or categorical (classification)
  • Pass an array of hashes to predict to make multiple predictions at once
  • Models are stored in PMML, a standard for model storage

Building Models

Goal

Often, the goal of building a model is to make good predictions on future data. To help achieve this, Eps splits the data into training and validation sets if you have 30+ data points. It uses the training set to build the model and the validation set to evaluate the performance.

If your data has a time associated with it, it’s highly recommended to use that field for the split.

Eps::Model.new(data, target: :price, split: :listed_at)

Otherwise, the split is random. There are a number of other options as well.

Performance is reported in the summary.

  • For regression, it reports validation RMSE (root mean squared error) - lower is better
  • For classification, it reports validation accuracy - higher is better

Typically, the best way to improve performance is feature engineering.

Feature Engineering

Features are extremely important for model performance. Features can be:

  1. numeric
  2. categorical
  3. text

Numeric

For numeric features, use any numeric type.

{bedrooms: 4, bathrooms: 2.5}

Categorical

For categorical features, use strings or booleans.

{state: "CA", basement: true}

Convert any ids to strings so they’re treated as categorical features.

{city_id: city_id.to_s}

For dates, create features like day of week and month.

{weekday: sold_on.strftime("%a"), month: sold_on.strftime("%b")}

For times, create features like day of week and hour of day.

{weekday: listed_at.strftime("%a"), hour: listed_at.hour.to_s}

Text

For text features, use strings with multiple words.

{description: "a beautiful house on top of a hill"}

This creates features based on word count.

You can specify text features explicitly with:

Eps::Model.new(data, target: :price, text_features: [:description])

You can set advanced options with:

text_features: {
  description: {
    min_occurences: 5,          # min times a word must appear to be included in the model
    max_features: 1000,         # max number of words to include in the model
    min_length: 1,              # min length of words to be included
    case_sensitive: true,       # how to treat words with different case
    tokenizer: /\s+/,           # how to tokenize the text, defaults to whitespace
    stop_words: ["and", "the"]  # words to exclude from the model
  }
}

Full Example

We recommend putting all the model code in a single file. This makes it easy to rebuild the model as needed.

In Rails, we recommend creating a app/ml_models directory. Be sure to restart Spring after creating the directory so files are autoloaded.

bin/spring stop

Here’s what a complete model in app/ml_models/price_model.rb may look like:

class PriceModel < Eps::Base
  def build
    houses = House.all

    # train
    data = houses.map { |v| features(v) }
    model = Eps::Model.new(data, target: :price, split: :listed_at)
    puts model.summary

    # save to file
    File.write(model_file, model.to_pmml)

    # ensure reloads from file
    @model = nil
  end

  def predict(house)
    model.predict(features(house))
  end

  private

  def features(house)
    {
      bedrooms: house.bedrooms,
      city_id: house.city_id.to_s,
      month: house.listed_at.strftime("%b"),
      listed_at: house.listed_at,
      price: house.price
    }
  end

  def model
    @model ||= Eps::Model.load_pmml(File.read(model_file))
  end

  def model_file
    File.join(__dir__, "price_model.pmml")
  end
end

Build the model with:

PriceModel.build

This saves the model to price_model.pmml. Check this into source control or use a tool like Trove to store it.

Predict with:

PriceModel.predict(house)

Monitoring

We recommend monitoring how well your models perform over time. To do this, save your predictions to the database. Then, compare them with:

actual = houses.map(&:price)
predicted = houses.map(&:predicted_price)
Eps.metrics(actual, predicted)

For RMSE and MAE, alert if they rise above a certain threshold. For ME, alert if it moves too far away from 0. For accuracy, alert if it drops below a certain threshold.

Other Languages

Eps makes it easy to serve models from other languages. You can build models in Python, R, and others and serve them in Ruby without having to worry about how to deploy or run another language.

Eps can serve LightGBM, linear regression, and naive Bayes models. Check out ONNX Runtime and Scoruby to serve other models.

Python

To create a model in Python, install the sklearn2pmml package

pip install sklearn2pmml

And check out the examples:

R

To create a model in R, install the pmml package

install.packages("pmml")

And check out the examples:

Verifying

It’s important for features to be implemented consistently when serving models created in other languages. We highly recommend verifying this programmatically. Create a CSV file with ids and predictions from the original model.

house_id prediction
1 145000
2 123000
3 250000

Once the model is implemented in Ruby, confirm the predictions match.

model = Eps::Model.load_pmml("model.pmml")

# preload houses to prevent n+1
houses = House.all.index_by(&:id)

CSV.foreach("predictions.csv", headers: true, converters: :numeric) do |row|
  house = houses[row["house_id"]]
  expected = row["prediction"]

  actual = model.predict(bedrooms: house.bedrooms, bathrooms: house.bathrooms)

  success = actual.is_a?(String) ? actual == expected : (actual - expected).abs < 0.001
  raise "Bad prediction for house #{house.id} (exp: #{expected}, act: #{actual})" unless success

  putc "✓"
end

Data

A number of data formats are supported. You can pass the target variable separately.

x = [{x: 1}, {x: 2}, {x: 3}]
y = [1, 2, 3]
Eps::Model.new(x, y)

Data can be an array of arrays

x = [[1, 2], [2, 0], [3, 1]]
y = [1, 2, 3]
Eps::Model.new(x, y)

Or Numo arrays

x = Numo::NArray.cast([[1, 2], [2, 0], [3, 1]])
y = Numo::NArray.cast([1, 2, 3])
Eps::Model.new(x, y)

Or a Rover data frame

df = Rover.read_csv("houses.csv")
Eps::Model.new(df, target: "price")

Or a Daru data frame

df = Daru::DataFrame.from_csv("houses.csv")
Eps::Model.new(df, target: "price")

When reading CSV files directly, be sure to convert numeric fields. The table method does this automatically.

CSV.table("data.csv").map { |row| row.to_h }

Algorithms

Pass an algorithm with:

Eps::Model.new(data, algorithm: :linear_regression)

Eps supports:

  • LightGBM (default)
  • Linear Regression
  • Naive Bayes

LightGBM

Pass the learning rate with:

Eps::Model.new(data, learning_rate: 0.01)

Linear Regression

By default, an intercept is included. Disable this with:

Eps::Model.new(data, intercept: false)

To speed up training on large datasets with linear regression, install GSL. With Homebrew, you can use:

brew install gsl

Then, add this line to your application’s Gemfile:

gem "gslr", group: :development

It only needs to be available in environments used to build the model.

Probability

To get the probability of each category for predictions with classification, use:

model.predict_probability(data)

Naive Bayes is known to produce poor probability estimates, so stick with LightGBM if you need this.

Validation Options

Pass your own validation set with:

Eps::Model.new(data, validation_set: validation_set)

Split on a specific value

Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2019-01-01")})

Specify the validation set size (the default is 0.25, which is 25%)

Eps::Model.new(data, split: {validation_size: 0.2})

Disable the validation set completely with:

Eps::Model.new(data, split: false)

Database Storage

The database is another place you can store models. It’s good if you retrain models automatically.

We recommend adding monitoring and guardrails as well if you retrain automatically

Create an Active Record model to store the predictive model.

rails generate model Model key:string:uniq data:text

Store the model with:

store = Model.where(key: "price").first_or_initialize
store.update(data: model.to_pmml)

Load the model with:

data = Model.find_by!(key: "price").data
model = Eps::Model.load_pmml(data)

Jupyter & IRuby

You can use IRuby to run Eps in Jupyter notebooks. Here’s how to get IRuby working with Rails.

Weights

Specify a weight for each data point

Eps::Model.new(data, weight: :weight)

You can also pass an array

Eps::Model.new(data, weight: [1, 2, 3])

Weights are supported for metrics as well

Eps.metrics(actual, predicted, weight: weight)

Reweighing is one method to mitigate bias in training data

Upgrading

0.3.0

Eps 0.3.0 brings a number of improvements, including support for LightGBM and cross-validation. There are a number of breaking changes to be aware of:

  • LightGBM is now the default for new models. On Mac, run:

    brew install libomp

    Pass the algorithm option to use linear regression or naive Bayes.

    Eps::Model.new(data, algorithm: :linear_regression) # or :naive_bayes
  • Cross-validation happens automatically by default. You no longer need to create training and test sets manually. If you were splitting on a time, use:

    Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2019-01-01")})

    Or randomly, use:

    Eps::Model.new(data, split: {validation_size: 0.3})

    To continue splitting manually, use:

    Eps::Model.new(data, validation_set: test_set)
  • It’s no longer possible to load models in JSON or PFA formats. Retrain models and save them as PMML.

0.2.0

Eps 0.2.0 brings a number of improvements, including support for classification.

We recommend:

  1. Changing Eps::Regressor to Eps::Model
  2. Converting models from JSON to PMML
model = Eps::Model.load_json("model.json")
File.write("model.pmml", model.to_pmml)
  1. Renaming app/stats_models to app/ml_models

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/eps.git
cd eps
bundle install
bundle exec rake test

More Repositories

1

pghero

A performance dashboard for Postgres
Ruby
7,123
star
2

searchkick

Intelligent search made easy
Ruby
6,257
star
3

chartkick

Create beautiful JavaScript charts with one line of Ruby
Ruby
6,157
star
4

blazer

Business intelligence made simple
Ruby
4,351
star
5

ahoy

Simple, powerful, first-party analytics for Rails
Ruby
3,872
star
6

strong_migrations

Catch unsafe migrations in development
Ruby
3,662
star
7

groupdate

The simplest way to group temporal data
Ruby
3,617
star
8

pgsync

Sync data from one Postgres database to another
Ruby
2,787
star
9

the-ultimate-guide-to-ruby-timeouts

Timeouts for popular Ruby gems
Ruby
2,212
star
10

production_rails

Best practices for running Rails in production
1,975
star
11

dexter

The automatic indexer for Postgres
Ruby
1,491
star
12

lockbox

Modern encryption for Ruby and Rails
Ruby
1,290
star
13

chartkick.js

Create beautiful charts with one line of JavaScript
JavaScript
1,211
star
14

react-chartkick

Create beautiful JavaScript charts with one line of React
JavaScript
1,183
star
15

pretender

Log in as another user in Rails
Ruby
1,124
star
16

ahoy_email

First-party email analytics for Rails
Ruby
1,051
star
17

secure_rails

Rails security best practices
954
star
18

pgslice

Postgres partitioning as easy as pie
Ruby
953
star
19

mailkick

Email subscriptions for Rails
Ruby
847
star
20

vue-chartkick

Create beautiful JavaScript charts with one line of Vue
JavaScript
747
star
21

awesome-legal

Awesome free legal documents for companies
589
star
22

searchjoy

Search analytics made easy
Ruby
579
star
23

polars-ruby

Blazingly fast DataFrames for Ruby
Ruby
563
star
24

torch.rb

Deep learning for Ruby, powered by LibTorch
Ruby
552
star
25

blind_index

Securely search encrypted database fields
Ruby
470
star
26

safely

Rescue and report exceptions in non-critical code
Ruby
470
star
27

authtrail

Track Devise login activity
Ruby
466
star
28

ahoy.js

Simple, powerful JavaScript analytics
JavaScript
463
star
29

multiverse

Multiple databases for Rails 🎉
Ruby
463
star
30

hightop

A nice shortcut for group count queries
Ruby
462
star
31

field_test

A/B testing for Rails
Ruby
460
star
32

s3tk

A security toolkit for Amazon S3
Python
439
star
33

disco

Recommendations for Ruby and Rails using collaborative filtering
Ruby
431
star
34

active_median

Median and percentile for Active Record, Mongoid, arrays, and hashes
Ruby
427
star
35

informers

State-of-the-art natural language processing for Ruby
Ruby
417
star
36

notable

Track notable requests and background jobs
Ruby
402
star
37

shorts

Short, random tutorials and posts
379
star
38

tensorflow-ruby

Deep learning for Ruby
Ruby
350
star
39

distribute_reads

Scale database reads to replicas in Rails
Ruby
328
star
40

slowpoke

Rack::Timeout enhancements for Rails
Ruby
327
star
41

prophet-ruby

Time series forecasting for Ruby
Ruby
321
star
42

rover

Simple, powerful data frames for Ruby
Ruby
311
star
43

groupdate.sql

The simplest way to group temporal data
PLpgSQL
280
star
44

kms_encrypted

Simple, secure key management for Lockbox and attr_encrypted
Ruby
235
star
45

jetpack

A friendly package manager for R
R
234
star
46

neighbor

Nearest neighbor search for Rails and Postgres
Ruby
230
star
47

rollup

Rollup time-series data in Rails
Ruby
230
star
48

hypershield

Shield sensitive data in Postgres and MySQL
Ruby
227
star
49

logstop

Keep personal data out of your logs
Ruby
218
star
50

pdscan

Scan your data stores for unencrypted personal data (PII)
Go
213
star
51

delete_in_batches

Fast batch deletes for Active Record and Postgres
Ruby
202
star
52

vega-ruby

Interactive charts for Ruby, powered by Vega and Vega-Lite
Ruby
192
star
53

mapkick

Create beautiful JavaScript maps with one line of Ruby
Ruby
173
star
54

dbx

A fast, easy-to-use database library for R
R
171
star
55

fastText-ruby

Efficient text classification and representation learning for Ruby
Ruby
162
star
56

autosuggest

Autocomplete suggestions based on what your users search
Ruby
162
star
57

swipeout

Swipe-to-delete goodness for the mobile web
JavaScript
159
star
58

pghero.sql

Postgres insights made easy
PLpgSQL
154
star
59

mainstreet

Address verification for Ruby and Rails
Ruby
149
star
60

or-tools-ruby

Operations research tools for Ruby
Ruby
139
star
61

mapkick.js

Create beautiful, interactive maps with one line of JavaScript
JavaScript
138
star
62

trend-ruby

Anomaly detection and forecasting for Ruby
Ruby
128
star
63

mitie-ruby

Named-entity recognition for Ruby
Ruby
122
star
64

barkick

Barcodes made easy
Ruby
120
star
65

ownership

Code ownership for Rails
Ruby
111
star
66

anomaly

Easy-to-use anomaly detection for Ruby
Ruby
98
star
67

errbase

Common exception reporting for a variety of services
Ruby
87
star
68

tokenizers-ruby

Fast state-of-the-art tokenizers for Ruby
Rust
81
star
69

ip_anonymizer

IP address anonymizer for Ruby and Rails
Ruby
79
star
70

str_enum

String enums for Rails
Ruby
75
star
71

faiss-ruby

Efficient similarity search and clustering for Ruby
C++
73
star
72

trend-api

Anomaly detection and forecasting API
R
71
star
73

archer

Rails console history for Heroku, Docker, and more
Ruby
70
star
74

onnxruntime-ruby

Run ONNX models in Ruby
Ruby
70
star
75

xgboost-ruby

High performance gradient boosting for Ruby
Ruby
69
star
76

secure-spreadsheet

Encrypt and password protect sensitive CSV and XLSX files
JavaScript
66
star
77

active_hll

HyperLogLog for Rails and Postgres
Ruby
66
star
78

guess

Statistical gender detection for Ruby
Ruby
60
star
79

morph

An encrypted, in-memory, key-value store
C++
59
star
80

lightgbm

High performance gradient boosting for Ruby
Ruby
56
star
81

midas-ruby

Edge stream anomaly detection for Ruby
Ruby
54
star
82

moves

Ruby client for Moves
Ruby
54
star
83

blingfire-ruby

High speed text tokenization for Ruby
Ruby
54
star
84

vowpalwabbit-ruby

Fast online machine learning for Ruby
Ruby
52
star
85

xlearn-ruby

High performance factorization machines for Ruby
Ruby
51
star
86

tomoto-ruby

High performance topic modeling for Ruby
C++
51
star
87

trove

Deploy machine learning models in Ruby (and Rails)
Ruby
50
star
88

ahoy_events

Simple, powerful event tracking for Rails
Ruby
42
star
89

mapkick-static

Create beautiful static maps with one line of Ruby
Ruby
42
star
90

practical-search

Let’s make search a better experience for our users
40
star
91

breakout-ruby

Breakout detection for Ruby
Ruby
40
star
92

plu

Price look-up codes made easy
Ruby
40
star
93

ngt-ruby

High-speed approximate nearest neighbors for Ruby
Ruby
39
star
94

gindex

Concurrent index migrations for Rails
Ruby
39
star
95

clockwork_web

A web interface for Clockwork
Ruby
38
star
96

ahoy_guide

A foundation of knowledge and libraries for solid analytics
38
star
97

notable_web

A web interface for Notable
HTML
36
star
98

AnomalyDetection.rb

Time series anomaly detection for Ruby
Ruby
34
star
99

khiva-ruby

High-performance time series algorithms for Ruby
Ruby
34
star
100

immudb-ruby

Ruby client for immudb, the immutable database
Ruby
34
star