• Stars
    star
    383
  • Rank 111,995 (Top 3 %)
  • Language
    Ruby
  • License
    MIT License
  • Created over 15 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Flexible and Extensible Machine Learning in Ruby

Decider

Yet Another Ruby Machine Learning Library

Manifesto

There are other ruby machine learning libraries out there:

So why another one?

  • You can install it and try it in irb right away. You don’t need to learn how a half-dozen classes work to get started:

    c = Decider.classifier(:spam, :ham)
    c.spam << "some spammy text"
    c.ham << "some hammy goodness"
    c.spam?("more spammy text")
    # => true
    

The default configuration is about 96% accurate as an email spam classifier.

  • You can control how it processes its input easily. Decider has built-in support for plain text and URIs, stemming words, stop word removal, and n-grams. All of these can easily be combined at your option (see “Getting Started” below for a quick example). Additional tokenization strategies or support for non-text document types can be added with a minimum of hassle.

  • Persist (Save) with Moneta. Pretty much any storage mechanism that’s available in ruby is supported. Save to a database and implement distributed classification if you like.

  • Clustering Analysis. Useful for recommendation algorithms. (In Progress)

Getting Started

c = Decider.classifier(:spam, :ham) do |doc|
  doc.plain_text
  doc.ngrams(2..3)
  doc.stem
end

c.spam << "buy viagra, jerk" << "get enormous hot dog for make women happy"
c.ham << "check out my code on github homie" << "let's go out for beers after work"

p c.spam?("viagra for huge hot dog")
# => true
puts "term frequencies:"
puts "spam: #{c.spam.term_frequency.inspect}"
puts "ham:  #{c.ham.term_frequency.inspect}"
puts ""
p c.scores("let's write code and drink some beers")
# => {:spam=>0.0, :ham=>1.0}
p c.classify("let's write code and drink some beers")
# => :ham

Performance

Decider has several benchmarks that also double as integration tests. These are run regularly and used to pinpoint CPU and RAM bottlenecks.

Decider does a lot of math and is fairly computationally intensive, so you want all the extra speed you can get. It is regularly tested with Ruby 1.9 and Jruby. I highly recommend using one of these Ruby implementations if at all possible if you plan on doing anything serious with Decider.

Also keep in mind that your dataset should reside entirely in memory or else you’ll hit a brick wall.

More Repositories

1

deep_merge

Recursive Merging for Ruby Hashes
Ruby
160
star
2

qusion

Using AMQP as a Queuing Backend for Web Apps Should Be Easy
Ruby
111
star
3

critical

Infrastructure Monitoring As Code
Ruby
68
star
4

moqueue

Mocktacular Companion to AMQP Library. Happy TATFTing!
Ruby
67
star
5

knife-plugins

my .chef/plugins/knife/
Ruby
27
star
6

nom_nom_nom

A Simple Status Server for Chef
JavaScript
22
star
7

cookbooks

Opscode Cookbooks for Chef
Ruby
12
star
8

partials

Demo of Template Partials in Chef 11
Ruby
12
star
9

knife-boxer

Frictionless Chef Environments via Checksum-Versioned Cookbooks
Ruby
10
star
10

omnibus-rubies

Matrix of Ruby/Rubygems Versions in Omnibus Form
Ruby
8
star
11

chef-workflow2-prototype

demoware
Ruby
7
star
12

teeth

Fast parsing of log files in Ruby
C
6
star
13

policyfile-jenkins-demo

demo app for policyfiles
Ruby
4
star
14

gem-mirror-tools

Scripts and Hacks to make Gem Mirroring Work
Ruby
2
star
15

acts_as_bourbon

GitHub Recommendation Contest
2
star
16

chef-data-bindings

Data Binding DSL for Chef: Make cookbooks more readable, portable, and maintainable
Ruby
2
star
17

kallistec-thor-tasks

My Own System Thor Tasks
2
star
18

resource-tracking-demo

work in progress
Ruby
1
star
19

QusionTestHarness

A Basic Rails App That I Use to Test Qusion
Ruby
1
star
20

seth

sethsethsethsethseth
Ruby
1
star
21

exampleproject

A bunch of slow running specs including one that randomly passes and fails. For demonstrating distributed CI.
Ruby
1
star
22

rubygems-json-list

JSON output for gem list
Ruby
1
star
23

docsite

Automated RDoc Site Generator
Ruby
1
star
24

queue-patch

Helps me stay on top of pull requests for opscode/chef
Ruby
1
star
25

omnibus-rubygems-mirror

Omnibus Builds for a Gem Mirror Server using gem-mirror-tools
Ruby
1
star
26

panopticon

Get performance statistics on remote servers using the magic of Nanite
Ruby
1
star