• Stars
    star
    392
  • Rank 107,652 (Top 3 %)
  • Language
    Elixir
  • Created almost 8 years ago
  • Updated almost 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Naive Bayes machine learning implementation in Elixir.

Simple Bayes Travis Coverage Hex.pm

A Naive Bayes machine learning implementation in Elixir.

In machine learning, naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.

Naive Bayes has been studied extensively since the 1950s. It was introduced under a different name into the text retrieval community in the early 1960s, and remains a popular (baseline) method for text categorization, the problem of judging documents as belonging to one category or the other (such as spam or legitimate, sports or politics, etc.) with word frequencies as the features. With appropriate preprocessing, it is competitive in this domain with more advanced methods including support vector machines. It also finds application in automatic medical diagnosis.

Naive Bayes classifiers are highly scalable, requiring a number of parameters linear in the number of variables (features/predictors) in a learning problem. Maximum-likelihood training can be done by evaluating a closed-form expression, which takes linear time, rather than by expensive iterative approximation as used for many other types of classifiers. - Wikipedia

Features

  • Naive Bayes algorithm with different models
    • Multinomial
    • Binarized (boolean) multinomial
    • Bernoulli
  • Multiple storage options
    • In-memory (default)
    • File system
    • Dets (Disk-based Erlang Term Storage)
  • Ignores stop words
  • Additive smoothing
  • TF-IDF
  • Optional keywords weighting
  • Optional word stemming via Stemmer

Feature Matrix

Multinomial Binarized multinomial Bernoulli
Stop words βœ… βœ… βœ…
Additive smoothing βœ… βœ…
TF-IDF βœ…
Keywords weighting βœ…
Stemming βœ… βœ… βœ…

Usage

Install by adding :simple_bayes and optionally :stemmer (for the default stemming functionality) to deps in your mix.exs:

defp deps do
  [
    {:simple_bayes, "~> 0.12"},
    {:stemmer,      "~> 1.0"}
  ]
end

If you're on Elixir 1.3 or below, ensure :simple_bayes and optionally :stemmer are started before your application:

def application do
  [applications: [:logger, :simple_bayes, :stemmer]]
end
bayes = SimpleBayes.init()
        |> SimpleBayes.train(:apple, "red sweet")
        |> SimpleBayes.train(:apple, "green", weight: 0.5)
        |> SimpleBayes.train(:apple, "round", weight: 2)
        |> SimpleBayes.train(:banana, "sweet")
        |> SimpleBayes.train(:banana, "green", weight: 0.5)
        |> SimpleBayes.train(:banana, "yellow long", weight: 2)
        |> SimpleBayes.train(:orange, "red")
        |> SimpleBayes.train(:orange, "yellow sweet", weight: 0.5)
        |> SimpleBayes.train(:orange, "round", weight: 2)

bayes |> SimpleBayes.classify_one("Maybe green maybe red but definitely round and sweet.")
# => :apple

bayes |> SimpleBayes.classify("Maybe green maybe red but definitely round and sweet.")
# => [
#   apple:  0.18519202529366116,
#   orange: 0.14447781772131096,
#   banana: 0.10123406763124557
# ]

bayes |> SimpleBayes.classify("Maybe green maybe red but definitely round and sweet.", top: 2)
# => [
#   apple:  0.18519202529366116,
#   orange: 0.14447781772131096,
# ]

With and without word stemming (requires a stem function, we recommend Stemmer):

SimpleBayes.init()
|> SimpleBayes.train(:apple, "buying apple")
|> SimpleBayes.train(:banana, "buy banana")
|> SimpleBayes.classify("buy apple")
# => [
#   banana: 0.05719389206673358,
#   apple: 0.05719389206673358
# ]

SimpleBayes.init(stem: &Stemmer.stem/1) # or any other stemming function
|> SimpleBayes.train(:apple, "buying apple")
|> SimpleBayes.train(:banana, "buy banana")
|> SimpleBayes.classify("buy apple")
# => [
#   apple: 0.18096114003107086,
#   banana: 0.15054767928902865
# ]

Configuration (Optional)

For application wide configuration, in your application's config/config.exs:

config :simple_bayes, model: :multinomial
config :simple_bayes, storage: :memory
config :simple_bayes, default_weight: 1
config :simple_bayes, smoothing: 0
config :simple_bayes, stem: false
config :simple_bayes, top: nil
config :simple_bayes, stop_words: ~w(
  a about above after again against all am an and any are aren't as at be
  because been before being below between both but by can't cannot could
  couldn't did didn't do does doesn't doing don't down during each few for from
  further had hadn't has hasn't have haven't having he he'd he'll he's her here
  here's hers herself him himself his how how's i i'd i'll i'm i've if in into
  is isn't it it's its itself let's me more most mustn't my myself no nor not of
  off on once only or other ought our ours ourselves out over own same shan't
  she she'd she'll she's should shouldn't so some such than that that's the
  their theirs them themselves then there there's these they they'd they'll
  they're they've this those through to too under until up very was wasn't we
  we'd we'll we're we've were weren't what what's when when's where where's
  which while who who's whom why why's with won't would wouldn't you you'd
  you'll you're you've your yours yourself yourselves
)

Alternatively, you may pass in the configuration options when you initialise:

SimpleBayes.init(
  model:          :multinomial,
  storage:        :memory,
  default_weight: 1,
  smoothing:      0,
  stem:           false,
  top:            nil,
  stop_words:     []
)

Available options for :model are:

  • :multinomial (default)
  • :binarized_multinomial
  • :bernoulli

Available options for :storage are:

  • :memory (default, can also be used by any database, see below for more details)
  • :file_system
  • :dets

Storage options have extra configurations:

Memory

  • :namespace - optional, it's only useful when you want to load by the namespace

File System

  • :file_path

Dets

  • :file_path

File System vs Dets

File system encodes and decodes data using base64, whereas Dets is a native Erlang library. Performance wise file system with base64 tends to be faster with less data, and Dets faster with more data. YMMV, please do your own comparison.

Configuration Examples

# application-wide configuration:
config :simple_bayes, storage: :file_system
config :simple_bayes, file_path: "path/to/the/file.txt"

# per-initialization configuration:
SimpleBayes.init(
  storage: :file_system,
  file_path: "path/to/the/file.txt"
)

Storage Usage

opts = [
  storage:   :file_system,
  file_path: "test/temp/file_sysmte_test.txt"
]

SimpleBayes.init(opts)
|> SimpleBayes.train(:apple, "red sweet")
|> SimpleBayes.train(:apple, "green", weight: 0.5)
|> SimpleBayes.train(:apple, "round", weight: 2)
|> SimpleBayes.train(:banana, "sweet")
|> SimpleBayes.save()

SimpleBayes.load(opts)
|> SimpleBayes.train(:banana, "green", weight: 0.5)
|> SimpleBayes.train(:banana, "yellow long", weight: 2)
|> SimpleBayes.train(:orange, "red")
|> SimpleBayes.train(:orange, "yellow sweet", weight: 0.5)
|> SimpleBayes.train(:orange, "round", weight: 2)
|> SimpleBayes.save()

SimpleBayes.load(opts)
|> SimpleBayes.classify("Maybe green maybe red but definitely round and sweet")

In-memory save/2, load/1 and the encoded_data option

Calling SimpleBayes.save/2 is unnecessary for :memory storage. However, when using the in-memory storage, you are able to get the encoded data - this is useful if you would like to store the encoded data in your persistence of choice. For example:

{:ok, _pid, encoded_data} = SimpleBayes.init()
|> SimpleBayes.train(:apple, "red sweet")
|> SimpleBayes.train(:apple, "green", weight: 0.5)
|> SimpleBayes.train(:apple, "round", weight: 2)
|> SimpleBayes.train(:banana, "sweet")
|> SimpleBayes.save()

# now store `encoded_data` in your database of choice
# once `encoded_data` is fetched again from the database, you are then able to:

SimpleBayes.load(encoded_data: encoded_data)
|> SimpleBayes.train(:banana, "green", weight: 0.5)
|> SimpleBayes.train(:banana, "yellow long", weight: 2)
|> SimpleBayes.train(:orange, "red")
|> SimpleBayes.train(:orange, "yellow sweet", weight: 0.5)
|> SimpleBayes.train(:orange, "round", weight: 2)
|> SimpleBayes.classify("Maybe green maybe red but definitely round and sweet")

Changelog

Please see CHANGELOG.md.

License

Licensed under MIT.

More Repositories

1

crawler

A high performance web crawler / scraper in Elixir.
Elixir
917
star
2

jquery-endless-scroll

Endless/infinite scrolling/pagination.
CoffeeScript
838
star
3

angel_nest

Project code name: Angel Nest. :)
Ruby
775
star
4

api_taster

A quick and easy way to visually test your Rails application's API.
Ruby
729
star
5

datamappify

Compose, decouple and manage domain logic and data persistence separately. Works particularly great for composing form objects!
Ruby
332
star
6

opq

Elixir queue! A simple, in-memory queue with worker pooling and rate limiting in Elixir.
Elixir
255
star
7

stemmer

An English (Porter2) stemming implementation in Elixir.
Elixir
149
star
8

bustle

Activities recording and retrieving using a simple Pub/Sub-like interface.
Ruby
93
star
9

ruby_decorators

Ruby method decorators inspired by Python.
Ruby
63
star
10

inherited_resources_views

Share and DRY up views between resources. Use with Inherited Resources.
Ruby
60
star
11

jquery-inline-confirmation

Inline Confirmation plugin for jQuery. One of the less obtrusive ways of implementing confirmation dialogues.
JavaScript
53
star
12

toy-robot-elixir

The infamous Toy Robot code test done in Elixir.
Elixir
45
star
13

skinny-coffee-machine

A simple JavaScript state machine with observers, for browsers and Node.js.
JavaScript
42
star
14

kohana-phamlp

This module is a bridge between the Kohana PHP framework (http://kohanaframework.org/) and the PHamlP library (http://code.google.com/p/phamlp/).
PHP
25
star
15

authlite

Authlite, an auth module for Kohana PHP framework, it offers greater flexibility than the official Auth module.
PHP
23
star
16

dotfiles

My dotfiles
Shell
18
star
17

amaze_hands

Amaze Hands is an amazing tool developed for analysing Kanban board cards.
Ruby
15
star
18

kthrottler

A Kohana port of Action Throtller (for Rails): http://github.com/fredwu/action_throttler
PHP
14
star
19

jquery-slideshow-lite

An extremely lightweight slideshow plugin for jQuery.
JavaScript
14
star
20

code-test-2016-cultureamp

Ruby
13
star
21

README-xplor

Fred @ Xplor - how to work with me.
10
star
22

code-test-2016-myob

Ruby
8
star
23

code-test-2016-trunkplatform

Ruby
6
star
24

action_throttler

An easy to use Rails plugin to quickly throttle application actions based on configurable duration and limit.
Ruby
6
star
25

app_reset

Resets (and if available, seeds) your databases.
Ruby
6
star
26

yield.rb

Aggregated token amounts and values. Supports ApeBoard, YieldWatch, Binance, CoinGecko and more.
Ruby
5
star
27

layerful

Layerful PHP framework.
4
star
28

advent_of_code_2018

https://adventofcode.com/2018/about
Elixir
4
star
29

security_guard

A collection of useful tools for auditing data and performing security checks.
Ruby
3
star
30

ruby-slim-tmbundle

https://github.com/slim-template/ruby-slim.tmbundle
3
star
31

fredwu.me-v3

JavaScript
3
star
32

flower

Playground to test out the Lotus framework.
Ruby
2
star
33

kata-poker-hands-elixir

A coding kata for comparing poker hands - Elixir version.
Elixir
2
star
34

jqstub

A simple stub library for jQuery / Zepto objects.
JavaScript
1
star
35

project_retard

One sale a day e-commerce platform built on Ruby on Rails.
JavaScript
1
star
36

spiky_xml

Just a spike on XML parsing in different environments.
JavaScript
1
star
37

reacraft

Ruby
1
star
38

toy-robot-lolz

It's art. And it's beautiful.
Ruby
1
star
39

kata-poker-hands-ruby

A coding kata for comparing poker hands - Ruby version.
Ruby
1
star
40

code-test-2016-adslot

CoffeeScript
1
star