• Stars
    star
    799
  • Rank 57,011 (Top 2 %)
  • Language
    Go
  • License
    Other
  • Created about 13 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Naive Bayesian Classification for Golang.

Naive Bayesian Classification

Perform naive Bayesian classification into an arbitrary number of classes on sets of strings. bayesian also supports term frequency-inverse document frequency calculations (TF-IDF).

Copyright (c) 2011-2017. Jake Brukhman. ([email protected]). All rights reserved. See the LICENSE file for BSD-style license.


Background

This is meant to be an low-entry barrier Go library for basic Bayesian classification. See code comments for a refresher on naive Bayesian classifiers, and please take some time to understand underflow edge cases as this otherwise may result in innacurate classifications.


Installation

Using the go command:

go get github.com/jbrukh/bayesian
go install !$

Documentation

See the GoPkgDoc documentation here.


Features

  • Conditional probability and "log-likelihood"-like scoring.
  • Underflow detection.
  • Simple persistence of classifiers.
  • Statistics.
  • TF-IDF support.

Example 1 (Simple Classification)

To use the classifier, first you must create some classes and train it:

import "github.com/jbrukh/bayesian"

const (
    Good bayesian.Class = "Good"
    Bad  bayesian.Class = "Bad"
)

classifier := bayesian.NewClassifier(Good, Bad)
goodStuff := []string{"tall", "rich", "handsome"}
badStuff  := []string{"poor", "smelly", "ugly"}
classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff,  Bad)

Then you can ascertain the scores of each class and the most likely class your data belongs to:

scores, likely, _ := classifier.LogScores(
                        []string{"tall", "girl"},
                     )

Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:

probs, likely, _ := classifier.ProbScores(
                        []string{"tall", "girl"},
                     )

Example 2 (TF-IDF Support)

To use the TF-IDF classifier, first you must create some classes and train it and you need to call ConvertTermsFreqToTfIdf() AFTER training and before calling classification methods such as LogScores, SafeProbScores, and ProbScores)

import "github.com/jbrukh/bayesian"

const (
    Good bayesian.Class = "Good"
    Bad bayesian.Class = "Bad"
)

// Create a classifier with TF-IDF support.
classifier := bayesian.NewClassifierTfIdf(Good, Bad)

goodStuff := []string{"tall", "rich", "handsome"}
badStuff  := []string{"poor", "smelly", "ugly"}

classifier.Learn(goodStuff, Good)
classifier.Learn(badStuff,  Bad)

// Required
classifier.ConvertTermsFreqToTfIdf()

Then you can ascertain the scores of each class and the most likely class your data belongs to:

scores, likely, _ := classifier.LogScores(
    []string{"tall", "girl"},
)

Magnitude of the score indicates likelihood. Alternatively (but with some risk of float underflow), you can obtain actual probabilities:

probs, likely, _ := classifier.ProbScores(
    []string{"tall", "girl"},
)

Use wisely.

More Repositories

1

gpt-jargon

Jargon is a natural language programming language specified and executed by LLMs like GPT-4.
Python
222
star
2

gpt4-spanish

A GPT-4 prompt that helps you learn Spanish.
137
star
3

craigsuck

Python script that periodically probes the Craigslist RSS feeds for new listings.
Python
39
star
4

window

Efficient moving window for high-speed data processing.
Go
32
star
5

sentiment

Classifier of Twitter sentiment
Go
21
star
6

ggit

ggit
Go
17
star
7

git-gofmt

Tiny git hook for gofmt
13
star
8

go-banzhaf

Go implementation of Banzhaf power index calculation
Go
9
star
9

metctools

Kohera's Marketcetera Tools
Java
5
star
10

hazelchat

A distributed local chat based on Hazelcast.
Java
4
star
11

gplot

Binary Go pipe to gnuplot.
Go
3
star
12

ml

Machine learning algorithms for Go.
Go
3
star
13

datastructs

Several fast data structures for Google Go
Go
2
star
14

Launcher

Bootstrapper for Marketcetera StrategyAgent
Java
2
star
15

sparser

Simple Parser with SableCC
Java
2
star
16

jargon-vscode-support

Syntax highlighting for Jargon LLM pseudolanguage in VSCode.
2
star
17

spot

Simple command-line Spotify web interface
Python
1
star
18

deskception

Script that periodically archives your Mac desktop into a never-ending nested metachain.
Shell
1
star
19

visualizer

Python
1
star
20

hazelcq

Experiments with Hazelcast.
Java
1
star
21

ordermap

Distributed order map proof-of-concept
Java
1
star
22

moloch-power

A quick summary of Banzhaf voting power in MolochDAO
Python
1
star
23

goneuro

Go driver for NeuroSky devices.
Go
1
star
24

javabits

Bits of Java for educational purposes.
1
star
25

cryptomarriage

Cryptomarriage smart contracts.
JavaScript
1
star
26

pybits

Educational snippets of python code.
Python
1
star
27

decrazifier

decrazifier
Go
1
star
28

atomic

Simple atomic values.
Go
1
star
29

jfa

Playing around with finite state automata
Java
1
star