SwiftLearner: Scala machine learning library

These are some simply written machine learning algorithms. They are easier to follow than the optimized libraries, and easier to tweak if you want to experiment. They use plain Java types and have few or no dependencies. SwiftLearner is easy to fork; you can also copy-paste the individual methods.

Some of the methods are very short, thanks to the elegance of the classic algorithms, and the expressive power of Scala. Some are optimized slightly, just enough to accommodate the test datasets. Those are not idiomatic Scala, closer to CS 101 while loops, which are longer, but perform better. They are still easy to follow.

Use this project as a prototyping library, a cookbook, or a cheat sheet. For high performance and rich features, there are better options. Still, these methods are fully functional and work well for small datasets.

The name comes from Fallout, the greatest game ever. Fallout is a trademark of Bethesda Softworks LLC.

To make one ML enthusiast happy, please star or fork this project ;)

Perceptron (tests) A single layer, single node granddaddy of neural networks.
Backprop (tests) A neural network with one hidden layer, using backpropagation.
Genetic Algorithm (tests) Genetic Algorithm with elitist tournament selection.
Gaussian Naive Bayes (tests) Gaussian naive Bayes classifier for continuous parameters.
Bernoulli Naive Bayes (tests) Bernoulli naive Bayes classifier for binary parameters.
k-Nearest Neighbors (tests) k-Nearest Neighbors classifier.
k-Means (tests) k-Means clustering.
Softmax (tests) Softmax (multinomial logistic) regression with SGD and AdaGrad.

Examples

Most of the examples I wrote so far are small enough to fit in the tests, so take a look there.

Fisher iris flowers dataset

One example is classifying the classic Fisher Iris flower dataset with different algorithms:

BackpropClassifier: 96% accuracy
GeneticIris: 94% accuracy
GaussianNaiveBayes: 94% accuracy
KNearestNeighbors: 94% accuracy
KMeans: semi-supervised clustering, 87% accuracy
SoftmaxClassifier: 90% accuracy

The accuracy for backprop and the genetic algorithm goes higher with longer training; these figures are for the quick settings in the automated tests.

Hotel recommendation

This is based on Expedia hotel recommendations competition on Kaggle

I have extracted a subset of the fields and data rows to test with NN/Backprop. This is not a full solution, only a small technical demo:

SwiftLearner hotels example

SwiftLearner backprop classifier scales fine to thousands of inputs and millions of examples. The prediction accuracy achieved so far is 0.058, which is nothing spectacular, but certainly an evidence of some learning, compared to a random guess at 0.01.

MNIST handwritten digits

Another classic example is classifying the handwritten digits from the MNIST database:

SoftmaxClassifier: 92% accuracy
BackpropClassifier: 95% accuracy
KNearestNeighbors: 89% accuracy
BernoulliNaiveBayes: 84% accuracy

Setup

Add the following line to your build.sbt:

libraryDependencies += "com.danylchuk" %% "swiftlearner" % "0.2.0"

valdanylchuk/swiftlearner

valdanylchuk

Reviews

Repository Details

SwiftLearner: Scala machine learning library

Contents

Examples

Fisher iris flowers dataset

Hotel recommendation

MNIST handwritten digits

Setup

License

More Repositories