These are some simply written machine learning algorithms. They are easier to follow than the optimized libraries, and easier to tweak if you want to experiment. They use plain Java types and have few or no dependencies. SwiftLearner is easy to fork; you can also copy-paste the individual methods.
Some of the methods are very short, thanks to the elegance of the classic algorithms, and the expressive power of Scala. Some are optimized slightly, just enough to accommodate the test datasets. Those are not idiomatic Scala, closer to CS 101 while loops, which are longer, but perform better. They are still easy to follow.
Use this project as a prototyping library, a cookbook, or a cheat sheet. For high performance and rich features, there are better options. Still, these methods are fully functional and work well for small datasets.
The name comes from Fallout, the greatest game ever. Fallout is a trademark of Bethesda Softworks LLC.
To make one ML enthusiast happy, please star or fork this project ;)
- Perceptron (tests) A single layer, single node granddaddy of neural networks.
- Backprop (tests) A neural network with one hidden layer, using backpropagation.
- Genetic Algorithm (tests) Genetic Algorithm with elitist tournament selection.
- Gaussian Naive Bayes (tests) Gaussian naive Bayes classifier for continuous parameters.
- Bernoulli Naive Bayes (tests) Bernoulli naive Bayes classifier for binary parameters.
- k-Nearest Neighbors (tests) k-Nearest Neighbors classifier.
- k-Means (tests) k-Means clustering.
- Softmax (tests) Softmax (multinomial logistic) regression with SGD and AdaGrad.
Most of the examples I wrote so far are small enough to fit in the tests, so take a look there.
One example is classifying the classic Fisher Iris flower dataset with different algorithms:
- BackpropClassifier: 96% accuracy
- GeneticIris: 94% accuracy
- GaussianNaiveBayes: 94% accuracy
- KNearestNeighbors: 94% accuracy
- KMeans: semi-supervised clustering, 87% accuracy
- SoftmaxClassifier: 90% accuracy
The accuracy for backprop and the genetic algorithm goes higher with longer training; these figures are for the quick settings in the automated tests.
This is based on Expedia hotel recommendations competition on Kaggle
I have extracted a subset of the fields and data rows to test with NN/Backprop. This is not a full solution, only a small technical demo:
SwiftLearner backprop classifier scales fine to thousands of inputs and millions of examples. The prediction accuracy achieved so far is 0.058, which is nothing spectacular, but certainly an evidence of some learning, compared to a random guess at 0.01.
Another classic example is classifying the handwritten digits from the MNIST database:
- SoftmaxClassifier: 92% accuracy
- BackpropClassifier: 95% accuracy
- KNearestNeighbors: 89% accuracy
- BernoulliNaiveBayes: 84% accuracy
Add the following line to your build.sbt:
libraryDependencies += "com.danylchuk" %% "swiftlearner" % "0.2.0"
This is free software under a BSD-style license. Copyright (c) 2016 Valentyn Danylchuk. See LICENSE for details.