• Stars
    star
    716
  • Rank 63,241 (Top 2 %)
  • Language
    Ruby
  • License
    BSD 3-Clause "New...
  • Created about 7 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Rumale is a machine learning library in Ruby

Rumale

Rumale

Build Status Gem Version BSD 3-Clause License Documentation

Rumale (Ruby machine learning) is a machine learning library in Ruby. Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python. Rumale supports Support Vector Machine, Logistic Regression, Ridge, Lasso, Multi-layer Perceptron, Naive Bayes, Decision Tree, Gradient Tree Boosting, Random Forest, K-Means, Gaussian Mixture Model, DBSCAN, Spectral Clustering, Mutidimensional Scaling, t-SNE, Fisher Discriminant Analysis, Neighbourhood Component Analysis, Principal Component Analysis, Non-negative Matrix Factorization, and many other algorithms.

Installation

Add this line to your application's Gemfile:

gem 'rumale'

And then execute:

$ bundle

Or install it yourself as:

$ gem install rumale

Documentation

Usage

Example 1. Pendigits dataset classification

Rumale provides function loading libsvm format dataset file. We start by downloading the pendigits dataset from LIBSVM Data web site.

$ wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/pendigits
$ wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/pendigits.t

Training of the classifier with Linear SVM and RBF kernel feature map is the following code.

require 'rumale'

# Load the training dataset.
samples, labels = Rumale::Dataset.load_libsvm_file('pendigits')

# Map training data to RBF kernel feature space.
transformer = Rumale::KernelApproximation::RBF.new(gamma: 0.0001, n_components: 1024, random_seed: 1)
transformed = transformer.fit_transform(samples)

# Train linear SVM classifier.
classifier = Rumale::LinearModel::SVC.new(reg_param: 0.0001)
classifier.fit(transformed, labels)

# Save the model.
File.open('transformer.dat', 'wb') { |f| f.write(Marshal.dump(transformer)) }
File.open('classifier.dat', 'wb') { |f| f.write(Marshal.dump(classifier)) }

Classifying testing data with the trained classifier is the following code.

require 'rumale'

# Load the testing dataset.
samples, labels = Rumale::Dataset.load_libsvm_file('pendigits.t')

# Load the model.
transformer = Marshal.load(File.binread('transformer.dat'))
classifier = Marshal.load(File.binread('classifier.dat'))

# Map testing data to RBF kernel feature space.
transformed = transformer.transform(samples)

# Classify the testing data and evaluate prediction results.
puts("Accuracy: %.1f%%" % (100.0 * classifier.score(transformed, labels)))

# Other evaluating approach
# results = classifier.predict(transformed)
# evaluator = Rumale::EvaluationMeasure::Accuracy.new
# puts("Accuracy: %.1f%%" % (100.0 * evaluator.score(results, labels)))

Execution of the above scripts result in the following.

$ ruby train.rb
$ ruby test.rb
Accuracy: 98.5%

Example 2. Cross-validation

require 'rumale'

# Load dataset.
samples, labels = Rumale::Dataset.load_libsvm_file('pendigits')

# Define the estimator to be evaluated.
lr = Rumale::LinearModel::LogisticRegression.new

# Define the evaluation measure, splitting strategy, and cross validation.
ev = Rumale::EvaluationMeasure::Accuracy.new
kf = Rumale::ModelSelection::StratifiedKFold.new(n_splits: 5, shuffle: true, random_seed: 1)
cv = Rumale::ModelSelection::CrossValidation.new(estimator: lr, splitter: kf, evaluator: ev)

# Perform 5-cross validation.
report = cv.perform(samples, labels)

# Output result.
mean_accuracy = report[:test_score].sum / kf.n_splits
puts "5-CV mean accuracy: %.1f%%" % (100.0 * mean_accuracy)

Execution of the above scripts result in the following.

$ ruby cross_validation.rb
5-CV mean accuracy: 95.5%

Speedup

Numo::Linalg

Rumale uses Numo::NArray for typed arrays. Loading the Numo::Linalg allows to perform matrix and vector product of Numo::NArray using BLAS libraries. Some machine learning algorithms frequently compute matrix and vector products, the execution speed of such algorithms can be expected to be accelerated.

Install Numo::Linalg gem.

$ gem install numo-linalg

In ruby script, just load Numo::Linalg along with Rumale.

require 'numo/linalg/autoloader'
require 'rumale'

Numo::Linalg allows user selection of background libraries for BLAS/LAPACK. Instead of fixing the background library, Numo::OpenBLAS and Numo::BLIS are available to simplify installation.

Numo::TinyLinalg

Numo::TinyLinalg is a subset library from Numo::Linalg consisting only of methods used in machine learning algorithms. Numo::TinyLinalg only supports OpenBLAS as a backend library for BLAS and LAPACK. If the OpenBLAS library is not found during installation, Numo::TinyLinalg downloads and builds that.

$ gem install numo-tiny_linalg

Load Numo::TinyLinalg instead of Numo::Linalg.

require 'numo/tiny_linalg'

Numo::Linalg = Numo::TinyLinalg

require 'rumale'

Parallel

Several estimators in Rumale support parallel processing. Parallel processing in Rumale is realized by Parallel gem, so install and load it.

$ gem install parallel
require 'parallel'
require 'rumale'

Estimators that support parallel processing have n_jobs parameter. When -1 is given to n_jobs parameter, all processors are used.

estimator = Rumale::Ensemble::RandomForestClassifier.new(n_jobs: -1, random_seed: 1)

Related Projects

  • Rumale::SVM provides support vector machine algorithms in LIBSVM and LIBLINEAR with Rumale interface.
  • Rumale::Torch provides the learning and inference by the neural network defined in torch.rb with Rumale interface.

License

The gem is available as open source under the terms of the BSD-3-Clause License.

More Repositories

1

llama_cpp.rb

llama_cpp provides Ruby bindings for llama.cpp
C
97
star
2

hnswlib-node

hnswlib-node provides Node.js bindings for Hnswlib
C++
80
star
3

suika

Suika 🍉 is a Japanese morphological analyzer written in pure Ruby
Ruby
34
star
4

annoy-rb

annoy-rb provides Ruby bindings for the Annoy (Approximate Nearest Neighbors Oh Yeah).
C++
28
star
5

magro

Magro is a minimal image processing library in Ruby
Ruby
18
star
6

rumale-torch

Rumale::Torch provides the learning and inference by the neural network defined in torch.rb with the same interface as Rumale.
Ruby
14
star
7

hnswlib.rb

hnswlib.rb provides Ruby bindings for Hnswlib
C++
12
star
8

hanny

Hanny is a Hash-based Approximate Nearest Neighbor search library in Ruby.
Ruby
9
star
9

sentencepiece.rb

sentencepiece.rb provides Ruby bindings for SentencePiece
C++
5
star
10

numo-openblas

Numo::OpenBLAS builds and uses OpenBLAS as a background library for Numo::Linalg
Ruby
5
star
11

mmh3

A pure Ruby implementation of MurmurHash3.
Ruby
4
star
12

rumale-svm

Rumale::SVM provides support vector machine algorithms of LIBSVM and LIBLINEAR with Rumale interface
Ruby
4
star
13

numo-blis

Numo::BLIS builds and uses BLIS as a background library for Numo::Linalg
Ruby
4
star
14

numo-random

Numo::Random provides random number generation with several distributions for Numo::NArray.
Ruby
3
star
15

darts-clone.rb

Darts-clone.rb provides Ruby bindings for the Darts-clone.
C++
3
star
16

numo-pocketfft

Numo::Pocketfft provides functions for descrete Fourier Transform based on pocketfft
C
3
star
17

numo-libsvm

Numo::Libsvm is a Ruby gem binding to the LIBSVM
C++
3
star
18

lbfgsb.rb

Lbfgsb.rb provides Ruby bindings for L-BFGS-B.
C
2
star
19

numo-linalg-randsvd

Numo::Linalg.randsvd is a module function on Numo::Linalg for truncated singular value decomposition with randomized algorithm.
Ruby
2
star
20

yoshoku.github.io

HTML
1
star
21

gpt_neox_client

gpt_neox_client is a simple client for GPT-NeoX in Ruby
C
1
star
22

randsvd

RandSVD is a class that performs truncated singular value decomposition using a randomized algorithm.
Ruby
1
star
23

mopti

Mopti is a multivariate optimization library in Ruby
Ruby
1
star