• Stars
    star
    1,263
  • Rank 36,187 (Top 0.8 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created almost 8 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

sklearn-porter

Build Status stable branch codecov Binder PyPI PyPI GitHub license

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
It's recommended for limited embedded systems and critical applications where performance matters most.

Navigation: Estimators • Installation • Usage • Known Issues • Development • Citation • License

Estimators

This table gives an overview over all supported combinations of estimators, programming languages and templates.

Programming language
C Go Java JS PHP Ruby
svm.SVC ✓ × ✓ ✓ × ✓ ✓ × ✓ ✓ × ✓ ✓ × ✓ ✓ ×
svm.NuSVC ✓ × ✓ ✓ × ✓ ✓ × ✓ ✓ × ✓ ✓ × ✓ ✓ ×
svm.LinearSVC ✓ × ✓ ✓ × ✓ ✓ × ✓ ✓ × ✓ ✓ × ✓ ✓ ×
tree.DecisionTreeClassifier ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ
ensemble.RandomForestClassifier × ✓ᴾ × × ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ×
ensemble.ExtraTreesClassifier × ✓ᴾ × × ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ✓ᴾ ×
ensemble.AdaBoostClassifier × ✓ᴾ × ✓ᴾ ✓ᴾ ✓ᴾ
neighbors.KNeighborsClassifier ✓ᴾ ✓ᴾ × ✓ᴾ ✓ᴾ × ✓ᴾ ✓ᴾ × ✓ᴾ ✓ᴾ × ✓ᴾ ✓ᴾ ×
naive_bayes.BernoulliNB ✓ᴾ ✓ᴾ × ✓ᴾ ✓ᴾ ×
naive_bayes.GaussianNB ✓ᴾ ✓ᴾ × ✓ᴾ ✓ᴾ ×
neural_network.MLPClassifier ✓ᴾ ✓ᴾ × ✓ᴾ ✓ᴾ ×
neural_network.MLPRegressor ✓ ✓ ×
á´€ á´‡ á´„ á´€ á´‡ á´„ á´€ á´‡ á´„ á´€ á´‡ á´„ á´€ á´‡ á´„ á´€ á´‡ á´„
Template

✓ = support of predict, ᴾ = support of predict_proba, × = not supported or feasible
ᴀ = attached model data, ᴇ = exported model data (JSON), ᴄ = combined model data

Installation

Purpose Version Branch Build Command
Production v0.7.4 stable pip install sklearn-porter
Development v1.0.0 main pip install https://github.com/nok/sklearn-porter/zipball/main

In both environments the only prerequisite is scikit-learn >= 0.17, <= 0.22.

Usage

Binder

Try it out yourself by starting an interactive notebook with Binder: Binder

Basics

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

from sklearn_porter import port, save, make, test

# 1. Load data and train a dummy classifier:
X, y = load_iris(return_X_y=True)
clf = DecisionTreeClassifier()
clf.fit(X, y)

# 2. Port or transpile an estimator:
output = port(clf, language='js', template='attached')
print(output)

# 3. Save the ported estimator:
src_path, json_path = save(clf, language='js', template='exported', directory='/tmp')
print(src_path, json_path)

# 4. Make predictions with the ported estimator:
y_classes, y_probas = make(clf, X[:10], language='js', template='exported')
print(y_classes, y_probas)

# 5. Test always the ported estimator by making an integrity check:
score = test(clf, X[:10], language='js', template='exported')
print(score)

OOP

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

from sklearn_porter import Estimator

# 1. Load data and train a dummy classifier:
X, y = load_iris(return_X_y=True)
clf = DecisionTreeClassifier()
clf.fit(X, y)

# 2. Port or transpile an estimator:
est = Estimator(clf, language='js', template='attached')
output = est.port()
print(output)

# 3. Save the ported estimator:
est.template = 'exported'
src_path, json_path = est.save(directory='/tmp')
print(src_path, json_path)

# 4. Make predictions with the ported estimator:
y_classes, y_probas = est.make(X[:10])
print(y_classes, y_probas)

# 5. Test always the ported estimator by making an integrity check:
score = est.test(X[:10])
print(score)

CLI

In addition you can use the sklearn-porter on the command line. The command calls porter and is available after the installation.

porter {show,port,save} [-h] [-v]

porter show [-l {c,go,java,js,php,ruby}] [-h]

porter port <estimator> [-l {c,go,java,js,php,ruby}]
                        [-t {attached,combined,exported}]
                        [--skip-warnings] [-h]

porter save <estimator> [-l {c,go,java,js,php,ruby}]
                        [-t {attached,combined,exported}]
                        [--directory DIRECTORY]
                        [--skip-warnings] [-h]

You can serialize an estimator and save it locally. For more details you can read the instructions to model persistence.

from joblib import dump

dump(clf, 'estimator.joblib', compress=0)

After that the estimator can be transpiled by using the subcommand port:

porter port estimator.joblib -l js -t attached > estimator.js

For further processing you can pass the result to another applications, e.g. UglifyJS.

porter port estimator.joblib -l js -t attached | uglifyjs --compress -o estimator.min.js

Known Issues

  • In some rare cases the regression tests of the support vector machine, SVC and NuSVC, fail since scikit-learn>=0.22. Because of that a QualityWarning will be raised which should reminds you to evaluate the result by using the test method.

Development

Aliases

The following commands are useful time savers in the daily development:

# Install a Python environment with `conda`:
make setup

# Start a Jupyter notebook with examples:
make notebook

# Start tests on the host or in a separate docker container:
make tests
make tests-docker

# Lint the source code with `pylint`:
make lint

# Generate notebooks with `jupytext`:
make examples

# Deploy a new version with `twine`:
make deploy

Dependencies

The prerequisite is Python 3.6 which you can install with conda:

conda env create -n sklearn-porter_3.6 python=3.6
conda activate sklearn-porter_3.6

After that you have to install all required packages:

pip install --no-cache-dir -e ".[development,examples]"

Environment

All tests run against these combinations of scikit-learn and Python versions:

Python
3.5 3.6 3.7 3.8
scikit-learn 0.17 cython 0.27.3 cython 0.27.3 not supported
by scikit-learn
no support
by scikit-learn
numpy 1.9.3 numpy 1.9.3
scipy 0.16.0 scipy 0.16.0
0.18 cython 0.27.3 cython 0.27.3 not supported
by scikit-learn
not supported
by scikit-learn
numpy 1.9.3 numpy 1.9.3
scipy 0.16.0 scipy 0.16.0
0.19 cython 0.27.3 cython 0.27.3 not supported
by scikit-learn
not supported
by scikit-learn
numpy 1.14.5 numpy 1.14.5
scipy 1.1.0 scipy 1.1.0
0.20 cython 0.27.3 cython 0.27.3 cython 0.27.3 not supported
by joblib
numpy numpy numpy
scipy scipy scipy
0.21 cython cython cython cython
numpy numpy numpy numpy
scipy scipy scipy scipy
0.22 cython cython cython cython
numpy numpy numpy numpy
scipy scipy scipy scipy

For the regression tests we have to use specific compilers and interpreters:

Name Source Version
GCC https://gcc.gnu.org 10.2.1
Go https://golang.org 1.15.15
Java (OpenJDK) https://openjdk.java.net 1.8.0
Node.js https://nodejs.org 12.22.5
PHP https://www.php.net 7.4.28
Ruby https://www.ruby-lang.org 2.7.4

Please notice that in general you can use older compilers and interpreters with the generated source code. For instance you can use Java 1.6 to compile and run models.

Logging

You can activate logging by changing the option logging.level.

from sklearn_porter import options

from logging import DEBUG

options['logging.level'] = DEBUG

Testing

You can run the unit and regression tests either on your local machine (host) or in a separate running Docker container.

pytest tests -v \
  --cov=sklearn_porter \
  --disable-warnings \
  --numprocesses=auto \
  -p no:doctest \
  -o python_files="EstimatorTest.py" \
  -o python_functions="test_*"
docker build \
  -t sklearn-porter \
  --build-arg PYTHON_VER=${PYTHON_VER:-python=3.6} \
  --build-arg SKLEARN_VER=${SKLEARN_VER:-scikit-learn=0.21} \
  .

docker run \
  -v $(pwd):/home/abc/repo \
  --detach \
  --entrypoint=/bin/bash \
  --name test \
  -t sklearn-porter

docker exec -it test ./docker-entrypoint.sh \
  pytest tests -v \
    --cov=sklearn_porter \
    --disable-warnings \
    --numprocesses=auto \
    -p no:doctest \
    -o python_files="EstimatorTest.py" \
    -o python_functions="test_*"

docker rm -f $(docker ps --all --filter name=test -q)

Citation

If you use this implementation in you work, please add a reference/citation to the paper. You can use the following BibTeX entry:

@unpublished{sklearn_porter,
  author = {Darius Morawiec},
  title = {sklearn-porter},
  note = {Transpile trained scikit-learn estimators to C, Java, JavaScript and others},
  url = {https://github.com/nok/sklearn-porter}
}

License

The package is Open Source Software released under the BSD 3-Clause license.

More Repositories

1

leap-motion-processing

Contributed library to use the Leap Motion in Processing.
Java
298
star
2

onedollar-unistroke-recognizer

Implementation of the $1 Gesture Recognizer, a two-dimensional template based gesture recognition, for Processing.
HTML
88
star
3

markdown-toc

Generate and update magically a table of contents based on the headlines of a parsed markdown file.
CoffeeScript
68
star
4

onedollar-unistroke-coffee

Implementation of the $1 Unistroke Recognizer, a two-dimensional template based gesture recognition, in CoffeeScript.
CoffeeScript
60
star
5

soundcloud-java-library

Unofficial Java library, which simplifies the use of the official SoundCloud Java API wrapper.
Java
58
star
6

myo-processing

Contributed library to use multiple Myo's in Processing.
Java
51
star
7

rake-text-ruby

Implementation of the Rapid Automatic Keyword Extraction algorithm in Ruby, a multi-word keywords extraction.
Ruby
37
star
8

redis-processing

[unmaintained] Wrapper to use Redis in Processing. It's based on Jedis, a small Java client by Jonathan Leibiusky.
Java
19
star
9

pipdev

It's an interactive tool for developers to test defined specifiers for version handling.
Python
15
star
10

git-walk

Walk up and down in revisions of a Git repository.
Python
9
star
11

modanet-eval

Initial data evaluation of ModaNet by eBay.
Python
9
star
12

soundcloud-processing

[unmaintained] Library to use the SoundCloud API in Processing.
CSS
8
star
13

arduino-multiple-mpus

[unmaintained] Reading values of multiple MPU-6050's.
Arduino
7
star
14

weka-porter

Transpile trained decision trees from Weka to C, Java or JavaScript.
Python
6
star
15

nikeplus-webcrawler

[unmaintained] A webcrawler as executable RubyGem, which grabs fine-grained data of your personal Nike+ runs and saves these as XML and JSON files.
Ruby
4
star
16

openml-cli

Use the command line tool `openml` to interact with the official API of OpenML.
Python
3
star
17

bash-recipes

Small collection of my written command line helpers.
Shell
2
star
18

tic-toc

Measure and track the wall and CPU time of defined scopes.
Python
1
star