• Stars
    star
    666
  • Rank 67,706 (Top 2 %)
  • Language
    Python
  • License
    GNU Affero Genera...
  • Created about 9 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python library for converting Scikit-Learn pipelines to PMML

SkLearn2PMML Build Status

Python package for converting Scikit-Learn pipelines to PMML.

Features

This package is a thin Python wrapper around the JPMML-SkLearn library.

Prerequisites

  • Java 1.8 or newer. The Java executable must be available on system path.
  • Python 2.7, 3.4 or newer.

Installation

Installing a release version from PyPI:

pip install sklearn2pmml

Alternatively, installing the latest snapshot version from GitHub:

pip install --upgrade git+https://github.com/jpmml/sklearn2pmml.git

Usage

A typical workflow can be summarized as follows:

  1. Create a PMMLPipeline object, and populate it with pipeline steps as usual. Class sklearn2pmml.pipeline.PMMLPipeline extends class sklearn.pipeline.Pipeline with the following functionality:
  • If the PMMLPipeline.fit(X, y) method is invoked with pandas.DataFrame or pandas.Series object as an X argument, then its column names are used as feature names. Otherwise, feature names default to "x1", "x2", .., "x{number_of_features}".
  • If the PMMLPipeline.fit(X, y) method is invoked with pandas.Series object as an y argument, then its name is used as the target name (for supervised models). Otherwise, the target name defaults to "y".
  1. Fit and validate the pipeline as usual.
  2. Optionally, compute and embed verification data into the PMMLPipeline object by invoking PMMLPipeline.verify(X) method with a small but representative subset of training data.
  3. Convert the PMMLPipeline object to a PMML file in local filesystem by invoking utility method sklearn2pmml.sklearn2pmml(pipeline, pmml_destination_path).

Developing a simple decision tree model for the classification of iris species:

import pandas

iris_df = pandas.read_csv("Iris.csv")

iris_X = iris_df[iris_df.columns.difference(["Species"])]
iris_y = iris_df["Species"]

from sklearn.tree import DecisionTreeClassifier
from sklearn2pmml.pipeline import PMMLPipeline

pipeline = PMMLPipeline([
	("classifier", DecisionTreeClassifier())
])
pipeline.fit(iris_X, iris_y)

from sklearn2pmml import sklearn2pmml

sklearn2pmml(pipeline, "DecisionTreeIris.pmml", with_repr = True)

Developing a more elaborate logistic regression model for the same:

import pandas

iris_df = pandas.read_csv("Iris.csv")

iris_X = iris_df[iris_df.columns.difference(["Species"])]
iris_y = iris_df["Species"]

from sklearn_pandas import DataFrameMapper
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn2pmml.decoration import ContinuousDomain
from sklearn2pmml.pipeline import PMMLPipeline

pipeline = PMMLPipeline([
	("mapper", DataFrameMapper([
		(["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), SimpleImputer()])
	])),
	("pca", PCA(n_components = 3)),
	("selector", SelectKBest(k = 2)),
	("classifier", LogisticRegression(multi_class = "ovr"))
])
pipeline.fit(iris_X, iris_y)
pipeline.verify(iris_X.sample(n = 15))

from sklearn2pmml import sklearn2pmml

sklearn2pmml(pipeline, "LogisticRegressionIris.pmml", with_repr = True)

Documentation

Integrations:

AutoML and other kinds of workflow automations:

Extensions:

Miscellaneous:

Archived:

De-installation

Uninstalling:

pip uninstall sklearn2pmml

License

SkLearn2PMML is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use SkLearn2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes SkLearn2PMML available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

SkLearn2PMML is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact [email protected]

More Repositories

1

jpmml-evaluator

Java Evaluator API for PMML
Java
864
star
2

jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
Java
517
star
3

jpmml-sparkml

Java library and command-line application for converting Apache Spark ML pipelines to PMML
Java
265
star
4

jpmml-lightgbm

Java library and command-line application for converting LightGBM models to PMML
Java
160
star
5

jpmml-model

Java Class Model API for PMML
Java
147
star
6

jpmml-xgboost

Java library and command-line application for converting XGBoost models to PMML
Java
122
star
7

jpmml-evaluator-spark

PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)
Java
94
star
8

pyspark2pmml

Python library for converting Apache Spark ML pipelines to PMML
Python
92
star
9

jpmml

Java PMML API (legacy codebase)
Java
81
star
10

jpmml-tensorflow

Java library and command-line application for converting TensorFlow models to PMML
Java
75
star
11

r2pmml

R library for converting R models to PMML
R
71
star
12

jpmml-sparkml-xgboost

JPMML-SparkML plugin for converting XGBoost4J-Spark models to PMML
Java
36
star
13

jpmml-r

Java library and command-line application for converting R models to PMML
Java
32
star
14

jpmml-android

PMML evaluator library for the Android operating system (http://www.android.com/)
Java
27
star
15

jpmml-transpiler

Java Transpiler (Translator + Compiler) API for PMML
Java
23
star
16

jpmml-h2o

Java library and command-line application for converting H2O.ai models to PMML
Java
20
star
17

sklearn2pmml-plugin

The simplest way to extend sklearn2pmml package with custom transformation and model types
Java
19
star
18

jpmml-evaluator-python

PMML evaluator library for Python
Python
19
star
19

jpmml-converter

Java library for authoring PMML
Java
15
star
20

jpmml-cascading

PMML evaluator library for the Cascading application framework (http://www.cascading.org/)
Java
13
star
21

jpmml-hive

PMML evaluator library for the Apache Hive data warehouse software (legacy codebase)
Java
13
star
22

jpmml-postgresql

PMML evaluator library for the PostgreSQL database (http://www.postgresql.org/)
Java
11
star
23

jpmml-catboost

Java library and command-line application for converting CatBoost models to PMML
Java
7
star
24

jpmml-evaluator-hive

PMML evaluator library for the Apache Hive data warehouse software (http://hive.apache.org/)
Java
6
star
25

sparklyr2pmml

R library for converting Apache Spark ML pipelines to PMML
R
6
star
26

jpmml-statsmodels

Java library and command-line application for converting StatsModels models to PMML
Java
5
star
27

jpmml-storm

PMML evaluator library for the Apache Storm distributed realtime computation system (https://storm.apache.org/)
Java
5
star
28

jpmml-pig

PMML evaluator library for the Apache Pig platform (legacy codebase)
Java
4
star
29

jpmml-sparkml-bootstrap

The simplest way to get started with a JPMML-SparkML powered software project (legacy codebase)
Java
3
star
30

jpmml-python

Java library for converting Python models to PMML
Java
3
star
31

jpmml-example

Example JPMML-enabled software development project (legacy codebase)
Java
2
star
32

jpmml-codevault

Java utilities for protecting Java application code
Java
1
star
33

jpmml-codemodel

Java utilities for generating, compiling and packaging Java application code
Java
1
star