• Stars
    star
    122
  • Rank 282,969 (Top 6 %)
  • Language
    Java
  • License
    GNU Affero Genera...
  • Created about 8 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Java library and command-line application for converting XGBoost models to PMML

JPMML-XGBoost Build Status

Java library and command-line application for converting XGBoost models to PMML.

Prerequisites

  • Java 1.8 or newer.

Features

Supports all XGBoost versions 0.4 through 1.7(.2).

  • Functionality:
    • Model data formats:
      • Binary (XGBoost 0.4 and newer)
      • JSON (XGBoost 1.0 and newer)
      • Universal Binary JSON (UBJSON) (XGBoost 1.6 and newer)
    • Gradient boosters:
      • GBTree
      • DART
    • Feature maps
    • Split types:
      • Numeric (XGBoost 0.4 and newer)
      • Categorical, One-Hot-Encoding (OHE)-based (XGBoost 1.3 and newer)
      • Categorical, Set-based (XGBoost 1.6 and newer)
      • Missing values (XGBoost 0.4 and newer)
    • Objective functions:
      • Regression
      • Binary- and multi-class classification
      • Ranking
      • Survival Analysis
  • Conversion options:
    • Truncation (ntree_limit aka iteration_range parameters)
    • Elimination of empty and constant trees
    • Tree rearrangements:
      • Compaction and flattening (reshaping deep binary trees into shallow multi-way trees)
      • Pruning
  • Production quality:

Installation

Enter the project root directory and build using Apache Maven:

mvn clean install

The build produces a library JAR file pmml-xgboost/target/pmml-xgboost-1.7-SNAPSHOT.jar, and an executable uber-JAR file pmml-xgboost-example/target/pmml-xgboost-example-executable-1.7-SNAPSHOT.jar.

Usage

A typical workflow can be summarized as follows:

  1. Use XGBoost to train a model.
  2. Save the model and the associated feature map to files in a local filesystem.
  3. Use the JPMML-XGBoost command-line converter application to turn those two files to a PMML file.

The XGBoost side of operations

Training a binary classification model using the Audit.csv dataset.

R language

library("r2pmml")
library("xgboost")

df = read.csv("Audit.csv", stringsAsFactors = TRUE)

# Three continuous features, followed by five categorical features
X = df[c("Age", "Hours", "Income", "Education", "Employment", "Gender", "Marital", "Occupation")]
y = df["Adjusted"]

audit.formula = formula("~ . - 1")
audit.frame = model.frame(audit.formula, data = X, na.action = na.pass)
# Define rules for binarizing categorical features into binary indicator features
audit.contrasts = lapply(X[sapply(X, is.factor)], contrasts, contrasts = FALSE)
# Perform binarization
audit.matrix = model.matrix(audit.formula, data = audit.frame, contrasts.arg = audit.contrasts)

# Generate feature map based on audit.frame (not audit.matrix), because data.frame holds richer column meta-information than matrix
audit.fmap = r2pmml::as.fmap(audit.frame)
r2pmml::write.fmap(audit.fmap, "Audit.fmap")

audit.xgb = xgboost(data = audit.matrix, label = as.matrix(y), objective = "binary:logistic", nrounds = 131)
xgb.save(audit.xgb, "XGBoostAudit.model")

Python language - Learning API

Using an Audit.fmap feature map file (works with any XGBoost version):

from sklearn2pmml.xgboost import make_feature_map
from xgboost import DMatrix

import pandas
import xgboost

df = pandas.read_csv("Audit.csv")

# Three continuous features, followed by five categorical features
X = df[["Age", "Hours", "Income", "Education", "Employment", "Gender", "Marital", "Occupation"]]
y = df["Adjusted"]

# Convert categorical features into binary indicator features
X = pandas.get_dummies(data = X, prefix_sep = "=", dtype = bool)

audit_fmap = make_feature_map(X, enable_categorical = False)
audit_fmap.save("Audit.fmap")

audit_dmatrix = DMatrix(data = X, label = y)

audit_xgb = xgboost.train(params = {"objective" : "binary:logistic"}, dtrain = audit_dmatrix, num_boost_round = 131)
audit_xgb.save_model("XGBoostAudit.model")

The same, but using an embedded feature map (works with XGBoost 1.4 and newer):

from xgboost import DMatrix

import pandas
import xgboost

def to_fmap_type(dtype):
    # Continuous integers
    if dtype == "int":
        return "int"
    # Continuous floats
    elif dtype == "float":
        return "float"
    # Binary indicators (ie. 0/1 values) generated by pandas.get_dummies(X)
    elif dtype == "bool":
        return "i"
    else:
        raise ValueError(dtype)

df = pandas.read_csv("Audit.csv")

# Three continuous features, followed by five categorical features
X = df[["Age", "Hours", "Income", "Education", "Employment", "Gender", "Marital", "Occupation"]]
y = df["Adjusted"]

# Convert categorical features into binary indicator features
X = pandas.get_dummies(data = X, prefix_sep = "=", dtype = bool)

feature_names = X.columns.values
feature_types = [to_fmap_type(dtype) for dtype in X.dtypes]

# Constructing a DMatrix with explicit feature names and feature types
audit_dmatrix = DMatrix(data = X, label = y, feature_names = feature_names, feature_types = feature_types)

audit_xgb = xgboost.train(params = {"objective" : "binary:logistic"}, dtrain = audit_dmatrix, num_boost_round = 131)
audit_xgb.save_model("XGBoostAudit.model")

Python language - Scikit-Learn API

Using an Audit.fmap feature map file (works with any XGBoost version):

from sklearn.preprocessing import LabelEncoder
from sklearn2pmml.xgboost import make_feature_map
from xgboost.sklearn import XGBClassifier

import pandas

df = pandas.read_csv("Audit.csv")

# Three continuous features, followed by five categorical features
X = df[["Age", "Hours", "Income", "Education", "Employment", "Gender", "Marital", "Occupation"]]
y = df["Adjusted"]

# Convert categorical features into binary indicator features
X = pandas.get_dummies(data = X, prefix_sep = "=", dtype = bool)

label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)

audit_fmap = make_feature_map(X, enable_categorical = False)
audit_fmap.save("Audit.fmap")

classifier = XGBClassifier(objective = "binary:logistic", n_estimators = 131)
classifier.fit(X, y)

audit_xgb = classifier.get_booster()
audit_xgb.save_model("XGBoostAudit.model")

The JPMML-XGBoost side of operations

Converting the model file XGBoostAudit.model (binary data format) together with the associated feature map file Audit.fmap to a PMML file XGBoostAudit.pmml:

java -jar pmml-xgboost-example/target/pmml-xgboost-example-executable-1.7-SNAPSHOT.jar --model-input XGBoostAudit.model --fmap-input Audit.fmap --target-name Adjusted --pmml-output XGBoostAudit.pmml

If the XGBoost model contains an embedded feature map, then the --fmap-input command-line option may be omitted.

Getting help:

java -jar pmml-xgboost-example/target/pmml-xgboost-example-executable-1.7-SNAPSHOT.jar --help

Documentation

License

JPMML-XGBoost is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use JPMML-XGBoost in a proprietary software project, then it is possible to enter into a licensing agreement which makes JPMML-XGBoost available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

JPMML-XGBoost is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact [email protected]

More Repositories

1

jpmml-evaluator

Java Evaluator API for PMML
Java
864
star
2

sklearn2pmml

Python library for converting Scikit-Learn pipelines to PMML
Python
666
star
3

jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
Java
517
star
4

jpmml-sparkml

Java library and command-line application for converting Apache Spark ML pipelines to PMML
Java
265
star
5

jpmml-lightgbm

Java library and command-line application for converting LightGBM models to PMML
Java
160
star
6

jpmml-model

Java Class Model API for PMML
Java
147
star
7

jpmml-evaluator-spark

PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)
Java
94
star
8

pyspark2pmml

Python library for converting Apache Spark ML pipelines to PMML
Python
92
star
9

jpmml

Java PMML API (legacy codebase)
Java
81
star
10

jpmml-tensorflow

Java library and command-line application for converting TensorFlow models to PMML
Java
75
star
11

r2pmml

R library for converting R models to PMML
R
71
star
12

jpmml-sparkml-xgboost

JPMML-SparkML plugin for converting XGBoost4J-Spark models to PMML
Java
36
star
13

jpmml-r

Java library and command-line application for converting R models to PMML
Java
32
star
14

jpmml-android

PMML evaluator library for the Android operating system (http://www.android.com/)
Java
27
star
15

jpmml-transpiler

Java Transpiler (Translator + Compiler) API for PMML
Java
23
star
16

jpmml-h2o

Java library and command-line application for converting H2O.ai models to PMML
Java
20
star
17

sklearn2pmml-plugin

The simplest way to extend sklearn2pmml package with custom transformation and model types
Java
19
star
18

jpmml-evaluator-python

PMML evaluator library for Python
Python
19
star
19

jpmml-converter

Java library for authoring PMML
Java
15
star
20

jpmml-cascading

PMML evaluator library for the Cascading application framework (http://www.cascading.org/)
Java
13
star
21

jpmml-hive

PMML evaluator library for the Apache Hive data warehouse software (legacy codebase)
Java
13
star
22

jpmml-postgresql

PMML evaluator library for the PostgreSQL database (http://www.postgresql.org/)
Java
11
star
23

jpmml-catboost

Java library and command-line application for converting CatBoost models to PMML
Java
7
star
24

jpmml-evaluator-hive

PMML evaluator library for the Apache Hive data warehouse software (http://hive.apache.org/)
Java
6
star
25

sparklyr2pmml

R library for converting Apache Spark ML pipelines to PMML
R
6
star
26

jpmml-statsmodels

Java library and command-line application for converting StatsModels models to PMML
Java
5
star
27

jpmml-storm

PMML evaluator library for the Apache Storm distributed realtime computation system (https://storm.apache.org/)
Java
5
star
28

jpmml-pig

PMML evaluator library for the Apache Pig platform (legacy codebase)
Java
4
star
29

jpmml-sparkml-bootstrap

The simplest way to get started with a JPMML-SparkML powered software project (legacy codebase)
Java
3
star
30

jpmml-python

Java library for converting Python models to PMML
Java
3
star
31

jpmml-example

Example JPMML-enabled software development project (legacy codebase)
Java
2
star
32

jpmml-codevault

Java utilities for protecting Java application code
Java
1
star
33

jpmml-codemodel

Java utilities for generating, compiling and packaging Java application code
Java
1
star