• Stars
    star
    193
  • Rank 199,963 (Top 4 %)
  • Language
    Python
  • Created almost 5 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Basic and advanced MLflow examples for many ML flavors

MLflow Examples

MLflow examples - basic and advanced.

This repo consists of two sets of code artifacts:

Last updated: 2023-07-12

Examples

Python examples

  • sklearn - Scikit-learn model - train and score.
    • Canonical example that shows multiple ways to train and score.
    • Options to log ONNX model, autolog and save model signature.
    • Train locally or against a Databricks cluster.
    • Score real-time against a local web server or Docker container.
    • Score batch with mlflow.load_model or Spark UDF>
  • sparkml - Spark ML model - train and score. ONNX too.
  • Keras/Tensorflow - train and score. ONNX working too.
  • xgboost - XGBoost (sklearn wrapper) model - train and score.
  • catboost - Catboost (using sklearn) model - train and score. ONNX working too.
  • pytorch - Pytorch - train and score. ONNX too.
  • onnx_sklearn - ONNX - Sklearn to ONNX train and score.
  • h2o - H2O model - train and score - with AutoML. ONNX too.
  • model_registry - Jupyter notebook sampling the Model Registry API.
  • e2e-ml-pipeline - End-to-end ML pipeline - training to real-time scoring.
  • reproduce - Reproduce an existing run.
  • scoring_server_benchmarks - Scoring server performance benchmarks.

The sklearn and Spark ML examples also demonstrate:

  • Different ways to run a project with the mlflow CLI
  • Real-time server scoring with docker containers
  • Running a project against a Databricks cluster

Scala examples - uses the MLflow Java client

  • hello_world - Hello World - no training or scoring.
  • sparkml - Scala train and score - Spark ML and XGBoost4j
  • mleap - Score an MLeap model with MLeap runtime (no Spark dependencies).
  • onnx - Score an ONNX model (that was created in Scikit-learn) in Java.

Databricks

Docker

Setup

Use Python 3.8.

Miniconda

  • Install miniconda3: https://conda.io/miniconda.html
  • Create the environment: conda env create --file conda.yaml
  • Source the environment: source activate mlflow-examples

Virtual Environment

Create a virtual environment.

python -m venv mlflow-examples
source mlflow-examples/bin/activate

pip install the libraries in conda.yaml.

MLflow Server

You can either run the MLflow tracking server directly on your laptop or with Docker.

Docker

See docker/docker-server/README.

Laptop Tracking Server

You can either use the local file store or a database-backed store. See MLflow Storage documentation.

Note that new MLflow 1.4.0 Model Registry functionality seems only to work with the database-backed store.

First activate the virtual environment.

cd $HOME/mlflow-server
source $HOME/virtualenvs/mlflow-examples/bin/activate

File Store

Start the MLflow tracking server.

mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri $PWD/mlruns --default-artifact-root $PWD/mlruns

Database-backed store - MySQL

  • Install MySQL
  • Create an mlflow user with password.
  • Create a database mlflow

Start the MLflow Tracking Server

mlflow server --host 0.0.0.0 --port 5000 \
  --backend-store-uri mysql://MLFLOW_USER:MLFLOW_PASSWORD@localhost:3306/mlflow \
  --default-artifact-root $PWD/mlruns  

Database-backed store - SQLite

mlflow server --host 0.0.0.0 --port 5000 \
  --backend-store-uri sqlite:///mlflow.db \
  --default-artifact-root $PWD/mlruns  

Examples

Most of the examples use a DecisionTreeRegressor model with the wine quality data set.

As such, the python/sparkml and scala/sparkml are isomorphic as they are simply language variants of the same Spark ML algorithm.

Setup

Before running an experiment

export MLFLOW_TRACKING_URI=http://localhost:5000

Data

Data is in the data folder.

wine-quality-white.csv contains the training data.

Real-time scoring prediction data

  • The prediction files contain the first three records of wine-quality-white.csv.
  • The format is standard MLflow JSON-serialized Pandas DataFrames split orientation format described here.
  • Data in predict-wine-quality.json is directly derived from wine-quality-white.csv.
    • The values are a mix of integers and doubles.
  • Apparently if you score predict-wine-quality.json against an MLeap SageMaker container, you will get errors as the server is unable to handle integers (bug).
  • Hence predict-wine-quality-float.json whose data is all doubles.

More Repositories

1

mlflow-export-import

Export and import MLflow experiments, runs or registered models
HTML
76
star
2

mlflow-spark-summit-2019

MLFlow Spark Summit 2019 Presentation
Jupyter Notebook
67
star
3

mlflow-tools

Tools for MLflow
Python
34
star
4

docker-spark-hive-metastore

Spark and Hive docker containers sharing a common MySQL metastore
25
star
5

mlflow-fun

MLflow samples - deprecated
Python
22
star
6

spark-python-scala-udf

Demonstrates calling a Scala UDF from Python using spark-submit with an EGG and JAR
Python
20
star
7

hive-json-schema-gen

Generates Hive schema from JSON
Scala
14
star
8

mlflow-model-monitoring

Python
11
star
9

mlflow-resources

5
star
10

delta-fun

Fun with Delta Lake
Jupyter Notebook
3
star
11

hl7-json-spark

Converts HL7 v2 to JSON and Spark
Scala
3
star
12

databricks-tests

Databricks integration tests
Python
2
star
13

hive-serde-json

Hive SerDe for Flattened JSON
Java
2
star
14

mlflow-scala-client

Scala client for MLflow
Scala
2
star
15

databricks-api-workflow

Workflow client for running jobs with the Databricks REST API
Python
2
star
16

databricks-api-sdk

SDK choices for Databricks REST API
Python
2
star
17

mlflow-reports

MLflow reports
Python
2
star
18

hive-spark-ddl-converter

Converts Hive DDL to Spark DDL
Scala
1
star
19

jaxrs-sample

REST JAX-RS CXF-based sample scaffolding for best REST practices
Java
1
star
20

spark-tools

Spark useful tools
Scala
1
star
21

mlflow-python-client

Python client for MLflow REST API
Python
1
star
22

mlflow-databricks-uc

Databricks Unity Catalog MLflow tools
Python
1
star
23

mlflow-tensorflow-serving

Serve MLflow model with TensorFlow Serving
Python
1
star