• Stars
    star
    127
  • Rank 282,790 (Top 6 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created almost 4 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

rubicon-ml

Test Package Publish Package Publish Docs edgetest

Conda Version PyPi Version Binder

Purpose

rubicon-ml is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a repeatable and searchable way. Its git integration associates these inputs and outputs directly with the model code that produced them to ensure full auditability and reproducibility for both developers and stakeholders alike. While experimenting, the dashboard makes it easy to explore, filter, visualize, and share recorded work.

p.s. If you're looking for Rubicon, the Java/ObjC Python bridge, visit this instead.


Components

rubicon-ml is composed of three parts:

  • A Python library for storing and retrieving model inputs, outputs, and analyses to filesystems that’s powered by fsspec
  • A dashboard for exploring, comparing, and visualizing logged data built with dash
  • And a process for sharing a selected subset of logged data with collaborators or reviewers that leverages intake

Workflow

Use rubicon_ml to capture model inputs and outputs over time. It can be easily integrated into existing Python models or pipelines and supports both concurrent logging (so multiple experiments can be logged in parallel) and asynchronous communication with S3 (so network reads and writes won’t block).

Meanwhile, periodically review the logged data within the Rubicon dashboard to steer the model tweaking process in the right direction. The dashboard lets you quickly spot trends by exploring and filtering your logged results and visualizes how the model inputs impacted the model outputs.

When the model is ready for review, Rubicon makes it easy to share specific subsets of the data with model reviewers and stakeholders, giving them the context necessary for a complete model review and approval.

Use

Check out the interactive notebooks in this Binder to try rubicon_ml for yourself.

Here's a simple example:

from rubicon_ml import Rubicon

rubicon = Rubicon(
    persistence="filesystem", root_dir="/rubicon-root", auto_git_enabled=True
)

project = rubicon.create_project(
    "Hello World", description="Using rubicon to track model results over time."
)

experiment = project.log_experiment(
    training_metadata=[SklearnTrainingMetadata("sklearn.datasets", "my-data-set")],
    model_name="My Model Name",
    tags=["my_model_name"],
)

experiment.log_parameter("n_estimators", n_estimators)
experiment.log_parameter("n_features", n_features)
experiment.log_parameter("random_state", random_state)

accuracy = rfc.score(X_test, y_test)
experiment.log_metric("accuracy", accuracy)

Then explore the project by running the dashboard:

rubicon_ml ui --root-dir /rubicon-root

Documentation

For a full overview, visit the docs. If you have suggestions or find a bug, please open an issue.

Install

The Python library is available on Conda Forge via conda and PyPi via pip.

conda config --add channels conda-forge
conda install rubicon-ml

or

pip install rubicon-ml

Develop

The project uses conda to manage environments. First, install conda. Then use conda to setup a development environment:

conda env create -f environment.yml
conda activate rubicon-ml-dev

Finally, install rubicon_ml locally into the newly created environment.

pip install -e ".[all]"

Testing

The tests are separated into unit and integration tests. They can be run directly in the activated dev environment via pytest tests/unit or pytest tests/integration. Or by simply running pytest to execute all of them.

Note: some integration tests are intentionally marked to control when they are run (i.e. not during CICD). These tests include:

  • Integration tests that write to physical filesystems - local and S3. Local files will be written to ./test-rubicon relative to where the tests are run. An S3 path must also be provided to run these tests. By default, these tests are disabled. To enable them, run:

    pytest -m "write_files" --s3-path "s3://my-bucket/my-key"
    
  • Integration tests that run Jupyter notebooks. These tests are a bit slower than the rest of the tests in the suite as they need to launch Jupyter servers. By default, they are enabled. To disable them, run:

    pytest -m "not run_notebooks and not write_files"
    

    Note: When simply running pytest, -m "not write_files" is the default. So, we need to also apply it when disabling notebook tests.

Code Formatting

Install and configure pre-commit to automatically run black, flake8, and isort during commits:

Now pre-commit will run automatically on git commit and will ensure consistent code format throughout the project. You can format without committing via pre-commit run or skip these checks with git commit --no-verify.

More Repositories

1

DataProfiler

What's in your data? Extract schema, statistics and entities from datasets
Python
1,426
star
2

react-native-pathjs-charts

Android and iOS charts based on react-native-svg and paths-js
JavaScript
878
star
3

datacompy

Pandas, Polars, and Spark DataFrame comparison for humans and more!
Python
480
star
4

cqrs-manager-for-distributed-reactive-services

Experimental CQRS and Event Sourcing service
Java
304
star
5

SWHttpTrafficRecorder

A simple library empowering you to record/capture HTTP(s) traffic of an iOS app for mocking/stubbing later.
Objective-C
205
star
6

fpe

A format-preserving encryption implementation in Go
Go
202
star
7

giraffez

User-friendly Teradata client for Python
Python
108
star
8

locopy

locopy: Loading/Unloading to Redshift and Snowflake using Python.
Python
104
star
9

checks-out

Checks-Out pull request approval system
Go
76
star
10

dataCompareR

dataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.
R
75
star
11

stack-deployment-tool

Go
66
star
12

bash_shell_mock

A shell script mocking utility/framework for the BASH shell
Shell
66
star
13

architecture-viewer

Visualize your PlantUML sequence diagrams as interactive architecture diagrams!
JavaScript
60
star
14

go-future-context

A simple Future (Promise) library for Go.
Go
54
star
15

AI_Dictionary_English_Spanish

TeX
49
star
16

acronym-decoder

Acronym Decoder
TypeScript
43
star
17

synthetic-data

Generating complex, nonlinear datasets appropriate for use with deep learning/black box models which 'need' nonlinearity

Python
43
star
18

Particle-Cloud-Framework

Python
36
star
19

slackbot-destroyer

📣 ❌ Slack integration that can destroy all incoming messages from Slackbot.
Python
34
star
20

global-attribution-mapping

GAM (Global Attribution Mapping) explains the landscape of neural network predictions across subpopulations
Python
33
star
21

federated-model-aggregation

The Federated Model Aggregation (FMA) Service is a collection of installable python components that make up the generic workflow/infrastructure needed for federated learning.
Python
30
star
22

oas-nodegen

A library for generating completely customizable code from the Open API Specification (FKA Swagger) RESTful API documentation using the scripting power of Node.js.
JavaScript
28
star
23

easy-screenshots

Android Instrumentation Test Screenshots made Easy.
Java
21
star
24

edgetest

edgetest is a tox-inspired python library that will loop through your project's dependencies, and check if your project is compatible with the latest version of each dependency
Python
19
star
25

ablation

Evaluating XAI methods through ablation studies.
Python
15
star
26

serverless-shell

⚡️🐚 Serverless Shell with environment variables plugin
JavaScript
14
star
27

OAuthClient

Awesome OAuth Client for Java.
Java
13
star
28

otvPlots

ovtPlots: An R Package for Variable Level Monitoring
R
13
star
29

json-syntax

Generates functions to convert Python classes to and from JSON friendly objects.
Python
12
star
30

screen-object

screen-object (ruby gem for mobile app automation)
Ruby
12
star
31

jwt-security

JavaScript
11
star
32

BankAccountStarter-API-reference-app

CSS
10
star
33

CreditOffers-API-reference-app

JavaScript
10
star
34

Rewards-API-reference-app

JavaScript
10
star
35

local-crontab

🗺️⏰ Convert local crontabs to UTC crontabs
JavaScript
8
star
36

modtracker

JSON unmarshaling in Go that includes detection of modified fields
Go
7
star
37

grpc-cucumber-js

JavaScript
7
star
38

edgetest-hub

hub plugin for edgetest
Python
2
star
39

oas-nodegen-example

Example project that shows how to customize generated code to fit a specific design pattern using oas-nodegen
Java
2
star
40

edgetest-conda

Conda plugin for edgetest
Python
1
star
41

edgetest-pip-tools

pip-tools plugin for edgetest
Python
1
star