• This repository has been archived on 09/Dec/2022
  • Stars
    star
    208
  • Rank 189,015 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 5 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Collection of GitHub Actions That Facilitate MLOps

A Collection of GitHub Actions That Facilitate MLOps

Materials that accompany the talk MLOps with GitHub Actions & Kubernetes

A Collection Of GitHub Actions That Enable MLOps and CI/CD For Machine Learning:

Below is a collection of GitHub Actions that we are curating or building that facilitate machine learning workflows:

1. ChatOps

2. Submitting Argo workflows

Argo allows you to orechestrate machine learning pipelines that run on Kubernetes.

3. Query Experiment Tracking Results

4. Publish Docker Images

5. Compile and Push Pipeline to Kubeflow

What is MLOps?

See this demo explaining this project and more background on what MLOps is and why it is needed.

Example Of What We Are Trying To Solve With MLOps:

The code-review process re: Machine Learning often involves making decisions about merging or deploying code where critical information regarding model performance and statistics are not readily available. This is due to the friction in including logging and statistics from model training runs in Pull Requests. For example, consider this excerpt from a real pull-request concerning a machine learning project:

In an ideal world, the participants in the above code review should be provided with all of the context necessary to evaluate the PR, including:

  • Model performance metrics and statistics
  • Comparison with baselines and other models on a holdout dataset
  • Verification that the metrics and statistics correspond to the code changed in the PR, by tying the results to a commit SHA.
  • Data versioning
  • etc.

How We Can Solve This With GitHub Actions:

GitHub Actions allow you to compose a set of pre-built CI/CD tools or make your own, allowing you to compose a workflow that enables MLOps from GitHub. The below example composes the following Actions into useful pipeline:

ChatOps → Deploy Argo ML Workflows → Weights & Biases Experiment Tracking -> Deploy Model:

View the demo pull request here. What is shown above is only the tip of the iceberg!

Explanation of Files In This Repo:

  • .github/workflows/
    • chatops.yaml: This workflow files handles two different scenarios (1) when I want to execute a full model run with the command /run-full-test and (2) when I want to deploy a model using the chatops command /deploy <run_id>. Note that you do not need to use chatops for your workflow, this was just the author's preferred way of triggering items. You can use one of the many other events that can trigger Actions. Furthermore, these chatops commands uses a pre-defined action machine-learning-apps/actions-chatops@master that performs an Action by authenticating another GitHub app. The steps taken in this workfow trigger either the workflow defined in ml-cicd.yaml or deploy.yaml.
    • ml-cicd.yaml: This workflow is triggered by the chatops command /run-full-test from events that occur in the chatoops.yaml file. This executes the full training run of the model.
    • deploy.yaml: This workflow is triggered by the chatops command /deploy <run_id>. This workflow fetches the appropriate model artificacts associated with the <run_id> from the experiment tracking system (which is Weights & Biases in this case), and deploys this model using Google Cloud Functions.
    • repo-dispatch.yaml: This workflow is triggered at the end of the Argo Workflow created in the step Submit Argo Deployment in ml-cicd.yaml. The terminal nodes of the Argo workflow creates a repository dispatch event which triggers this workflow.
    • see-payload.yaml & see_token.yaml - these files were used for debugging and can be safely ignored.
  • /action_files: these are a collection of shell scripts and python files that are run at various steps in the workflow files mentioned above.
  • /src - these are the files that define the pre-processing and training of the model. These files are copied into the appropriate Docker container images in the workflow when the workflow is triggered.

Recommended Way Of Getting Started With GitHub Actions and MLOps

The example in this repo is end-to-end and requires familiarity with Kubernetes and GitHub Actions to fully understand. When starting out, we recommend automating one part of your workflow, such as deploying models. As you learn more about the syntax of GitHub Actions you can increase the scope of your workflow as appropriate.

We also encourage you to make GitHub Actions for others to use to accomodate other tools.

For any questions, please open an issue in this repo.

More Repositories

1

Issue-Label-Bot

Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"
SCSS
326
star
2

ml-template-azure

Template for getting started with automated ML Ops on Azure Machine Learning
Python
126
star
3

actions-chatops

Actions That Enables ChatOps In a PR Through a GitHub App
Python
74
star
4

actions-app-token

Impersonate a GitHub App Token inside Actions
Python
61
star
5

wandb-action

GitHub Action That Retrieves Model Runs From Weights & Biases
Python
60
star
6

gpr-docker-publish

GitHub Action That Publishes Docker Images to GPR
Shell
53
star
7

actions-argo

GitHub Action That Submits Argo Workflows From GitHub
Shell
38
star
8

mlops-dashboard

Prototype MLOps Dashboard Using GitHub Pages
Jupyter Notebook
23
star
9

pr-comment

Action that makes a pr comment from a file artifact
Ruby
20
star
10

hands-on-ml2

Notebooks from https://github.com/ageron/handson-ml2 as blog posts using fastpages
Jupyter Notebook
18
star
11

self-hosted-k8s-runner

Run a self-hosted Actions runner on Kubernetes.
Shell
17
star
12

gke-argo

GitHub Action That Submits Argo Workflows For Execution on Your GKE Cluster
Shell
16
star
13

gke-kubeconfig

GitHub Action that Fetches Kubeconfig From GKE and saves to `$GITHUB_WORKSPACE/.kube/config`
Shell
13
star
14

mystify

Jupyter backend for textual notebooks in MyST format
Python
12
star
15

website-docs

Docs for ml-ops.github.com
Ruby
8
star
16

IssuesLanguageModel

A Language model trained on a large corpus of GitHub Issues
Jupyter Notebook
7
star
17

mdparse

Parsing of markdown files for deep learning
Python
6
star
18

actions-chatops-deprecated

GitHub Action For ChatOps in Pull Requests
Shell
6
star
19

MLOps

Managing machine learning operations using GitHub and other cloud providers.
CSS
5
star
20

great-expectations-render

Jupyter Notebook
4
star
21

actions-impersonate

Have Your Action Impersonate a GitHub App In Order To Overcome Restrictions
Python
2
star
22

dvc-example

1
star
23

demo-videos

1
star
24

actions

Repository for Discussing and Building GitHub Actions for Machine Learning
1
star