• Stars
    star
    3,907
  • Rank 10,707 (Top 0.3 %)
  • Language
    JavaScript
  • License
    Apache License 2.0
  • Created about 4 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

♾️ CML - Continuous Machine Learning | CI/CD for ML

GHA npm

What is CML? Continuous Machine Learning (CML) is an open-source CLI tool for implementing continuous integration & delivery (CI/CD) with a focus on MLOps. Use it to automate development workflows — including machine provisioning, model training and evaluation, comparing ML experiments across project history, and monitoring changing datasets.

CML can help train and evaluate models — and then generate a visual report with results and metrics — automatically on every pull request.

An example report for a neural style transfer model.

CML principles:

  • GitFlow for data science. Use GitLab or GitHub to manage ML experiments, track who trained ML models or modified data and when. Codify data and models with DVC instead of pushing to a Git repo.
  • Auto reports for ML experiments. Auto-generate reports with metrics and plots in each Git pull request. Rigorous engineering practices help your team make informed, data-driven decisions.
  • No additional services. Build your own ML platform using GitLab, Bitbucket, or GitHub. Optionally, use cloud storage as well as either self-hosted or cloud runners (such as AWS EC2 or Azure). No databases, services or complex setup needed.

Need help? Just want to chat about continuous integration for ML? Visit our Discord channel!

⏯️ Check out our YouTube video series for hands-on MLOps tutorials using CML!

Table of Contents

  1. Setup (GitLab, GitHub, Bitbucket)
  2. Usage
  3. Getting started (tutorial)
  4. Using CML with DVC
  5. Advanced Setup (Self-hosted, local package)
  6. Example projects

Setup

You'll need a GitLab, GitHub, or Bitbucket account to begin. Users may wish to familiarize themselves with Github Actions or GitLab CI/CD. Here, will discuss the GitHub use case.

GitLab

Please see our docs on CML with GitLab CI/CD and in particular the personal access token requirement.

Bitbucket

Please see our docs on CML with Bitbucket Cloud.

GitHub

The key file in any CML project is .github/workflows/cml.yaml:

name: your-workflow-name
on: [push]
jobs:
  run:
    runs-on: ubuntu-latest
    # optionally use a convenient Ubuntu LTS + DVC + CML image
    # container: ghcr.io/iterative/cml:0-dvc2-base1
    steps:
      - uses: actions/checkout@v3
      # may need to setup NodeJS & Python3 on e.g. self-hosted
      # - uses: actions/setup-node@v3
      #   with:
      #     node-version: '16'
      # - uses: actions/setup-python@v4
      #   with:
      #     python-version: '3.x'
      - uses: iterative/setup-cml@v1
      - name: Train model
        run: |
          # Your ML workflow goes here
          pip install -r requirements.txt
          python train.py
      - name: Write CML report
        env:
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          # Post reports as comments in GitHub PRs
          cat results.txt >> report.md
          cml comment create report.md

Usage

We helpfully provide CML and other useful libraries pre-installed on our custom Docker images. In the above example, uncommenting the field container: ghcr.io/iterative/cml:0-dvc2-base1) will make the runner pull the CML Docker image. The image already has NodeJS, Python 3, DVC and CML set up on an Ubuntu LTS base for convenience.

CML Functions

CML provides a number of functions to help package the outputs of ML workflows (including numeric data and visualizations about model performance) into a CML report.

Below is a table of CML functions for writing markdown reports and delivering those reports to your CI system.

Function Description Example Inputs
cml runner launch Launch a runner locally or hosted by a cloud provider See Arguments
cml comment create Return CML report as a comment in your GitLab/GitHub workflow <path to report> --head-sha <sha>
cml check create Return CML report as a check in GitHub <path to report> --head-sha <sha>
cml pr create Commit the given files to a new branch and create a pull request <path>...
cml tensorboard connect Return a link to a Tensorboard.dev page --logdir <path to logs> --title <experiment title> --md

CML Reports

The cml comment create command can be used to post reports. CML reports are written in markdown (GitHub, GitLab, or Bitbucket flavors). That means they can contain images, tables, formatted text, HTML blocks, code snippets and more — really, what you put in a CML report is up to you. Some examples:

🗒️ Text Write to your report using whatever method you prefer. For example, copy the contents of a text file containing the results of ML model training:

cat results.txt >> report.md

🖼️ Images Display images using the markdown or HTML. Note that if an image is an output of your ML workflow (i.e., it is produced by your workflow), it can be uploaded and included automaticlly to your CML report. For example, if graph.png is output by python train.py, run:

echo "![](./graph.png)" >> report.md
cml comment create report.md

Getting Started

  1. Fork our example project repository.

⚠️ Note that if you are using GitLab, you will need to create a Personal Access Token for this example to work.

⚠️ The following steps can all be done in the GitHub browser interface. However, to follow along with the commands, we recommend cloning your fork to your local workstation:

git clone https://github.com/<your-username>/example_cml
  1. To create a CML workflow, copy the following into a new file, .github/workflows/cml.yaml:
name: model-training
on: [push]
jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
      - uses: iterative/setup-cml@v1
      - name: Train model
        env:
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          pip install -r requirements.txt
          python train.py

          cat metrics.txt >> report.md
          echo "![](./plot.png)" >> report.md
          cml comment create report.md
  1. In your text editor of choice, edit line 16 of train.py to depth = 5.

  2. Commit and push the changes:

git checkout -b experiment
git add . && git commit -m "modify forest depth"
git push origin experiment
  1. In GitHub, open up a pull request to compare the experiment branch to master.

Shortly, you should see a comment from github-actions appear in the pull request with your CML report. This is a result of the cml send-comment function in your workflow.

This is the outline of the CML workflow:

  • you push changes to your GitHub repository,
  • the workflow in your .github/workflows/cml.yaml file gets run, and
  • a report is generated and posted to GitHub.

CML functions let you display relevant results from the workflow — such as model performance metrics and visualizations — in GitHub checks and comments. What kind of workflow you want to run, and want to put in your CML report, is up to you.

Using CML with DVC

In many ML projects, data isn't stored in a Git repository, but needs to be downloaded from external sources. DVC is a common way to bring data to your CML runner. DVC also lets you visualize how metrics differ between commits to make reports like this:

The .github/workflows/cml.yaml file used to create this report is:

name: model-training
on: [push]
jobs:
  run:
    runs-on: ubuntu-latest
    container: ghcr.io/iterative/cml:0-dvc2-base1
    steps:
      - uses: actions/checkout@v3
      - name: Train model
        env:
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: |
          # Install requirements
          pip install -r requirements.txt

          # Pull data & run-cache from S3 and reproduce pipeline
          dvc pull data --run-cache
          dvc repro

          # Report metrics
          echo "## Metrics" >> report.md
          git fetch --prune
          dvc metrics diff master --show-md >> report.md

          # Publish confusion matrix diff
          echo "## Plots" >> report.md
          echo "### Class confusions" >> report.md
          dvc plots diff --target classes.csv --template confusion -x actual -y predicted --show-vega master > vega.json
          vl2png vega.json -s 1.5 > confusion_plot.png
          echo "![](./confusion_plot.png)" >> report.md

          # Publish regularization function diff
          echo "### Effects of regularization" >> report.md
          dvc plots diff --target estimators.csv -x Regularization --show-vega master > vega.json
          vl2png vega.json -s 1.5 > plot.png
          echo "![](./plot.png)" >> report.md

          cml comment create report.md

⚠️ If you're using DVC with cloud storage, take note of environment variables for your storage format.

Configuring Cloud Storage Providers

There are many supported could storage providers. Here are a few examples for some of the most frequently used providers:

S3 and S3-compatible storage (Minio, DigitalOcean Spaces, IBM Cloud Object Storage...)
# Github
env:
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  AWS_SESSION_TOKEN: ${{ secrets.AWS_SESSION_TOKEN }}

👉 AWS_SESSION_TOKEN is optional.

👉 AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY can also be used by cml runner to launch EC2 instances. See [Environment Variables].

Azure
env:
  AZURE_STORAGE_CONNECTION_STRING:
    ${{ secrets.AZURE_STORAGE_CONNECTION_STRING }}
  AZURE_STORAGE_CONTAINER_NAME: ${{ secrets.AZURE_STORAGE_CONTAINER_NAME }}
Aliyun
env:
  OSS_BUCKET: ${{ secrets.OSS_BUCKET }}
  OSS_ACCESS_KEY_ID: ${{ secrets.OSS_ACCESS_KEY_ID }}
  OSS_ACCESS_KEY_SECRET: ${{ secrets.OSS_ACCESS_KEY_SECRET }}
  OSS_ENDPOINT: ${{ secrets.OSS_ENDPOINT }}
Google Storage

⚠️ Normally, GOOGLE_APPLICATION_CREDENTIALS is the path of the json file containing the credentials. However in the action this secret variable is the contents of the file. Copy the json contents and add it as a secret.

env:
  GOOGLE_APPLICATION_CREDENTIALS: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS }}
Google Drive

⚠️ After configuring your Google Drive credentials you will find a json file at your_project_path/.dvc/tmp/gdrive-user-credentials.json. Copy its contents and add it as a secret variable.

env:
  GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}

Advanced Setup

Self-hosted (On-premise or Cloud) Runners

GitHub Actions are run on GitHub-hosted runners by default. However, there are many great reasons to use your own runners: to take advantage of GPUs, orchestrate your team's shared computing resources, or train in the cloud.

☝️ Tip! Check out the official GitHub documentation to get started setting up your own self-hosted runner.

Allocating Cloud Compute Resources with CML

When a workflow requires computational resources (such as GPUs), CML can automatically allocate cloud instances using cml runner. You can spin up instances on AWS, Azure, GCP, or Kubernetes.

For example, the following workflow deploys a g4dn.xlarge instance on AWS EC2 and trains a model on the instance. After the job runs, the instance automatically shuts down.

You might notice that this workflow is quite similar to the basic use case above. The only addition is cml runner and a few environment variables for passing your cloud service credentials to the workflow.

Note that cml runner will also automatically restart your jobs (whether from a GitHub Actions 35-day workflow timeout or a AWS EC2 spot instance interruption).

name: Train-in-the-cloud
on: [push]
jobs:
  deploy-runner:
    runs-on: ubuntu-latest
    steps:
      - uses: iterative/setup-cml@v1
      - uses: actions/checkout@v3
      - name: Deploy runner on EC2
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: |
          cml runner launch \
            --cloud=aws \
            --cloud-region=us-west \
            --cloud-type=g4dn.xlarge \
            --labels=cml-gpu
  train-model:
    needs: deploy-runner
    runs-on: [self-hosted, cml-gpu]
    timeout-minutes: 50400 # 35 days
    container:
      image: ghcr.io/iterative/cml:0-dvc2-base1-gpu
      options: --gpus all
    steps:
      - uses: actions/checkout@v3
      - name: Train model
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
        run: |
          pip install -r requirements.txt
          python train.py

          cat metrics.txt > report.md
          cml comment create report.md

In the workflow above, the deploy-runner step launches an EC2 g4dn.xlarge instance in the us-west region. The model-training step then runs on the newly-launched instance. See [Environment Variables] below for details on the secrets required.

🎉 Note that jobs can use any Docker container! To use functions such as cml send-comment from a job, the only requirement is to have CML installed.

Docker Images

The CML Docker image (ghcr.io/iterative/cml or iterativeai/cml) comes loaded with Python, CUDA, git, node and other essentials for full-stack data science. Different versions of these essentials are available from different image tags. The tag convention is {CML_VER}-dvc{DVC_VER}-base{BASE_VER}{-gpu}:

{BASE_VER} Software included (-gpu)
0 Ubuntu 18.04, Python 2.7 (CUDA 10.1, CuDNN 7)
1 Ubuntu 20.04, Python 3.8 (CUDA 11.2, CuDNN 8)

For example, iterativeai/cml:0-dvc2-base1-gpu, or ghcr.io/iterative/cml:0-dvc2-base1.

Arguments

The cml runner launch function accepts the following arguments:

  --labels                                  One or more user-defined labels for
                                            this runner (delimited with commas)
                                                       [string] [default: "cml"]
  --idle-timeout                            Time to wait for jobs before
                                            shutting down (e.g. "5min"). Use
                                            "never" to disable
                                                 [string] [default: "5 minutes"]
  --name                                    Name displayed in the repository
                                            once registered
                                                    [string] [default: cml-{ID}]
  --no-retry                                Do not restart workflow terminated
                                            due to instance disposal or GitHub
                                            Actions timeout            [boolean]
  --single                                  Exit after running a single job
                                                                       [boolean]
  --reuse                                   Don't launch a new runner if an
                                            existing one has the same name or
                                            overlapping labels         [boolean]
  --reuse-idle                              Creates a new runner only if the
                                            matching labels don't exist or are
                                            already busy               [boolean]
  --docker-volumes                          Docker volumes, only supported in
                                            GitLab         [array] [default: []]
  --cloud                                   Cloud to deploy the runner
                         [string] [choices: "aws", "azure", "gcp", "kubernetes"]
  --cloud-region                            Region where the instance is
                                            deployed. Choices: [us-east,
                                            us-west, eu-west, eu-north]. Also
                                            accepts native cloud regions
                                                   [string] [default: "us-west"]
  --cloud-type                              Instance type. Choices: [m, l, xl].
                                            Also supports native types like i.e.
                                            t2.micro                    [string]
  --cloud-permission-set                    Specifies the instance profile in
                                            AWS or instance service account in
                                            GCP           [string] [default: ""]
  --cloud-metadata                          Key Value pairs to associate
                                            cml-runner instance on the provider
                                            i.e. tags/labels "key=value"
                                                           [array] [default: []]
  --cloud-gpu                               GPU type. Choices: k80, v100, or
                                            native types e.g. nvidia-tesla-t4
                                                                        [string]
  --cloud-hdd-size                          HDD size in GB              [number]
  --cloud-ssh-private                       Custom private RSA SSH key. If not
                                            provided an automatically generated
                                            throwaway key will be used  [string]
  --cloud-spot                              Request a spot instance    [boolean]
  --cloud-spot-price                        Maximum spot instance bidding price
                                            in USD. Defaults to the current spot
                                            bidding price [number] [default: -1]
  --cloud-startup-script                    Run the provided Base64-encoded
                                            Linux shell script during the
                                            instance initialization     [string]
  --cloud-aws-security-group                Specifies the security group in AWS
                                                          [string] [default: ""]
  --cloud-aws-subnet,                       Specifies the subnet to use within
  --cloud-aws-subnet-id                     AWS           [string] [default: ""]

Environment Variables

⚠️ You will need to create a personal access token (PAT) with repository read/write access and workflow privileges. In the example workflow, this token is stored as PERSONAL_ACCESS_TOKEN.

ℹ️ If using the --cloud option, you will also need to provide access credentials of your cloud compute resources as secrets. In the above example, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (with privileges to create & destroy EC2 instances) are required.

For AWS, the same credentials can also be used for configuring cloud storage.

Proxy support

CML support proxy via known environment variables http_proxy and https_proxy.

On-premise (Local) Runners

This means using on-premise machines as self-hosted runners. The cml runner launch function is used to set up a local self-hosted runner. On a local machine or on-premise GPU cluster, install CML as a package and then run:

cml runner launch \
  --repo=$your_project_repository_url \
  --token=$PERSONAL_ACCESS_TOKEN \
  --labels="local,runner" \
  --idle-timeout=180

The machine will listen for workflows from your project repository.

Local Package

In the examples above, CML is installed by the setup-cml action, or comes pre-installed in a custom Docker image pulled by a CI runner. You can also install CML as a package:

npm install --location=global @dvcorg/cml

You can use cml without node by downloading the correct standalone binary for your system from the asset section of the releases.

You may need to install additional dependencies to use DVC plots and Vega-Lite CLI commands:

sudo apt-get install -y libcairo2-dev libpango1.0-dev libjpeg-dev libgif-dev \
                        librsvg2-dev libfontconfig-dev
npm install -g vega-cli vega-lite

CML and Vega-Lite package installation require the NodeJS package manager (npm) which ships with NodeJS. Installation instructions are below.

Install NodeJS

  • GitHub: This is probably not necessary when using GitHub's default containers or one of CML's Docker containers. Self-hosted runners may need to use a set up action to install NodeJS:
uses: actions/setup-node@v3
  with:
    node-version: '16'
  • GitLab: Requires direct installation.
curl -sL https://deb.nodesource.com/setup_16.x | bash
apt-get update
apt-get install -y nodejs

See Also

These are some example projects using CML.

🔑 needs a PAT.

⚠️ Maintenance ⚠️

  • ~2023-07 Nvidia has dropped container CUDA images with 10.x/cudnn7 and 11.2.1, CML images will be updated accrodingly

More Repositories

1

dvc

🦉 ML Experiments and Data Management with Git
Python
13,036
star
2

mlem

🐶 A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day🤞
Python
713
star
3

PyDrive2

Google Drive API Python wrapper library. Maintained fork of PyDrive.
Python
516
star
4

shtab

↔️ Automagic shell tab completion for Python CLI applications
Python
328
star
5

dvc.org

📖 DVC website and documentation
TypeScript
320
star
6

terraform-provider-iterative

☁️ Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes
Go
287
star
7

vscode-dvc

Machine learning experiment tracking and data versioning with DVC extension for VS Code
TypeScript
176
star
8

example-get-started

Get started DVC project
Python
167
star
9

dvclive

📈 Log and track ML metrics, parameters, models with Git and/or DVC
Python
150
star
10

gto

🏷️ Git Tag Ops. Turn your Git repository into Artifact Registry or Model Registry.
Python
134
star
11

awesome-iterative-projects

A list of projects relying on Iterative.AI tools to achieve awesomeness
63
star
12

dataset-registry

Dataset registry DVC project
60
star
13

magnetic-tiles-defect

Demo Computer Vision Project
Jupyter Notebook
55
star
14

course-ds-base

Jupyter Notebook
39
star
15

example-get-started-experiments

Get started DVC project
Python
39
star
16

demo-bank-customer-churn

Demo DVC project training a classification model on tabular data
Jupyter Notebook
38
star
17

aita_dataset

AITA dataset based on r/AmItheAsshole/
Python
32
star
18

setup-dvc

DVC GitHub action
JavaScript
29
star
19

example_cml

Python
28
star
20

workshop-uncool-mlops

Accompanies the uncool MLOps workshop
Python
26
star
21

ldb-resources

Python
26
star
22

mlem.ai

✨ Landing page for MLEM
TypeScript
26
star
23

setup-cml

GitHub Action for CML setup
TypeScript
21
star
24

cml_base_case

Python
21
star
25

example-repos-dev

Source code and generator scripts for example DVC projects
Python
21
star
26

cml_cloud_case

Python
20
star
27

dvc-bench

Benchmarks for DVC
Python
20
star
28

scmrepo

SCM wrapper and fsspec filesystem for Git for use in DVC.
Python
19
star
29

cml_dvc_case

Python
18
star
30

intellij-dvc

DVC integration plugin for Intellij IDEs including PyCharm, IntelliJ IDEA and CLion
Java
17
star
31

dvc-data

DVC's data management subsystem
Python
17
star
32

studio-support

❓ DVC Studio Issues, Question, and Discussions
15
star
33

studio-selfhosted

This repository contains auxiliary installation code for self-hosting Studio
Shell
15
star
34

VSCode-DVC-Workshop

Workshop about DVC VSCode Extension
Jupyter Notebook
14
star
35

example-dvc-experiments

DVC Get Started Project with a focus on `dvc experiment` features.
HTML
13
star
36

pytest-servers

Create temporary directories on the various filesystems for testing
Python
12
star
37

py-template

Hypermodern Python Cookiecutter
Python
12
star
38

cml.dev

🔗 CML website and documentation
TypeScript
12
star
39

dvc-objects

dvc objects - contains filesystem and object-db level abstractions to use in dvc and dvc-data
Python
10
star
40

dvcyaml-schema

Schema for dvc.yaml
Python
10
star
41

example-versioning

Data sets and ML models versioning example from DVC get started
Python
9
star
42

dvc-streamlit-components

Streamlit components for DVC
Python
9
star
43

blog

📖 DVC blog engine
TypeScript
8
star
44

cml_tensorboard_case

Python
8
star
45

priority-list

⛏️ Make a dent in GitHub issue & PR backlogs across repositories
Python
7
star
46

dvc-task

Celery task queue used in DVC
Python
7
star
47

tpi

Python wrapper for terraform-provider-iterative
Python
7
star
48

example-gto

Get Started GTO Project
7
star
49

example-pokemon-classifier

Example project with a CNN to train a Pokémon type classifier.
Python
7
star
50

dvc-render

Library for rendering DVC plots
Python
6
star
51

example-mlem-get-started

Get Started MLEM project
Python
6
star
52

dvc-studio-client

Client to interact with DVC Studio
Python
6
star
53

workshop-uncool-mlops-solution

Python
6
star
54

pytest-test-utils

Python
6
star
55

features

A collection of development container 'features' for machine learning and data science
Shell
6
star
56

dvc-s3

AWS S3 plugin for dvc
Python
6
star
57

gatsby-theme-iterative

A Gatsby theme for shared logic between all the websites from iterative.ai
JavaScript
6
star
58

stale-model-example

This is the repo for the Preventing Stale Models in Production blog post.
Jupyter Notebook
6
star
59

gto-action

⚙️ GTO Github Action
Shell
6
star
60

morefs

A collection of self-contained fsspec-based filesystems
Python
6
star
61

llm-demo

Demo of using DVC with LangChain
Python
6
star
62

course-checkpoints-project

This is the project we use for the DVC educational course to demonstrate how checkpoints work.
HTML
6
star
63

dvc-checkpoints-mnist

Example of checkpoints in a DVC project training a simple convolutional neural net to classify MNIST data
Python
5
star
64

dvc-s3-repo

Maintain deb and rpm repositories on s3
Python
5
star
65

enhancement-proposals

5
star
66

evidently-dvc

Tutorial: Automate Data Validation and Model Monitoring Pipelines with DVC and Evidently
HTML
5
star
67

dvc-gs

Google Storage plugin for dvc
Python
4
star
68

link-check

A Node-based tool to verify if links are alive. Built to be used anywhere!
TypeScript
4
star
69

dvc-snap

dvc snap package
Shell
4
star
70

cml-runner-base-case

Python
4
star
71

pretrained-model-demo

Python
4
star
72

homebrew-dvc

Automatic updates for dvc homebrew package
Shell
4
star
73

sqltrie

SQL-based prefix tree implementation inspired by pygtrie and python-diskcache
Python
3
star
74

vscode-dvc-demo

Python
3
star
75

example_model_export_cml

Example on how to use CML to provision an AWS EC2 runner, train a model, and export the resulting model.
Python
3
star
76

cnn_tutorial

CNN tutorial for DVC
Python
3
star
77

cml-playground

Shell
3
star
78

dvc-learn-project

This is the project used in the DVC Learn Meetups and videos.
HTML
3
star
79

telemetry-python

Common library to send usage telemetry
Python
3
star
80

checkpoints-tutorial

This is the code used in the checkpoints tutorial.
Python
3
star
81

ldb

Python
3
star
82

chocolatey-dvc

Chocolatey package for dvc
PowerShell
3
star
83

blog-tpi-jupyter

Terraform Provider Iterative + Jupyter + TensorBoard + AWS/Azure/GCP/K8s
Jupyter Notebook
3
star
84

example-get-started-ssh-private-fixture

Frozen copy of the Get Started DVC project. Used in Studio tests to test SSH remote credentials.
Python
2
star
85

link-check.action

A GitHub Action driver for link-check, deployed via submodules.
JavaScript
2
star
86

dvc-gdrive

Google Drive plugin for DVC
Python
2
star
87

example-get-started-s3

Example get started (metrics and plots in S3)
Python
2
star
88

example-mlem

Example of using MLEM with DVC Pipeline
Python
2
star
89

katacoda-scenarios

Interactive Katacoda Scenarios
Shell
2
star
90

dvc-exe

Private repository for building and signing dvc for windows
Inno Setup
2
star
91

dvc-azure

Azure plugin for dvc
Python
2
star
92

dvc_action_example

Python
2
star
93

dvc-oss

Alibaba OSS plugin for dvc
Python
2
star
94

sagemaker-pipeline

An example project, showcasing a DVC pipeline using SageMaker SDK for data preparation and model training
Python
2
star
95

dvc-test

Integration tests for dvc
Python
2
star
96

vscode-dvc-pack

2
star
97

get-started-pipelines

DVC Get Started project with a focus on creating pipelines with `dvc stage`
Python
2
star
98

cookiecutter-dvc-plugin

A Cookiecutter template for dvc plugins
Python
2
star
99

testing-ldb

Aug 10th Hackathon
Python
2
star
100

gha-required-workflows

Repository containing Iterative Required workflows for GitHub Actions
1
star