• Stars
    star
    479
  • Rank 91,155 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 2 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code review for data in dbt

ci-tests codecov release pipy python downloads license InfuseAI Discord Invite

Docs | Discord | Blog

Code review for data in dbt

PipeRider automatically compares your data to highlight the difference in impacted downstream dbt models so you can merge your Pull Requests with confidence.

How it works:

  • Easy to connect your datasource -> PipeRider leverages the connection profiles in your dbt project to connect to the data warehouse
  • Generate profiling statistics of your models to get a high-level overview of your data
  • Compare target branch changes with the main branch in a HTML report
  • Post a quick summary of the data changes to your PR, so others can be confident too

Core concepts

  • Easy to install: Leveraging dbt's configuration settings, PipeRider can be installed within 2 minutes
  • Fast comparison: by collecting profiling statistics (e.g. uniqueness, averages, quantiles, histogram) and metric queries, comparing downstream data impact takes little time, speeding up your team's review time
  • Valuable insights: various profiling statistics displayed in the HTML report give fast insights into your data

Quickstart

  1. Install PipeRider

    pip install piperider[<connector>]

    You can find all supported data source connectors here.

  2. Add PipeRider tag on your model: Go to your dbt project, and add the PipeRider tag on the model you want to profile.

    --models/staging/stg_customers.sql
    {{ config(
       tags=["piperider"]
    ) }}
    
    select ...

    and show the models would be run by piperider

     dbt list -s tag:piperider --resource-type model
    
  3. Run PipeRider

    piperider run

To see the full quick start guide, please refer to PipeRider documentation

Features

  • Model profiling: PipeRider can profile your dbt models and obtain information such as basic data composition, quantiles, histograms, text length, top categories, and more.
  • Metric queries: PipeRider can integrate with dbt metrics and present the time-series data of metrics in the report.
  • HTML report: PipeRider generates a static HTML report each time it runs, which can be viewed locally or shared.
  • Report comparison: You can compare two previously generated reports or use a single command to compare the differences between the current branch and the main branch. The latter is designed specifically for code review scenarios. In our pull requests on GitHub, we not only want to know which files have been changed, but also the impact of these changes on the data. PipeRider can easily generate comparison reports with a single command to provide this information.
  • CI integration: The key to CI is automation, and in the code review process, automating this workflow is even more meaningful. PipeRider can easily integrate into your CI process. When new commits are pushed to your PR branch, reports can be automatically generated to provide reviewers with more confidence in the changes made when reviewing.

Example Report Demo

We use the example project git-repo-analytics to demonstrate how to use piperider+dbt+duckdb to analyze dbt-core repository. Here is the generated result (daily update)

Run Report

Comparison Report

Comparison Summary in a PR

PipeRider Cloud (beta)

PipeRider Cloud allows you to upload reports and share them with your team members. For information on pricing plans, please refer to the pricing page.

PipeRider Compare Action

PipeRider provides the PipeRider Compare Action to quickly integrate into your Github Actions workflow. It has the following features:

  • Automatically generates a report comparing the PR branch to the main branch
  • Uploads the report to GitHub artifacts or PipeRider cloud
  • Adds a comment to the pull request with a comparison summary and a link to the report.

You can refer to example workflow yaml and the example pull request.

Development

See setup dev environment and the contributing guildlines to get started.

We love chatting with our users! Let us know if you have any questions, feedback, or need help trying out PipeRider! ❤️

More Repositories

1

primehub

open-source MLOps platform
Shell
390
star
2

colab-xterm

Open a terminal in colab, including the free tier.
Python
345
star
3

ArtiVC

A version control system to manage large files.
Go
290
star
4

crane

Crane is a easy-to-use and beautiful desktop application helps you build manage your container images.
TypeScript
279
star
5

awesome-public-dbt-projects

A curated list of awesome public DBT projects
62
star
6

k8s-iperf

Run network performance test in kubernetes cluster
Shell
31
star
7

primehub-aws-cdk

Life is short, don't waste time on setting k8s environment. One-click CDK to set up AWS EKS with PrimeHub.
TypeScript
28
star
8

taxi_rides_ny_duckdb

PipeRider dbt workshop for DataTalksClub DE Zoomcamp
16
star
9

dimon

Python
15
star
10

primehub-python-sdk

PrimeHub Python SDK
Python
13
star
11

primehub-controller

🎮 PrimeHub Controller
Go
11
star
12

primehub-console

PrimeHub Console UI
TypeScript
11
star
13

piperider-compare-action

Shell
9
star
14

git-repo-analytics

Python
7
star
15

model-deployment-examples

Shell
5
star
16

primehub-site

A static site of PrimeHub.
JavaScript
5
star
17

showcase

showcase
Jupyter Notebook
5
star
18

dbt-nthu-kktv

4
star
19

kube-notebooks

Jupyter Notebooks ❤️ Kubernetes
Jupyter Notebook
4
star
20

PrimeLM

The large language model service and project dialogue system platform, built for enterprises, is provided by InfuseAI.
4
star
21

awesome-primehub-apps

Collection of awesome PrimeHub Apps
Python
3
star
22

primehub-job

Python
2
star
23

primehub-seldon-servers

Python
2
star
24

primehub-install

Shell
2
star
25

auto-img-cls

Jupyter Notebook
2
star
26

piperider-action

JavaScript
2
star
27

piperider-blog

PipeRider blog built in Jekyll
HTML
2
star
28

WaysOfML

JavaScript
2
star
29

primehub-remote-deploy

The example of primehub-python-sdk, deploy the PrimeHub deployment to remote cluster
Python
2
star
30

PipeRider-Documentation

1
star
31

Homebrew-ArtiVC

homebrew formula for artiv
Ruby
1
star
32

primehub-examples

When the PrimeHub Notebook start, we will get the primehub-example as our example folder. This repository will put the example into the folder.
Makefile
1
star
33

TaoKanOperator

A Kubernetes operator for transferring PVC data to the remote cluster
Go
1
star
34

dbt-infuse-finance

Python
1
star
35

dbt-project-pull-request-visualizer

A tool to visualize the GitHub Pull Request of a dbt project
Python
1
star
36

one.primehub.io

JavaScript
1
star
37

primehub-dataset-upload

Python
1
star
38

piperider-getting-started

1
star