• Stars
    star
    103
  • Rank 321,854 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A CLI tool to reduce the friction between data scientists by reducing git conflicts removing notebook metadata and gracefully resolving git conflicts.

logo

Maintained by dataroots Python versions PiPy Downloads Code style: black Mypy checked Codecov test

databooks is a package to ease the collaboration between data scientists using Jupyter notebooks, by reducing the number of git conflicts between different notebooks and resolution of git conflicts when encountered.

The key features include:

  • CLI tool
    • Clear notebook metadata
    • Resolve git conflicts
  • Simple to use
  • Simple API for using modelling and comparing notebooks using Pydantic

Requirements

databooks is built on top of:

Installation

pip install databooks

Usage

Clear metadata

Simply specify the paths for notebook files to remove metadata. By doing so, we can already avoid many of the conflicts.

$ databooks meta [OPTIONS] PATHS...

databooks meta demo

Fix git conflicts for notebooks

Specify the paths for notebook files with conflicts to be fixed. Then, databooks finds the source notebooks that caused the conflicts and compares them (so no JSON manipulation!)

$ databooks fix [OPTIONS] PATHS...

databooks fix demo

Assert notebook metadata

Specify paths of notebooks to be checked, an expression or recipe of what you'd like to enforce. databooks will run your checks and raise errors if any notebook does not comply with the desired metadata values. This advanced feature allows users to enforce cell tags, sequential cell execution, maximum number of cells, among many other things!

Check out our docs for more!

$ databooks assert [OPTIONS] PATHS...

databooks assert demo

Show rich notebook

Instead of launching Jupyter and opening the browser to inspect notebooks, have a quick look at them in the terminal. All you need is to specify the path(s) of the notebook(s).

$ databooks show [OPTIONS] PATHS...

databooks show demo

Show rich notebook diffs

Similar to git diff, but for notebooks! Show a rich diff of the notebooks in the terminal. Works for comparing git index with the current working directory, comparing branches or blobs.

$ databooks diff [OPTIONS] [REF_BASE] [REF_REMOTE] [PATHS]...

databooks diff demo

License

This project is licensed under the terms of the MIT license.

More Repositories

1

ml-skeleton-py

A best-practices first project template that allows you to get started on a new machine learning project.
Python
138
star
2

cheek

cheek: a pico-sized declarative job scheduler
Go
117
star
3

artyfarty

ggplot2 theme + palette presets
R
97
star
4

tf-profile

CLI tool to profile Terraform runs, written in Go
Go
94
star
5

tutorial-face-mask-detection

In this project, we develop a pipeline to detect unmasked faces in images. This can, for example, be used to alert people that do not wear a mask when entering a building.
Jupyter Notebook
86
star
6

terraform-aws-ecs-airflow

A terraform module that creates an airflow instance in AWS ECS.
HCL
53
star
7

fresh-coffee-listener

Using a raspberry pi, we listen to the coffee machine and count the number of coffee consumption
Python
51
star
8

prefect-dbt-flow

prefect integration for running dbt
Python
46
star
9

tutorial-great-expectations

A tutorial for the Great Expectations library.
Jupyter Notebook
39
star
10

tutorial-streamlit-demo

Python
29
star
11

terraform-module-azure-datalake

Terraform module for an Azure Data Lake
HCL
28
star
12

tutorial-mlops

MLOps exercise material.
Jupyter Notebook
21
star
13

terraform-module-kubeflow

Kubeflow deployment purely in Terraform
HCL
19
star
14

terraform-aws-ecs-dagster

A terraform module that deploys Dagster to AWS, using ECS.
HCL
16
star
15

bikefitting

Graduation project of the rootsacademy: A bikefitting application consisting of a website where users can upload a short video of themselves on a bike trainer and will receive a recommendation to either move the bike saddle up or down.
Python
16
star
16

terraform-module-kubernetes-application

Terraform module for a kubernetes application
HCL
15
star
17

expiring-lru-cache

LRU caching with expiration period.
Python
13
star
18

.github

πŸš€ Get started in our repos
Python
12
star
19

dbt-fabric

Python
10
star
20

github-stats-card

⭐️ a minimal but inclusive github stats badge ⭐️
TypeScript
10
star
21

skeleton-pyspark

A best-practices first project template that allows you to get started on a new pyspark project
Python
9
star
22

terraform-provider-kubeflowpipelines

Terraform provider for Kubeflow pipelines API
Go
8
star
23

deploy-spark-streaming-ml

8
star
24

python-minimal-boilerplate

Some minimal Python boilerplate.
Python
8
star
25

notion-dbs-data-quality

Using Great Expectations and Notion's API, this repo aims to provide data quality for our databases in Notion.
Python
8
star
26

anomalib-demo

Jupyter Notebook
5
star
27

OsConfig

Bash scrips for Data professionals OS configuration
Shell
4
star
28

knowledgebase_guardian

A minimal example to demonstrate how LLM's can assist in detecting contradictions in documents.
Python
3
star
29

phonehome

KISS telemetry for FOSS packages
Go
3
star
30

workshop-image-segmentation-style-transfer

Jupyter Notebook
2
star
31

terraform-module-azure-snowflake

HCL
2
star
32

rootsstyle

A dataroots inspired style for Matplotlib. Works with any visualization tools that builds upon Matplotlib (seaborn, pandas).
Python
2
star
33

snowflake-ml

Toy use case on how to use Snowflake as a full ML platform.
Python
2
star
34

rootscamp2019-session4

session 4 - building an API assets
Python
1
star
35

terraform-azurerm-aci-dagster

A terraform module that deploys Dagster to Azure.
HCL
1
star
36

dbt-fabric-demo

Demo project for dbt-fabric
1
star
37

mlflow-emissions-sdk

This package logs the carbon emissions of machine learning models 🌿
Python
1
star
38

tutorial-hyperparameter-optimization

Tutorial for Rootlabs@Lunch: Practical Hyperparameter Optimisation
Jupyter Notebook
1
star
39

transforming-tabular-data

Comparing Pandas v. Polars v. PyArrow v. DuckDB πŸΌπŸ»β€β„οΈπŸΉπŸ¦†
Jupyter Notebook
1
star
40

homebrew-tf-profile

This is a homebrew formulae for tf-profile.
Ruby
1
star