• Stars
    star
    120
  • Rank 295,983 (Top 6 %)
  • Language SQL
  • License
    Apache License 2.0
  • Created over 2 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This package generates database constraints based on the tests in a dbt project

dbt Constraints Package

This package generates database constraints based on the tests in a dbt project. It is currently compatible with Snowflake, PostgreSQL, Oracle, Redshift, and Vertica only.

How the dbt Constraints Package differs from dbt's Model Contracts feature

This package focuses on automatically generating constraints based on the tests already in a user's dbt project. In most cases, merely adding the dbt Constraints package is all that is needed to generate constraints. dbt's recent model contracts feature allows users to explicitly document constraints for models in yml. This package and the core feature are 100% compatible with one another and the dbt Constraints package will skip generating constraints already created by a model contract. However, the dbt Constraints package will also generate constraints for any tests that are not documented as model contracts. As described in the next section, dbt Constraints is also designed to provide join elimination on Snowflake.

Why data engineers should add referential integrity constraints

The primary reason to add constraints to your database tables is that many tools including DBeaver and Oracle SQL Developer Data Modeler can correctly reverse-engineer data model diagrams if there are primary keys, unique keys, and foreign keys on tables. Most BI tools will also add joins automatically between tables when you import tables that have foreign keys. This can both save time and avoid mistakes.

In addition, although Snowflake doesn't enforce most constraints, the query optimizer can consider primary key, unique key, and foreign key constraints during query rewrite if the constraint is set to RELY. Since dbt can test that the data in the table complies with the constraints, this package creates constraints on Snowflake with the RELY property to improve query performance. Some database query optimizers also consider not null constraints when building an execution plan.

Many databases including Snowflake, PostgreSQL, Oracle, SQL Server, MySQL, and DB2 can use referential integrity constraints to perform "Join Elimination" to remove tables from an execution plan. This commonly occurs when you query a subset of columns from a view and some of the tables in the view are unnecessary. In addition, on databases that do not support join elimination, some BI and visualization tools will also rewrite their queries based on constraint information, producing the same effect.

Finally, although most columnar databases including Snowflake do not use or need indexes, most row-oriented databases including PostgreSQL and Oracle require indexes on their primary key columns in order to perform efficient joins between tables. A primary key or unique key constraint is typically enforced on databases using such indexes. Having dbt create the unique indexes automatically can slightly reduce the degree of performance tuning necessary for row-oriented databases. Row-oriented databases frequently also need indexes on foreign key columns but that is something best added manually.

Please note

When you add this package, dbt will automatically begin to create unique keys for all your existing unique and dbt_utils.unique_combination_of_columns tests, foreign keys for existing relationship tests, and not null constraints for not_null tests. The package also provides three new tests (primary_key, unique_key, and foreign_key) that are a bit more flexible than the standard dbt tests. These tests can be used inline, out-of-line, and can support multiple columns when used in the tests: section of a model. The primary_key test will also cause a not null constraint to be created on each column.

Disabling automatic constraint generation

The dbt_constraints_enabled variable can be set to false in your project to disable automatic constraint generation. By default dbt Constraints only creates constraints on models. To allow constraints on sources, you can set dbt_constraints_sources_enabled to true. The package will verify that you have sufficient database privileges to create constraints on sources.

vars:
  # The package can be temporarily disabled using this variable
  dbt_constraints_enabled: true

  # The package can also add constraints on sources if you have sufficient privileges
  dbt_constraints_sources_enabled: false

  # You can also be specific on which constraints are enabled for sources
  # You must also enable dbt_constraints_sources_enabled above
  dbt_constraints_sources_pk_enabled: true
  dbt_constraints_sources_uk_enabled: true
  dbt_constraints_sources_fk_enabled: true
  dbt_constraints_sources_nn_enabled: true

Installation

  1. Add this package to your packages.yml following these instructions. Please check this link for the latest released version.
packages:
  - package: Snowflake-Labs/dbt_constraints
    version: [">=0.6.0", "<0.7.0"]
# <see https://github.com/Snowflake-Labs/dbt_constraints/releases/latest> for the latest version tag.
# You can also pull the latest changes from Github with the following:
#  - git: "https://github.com/Snowflake-Labs/dbt_constraints.git"
#    revision: main
  1. Run dbt deps.

  2. Optionally add primary_key, unique_key, or foreign_key tests to your model like the following examples.

  - name: DIM_ORDER_LINES
    columns:
      # Single column inline constraints
      - name: OL_PK
        tests:
          - dbt_constraints.primary_key
      - name: OL_UK
        tests:
          - dbt_constraints.unique_key
      - name: OL_CUSTKEY
        tests:
          - dbt_constraints.foreign_key:
              pk_table_name: ref('DIM_CUSTOMERS')
              pk_column_name: C_CUSTKEY
    tests:
      # Single column constraints
      - dbt_constraints.primary_key:
          column_name: OL_PK
      - dbt_constraints.unique_key:
          column_name: OL_ORDERKEY
      - dbt_constraints.foreign_key:
          fk_column_name: OL_CUSTKEY
          pk_table_name: ref('DIM_CUSTOMERS')
          pk_column_name: C_CUSTKEY
      # Multiple column constraints
      - dbt_constraints.primary_key:
          column_names:
            - OL_PK_COLUMN_1
            - OL_PK_COLUMN_2
      - dbt_constraints.unique_key:
          column_names:
            - OL_UK_COLUMN_1
            - OL_UK_COLUMN_2
      - dbt_constraints.foreign_key:
          fk_column_names:
            - OL_FK_COLUMN_1
            - OL_FK_COLUMN_2
          pk_table_name: ref('DIM_CUSTOMERS')
          pk_column_names:
            - C_PK_COLUMN_1
            - C_PK_COLUMN_2

Dependencies and Requirements

  • The package's macros depend on the results and graph object schemas of dbt >=1.0.0

  • The package currently only includes macros for creating constraints in Snowflake, PostgreSQL, and Oracle. To add support for other databases, it is necessary to implement the following seven macros with the appropriate DDL & SQL for your database. Pull requests to contribute support for other databases are welcome. See the <ADAPTER_NAME>__create_constraints.sql files as examples.

<ADAPTER_NAME>__create_primary_key(table_model, column_names, verify_permissions, quote_columns=false, constraint_name=none, lookup_cache=none)
<ADAPTER_NAME>__create_unique_key(table_model, column_names, verify_permissions, quote_columns=false, constraint_name=none, lookup_cache=none)
<ADAPTER_NAME>__create_foreign_key(pk_model, pk_column_names, fk_model, fk_column_names, verify_permissions, quote_columns=false, constraint_name=none, lookup_cache=none)
<ADAPTER_NAME>__create_not_null(pk_model, pk_column_names, fk_model, fk_column_names, verify_permissions, quote_columns=false, lookup_cache=none)
<ADAPTER_NAME>__unique_constraint_exists(table_relation, column_names, lookup_cache=none)
<ADAPTER_NAME>__foreign_key_exists(table_relation, column_names, lookup_cache=none)
<ADAPTER_NAME>__have_references_priv(table_relation, verify_permissions, lookup_cache=none)
<ADAPTER_NAME>__have_ownership_priv(table_relation, verify_permissions, lookup_cache=none)

dbt_constraints Limitations

Generally, if you don't meet a requirement, tests are still executed but the constraint is skipped rather than producing an error.

  • All models involved in a constraint must be materialized as table, incremental, snapshot, or seed.

  • If source constraints are enabled, the source must be a table. You must also have the OWNERSHIP table privilege to add a constraint. For foreign keys you also need the REFERENCES privilege on the parent table with the primary or unique key. The package will identify when you lack these privileges on Snowflake and PostgreSQL. Oracle does not provide an easy way to look up your effective privileges so it has an exception handler and will display Oracle's error messages.

  • All columns on constraints must be individual column names, not expressions. You can reference columns on a model that come from an expression.

  • Constraints are not created for failed tests. See how to get around this using severity and config: always_create_constraint: true in the next section.

  • primary_key, unique_key, and foreign_key tests are considered first and duplicate constraints are skipped. One exception is that you will get an error if you add two different primary_key tests to the same model.

  • Foreign keys require that the parent table have a primary key or unique key on the referenced columns. Unique keys generated from standard unique tests are sufficient.

  • The order of columns on a foreign key test must match between the FK columns and PK columns

  • The foreign_key test will ignore any rows with a null column, even if only one of two columns in a compound key is null. If you also want to ensure FK columns are not null, you should add standard not_null tests to your model which will add not null constraints to the table.

  • Referential constraints must apply to all the rows in a table so any tests with a config: where: property will be skipped when creating constraints. See how to disable this rule using config: always_create_constraint: true in the next section.

Advanced: config: always_create_constraint: true property

There is an advanced option to force a constraint to be generated when there is a config: where: property or if the constraint has a threshold. The config: always_create_constraint: true property will override those exclusions. When this setting is in effect, you can create constraints even when you have excluded some records or have a number of failures below a threshold. If your test has a status of 'failed', it will still be skipped. Please see dbt's documentation on how to set a threshold for failures.

Caveat Emptor:

  • You will get an error if you try to force constraints to be generated that are enforced by your database. On Snowflake that is only a not_null constraint but on databases like Oracle, all the generated constraints are enforced.
  • This feature could cause unexpected query results on Snowflake due to join elimination.

This is an example using the feature:

  - name: dim_duplicate_orders
    description: "Test that we do not try to create PK/UK on failed tests"
    columns:
      - name: o_orderkey
        description: "The primary key for this table"
      - name: o_orderkey_seq
        description: "duplicate seq column to test UK"
    tests:
      # This constraint should be skipped because it has failures
      - dbt_constraints.primary_key:
          column_name: o_orderkey
          config:
            severity: warn
      # This constraint should be still generated because always_create_constraint=true
      - dbt_constraints.unique_key:
          column_name: o_orderkey
          config:
            warn_if: ">= 5000"
            error_if: ">= 10000"
            always_create_constraint: true
      # This constraint should be still generated because always_create_constraint=true
      - dbt_constraints.unique_key:
          column_name: o_orderkey_seq
          config:
            severity: warn
            always_create_constraint: true

Primary Maintainers

This is a community-developed package, not an official Snowflake offering. It comes with no support or warranty. However, feel free to raise a github issue if you find a bug or would like a new feature.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this package except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

More Repositories

1

snowflake-arctic

Python
514
star
2

schemachange

A Database Change Management tool for Snowflake
Python
502
star
3

snowpark-python-demos

This repository provides various demos/examples of using Snowpark for Python.
Jupyter Notebook
267
star
4

sfquickstarts

Follow along with our tutorials to get you up and running with the Snowflake Data Cloud.
HTML
237
star
5

awesome-snowflake

A curated list of resources about Snowflake
180
star
6

snowflake-demo-streamlit

This repo contains a collection of Streamlit in Snowflake demos, tutorials, and examples
Python
160
star
7

snowflake-demo-notebooks

Collection of Snowflake Notebook demos, tutorials, and examples
Jupyter Notebook
138
star
8

sf-samples

Sample files, code snippets and downloads for Snowflake labs and tutorials.
125
star
9

sfguide-frosty-llm-chatbot-on-streamlit-snowflake

An LLM-powered Streamlit chatbot for data exploration and question answering on Snowflake
Python
122
star
10

sansshell

A non-interactive daemon for host management
Go
89
star
11

sfguide-data-engineering-with-snowpark-python

Python
89
star
12

Excelerator

This is an Excel Addin for Windows that reads and writes data to Snowflake
VBA
76
star
13

sfgrantreport

Snowflake Grant Report offers a way of visualizing role hierarchy and rapid diagnosis of as-is permissions, giving customers insight without difficult discovery.
C#
72
star
14

sfguide-getting-started-dataengineering-ml-snowpark-python

Jupyter Notebook
64
star
15

semantic-model-generator

Python
62
star
16

django-snowflake

Python
59
star
17

sfsnowsightextensions

Snowflake Snowsight Extensions enables manipulation of Snowsight features from command-line
C#
56
star
18

snowpark-python-template

Python project template for Snowpark development
Python
54
star
19

sfguide-intro-to-machine-learning-with-snowflake-ml-for-python

Jupyter Notebook
54
star
20

roleout

HCL
50
star
21

arctic-embed

Jupyter Notebook
47
star
22

sfguide-getting-started-machine-learning

Jupyter Notebook
41
star
23

sqltools-snowflake-driver

A Snowflake driver for the SQLTools VSCode extension.
TypeScript
36
star
24

snowpark-containers-llama-2-sample

Python
32
star
25

sfguide-getting-started-snowpark-python

Quickstart: Getting Started with Snowpark Python
Jupyter Notebook
32
star
26

mlflow-snowflake

Python
31
star
27

sfguide-citibike-ml-snowpark-python

Jupyter Notebook
31
star
28

sfguide-recommender-pipeline

Snowflake Guide: Building a Recommendation Engine Using Snowflake & Amazon SageMaker
Python
30
star
29

snowpark-extensions-py

Python
26
star
30

snowflake-vcrpy

A PyTest plugin to speed up your tests which depend on Snowflake sessions
Python
25
star
31

terraform-snowflake-api-integration-with-geff-aws

Terraform module to create resources across the Snowflake and AWS providers and establish proper relationships within those resources.
HCL
23
star
32

snowsql-formatter

JavaScript
22
star
33

snowpark-devops

Python
22
star
34

snowflake-cli-action

Github Action enabling easy use of Snowflake CLI in your CI/CD workflows
Shell
22
star
35

sfguide-twitter-auto-ingest

Learn how to auto-ingest streaming data into Snowflake using Snowpipe.
Python
21
star
36

sfquickstart-data-clean-room

PLpgSQL
20
star
37

sfguide-external-functions-examples

Python
20
star
38

sfguide-getting-started-with-native-apps

Python
19
star
39

lezer-snowsql

JavaScript
19
star
40

sfguide-data-engineering-with-snowpark-python-intro

Python
19
star
41

sfguide-spcs-cortex-reactjs-flask-app

JavaScript
18
star
42

sfguide-snowflake-python-api

Guide for running a custom API Powered by Snowflake in Python
Python
18
star
43

terraform-snowflake-storage-integration-aws

To create the base infrastructure for storage only pipelines that load data from S3 to Snowflake.
HCL
18
star
44

geff

Python
16
star
45

sfguide_snowpark_on_jupyter

Jupyter Notebooks with Snowpark
Jupyter Notebook
15
star
46

sfguide-native-apps-chairlift

Snowflake Native Application sample demonstrating data sharing and analysis using a fictional Chairlift manufacturer.
Python
15
star
47

icetire

Data Science Sandbox for Snowflake
Jupyter Notebook
15
star
48

streamlit-examples

Python
14
star
49

sfguide-getting-started-weaviate-on-spcs

The code in this repository deploys Weaviate into Snowpark Container Services (SPCS), demonstrating how to run Weaviate in Snowflake.
Jupyter Notebook
14
star
50

sfguide-financial-asset-management

Snowflake demo for Financial Services
PLSQL
14
star
51

sfguide-getting-started-with-cortex-analyst

Python
13
star
52

sfguide-data-apps-demo

JavaScript
12
star
53

sfguide-getting-started-with-snowflake-devops

PLpgSQL
12
star
54

SC.DDLExportScripts

Collection of utility scripts to extract code so it can be upgraded to SnowFlake using the SnowConvert tool.
PLpgSQL
11
star
55

sfguide-data-crawler

Generate descriptions of Snowflake tables and views with LLMs
Python
11
star
56

Sentry

Python
11
star
57

builder-workshops

10
star
58

sfguide-blog-ai-assistant

Python
10
star
59

sfguide-snowpark-pytorch-streamlit-openai-image-rec

Python
10
star
60

EDMC-CDMC-v1-14-Control_Mapping

Controls that will be used in Snowflake to satisfy the EDMC's CDMC framework.
Python
10
star
61

OpenLineage-AccessHistory-Setup

Guideline to extract table lineage info in OpenLineage format from access history view
10
star
62

sfguide-getting-started-with-snowflake-arctic-and-snowflake-cortex

Jupyter Notebook
10
star
63

sfguide-tasty-bytes-snowpark-101-for-data-science

Jupyter Notebook
9
star
64

sfguide-snowpark-for-python-streamlit

Python
9
star
65

sfguide-snowpark-scikit-learn

Jupyter Notebook
9
star
66

aws-integrations-cloudops

Snowflake & AWS Service Catalog Integration
Python
9
star
67

spcs-updates

8
star
68

SFSimilarity

Similarity and Distance functions for Snowflake
Java
8
star
69

sfguide-getting-started-snowpark-scala

Snowpark Twitter Sentiment Analysis Example
Scala
8
star
70

Snowflake-HL7V2-Parsing

Java
8
star
71

sfguide-data-engineering-with-notebooks

Jupyter Notebook
8
star
72

sfguide-intro-to-snowpark-container-services

Python
8
star
73

cortex-search

7
star
74

sfguide-terraform-sample

Sample project for the guide Terraforming Snowflake
HCL
7
star
75

sfguide-vault-snowflakepasswords

vault-snowflakepasswords-sample is a sample Hashicorp Vault database plugin designed to work with the Snowflake Data Platform
Go
7
star
76

terraform-snowflake-snowalert

Terraformed version of Snowalert
HCL
6
star
77

sfguide-sporting-events-prediction-using-snowpark-ml

Jupyter Notebook
6
star
78

sfguide-getting-started-with-generative-ai-snowflake-external-functions-streamlit

Python
5
star
79

sfguide-marketplace-accelerator2.0

Snowflake Marketplace Accelerator 2.0 speeds up monteization and inter/intra org collaboration
PLpgSQL
5
star
80

snowpark-extensions

Useful Extensions to be used when modernizing projects from Spark to Snowpark
Scala
5
star
81

sfguide-build-and-deploy-snowpark-ml-models-using-streamlit-snowflake-notebooks

Python
5
star
82

sfguide-getting-started-with-pandas-on-snowflake

Jupyter Notebook
5
star
83

sfguide-data-engineering-pipelines-with-pandas-on-snowflake

Jupyter Notebook
4
star
84

sqlpack

Python
4
star
85

sfguide-text-embedding-snowpark-container-service

A pure-Python approach to packaging a text embedding model into a Snowpark Container Services service
Python
4
star
86

snowflake-maven-gradle-plugins

Maven and Gradle plugins to deploy your UDFs and stored procedures to Snowflake
Java
4
star
87

snowpark-scala-template

Scala project template for Snowpark development
Scala
4
star
88

sfguide_failover_scripts

Env setup for the replication/failover hands on lab
Python
4
star
89

devday2024-oss-demo

Jupyter Notebook
4
star
90

sfguide-cohort-builder

Python
3
star
91

modern-data-engineering-snowflake

Companion repository that goes along with Snowflake's "Introduction to Modern Data Engineering with Snowflake" course on Coursera
PLpgSQL
3
star
92

terraform-snowflake-snowpipe-aws

Terraform module for creating Snowpipe to ingest data from AWS S3 bucket
HCL
3
star
93

sfguide-getting-started-with-iceberg-tables

Jupyter Notebook
3
star
94

sfguide-prompt-engineering-and-llm-evaluation

PLpgSQL
3
star
95

sfguide-snowflake-java-api

Guide for running a custom API Powered by Snowflake in java
Java
3
star
96

sfguide-getting-started-snowpark-python-sagemaker

Jupyter Notebook
3
star
97

sfguide-aws-autopilot-integration

3
star
98

snowpark-java-template

Java project template for Snowpark development
Java
3
star
99

sfguide-getting-started-snowpark-python-feast

Jupyter Notebook
3
star
100

sfguide-snowpark-python-top-three-tips-for-optimal-performance

Jupyter Notebook
3
star