• Stars
    star
    285
  • Rank 142,147 (Top 3 %)
  • Language PLpgSQL
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

dbt macros to stage external sources

External sources in dbt

dbt v0.15.0 added support for an external property within sources that can include information about location, partitions, and other database-specific properties.

This package provides:

  • Macros to create/replace external tables and refresh their partitions, using the metadata provided in your .yml file source definitions
  • Snowflake-specific macros to create, backfill, and refresh snowpipes, using the same metadata

Supported databases

  • Redshift (Spectrum)
  • Snowflake
  • BigQuery
  • Spark
  • Synapse
  • Azure SQL

sample docs

Installation

Follow the instructions at hub.getdbt.com on how to modify your packages.yml and run dbt deps.

Syntax

The stage_external_sources macro is the primary point of entry when using this package. It has two operational modes: standard and "full refresh."

# iterate through all source nodes, create if missing, refresh metadata
$ dbt run-operation stage_external_sources

# iterate through all source nodes, create or replace (+ refresh if necessary)
$ dbt run-operation stage_external_sources --vars "ext_full_refresh: true"

The stage_external_sources macro accepts a limited node selection syntax similar to snapshotting source freshness:

# stage all Snowplow and Logs external sources:
$ dbt run-operation stage_external_sources --args "select: snowplow logs"

# stage a particular external source table:
$ dbt run-operation stage_external_sources --args "select: snowplow.event"

Setup

The macros assume that you:

  1. Have already created your database's required scaffolding for external resources:
  • an external stage (Snowflake)
  • an external schema + S3 bucket (Redshift Spectrum)
  • an external data source and file format (Synapse)
  • an external data source and databse-scoped credential (Azure SQL)
  • a Google Cloud Storage bucket (BigQuery)
  • an accessible set of files (Spark)
  1. Have the appropriate permissions on to create tables using that scaffolding
  2. Have already created the database/project and/or schema/dataset in which dbt will create external tables (or snowpiped tables)

Spec

version: 2

sources:
  - name: snowplow
    tables:
      - name: event
        description: >
            This source table is actually a set of files in external storage.
            The dbt-external-tables package provides handy macros for getting
            those files queryable, just in time for modeling.
                            
        external:
          location:         # required: S3 file path, GCS file path, Snowflake stage, Synapse data source
          
          ...               # database-specific properties of external table
          
          partitions:       # optional
            - name: collector_date
              data_type: date
              ...           # database-specific properties

        # Specify ALL column names + datatypes.
        # Column order must match for CSVs, column names must match for other formats.
        # Some databases support schema inference.

        columns:
          - name: app_id
            data_type: varchar(255)
            description: "Application ID"
          - name: platform
            data_type: varchar(255)
            description: "Platform"
          ...

The stage_external_sources macro will use this YAML config to compile and execute the appropriate create, refresh, and/or drop commands:

19:01:48 + 1 of 1 START external source spectrum.my_partitioned_tbl
19:01:48 + 1 of 1 (1) drop table if exists "db"."spectrum"."my_partitioned_tbl"
19:01:48 + 1 of 1 (1) DROP TABLE
19:01:48 + 1 of 1 (2) create external table "db"."spectrum"."my_partitioned_tbl"...
19:01:48 + 1 of 1 (2) CREATE EXTERNAL TABLE
19:01:48 + 1 of 1 (3) alter table "db"."spectrum"."my_partitioned_tbl"...
19:01:49 + 1 of 1 (3) ALTER EXTERNAL TABLE

Resources

  • sample_sources: detailed example source specs, with annotations, for each database's implementation
  • sample_analysis: a "dry run" version of the compiled DDL/DML that stage_external_sources runs as an operation
  • tested specs: source spec variations that are confirmed to work on each database, via integration tests

If you encounter issues using this package or have questions, please check the open issues, as there's a chance it's a known limitation or work in progress. If not, you can:

  • open a new issue to report a bug or suggest an enhancement
  • post a technical question to the Community Forum
  • post a conceptual question to the relevant database channel (#db-redshift, #dbt-snowflake, etc) in the dbt Slack community

Additional contributions to this package are very welcome! Please create issues or open PRs against master. Check out this post on the best workflow for contributing to a package.

More Repositories

1

dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Python
7,503
star
2

metricflow

MetricFlow allows you to define, build, and maintain metrics in code.
Python
1,086
star
3

dbt-utils

Utility functions for dbt projects.
Python
929
star
4

corp

Assets related to the operation of Fishtown Analytics.
411
star
5

dbt-project-evaluator

This package contains macros and models to find DAG issues automatically
Shell
385
star
6

dbt-spark

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
Python
365
star
7

dbt-codegen

Macros that generate dbt code
Makefile
322
star
8

jaffle_shop

A self-contained dbt project for testing purposes
264
star
9

dbt-audit-helper

Useful macros when performing data audits
205
star
10

dbt_metrics

Macros for calculating metrics
Python
203
star
11

dbt-snowflake

dbt-snowflake contains all of the code enabling dbt to work with Snowflake
Python
199
star
12

dbt-bigquery

dbt-bigquery contains all of the code required to make dbt operate on a BigQuery database.
Python
163
star
13

dbt-project-maturity

132
star
14

dbt-labs-experimental-features

dbt support for database features which are not yet supported natively in dbt-core
Makefile
131
star
15

snowplow

Data models for snowplow analytics.
LookML
126
star
16

docs.getdbt.com

The code behind docs.getdbt.com
JavaScript
111
star
17

dbt-docs

Auto-generated data documentation site for dbt projects
JavaScript
111
star
18

dbt-jsonschema

Python
104
star
19

jaffle_shop_duckdb

Get started with dbt in less than 1 minute from `git clone` to `dbt docs serve` for free!
Shell
99
star
20

dbt-event-logging

a dbt package to make auditing dbt runs easy.
LookML
91
star
21

dbt-meshify

A dbt-core plugin that automates the management and creation of dbt groups, contracts, access, and versions.
Python
91
star
22

dbt-server

A web API for dbt.
Python
89
star
23

redshift

Redshift package for dbt (getdbt.com)
PLpgSQL
87
star
24

dbt-completion.bash

Adds autocompletion to the dbt CLI
Shell
82
star
25

dbt-redshift

dbt-redshift contains all of the code enabling dbt to work with Amazon Redshift
Python
80
star
26

terraform-provider-dbtcloud

dbt Cloud Terraform Provider
Go
73
star
27

mrr-playbook

Makefile
70
star
28

dbt-learn-group-training

The go to demo for public and private dbt Learn
69
star
29

segment

Data models for Segment built using dbt (getdbt.com).
Makefile
65
star
30

dbt-sql-formatter

makes your sql less bad
Python
57
star
31

dbt-semantic-interfaces

The shared semantic layer definitions that dbt-core and MetricFlow use.
Python
55
star
32

dbt-init

A dbt-init script for consulting projects
Python
51
star
33

jaffle-shop-template

Template for a DuckDB-based, Codespace-oriented sandbox project that is also dbt Cloud compatible, and includes code-first BI tooling via Evidence.
Python
49
star
34

dbt-starter-project

Cloned by the `dbt init` task
44
star
35

jaffle-shop

🥪🦘 An open source sandbox project exploring dbt workflows via a fictional sandwich shop's data.
42
star
36

dbt-learn-jinja

41
star
37

metrics-playbook

This repository contains files for the metrics framework playbook.
36
star
38

facebook-ads

dbt data models for facebook ads
35
star
39

dbt-presto

[ARCHIVED] The Presto adapter plugin for dbt Core
Python
33
star
40

spark-utils

Utility functions for dbt projects running on Spark
Python
30
star
41

stripe

TSQL
29
star
42

attribution-playbook

TSQL
27
star
43

dbt-databricks-demo

Demo project for dbt on Databricks
27
star
44

dbt-technical-blog-writing

Conversation around dbt technical tutorials, blogs, guides, etc
26
star
45

python-snowpark-formula1

Python
25
star
46

dbtdocs-to-lookml

[WIP] A script to add descriptions from dbt schema files to your lookml project
Python
25
star
47

jaffle_shop_metrics

This repository contains examples of how to use dbt's metric functionality on the jaffle shop dataset
24
star
48

tree-sitter-jinja2

C
23
star
49

quickbooks

dbt data models for Quickbooks Online.
TSQL
23
star
50

hub.getdbt.com

Package hub for dbt.
SCSS
23
star
51

dbt-jobs-as-code

Tools to handle dbt Jobs as well-defined YAML files
Python
22
star
52

semantic-layer-llm-benchmarking

Shell
22
star
53

dbt-extractor

Rust
21
star
54

airflow-fivetran-dbt

Example orchestration pipeline for Fivetran + dbt managed by Airflow
Python
20
star
55

rapid-onboarding-exemplar

dbt Project for Rapid Onboarding instructors to use in instruction and learners to reference throughout the course.
Python
20
star
56

jaffle-sl-template

19
star
57

2023-04-18---zero-to-dbt

Welcome! This dbt project is built to be imported to a freshly-initialized dbt project to work through the hands-on zero to dbt lab detailed in this repo's readme.
17
star
58

dbt-starburst-demo

dbt + Trino demo project, using TPC-H sample data
17
star
59

atom-dbt

Atom highlighter for dbt projects
16
star
60

adwords

dbt adwords models
16
star
61

dbot

An LLM-powered chatbot with the added context of the dbt knowledge base.
Python
16
star
62

dbt-adapter-tests

a pytest plugin for dbt adapter test suites
Python
16
star
63

dbt_faker

16
star
64

dbt-rpc

A server that can compile and run queries in the context of a dbt project. Additionally, it provides methods that can be used to list and terminate running processes.
Python
16
star
65

postgres

Postgres utility package for dbt (getdbt.com)
14
star
66

dbt-core-bundles

Generates bundles of verified adapters + core
Python
14
star
67

dbt-database-adapter-scaffold

Python
14
star
68

tap-framework

a framework for rapidly prototyping new singer taps
Python
14
star
69

actions

Common GitHub actions and workflows for maintaining dbt
Python
12
star
70

databricks_dbt_demo_project

dbt Cloud project for Databricks SQL Analytics Demos
Python
12
star
71

salesforce

11
star
72

homebrew-dbt

🍻 Homebrew formulae for installing dbt on macOS
Ruby
11
star
73

stitch-utils

Utility functions for Stitch-loaded data
Makefile
10
star
74

dbt-cloud-snowflake-demo-template

9
star
75

dbt-styleguide

Styles for dbt on the net
HTML
9
star
76

terraform-aws-dbt-cloud-single-tenant

HCL
9
star
77

tap-s3-csv

Singer tap for getting CSV and XLS(X) data out of Amazon S3
Python
9
star
78

analytics-engineering-survey

A survey of pains, gains, and areas of investment for global data teams.
9
star
79

dbt-package-workshop

The companion repo to the 2022 Coalesce New Orleans Workshop - dbt Packages You Didn't Know You Needed
9
star
80

jaffle-shop-generator

Python
8
star
81

hubcap

This app adds modules to the hubsite at hub.getdbt.com
Python
8
star
82

dbt-python-hands-on-lab-snowpark

Python
7
star
83

shopify-data-warehouse

LookML
7
star
84

ecommerce

LookML
7
star
85

python-string-parsing

Demo using dateutil library with pandas dataframes in Python
Python
7
star
86

hologram

A library for automatically generating Draft 7 JSON Schemas from Python dataclasses
Python
7
star
87

shopify

dbt data models for Shopify.
TSQL
7
star
88

dbt-cloud-openapi-spec

Python
6
star
89

new-python-wrench-demo

Demo of Python models to accompany the Snowflake blog post
Jupyter Notebook
6
star
90

dbt-integration-tests

Python
6
star
91

jaffle_shop-dev

The dev version of jaffle shop
6
star
92

snowflake-resource-monitoring

6
star
93

log-tables-example

Example of log table creation using dbt.
Shell
5
star
94

tap-amazon-mws

Singer.io tap for Amazon MWS
Python
5
star
95

tap-ringcentral

Singer.io tap for RingCentral
Python
5
star
96

snowflake_dbt_partner_demo

PLpgSQL
5
star
97

Coalesce2022-Training-Fundamentals

This is the repository for the Coalesce 2022 Fundamentals Live training
5
star
98

fishtown-ui

The design implementation used at Fishtown Analytics.
TypeScript
5
star
99

coalesce-2022-dag-workshop

Get more from your DAG! A Coalesce 2022 workshop that outlines how to use macros and DAG introspection to get more from your dbt project.
5
star
100

tap-amazon-advertising

Singer.io tap for Amazon Advertising
Python
5
star