• Stars
    star
    126
  • Rank 278,701 (Top 6 %)
  • Language LookML
  • License
    Apache License 2.0
  • Created almost 8 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Data models for snowplow analytics.

Snowplow sessionization

⛔🏚️ As of January 2023, this package is obsolete and no longer developed. We strongly recommend that you instead use the snowplow/snowplow_web package, maintained by the team at Snowplow.


This dbt package:

  • Rolls up page_view and page_ping events into page views and sessions
  • Performs "user stitching" to tie all historical events associated with an anonymous cookie (domain_userid) to the same user_id

Adapted from Snowplow's web model.

Models

The primary ouputs of this package are page views and sessions. There are several intermediate models used to create these two models.

model description
snowplow_page_views Contains a list of pageviews with scroll depth, view timing, and optionally useragent and performance data.
snowplow_sessions Contains a rollup of page views indexed by cookie id (domain_sessionid)

snowplow graph

Prerequisites

This package takes the Snowplow JavaScript tracker as its foundation. It assumes that all Snowplow events are sent with a web_page context.

Mobile

It is possible to sessionize mobile (app) events by including two predefined contexts with all events:

As long as all events are associated with an anonymous user, a session, and a screen/page view, they can be made to fit the same canonical data model as web events fired from the JavaScript tracker. Whether this is the desired outcome will vary significantly; mobile-first analytics often makes different assumptions about user identity, engagement, referral, and inactivity cutoffs.

For specific implementation details:

Installation Instructions

Check dbt Hub for the latest installation instructions, or read the docs for more information on installing packages.

Configuration

The variables needed to configure this package are as follows:

variable information required
snowplow:timezone Timezone in which analysis takes place. Used to calculate local times. No
snowplow:page_ping_frequency Configured timeout for page pings in tracker (seconds). Default=30 No
snowplow:events Schema and table containing all snowplow events Yes
snowplow:context:web_page Schema and table for web page context Yes
snowplow:context:performance_timing Schema and table for perf timing context, or false if none is present Yes
snowplow:context:useragent Schema and table for useragent context, or false if none is available Yes
snowplow:pass_through_columns Additional columns for inclusion in final models No
snowplow:page_view_lookback_days Amount of days to rescan to merge page_views in the same session Yes

An example dbt_project.yml configuration:

# dbt_project.yml

...

vars:
  'snowplow:timezone': 'America/New_York'
  'snowplow:page_ping_frequency': 10
  'snowplow:events': "{{ ref('sp_base_events') }}"
  'snowplow:context:web_page': "{{ ref('sp_base_web_page_context') }}"
  'snowplow:context:performance_timing': false
  'snowplow:context:useragent': false
  'snowplow:pass_through_columns': []
  'snowplow:page_view_lookback_days': 1

Database support

Core:

  • Redshift
  • Snowflake
  • BigQuery
  • Postgres

Plugins:

Contributions

Additional contributions to this package are very welcome! Please create issues or open PRs against master. Check out this post on the best workflow for contributing to a package..

Much of tracking can be the Wild West. Snowplow's canonical event model is a major asset in our ability to perform consistent analysis atop predictably structured data, but any detailed implementation is bound to diverge.

To that end, we aim to keep this package rooted in a garden-variety Snowplow web deployment. All PRs should seek to add or improve functionality that is contained within a plurality of Snowplow deployments.

If you need to change implementation-specific details, you have two avenues:

  • Override models from this package with versions that feature your custom logic. Create a model with the same name locally (e.g. snowplow_id_map) and disable the snowplow package's version in dbt_project.yml:
snowplow:
    ...
    identification:
      default:
        snowplow_id_map:
          +enabled: false
  • Fork this repository :)

More Repositories

1

dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Python
7,503
star
2

metricflow

MetricFlow allows you to define, build, and maintain metrics in code.
Python
1,086
star
3

dbt-utils

Utility functions for dbt projects.
Python
929
star
4

corp

Assets related to the operation of Fishtown Analytics.
411
star
5

dbt-project-evaluator

This package contains macros and models to find DAG issues automatically
Shell
385
star
6

dbt-spark

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
Python
365
star
7

dbt-codegen

Macros that generate dbt code
Makefile
322
star
8

dbt-external-tables

dbt macros to stage external sources
PLpgSQL
285
star
9

jaffle_shop

A self-contained dbt project for testing purposes
264
star
10

dbt-audit-helper

Useful macros when performing data audits
205
star
11

dbt_metrics

Macros for calculating metrics
Python
203
star
12

dbt-snowflake

dbt-snowflake contains all of the code enabling dbt to work with Snowflake
Python
199
star
13

dbt-bigquery

dbt-bigquery contains all of the code required to make dbt operate on a BigQuery database.
Python
163
star
14

dbt-project-maturity

132
star
15

dbt-labs-experimental-features

dbt support for database features which are not yet supported natively in dbt-core
Makefile
131
star
16

docs.getdbt.com

The code behind docs.getdbt.com
JavaScript
111
star
17

dbt-docs

Auto-generated data documentation site for dbt projects
JavaScript
111
star
18

dbt-jsonschema

Python
104
star
19

jaffle_shop_duckdb

Get started with dbt in less than 1 minute from `git clone` to `dbt docs serve` for free!
Shell
99
star
20

dbt-event-logging

a dbt package to make auditing dbt runs easy.
LookML
91
star
21

dbt-meshify

A dbt-core plugin that automates the management and creation of dbt groups, contracts, access, and versions.
Python
91
star
22

dbt-server

A web API for dbt.
Python
89
star
23

redshift

Redshift package for dbt (getdbt.com)
PLpgSQL
87
star
24

dbt-completion.bash

Adds autocompletion to the dbt CLI
Shell
82
star
25

dbt-redshift

dbt-redshift contains all of the code enabling dbt to work with Amazon Redshift
Python
80
star
26

terraform-provider-dbtcloud

dbt Cloud Terraform Provider
Go
73
star
27

mrr-playbook

Makefile
70
star
28

dbt-learn-group-training

The go to demo for public and private dbt Learn
69
star
29

segment

Data models for Segment built using dbt (getdbt.com).
Makefile
65
star
30

dbt-sql-formatter

makes your sql less bad
Python
57
star
31

dbt-semantic-interfaces

The shared semantic layer definitions that dbt-core and MetricFlow use.
Python
55
star
32

dbt-init

A dbt-init script for consulting projects
Python
51
star
33

jaffle-shop-template

Template for a DuckDB-based, Codespace-oriented sandbox project that is also dbt Cloud compatible, and includes code-first BI tooling via Evidence.
Python
49
star
34

dbt-starter-project

Cloned by the `dbt init` task
44
star
35

jaffle-shop

🥪🦘 An open source sandbox project exploring dbt workflows via a fictional sandwich shop's data.
42
star
36

dbt-learn-jinja

41
star
37

metrics-playbook

This repository contains files for the metrics framework playbook.
36
star
38

facebook-ads

dbt data models for facebook ads
35
star
39

dbt-presto

[ARCHIVED] The Presto adapter plugin for dbt Core
Python
33
star
40

spark-utils

Utility functions for dbt projects running on Spark
Python
30
star
41

stripe

TSQL
29
star
42

attribution-playbook

TSQL
27
star
43

dbt-databricks-demo

Demo project for dbt on Databricks
27
star
44

dbt-technical-blog-writing

Conversation around dbt technical tutorials, blogs, guides, etc
26
star
45

python-snowpark-formula1

Python
25
star
46

dbtdocs-to-lookml

[WIP] A script to add descriptions from dbt schema files to your lookml project
Python
25
star
47

jaffle_shop_metrics

This repository contains examples of how to use dbt's metric functionality on the jaffle shop dataset
24
star
48

tree-sitter-jinja2

C
23
star
49

quickbooks

dbt data models for Quickbooks Online.
TSQL
23
star
50

hub.getdbt.com

Package hub for dbt.
SCSS
23
star
51

dbt-jobs-as-code

Tools to handle dbt Jobs as well-defined YAML files
Python
22
star
52

semantic-layer-llm-benchmarking

Shell
22
star
53

dbt-extractor

Rust
21
star
54

airflow-fivetran-dbt

Example orchestration pipeline for Fivetran + dbt managed by Airflow
Python
20
star
55

rapid-onboarding-exemplar

dbt Project for Rapid Onboarding instructors to use in instruction and learners to reference throughout the course.
Python
20
star
56

jaffle-sl-template

19
star
57

2023-04-18---zero-to-dbt

Welcome! This dbt project is built to be imported to a freshly-initialized dbt project to work through the hands-on zero to dbt lab detailed in this repo's readme.
17
star
58

dbt-starburst-demo

dbt + Trino demo project, using TPC-H sample data
17
star
59

atom-dbt

Atom highlighter for dbt projects
16
star
60

adwords

dbt adwords models
16
star
61

dbot

An LLM-powered chatbot with the added context of the dbt knowledge base.
Python
16
star
62

dbt-adapter-tests

a pytest plugin for dbt adapter test suites
Python
16
star
63

dbt_faker

16
star
64

dbt-rpc

A server that can compile and run queries in the context of a dbt project. Additionally, it provides methods that can be used to list and terminate running processes.
Python
16
star
65

postgres

Postgres utility package for dbt (getdbt.com)
14
star
66

dbt-core-bundles

Generates bundles of verified adapters + core
Python
14
star
67

dbt-database-adapter-scaffold

Python
14
star
68

tap-framework

a framework for rapidly prototyping new singer taps
Python
14
star
69

actions

Common GitHub actions and workflows for maintaining dbt
Python
12
star
70

databricks_dbt_demo_project

dbt Cloud project for Databricks SQL Analytics Demos
Python
12
star
71

salesforce

11
star
72

homebrew-dbt

🍻 Homebrew formulae for installing dbt on macOS
Ruby
11
star
73

stitch-utils

Utility functions for Stitch-loaded data
Makefile
10
star
74

dbt-cloud-snowflake-demo-template

9
star
75

dbt-styleguide

Styles for dbt on the net
HTML
9
star
76

terraform-aws-dbt-cloud-single-tenant

HCL
9
star
77

tap-s3-csv

Singer tap for getting CSV and XLS(X) data out of Amazon S3
Python
9
star
78

analytics-engineering-survey

A survey of pains, gains, and areas of investment for global data teams.
9
star
79

dbt-package-workshop

The companion repo to the 2022 Coalesce New Orleans Workshop - dbt Packages You Didn't Know You Needed
9
star
80

jaffle-shop-generator

Python
8
star
81

hubcap

This app adds modules to the hubsite at hub.getdbt.com
Python
8
star
82

dbt-python-hands-on-lab-snowpark

Python
7
star
83

shopify-data-warehouse

LookML
7
star
84

ecommerce

LookML
7
star
85

python-string-parsing

Demo using dateutil library with pandas dataframes in Python
Python
7
star
86

hologram

A library for automatically generating Draft 7 JSON Schemas from Python dataclasses
Python
7
star
87

shopify

dbt data models for Shopify.
TSQL
7
star
88

dbt-cloud-openapi-spec

Python
6
star
89

new-python-wrench-demo

Demo of Python models to accompany the Snowflake blog post
Jupyter Notebook
6
star
90

dbt-integration-tests

Python
6
star
91

jaffle_shop-dev

The dev version of jaffle shop
6
star
92

snowflake-resource-monitoring

6
star
93

log-tables-example

Example of log table creation using dbt.
Shell
5
star
94

tap-amazon-mws

Singer.io tap for Amazon MWS
Python
5
star
95

tap-ringcentral

Singer.io tap for RingCentral
Python
5
star
96

snowflake_dbt_partner_demo

PLpgSQL
5
star
97

Coalesce2022-Training-Fundamentals

This is the repository for the Coalesce 2022 Fundamentals Live training
5
star
98

fishtown-ui

The design implementation used at Fishtown Analytics.
TypeScript
5
star
99

coalesce-2022-dag-workshop

Get more from your DAG! A Coalesce 2022 workshop that outlines how to use macros and DAG introspection to get more from your dbt project.
5
star
100

tap-amazon-advertising

Singer.io tap for Amazon Advertising
Python
5
star