• Stars
    star
    203
  • Rank 192,890 (Top 4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 3 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Macros for calculating metrics

Deprecation Notice

The dbt_metrics package will no longer be maintained with the release of dbt-core 1.6 in July 2023. All semantic layer capability will be instead be found in MetricFlow.

dbt_metrics

About

This dbt package generates queries based on metrics, introduced to dbt Core in v1.0. For more information on metrics, such as available calculation methods, properties, and other definition parameters, please reference the documentation linked above.

Tenets

The tenets of dbt_metrics, which should be considered during development, issues, and contributions, are:

  • A metric value should be consistent everywhere that it is referenced
  • We prefer generalized metrics with many dimensions over specific metrics with few dimensions
  • It should be easier to use dbt’s metrics than it is to avoid them
  • Organization and discoverability are as important as precision
  • One-off models built to power metrics are an anti-pattern

Installation Instructions

Check dbt Hub for the latest installation instructions, or read the docs for more information on installing packages.

Include in your package.yml

packages:
  - package: dbt-labs/metrics
    version: [">=1.5.0", "<1.6.0"]

Supported Adapters

The adapaters that are currently supported in the dbt_metrics package are:

  • Snowflake
  • BigQuery
  • Redshift
  • Postgres
  • Databricks

Macros

Calculate

The calculate macro performs the metric aggregation and returns the dataset based on the specifications of the metric definition and the options selected in the macro. It can be accessed like any other macro:

select * 
from {{ metrics.calculate(
    metric('new_customers'),
    grain='week',
    dimensions=['plan', 'country'],
    secondary_calculations=[
        metrics.period_over_period(comparison_strategy="ratio", interval=1, alias="pop_1wk"),
        metrics.period_over_period(comparison_strategy="difference", interval=1),

        metrics.period_to_date(aggregate="average", period="month", alias="this_month_average"),
        metrics.period_to_date(aggregate="sum", period="year"),

        metrics.rolling(aggregate="average", interval=4, alias="avg_past_4wks"),
        metrics.rolling(aggregate="min", interval=4)
    ],
    start_date='2022-01-01',
    end_date='2022-12-31',
    where="plan='filter_value'"
) }}

If no grain is provided to the macro in the query then the dataset returned will not be time-bound.

start_date and end_date are optional. When not provided, the spine will span all dates from oldest to newest in the metric's dataset. This default is likely to be correct in most cases, but you can use the arguments to either narrow the resulting table or expand it (e.g. if there was no new customers until 3 January but you want to include the first two days as well). Both values are inclusive.

Supported Inputs

Input Example Description Required
metric_list metric('some_metric'), [metric('some_metric'),metric('some_other_metric')] The metric(s) to be queried by the macro. If multiple metrics required, provide in list format. Required
grain day, week, month The time grain that the metric will be aggregated to in the returned dataset Optional
dimensions [plan, country, some_predefined_dimension_name] The dimensions you want the metric to be aggregated by in the returned dataset Optional
start_date 2022-01-01 Limits the date range of data used in the metric calculation by not querying data before this date Optional
end_date 2022-12-31 Limits the date range of data used in the metric claculation by not querying data after this date Optional
where plan='paying_customer' A sql statment, or series of sql statements, that alter the final CTE in the generated sql. Most often used to limit the data to specific values of dimensions provided Optional
date_alias 'date_field' A string value that aliases the date field in the final dataset Optional

Develop

There are times when you want to test what a metric might look like before defining it in your project. In these cases you should use the develop metric, which allows you to provide a single metric in a contained yml in order to simulate what the metric might loook like if defined in your project.

{% set my_metric_yml -%}
{% raw %}

metrics:
  - name: develop_metric
    model: ref('fact_orders')
    label: Total Discount ($)
    timestamp: order_date
    time_grains: [day, week, month]
    calculation_method: average
    expression: discount_total
    dimensions:
      - had_discount
      - order_country

{% endraw %}
{%- endset %}

select * 
from {{ metrics.develop(
        develop_yml=my_metric_yml,
        metric_list=['develop_metric']
        grain='month'
        )
    }}

Supported Inputs

Input Example Description Required
metric_list ('some_metric'), [('some_metric'),('some_other_metric')] The metric(s) to be queried by the macro. If multiple metrics required, provide in list format. Do not provide in metric('name) format as that triggers dbt parsing for metric that doesn't exist. Just provide the name of the metric. Required
grain day, week, month The time grain that the metric will be aggregated to in the returned dataset Optional
dimensions [plan, country, some_predefined_dimension_name The dimensions you want the metric to be aggregated by in the returned dataset Optional
start_date 2022-01-01 Limits the date range of data used in the metric calculation by not querying data before this date Optional
end_date 2022-12-31 Limits the date range of data used in the metric claculation by not querying data after this date Optional
where plan='paying_customer' A sql statment, or series of sql statements, that alter the final CTE in the generated sql. Most often used to limit the data to specific values of dimensions provided Optional
date_alias 'date_field' A string value that aliases the date field in the final dataset Optional

Multiple Metrics Or Derived Metrics

If you have a more complicated use case that you are interested in testing, the develop macro also supports this behavior. The only caveat is that you must include the raw tags for any provided metric yml that contains a derived metric. Example below:

{% set my_metric_yml -%}
{% raw %}

metrics:
  - name: develop_metric
    model: ref('fact_orders')
    label: Total Discount ($)
    timestamp: order_date
    time_grains: [day, week, month]
    calculation_method: average
    expression: discount_total
    dimensions:
      - had_discount
      - order_country

  - name: derived_metric
    label: Total Discount ($)
    timestamp: order_date
    time_grains: [day, week, month]
    calculation_method: derived
    expression: "{{ metric('develop_metric') }} - 1 "
    dimensions:
      - had_discount
      - order_country

  - name: some_other_metric_not_using
    label: Total Discount ($)
    timestamp: order_date
    time_grains: [day, week, month]
    calculation_method: derived
    expression: "{{ metric('derived_metric') }} - 1 "
    dimensions:
      - had_discount
      - order_country

{% endraw %}
{%- endset %}

select * 
from {{ metrics.develop(
        develop_yml=my_metric_yml,
        metric_list=['derived_metric']
        grain='month'
        )
    }}

The above example will return a dataset that contains the metric provided in the metric list (derived_metric) and the parent metric (develop_metric). It will not contain some_other_metric_not_using as it is not designated in the metric list or a parent of the metrics included.

Available calculation methods

The method of calculation (aggregation or derived) that is applied to the expression.

Metric Calculation Method Description
count This metric type will apply the count aggregation to the specified field
count_distinct This metric type will apply the count aggregation to the specified field, with an additional distinct statement inside the aggregation
sum This metric type will apply the sum aggregation to the specified field
average This metric type will apply the average aggregation to the specified field
min This metric type will apply the min aggregation to the specified field
max This metric type will apply the max aggregation to the specified field
median This metric type will apply the median aggregation to the specified field, or an alternative percentile_cont aggregation if median is not available
derived This metric type is defined as any non-aggregating calculation of 1 or more metrics

Use cases and examples

Jaffle Shop Metrics

For those curious about how to implement metrics in a dbt project, please reference the jaffle_shop_metrics.

Secondary calculations

Secondary calculations are window functions which act on the primary metric or metrics. You can use them to compare values to an earlier period and calculate year-to-date sums or rolling averages. The use of secondary calculations requires a grain input in the macro.

Create secondary calculations using the convenience constructor macros. Alternatively, you can manually create a list of dictionary entries (not recommended).

Example of manual dictionary creation (not recommended)

Creating a calculation this way has no input validation.

[
    {"calculation": "period_over_period", "interval": 1, "comparison_strategy": "difference", "alias": "pop_1mth"},
    {"calculation": "rolling", "interval": 3, "aggregate": "sum"}
]

Column aliases are automatically generated, but you can override them by setting alias.

Period over Period (source)

The period over period secondary calculation performs a calculation against the metric(s) in question by either determining the difference or the ratio between two points of time. This other point in time is determined by the input variable which looks at the grain selected in the macro.

Constructor: metrics.period_over_period(comparison_strategy, interval [, alias, metric_list])

Input Example Description Required
comparison_strategy ratio or difference How to calculate the delta between the two periods Yes
interval 1 Integer - the number of time grains to look back Yes
alias week_over_week The column alias for the resulting calculation No
metric_list base_sum_metric List of metrics that the secondary calculation should be applied to. Default is all metrics selected No

Period to Date (source)

The period to date secondary calculation performs an aggregation on a defined period of time that is equal to or coarser (higher, more aggregated) than the grain selected. Great example of this is when you want to display a month_to_date value alongside your weekly grained metric.

Constructor: metrics.period_to_date(aggregate, period [, alias, metric_list])

Input Example Description Required
aggregate max, average The aggregation to use in the window function. Options vary based on the primary aggregation and are enforced in validate_aggregate_coherence(). Yes
period "day", "week" The time grain to aggregate to. One of ["day", "week", "month", "quarter", "year"]. Must be at equal or coarser (higher, more aggregated) granularity than the metric's grain (see Time Grains below). In example grain of month, the acceptable periods would be month, quarter, or year. Yes
alias month_to_date The column alias for the resulting calculation No
metric_list base_sum_metric List of metrics that the secondary calculation should be applied to. Default is all metrics selected No

Rolling (source)

The rolling secondary calculation performs an aggregation on a number of rows in metric dataset. For example, if the user selects the week grain and sets a rolling secondary calculation to 4 then the value returned will be a rolling 4 week calculation of whatever aggregation type was selected. If the interval input is not provided then the rolling caclulation will be unbounded on all preceding rows.

Constructor: metrics.rolling(aggregate [, interval, alias, metric_list])

Input Example Description Required
aggregate max, average The aggregation to use in the window function. Options vary based on the primary aggregation and are enforced in validate_aggregate_coherence(). Yes
interval 1 Integer - the number of time grains to look back No
alias month_to_date The column alias for the resulting calculation No
metric_list base_sum_metric List of metrics that the secondary calculation should be applied to. Default is all metrics selected No

Prior (source)

The prior secondary calculation returns the value from a specified number of intervals prior to the row.

Constructor: metrics.prior(interval [, alias, metric_list])

Input Example Description Required
interval 1 Integer - the number of time grains to look back Yes
alias 2_weeks_prior The column alias for the resulting calculation No
metric_list base_sum_metric List of metrics that the secondary calculation should be applied to. Default is all metrics selected No

Customisation

Most behaviour in the package can be overridden or customised.

Metric Configs

Metric nodes now accept config dictionaries like other dbt resources (beginning in dbt-core v1.3+). Metric configs can specified in the metric yml itself or for groups of metrics in the dbt_project.yml file.

# in metrics.yml
version: 2

metrics:
  - name: config_metric
    label: Example Metric with Config
    model: ref('my_model')
    calculation_method: count
    timestamp: date_field
    time_grains: [day, week, month]

    config:
      enabled: True

Or:

# in dbt_project.yml

metrics: 
  your_project_name: 
    +enabled: true

The metrics package contains validation on the configurations you're able to provide.

Accepted Metric Configurations

Below is the list of metric configs currently accepted by this package.

Config Type Accepted Values Default Value Description
enabled boolean True/False True Enables or disables a metric node. When disabled, dbt will not consider it as part of your project.
treat_null_values_as_zero boolean True/False True Controls the coalesce behavior for metrics. By default, when there are no observations for a metric, the output of the metric as well as period Over period secondary calculations will include a coalesce({{ field }}, 0) to return 0's rather than nulls. Setting this config to False instead returns NULL values.
restrict_no_time_grain_false boolean True/False False Controls whether this metric can be queried without a provided time grain. By default, all metrics will be able to be queried without a grain and aggregated in a non time-bound way. This config will restrict that behavior and require a grain input in order to query the metric.

Window Periods

Version 0.4.0 of this package, and beyond, offers support for the window attribute of the metric definition. This alters the underlying query to allow the metric definition to contain a window of time, such as the past 14 days or the past 3 months. Utilizing the window functionality requires a grain be provided in the query.

More information can be found in the metrics page of dbt docs/.

Derived Metrics

Note: In version 0.4.0, expression metrics were renamed to derived Version 0.3.0 of this package, and beyond, offer support for derived metrics! More information around this calculation_method can be found in themetrics page of dbt docs/.

Multiple Metrics

There may be instances where you want to return multiple metrics within a single macro. This is possible by providing a list of metrics instead of a single metric. See example below:

  select *
  from 
  {{ metrics.calculate(
      [metric('base_sum_metric'), metric('base_average_metric')], 
      grain='day', 
      dimensions=['had_discount']
      )
  }}

Note: The metrics must share the time_grain selected in the macro AND the dimensions selected in the macro. If these are not shared between the 2+ metrics, this behaviour will fail. Additionally, secondary calculations can be used for multiple metrics but each secondary calculation will be applied against each metric and returned in a field that matches the following pattern: metric_name_secondary_calculation_alias.

Where Clauses

Sometimes you'll want to see the metric in the context of a particular filter but this filter isn't neccesarily part of the metric definition. In this case, you can use the where input for the metrics package. It takes a list of sql statements and adds them in as filters to the final CTE in the produced SQL.

Additionally, this input can be used by BI Tools to as a way for filters in their UI to be passed through into the metric logic.

Calendar

The package comes with a basic calendar table, running between 2010-01-01 and 2029-12-31 inclusive. You can replace it with any custom calendar table which meets the following requirements:

  • Contains a date_day column.
  • Contains the following columns: date_week, date_month, date_quarter, date_year, or equivalents.
  • Additional date columns need to be prefixed with date_, e.g. date_4_5_4_month for a 4-5-4 retail calendar date set. Dimensions can have any name (see dimensions on calendar tables).

To do this, set the value of the dbt_metrics_calendar_model variable in your dbt_project.yml file:

#dbt_project.yml
config-version: 2
[...]
vars:
    dbt_metrics_calendar_model: my_custom_calendar

Dimensions from calendar tables

You may want to aggregate metrics by a dimension in your custom calendar table, for example is_weekend. You can include this within the list of dimensions in the macro call without it needing to be defined in the metric definition.

To do so, set a list variable at the project level called custom_calendar_dimension_list, as shown in the example below.

vars:
  custom_calendar_dimension_list: ["is_weekend"]

The is_weekend field can now be used by your metrics.

Time Grains

The package protects against nonsensical secondary calculations, such as a month-to-date aggregate of data which has been rolled up to the quarter. If you customise your calendar (for example by adding a 4-5-4 retail calendar month), you will need to override the get_grain_order() macro. In that case, you might remove month and replace it with month_4_5_4. All date columns must be prefixed with date_ in the table. Do not include the prefix when defining your metric, it will be added automatically.

Custom aggregations

To create a custom primary aggregation (as exposed through the calculation_method config of a metric), create a macro of the form metric_my_aggregate(expression), then override the gen_primary_metric_aggregate() macro to add it to the dispatch list. The package also protects against nonsensical secondary calculations such as an average of an average; you will need to override the get_metric_allowlist() macro to both add your new aggregate to to the existing aggregations' allowlists, and to make an allowlist for your new aggregation:

    {% do return ({
        "average": ['max', 'min'],
        "count": ['max', 'min', 'average', 'my_new_aggregate'],
        [...]
        "my_new_aggregate": ['max', 'min', 'sum', 'average', 'my_new_aggregate']
    }) %}

To create a custom secondary aggregation (as exposed through the secondary_calculations input in the metric macro), create a macro of the form secondary_calculation_my_calculation(metric_name, dimensions, calc_config), then override the perform_secondary_calculations() macro.

Secondary calculation column aliases

Aliases can be set for a secondary calculation. If no alias is provided, one will be automatically generated. To modify the existing alias logic, or add support for a custom secondary calculation, override generate_secondary_calculation_alias().

More Repositories

1

dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Python
7,503
star
2

dbt-utils

Utility functions for dbt projects.
Makefile
1,324
star
3

metricflow

MetricFlow allows you to define, build, and maintain metrics in code.
Python
1,110
star
4

dbt-codegen

Macros that generate dbt code
Makefile
460
star
5

dbt-project-evaluator

This package contains macros and models to find DAG issues automatically
Shell
437
star
6

corp

Assets related to the operation of Fishtown Analytics.
411
star
7

dbt-spark

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
Python
398
star
8

dbt-external-tables

dbt macros to stage external sources
PLpgSQL
299
star
9

jaffle_shop

A self-contained dbt project for testing purposes
264
star
10

dbt-audit-helper

Useful macros when performing data audits
205
star
11

dbt-snowflake

dbt-snowflake contains all of the code enabling dbt to work with Snowflake
Python
199
star
12

dbt-bigquery

dbt-bigquery contains all of the code required to make dbt operate on a BigQuery database.
Python
163
star
13

dbt-labs-experimental-features

dbt support for database features which are not yet supported natively in dbt-core
Makefile
139
star
14

dbt-project-maturity

132
star
15

snowplow

Data models for snowplow analytics.
LookML
126
star
16

docs.getdbt.com

The code behind docs.getdbt.com
JavaScript
118
star
17

dbt-jsonschema

Python
113
star
18

dbt-docs

Auto-generated data documentation site for dbt projects
JavaScript
111
star
19

dbt-meshify

A dbt-core python package that automates the management and creation of dbt groups, contracts, access, and versions.
Python
107
star
20

jaffle_shop_duckdb

Get started with dbt in less than 1 minute from `git clone` to `dbt docs serve` for free!
Shell
99
star
21

dbt-event-logging

a dbt package to make auditing dbt runs easy.
LookML
91
star
22

dbt-server

A web API for dbt.
Python
89
star
23

redshift

Redshift package for dbt (getdbt.com)
PLpgSQL
87
star
24

terraform-provider-dbtcloud

dbt Cloud Terraform Provider
Go
84
star
25

jaffle-shop

🥪🦘 An open source sandbox project exploring dbt workflows via a fictional sandwich shop's data.
83
star
26

dbt-completion.bash

Adds autocompletion to the dbt CLI
Shell
82
star
27

dbt-redshift

dbt-redshift contains all of the code enabling dbt to work with Amazon Redshift
Python
80
star
28

dbt-learn-group-training

The go to demo for public and private dbt Learn
70
star
29

mrr-playbook

Makefile
70
star
30

segment

Data models for Segment built using dbt (getdbt.com).
Makefile
65
star
31

dbt-semantic-interfaces

The shared semantic layer definitions that dbt-core and MetricFlow use.
Python
64
star
32

dbt-sql-formatter

makes your sql less bad
Python
57
star
33

dbt-init

A dbt-init script for consulting projects
Python
51
star
34

jaffle-shop-template

Template for a DuckDB-based, Codespace-oriented sandbox project that is also dbt Cloud compatible, and includes code-first BI tooling via Evidence.
Python
49
star
35

dbt-starter-project

Cloned by the `dbt init` task
44
star
36

dbt-learn-jinja

41
star
37

metrics-playbook

This repository contains files for the metrics framework playbook.
36
star
38

facebook-ads

dbt data models for facebook ads
35
star
39

dbt-presto

[ARCHIVED] The Presto adapter plugin for dbt Core
Python
33
star
40

spark-utils

Utility functions for dbt projects running on Spark
Python
30
star
41

stripe

TSQL
29
star
42

semantic-layer-llm-benchmarking

Shell
29
star
43

dbt-technical-blog-writing

Conversation around dbt technical tutorials, blogs, guides, etc
28
star
44

attribution-playbook

TSQL
27
star
45

dbt-databricks-demo

Demo project for dbt on Databricks
27
star
46

python-snowpark-formula1

Python
25
star
47

dbtdocs-to-lookml

[WIP] A script to add descriptions from dbt schema files to your lookml project
Python
25
star
48

tree-sitter-jinja2

C
25
star
49

jaffle_shop_metrics

This repository contains examples of how to use dbt's metric functionality on the jaffle shop dataset
24
star
50

quickbooks

dbt data models for Quickbooks Online.
TSQL
23
star
51

hub.getdbt.com

Package hub for dbt.
SCSS
23
star
52

dbt-jobs-as-code

Tools to handle dbt Jobs as well-defined YAML files
Python
22
star
53

rapid-onboarding-exemplar

dbt Project for Rapid Onboarding instructors to use in instruction and learners to reference throughout the course.
Python
22
star
54

dbt-extractor

Rust
21
star
55

airflow-fivetran-dbt

Example orchestration pipeline for Fivetran + dbt managed by Airflow
Python
20
star
56

jaffle-sl-template

19
star
57

dbt-starburst-demo

dbt + Trino demo project, using TPC-H sample data
18
star
58

2023-04-18---zero-to-dbt

Welcome! This dbt project is built to be imported to a freshly-initialized dbt project to work through the hands-on zero to dbt lab detailed in this repo's readme.
17
star
59

adwords

dbt adwords models
16
star
60

atom-dbt

Atom highlighter for dbt projects
16
star
61

dbot

An LLM-powered chatbot with the added context of the dbt knowledge base.
Python
16
star
62

dbt-adapter-tests

a pytest plugin for dbt adapter test suites
Python
16
star
63

dbt-rpc

A server that can compile and run queries in the context of a dbt project. Additionally, it provides methods that can be used to list and terminate running processes.
Python
16
star
64

dbt_faker

16
star
65

postgres

Postgres utility package for dbt (getdbt.com)
14
star
66

dbt-core-bundles

Generates bundles of verified adapters + core
Python
14
star
67

dbt-database-adapter-scaffold

Python
14
star
68

tap-framework

a framework for rapidly prototyping new singer taps
Python
14
star
69

actions

Common GitHub actions and workflows for maintaining dbt
Python
12
star
70

databricks_dbt_demo_project

dbt Cloud project for Databricks SQL Analytics Demos
Python
12
star
71

salesforce

11
star
72

homebrew-dbt

🍻 Homebrew formulae for installing dbt on macOS
Ruby
11
star
73

dbt-cloud-snowflake-demo-template

10
star
74

stitch-utils

Utility functions for Stitch-loaded data
Makefile
10
star
75

dbt-common

Python
10
star
76

dbt-styleguide

Styles for dbt on the net
HTML
9
star
77

terraform-aws-dbt-cloud-single-tenant

HCL
9
star
78

tap-s3-csv

Singer tap for getting CSV and XLS(X) data out of Amazon S3
Python
9
star
79

analytics-engineering-survey

A survey of pains, gains, and areas of investment for global data teams.
9
star
80

dbt-package-workshop

The companion repo to the 2022 Coalesce New Orleans Workshop - dbt Packages You Didn't Know You Needed
9
star
81

jaffle-shop-generator

Python
8
star
82

hubcap

This app adds modules to the hubsite at hub.getdbt.com
Python
8
star
83

dbt-python-hands-on-lab-snowpark

Python
7
star
84

shopify-data-warehouse

LookML
7
star
85

ecommerce

LookML
7
star
86

python-string-parsing

Demo using dateutil library with pandas dataframes in Python
Python
7
star
87

hologram

A library for automatically generating Draft 7 JSON Schemas from Python dataclasses
Python
7
star
88

shopify

dbt data models for Shopify.
TSQL
7
star
89

dbt-proserv

This is a public repository that the dbt proserv team uses for collective demos.
Python
7
star
90

dbt-cloud-openapi-spec

Python
6
star
91

new-python-wrench-demo

Demo of Python models to accompany the Snowflake blog post
Jupyter Notebook
6
star
92

dbt-integration-tests

Python
6
star
93

jaffle_shop-dev

The dev version of jaffle shop
6
star
94

snowflake-resource-monitoring

6
star
95

log-tables-example

Example of log table creation using dbt.
Shell
5
star
96

tap-amazon-mws

Singer.io tap for Amazon MWS
Python
5
star
97

snowflake_dbt_partner_demo

PLpgSQL
5
star
98

Coalesce2022-Training-Fundamentals

This is the repository for the Coalesce 2022 Fundamentals Live training
5
star
99

coalesce-2022-dag-workshop

Get more from your DAG! A Coalesce 2022 workshop that outlines how to use macros and DAG introspection to get more from your dbt project.
5
star
100

tap-amazon-advertising

Singer.io tap for Amazon Advertising
Python
5
star