• Stars
    star
    130
  • Rank 262,508 (Top 6 %)
  • Language
  • Created about 3 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

dbt package mimicking dplyr select-helpers semantics

dbtplyr

This add-on package enhances dbt by providing macros which programmatically select columns based on their column names. It is inspired by the across() function and the select helpers in the R package dplyr.

dplyr (>= 1.0.0) has helpful semantics for selecting and applying transformations to variables based on their names. For example, if one wishes to take the sum of all variables with name prefixes of N and the mean of all variables with name prefixes of IND in the dataset mydata, they may write:

summarize(
  mydata, 
  across( starts_with('N'), sum),
  across( starts_with('IND', mean)
)

This package enables us to similarly write dbt data models with commands like:

{% set cols = dbtplyr.get_column_names( ref('mydata') ) %}
{% set cols_n = dbtplyr.starts_with('N', cols) %}
{% set cols_ind = dbtplyr.starts_with('IND', cols) %}

select

  {{ dbtplyr.across(cols_n, "sum({{var}}) as {{var}}_tot") }},
  {{ dbtplyr.across(cols_ind, "mean({{var}}) as {{var}}_avg") }}

from {{ ref('mydata') }}

which dbt then compiles to standard SQL.

Alternatively, to protect against cases where no column names matched the pattern provided (e.g. no variables start with n so cols_n is an empty list), one may instead internalize the final comma so that it is only compiled to SQL when relevant by using the final_comma parameter of across.

  {{ dbtplyr.across(cols_n, "sum({{var}}) as {{var}}_tot", final_comma = true) }}

Note that, slightly more dplyr-like, you may also write:

select

  {{ dbtplyr.across(dbtplyr.starts_with('N', ref('mydata')), "sum({{var}}) as {{var}}_tot") }},
  {{ dbtplyr.across(dbtplyr.starts_with('IND', ref('mydata')), "mean({{var}}) as {{var}}_avg") }}

from {{ ref('mydata') }}

But, as each function call is a bit longer than the equivalent dplyr code, I personally find the first form more readable.

Macros

The complete list of macros included are:

Functions to apply operation across columns

  • across(var_list, script_string, final_comma)
  • c_across(var_list, script_string)

Functions to evaluation condition across columns

  • if_any(var_list, script_string)
  • if_all(var_list, script_string)

Functions to subset columns by naming conventions

  • starts_with(string, relation or list)
  • ends_with(string, relation or list)
  • contains(string, relation or list)
  • not_contains(string, relation or list)
  • one_of(string_list, relation or list)
  • not_one_of(string_list, relation or list)
  • matches(string, relation)
  • everything(relation)
  • where(fn, relation) where fn is the string name of a Column type-checker (e.g. "is_number")

Note that all of the select-helper functions that take a relation as an argument can optionally be passed a list of names instead.

Documentation for these functions is available on the package website and in the macros/macro.yml file.

More Repositories

1

projmgr

R-based project management tools
R
118
star
2

Rtistic

A hackathon-in-a-box / "cookbook" to help build an R package with custom RMarkdown themes and ggplot2 themes & palettes. This looks like a package but it is not intended to be installed as-is. It is a wireframe to be used by an individual or group to create their *own* package!
R
81
star
3

data-disasters

HTML
41
star
4

demo-rmd-snow

HTML
30
star
5

conf-speaker-experience

A list of "good practices" for creating a good environment for conference speakers. Based on my own experience as a speaker but prepared in my role as an organizer for satRdays Chicago 2019
30
star
6

xaringan_columns

JavaScript macro for adding multiple columns to xaringan (.Rmd) slides
HTML
28
star
7

demo-shiny-modules

A demo of and argument for using Shiny modules even for simple Shiny applications
R
25
star
8

convo

R package based on "Column Names as Contracts" blog post (https://emilyriederer.netlify.app/post/column-name-contracts/)
R
23
star
9

data-validation-demo

Demo of pointblank / projmgr / GitHub Actions / Slack workflow for data quality monitoring
15
star
10

demo-crosstalk

A short lesson on using crosstalk for adding interactivity to an R Markdown document
HTML
11
star
11

nc-votes-duckdb

Demonstrating a "minimum viable approach" to large-scale OOM data processing with DuckDB and Arrow using NC voterfile data for the 2022 midterms
Python
10
star
12

dbt-convo-covid

Demo repo with full code described in blog post
9
star
13

wigo

knitr engine to help understand what is going on in an unruly RMarkdown script
R
7
star
14

talk-prep

A few thoughts on preparing a tech talk
3
star
15

website

Blog / website repo
JavaScript
3
star
16

sticker-driven-maintenance

1
star
17

chirunconf-dashboard

Repo for chirunconf Shiny dashboard
R
1
star
18

codespaces-experiment

Python
1
star
19

devspace-test

1
star