• This repository has been archived on 24/Jul/2024
  • Stars
    star
    395
  • Rank 109,040 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Apache Airflow integration for dbt

airflow-dbt

This is a collection of Airflow operators to provide easy integration with dbt.

from airflow import DAG
from airflow_dbt.operators.dbt_operator import (
    DbtSeedOperator,
    DbtSnapshotOperator,
    DbtRunOperator,
    DbtTestOperator,
    DbtCleanOperator,
)
from airflow.utils.dates import days_ago

default_args = {
  'dir': '/srv/app/dbt',
  'start_date': days_ago(0)
}

with DAG(dag_id='dbt', default_args=default_args, schedule_interval='@daily') as dag:

  dbt_seed = DbtSeedOperator(
    task_id='dbt_seed',
  )

  dbt_snapshot = DbtSnapshotOperator(
    task_id='dbt_snapshot',
  )

  dbt_run = DbtRunOperator(
    task_id='dbt_run',
  )

  dbt_test = DbtTestOperator(
    task_id='dbt_test',
    retries=0,  # Failing tests would fail the task, and we don't want Airflow to try again
  )

  dbt_clean = DbtCleanOperator(
    task_id='dbt_clean',
  )

  dbt_seed >> dbt_snapshot >> dbt_run >> dbt_test >> dbt_clean

Installation

Install from PyPI:

pip install airflow-dbt

It will also need access to the dbt CLI, which should either be on your PATH or can be set with the dbt_bin argument in each operator.

Usage

There are five operators currently implemented:

Each of the above operators accept the following arguments:

  • env
    • If set as a kwarg dict, passed the given environment variables as the arguments to the dbt task
  • profiles_dir
    • If set, passed as the --profiles-dir argument to the dbt command
  • target
    • If set, passed as the --target argument to the dbt command
  • dir
    • The directory to run the dbt command in
  • full_refresh
    • If set to True, passes --full-refresh
  • vars
    • If set, passed as the --vars argument to the dbt command. Should be set as a Python dictionary, as will be passed to the dbt command as YAML
  • models
    • If set, passed as the --models argument to the dbt command
  • exclude
    • If set, passed as the --exclude argument to the dbt command
  • select
    • If set, passed as the --select argument to the dbt command
  • selector
    • If set, passed as the --selector argument to the dbt command
  • dbt_bin
    • The dbt CLI. Defaults to dbt, so assumes it's on your PATH
  • verbose
    • The operator will log verbosely to the Airflow logs
  • warn_error
    • If set to True, passes --warn-error argument to dbt command and will treat warnings as errors

Typically you will want to use the DbtRunOperator, followed by the DbtTestOperator, as shown earlier.

You can also use the hook directly. Typically this can be used for when you need to combine the dbt command with another task in the same operators, for example running dbt docs and uploading the docs to somewhere they can be served from.

Building Locally

To install from the repository: First it's recommended to create a virtual environment:

python3 -m venv .venv

source .venv/bin/activate

Install using pip:

pip install .

Testing

To run tests locally, first create a virtual environment (see Building Locally section)

Install dependencies:

pip install . pytest

Run the tests:

pytest tests/

Code style

This project uses flake8.

To check your code, first create a virtual environment (see Building Locally section):

pip install flake8
flake8 airflow_dbt/ tests/ setup.py

Package management

If you use dbt's package manager you should include all dependencies before deploying your dbt project.

For Docker users, packages specified in packages.yml should be included as part your docker image by calling dbt deps in your Dockerfile.

Amazon Managed Workflows for Apache Airflow (MWAA)

If you use MWAA, you just need to update the requirements.txt file and add airflow-dbt and dbt to it.

Then you can have your dbt code inside a folder {DBT_FOLDER} in the dags folder on S3 and configure the dbt task like below:

dbt_run = DbtRunOperator(
  task_id='dbt_run',
  dbt_bin='/usr/local/airflow/.local/bin/dbt',
  profiles_dir='/usr/local/airflow/dags/{DBT_FOLDER}/',
  dir='/usr/local/airflow/dags/{DBT_FOLDER}/'
)

Templating and parsing environments variables

If you would like to run DBT using custom profile definition template with environment-specific variables, like for example profiles.yml using jinja:

<profile_name>:
  outputs:
    <source>:
      database: "{{ env_var('DBT_ENV_SECRET_DATABASE') }}"
      password: "{{ env_var('DBT_ENV_SECRET_PASSWORD') }}"
      schema: "{{ env_var('DBT_ENV_SECRET_SCHEMA') }}"
      threads: "{{ env_var('DBT_THREADS') }}"
      type: <type>
      user: "{{ env_var('USER_NAME') }}_{{ env_var('ENV_NAME') }}"
  target: <source>

You can pass the environment variables via the env kwarg parameter:

import os
...

dbt_run = DbtRunOperator(
  task_id='dbt_run',
  env={
    'DBT_ENV_SECRET_DATABASE': '<DATABASE>',
    'DBT_ENV_SECRET_PASSWORD': '<PASSWORD>',
    'DBT_ENV_SECRET_SCHEMA': '<SCHEMA>',
    'USER_NAME': '<USER_NAME>',
    'DBT_THREADS': os.getenv('<DBT_THREADS_ENV_VARIABLE_NAME>'),
    'ENV_NAME': os.getenv('ENV_NAME')
  }
)

License & Contributing

GoCardless ♥ open source. If you do too, come join us.

More Repositories

1

statesman

A statesmanlike state machine library.
Ruby
1,775
star
2

angularjs-style-guide

AngularJS style guide used at GoCardless
1,443
star
3

business

Ruby business day calculations
Ruby
498
star
4

http-api-design

HTTP Design Guidelines
419
star
5

es6-angularjs

JavaScript
180
star
6

coach

Alternative controllers with middleware
Ruby
165
star
7

logjam

a log shipping tool
Go
136
star
8

our-postgresql-setup

PostgreSQL clustering with corosync/pacemaker test environment
Shell
124
star
9

nandi

Fear free PostgreSQL migrations for Rails
Ruby
124
star
10

activerecord-safer_migrations

Safer ActiveRecord migrations for Postgres
Ruby
117
star
11

amqpc

AMQP CLI tool
Go
115
star
12

pgreplay-go

Postgres load testing tool
Go
113
star
13

ibandit

Convert national banking details into IBANs, and vice-versa.
Ruby
101
star
14

rspec-activejob

RSpec matchers for testing ActiveJob
Ruby
98
star
15

gocardless-pro-php

GoCardless Pro PHP Client
PHP
97
star
16

gocardless-legacy-php

The PHP client library for the GoCardless Legacy API
PHP
66
star
17

stolon-pgbouncer

Add-on to stolon for providing zero-downtime failover and PgBouncer integration
Go
62
star
18

gocardless-legacy-ruby

The Ruby client library for the GoCardless API
Ruby
52
star
19

draupnir

Anonymised database instances as-a-service
Go
45
star
20

gc-http-factory

A factory for creating $http services in Angular.
JavaScript
44
star
21

gocardless-pro-python

GoCardless Pro Python Client
Python
37
star
22

bump

Automated dependency management for Ruby, Python and Javascript
Ruby
36
star
23

gocardless-pro-ruby

GoCardless Pro Ruby Client
Ruby
30
star
24

stubby

Your favourite pretender stubber
JavaScript
29
star
25

gocardless-dotnet

GoCardless .NET Client
C#
28
star
26

gocardless-nodejs

GoCardless Node.js client
TypeScript
24
star
27

prius

Environmentally-friendly application config
Ruby
24
star
28

utopia-getting-started

Sharing a copy of our getting-started tutorial, as a demonstration of how our infrastructure works with utopia
23
star
29

resque-sentry

A Resque failure backend that sends errors to Sentry
Ruby
23
star
30

statesman-events

Event support for Statesman (UNMAINTAINED)
Ruby
23
star
31

anony

A small library that defines how ActiveRecord models should be anonymised for deletion purposes.
Ruby
23
star
32

theatre

GoCardless' collection of Kubernetes extensions
Go
23
star
33

systemjs-assetgraph

AssetGraph transform for optimizing SystemJS pages for production
JavaScript
22
star
34

gocardless-legacy-python

The Python client library for the GoCardless API
Python
22
star
35

gocardless-pro-java

GoCardless Pro Java Client
Java
19
star
36

javascript-style-guide

The GoCardless JavaScript styleguide
18
star
37

pgsql-cluster-manager

Daemon and migration tool that manages Postgres cluster using etcd/corosync/pacemaker
Go
17
star
38

slo-builder

Templates for building SLOs with Prometheus rules and alerts
Go
16
star
39

drydock

DryDock is a utility to clean up Docker images
Go
16
star
40

business-python

Python business day calculations
Python
15
star
41

ng-gc-components

JavaScript
14
star
42

legacy-api-docs

Docs for GoCardless legacy API
JavaScript
14
star
43

companies-house-rest

Ruby wrapper for the Companies House REST API.
Ruby
13
star
44

gocardless-legacy-dotnet

The .NET client library for the GoCardless Legacy API
C#
12
star
45

bump-core

The core logic powering Bump
Ruby
12
star
46

gocardless-legacy-node

The Node.js client library for the GoCardless Legacy API
JavaScript
11
star
47

html-style-guide

How we write HTML at GoCardless
11
star
48

atum

Ruby HTTP client generator for APIs represented with JSON schema
Ruby
9
star
49

gocardless-legacy-java

The Java client library for the GoCardless Legacy API
Java
9
star
50

codeigniter-gocardless

The CodeIgniter spark for the GoCardless API
PHP
8
star
51

sample-legacy-django-app

A sample Django app demonstrating the use of the GoCardless Legacy API and the Python client.
Python
8
star
52

logsearch

Search Logstash / Elasticsearch logs from the command line
Go
7
star
53

companies-house-gateway-ruby

Ruby wrapper for the Companies House XML Gateway
Ruby
6
star
54

react-dropin

React bindings for the GoCardless Dropin checkout flow
TypeScript
6
star
55

airflow-looker

A collection of Airflow extensions to provide integration with Looker
Python
6
star
56

uk_phone_numbers

A Ruby library for validating and formatting UK phone numbers.
Ruby
6
star
57

callcredit-ruby

Ruby wrapper for Callcredit's CallValidate API
Ruby
6
star
58

sample-legacy-rails-app

A sample Rails app demonstrating the use of GoCardless Legacy API and the Ruby client.
Ruby
6
star
59

gocardless-pro-go

Go
6
star
60

creditsafe-ruby

Ruby library for the Creditsafe SOAP API
Ruby
5
star
61

github-archive

Easy way to archive an entire organisation repos on S3
Go
5
star
62

cli-releases

Release repo for the gocardless cli
Dockerfile
4
star
63

simple-swag

Dead simple swagger/openapi docs server
Go
3
star
64

bank-webfont

A webfont of prominent UK banks
CSS
3
star
65

bucket-store

Helper library to access cloud storage services
Ruby
3
star
66

gc_ruboconfig

GoCardless Engineering shared rubocop config
Ruby
3
star
67

slackify

Update your Slack status with what you're listening to in Spotify
Elixir
3
star
68

gocardless-legacy-example-django

An example site using the GoCardless Legacy API
Python
3
star
69

belongs-to-one-of

Gem to support activemodel relations where one model can be a child of one of many models
Ruby
3
star
70

coach-demo

Ruby
3
star
71

gocardless-pro-ruby-example

Example of using the GoCardless Pro Ruby client library
HTML
3
star
72

gocardless-php

Placeholder explaining our PHP API libraries.
2
star
73

prometheus-client-ruby-data-stores-experiments

Ruby
2
star
74

que

A Ruby job queue that uses PostgreSQL's advisory locks for speed and reliability.
Ruby
2
star
75

gocardless-pro-java-example

Example of using the GoCardless Pro Java client library
Java
2
star
76

gocardless-legacy-partner-example-ruby

An example GoCardless partner app, written in Sinatra
Ruby
1
star
77

open-charities

A Ruby library for querying the OpenCharities database
Ruby
1
star
78

slackify-dot-rb

Take 2: this time without a language barrier
Ruby
1
star
79

gocardless-pro-java-maven-example

An example java app that handles webhooks
Java
1
star
80

gocardless-pro-ios-sdk

GoCardless Pro iOS SDK
Swift
1
star
81

publish-techdocs-action

Action to generate and publish TechDocs
Shell
1
star
82

salesforce_wrapper

A wrapper around Restforce, catching exceptions and performing a configurable action with them (e.g. sending an email).
Ruby
1
star
83

passfort

Ruby client library for the PassFort API
Ruby
1
star
84

homebrew-taps

Ruby
1
star
85

gocardless-pro-php-demo

Pro client PHP demo
PHP
1
star
86

rspec-que

RSpec matchers for testing Que
Ruby
1
star