• Stars
    star
    102
  • Rank 335,584 (Top 7 %)
  • Language Jinja
  • License
    Other
  • Created almost 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Astronomer Core Docker Images

Astronomer Core Docker Images

docker-pulls

Astronomer makes it easy to run, monitor, and scale Apache Airflow deployments in our cloud or yours. Source code is made available for the benefit of customers.

Terminology

Terms Example Description
edge build main-dev Built from the current main branch of astronomer/airflow
dev build 2.2.4-4-dev Development build, released during ap-airflow changes, including pre-releases and version releases
nightly build 2.2.4-nightly-20220314 Nightly builds, regularly triggered by a CircleCI pipeline sometime during the midnight hour UTC
release build 2.2.4-4 Release builds, triggered by a release PR

Note: Edge builds are always development builds

Build matrix

Build Nightly Pre-release PR Release PR
edge build ✅ ✅ ✅
nightly build ✅ ✅
dev build (only during pre-release) ✅ ✅
release build ✅

Docker images

Docker images for deploying and running Astronomer Core are currently available on Quay.

We publish 2 variants for each AC Version (example: 2.3.4-7):

  1. quay.io/astronomer/ap-airflow:2.3.4-7
  2. quay.io/astronomer/ap-airflow:2.3.4-7-onbuild

The only difference between them is that the -onbuild images uses Docker ONBUILD commands to copy packages.txt, requirements.txt and the entire project directory (including dags, plugins folders etc) in the docker file.

We also publish a "floating" or movable tag that points at the latest release of the Airflow version:

  1. quay.io/astronomer/ap-airflow:2.3.4
  2. quay.io/astronomer/ap-airflow:2.3.4-onbuild

Version Life Cycle & Maintenance policy

The support and maintenance of the Docker images are described in the Version Life Cycle.

Contents of this repo

  • The official Dockerfiles that build Astronomer Core Images
  • Example docker-compose files for running various pieces and configurations of the platform.

Contribute

Step-by-step instructions for common activities

Release a new Astronomer Certified major, minor, or bugfix version (eg: X.Y.Z)

Click to expand Step-By-Step instructions
  1. Remove the -dev part of the relevant version in IMAGE_MAP in .circleci/common.py.

    Example: The latest dev version is 2.2.1-1-dev, and we want to release 2.2.1-1.

    diff --git a/.circleci/common.py b/.circleci/common.py
    index xxxxxxx..yyyyyyy 100644
    --- a/.circleci/common.py
    +++ b/.circleci/common.py
    @@ -35,7 +35,7 @@ IMAGE_MAP = collections.OrderedDict([
         ("2.1.3-2", ["buster"]),
         ("2.1.4-2", ["buster"]),
         ("2.2.0-3-dev", ["bullseye", "buster"]),
    -    ("2.2.1-1-dev", ["bullseye", "buster"]),
    +    ("2.2.1-1", ["bullseye", "buster"]),
    ])
    
    # Airflow Versions for which we don't publish Python Wheels
  2. Run the update-dockerfiles pre-commit hook (this should fail but it should change the relevant Dockerfile).

    Example:

    pre-commit run update-dockerfiles
  3. Add the changed Dockerfile and commit (this should succeed).

    Example: The update-dockerfiles hook updated 2.2.1/bullseye/Dockerfile:

    git add 2.2.1/bullseye/Dockerfile; git commit

Release an existing Astronomer Certified version with an updated version of Airflow

Click to expand Step-By-Step instructions
  1. Update the postfix version of the relevant version in IMAGE_MAP in .circleci/common.py.

    Example: The latest AC version is 2.2.0-1 and we want to release 2.2.0-2.

    diff --git a/.circleci/common.py b/.circleci/common.py
    index xxxxxxx..yyyyyyy 100644
    --- a/.circleci/common.py
    +++ b/.circleci/common.py
    @@ -35,7 +35,7 @@ IMAGE_MAP = collections.OrderedDict([
         ("2.1.3-2", ["buster"]),
         ("2.1.4-2", ["buster"]),
         ("2.2.0-3-dev", ["bullseye", "buster"]),
    -    ("2.2.1-1", ["bullseye", "buster"]),
    +    ("2.2.1-2", ["bullseye", "buster"]),
     ])
    
     # Airflow Versions for which we don't publish Python Wheels
  2. Run the update-dockerfiles pre-commit hook (this should fail but it should change the relevant Dockerfile).

    Example:

    pre-commit run update-dockerfiles
  3. Add the changed Dockerfile and commit (this should succeed).

    Example: The update-dockerfiles hook updated 2.2.0/bullseye/Dockerfile:

    git add 2.2.0/bullseye/Dockerfile; git commit

Add new Astronomer Certified development version

Click to expand Step-By-Step instructions
  1. Add the Astronomer Certified version to IMAGE_MAP in .circleci/common.py.

    Example: The latest previous release was 2.2.1-1 and we're adding 2.3.0-1-dev.

    diff --git a/.circleci/common.py b/.circleci/common.py
    index xxxxxxx..yyyyyyy 100644
    --- a/.circleci/common.py
    +++ b/.circleci/common.py
    @@ -36,6 +36,7 @@ IMAGE_MAP = collections.OrderedDict([
         ("2.1.4-2", ["buster"]),
         ("2.2.0-3-dev", ["bullseye", "buster"]),
         ("2.2.1-1", ["bullseye", "buster"]),
    +    ("2.3.0-1-dev", ["bullseye"]),
     ])
    
     # Airflow Versions for which we don't publish Python Wheels
  2. Edit the new CHANGELOG.md to show what has changed in this release.

    Example:

    nano 2.3.0/CHANGELOG.md
  3. Add the new directory to the Git staging area.

    Example:

    git add 2.3.0
  4. Run the update-dockerfiles pre-commit hook (this should fail but it should change the relevant Dockerfile).

    Example:

    pre-commit run update-dockerfiles

    The pre-commit hook should change some lines in the new Dockerfile.

    diff --git a/2.3.0/bullseye/Dockerfile b/2.3.0/bullseye/Dockerfile
    index xxxxxxx..yyyyyyy 100644
    --- a/2.3.0/bullseye/Dockerfile
    +++ b/2.3.0/bullseye/Dockerfile
    @@ -110,10 +110,10 @@ RUN apt-get update \
         && apt-get clean \
         && rm -rf /var/lib/apt/lists/*
    
    -ARG VERSION="2.2.1-1"
    +ARG VERSION="2.3.0-1.*"
     ARG SUBMODULES="async,azure,amazon,elasticsearch,google,password,cncf.kubernetes,mysql,postgres,redis,slack,ssh,statsd,virtualenv"
     ARG AIRFLOW_MODULE="astronomer_certified[${SUBMODULES}]==$VERSION"
    -ARG AIRFLOW_VERSION="2.2.1"
    +ARG AIRFLOW_VERSION="2.3.0"
    
     # Make pip look at our pip repo too, and force it to install these specific
     # versions when ever it installs a module.
    @@ -145,8 +145,8 @@ RUN apt-get update \
         && apt-get clean \
         && rm -rf /var/lib/apt/lists/*
    
    -ARG VERSION="2.2.1-1"
    -ARG AIRFLOW_VERSION="2.2.1"
    +ARG VERSION="2.3.0-1.*"
    +ARG AIRFLOW_VERSION="2.3.0"
     LABEL io.astronomer.docker.airflow.version="${AIRFLOW_VERSION}"
     LABEL io.astronomer.docker.ac.version="${VERSION}"
    
  5. Stage the changes to the Dockerfile and commit (this should succeed).

    Example:

    git add 2.3.0/bullseye/Dockerfile && git commit

Add a new base build image (eg: new Debian stable release)

Click to expand Step-By-Step instructions
  1. Add or adjust the Debian release name in IMAGE_MAP.

    Example: Previous Astronomer Certified versions only built with Debian Buster, but Debian Bullseye has just been released as the new Debian stable version and we'd like to add support for that.

    diff --git a/.circleci/common.py b/.circleci/common.py
    index xxxxxxx..yyyyyyy 100644
    --- a/.circleci/common.py
    +++ b/.circleci/common.py
    @@ -36,7 +36,7 @@ IMAGE_MAP = collections.OrderedDict([
         ("2.1.4-2", ["buster"]),
         ("2.2.0-3-dev", ["bullseye", "buster"]),
         ("2.2.1-1", ["bullseye", "buster"]),
    -    ("2.3.0-1-dev", ["buster"]),
    +    ("2.3.0-1-dev", ["bullseye", "buster"]),
     ])
    
     # Airflow Versions for which we don't publish Python Wheels
  2. Add a new version directory for it.

    Example: There is currently a 2.3.0/buster directory that we need to copy to 2.3.0/bullseye and then modify that Dockerfile to use Debian Bullseye.

    cp -a 2.3.0/buster 2.3.0/bullseye
  3. Adjust the relevant Dockerfile.

    Example: Update the 2.3.0/bullseye/Dockerfile to use the upstream Debian Bullseye image.

    diff --git a/2.3.0/bullseye/Dockerfile b/2.3.0/bullseye/Dockerfile
    index xxxxxxx..yyyyyyy 100644
    --- a/2.3.0/bullseye/Dockerfile
    +++ b/2.3.0/bullseye/Dockerfile
    @@ -14,7 +14,7 @@
     # limitations under the License.
     ARG APT_DEPS_IMAGE="airflow-apt-deps"
     ARG PYTHON_MAJOR_MINOR_VERSION="3.9"
    -ARG PYTHON_BASE_IMAGE="python:${PYTHON_MAJOR_MINOR_VERSION}-slim-buster"
    +ARG PYTHON_BASE_IMAGE="python:${PYTHON_MAJOR_MINOR_VERSION}-slim-bullseye"
    
     FROM ${PYTHON_BASE_IMAGE} as airflow-apt-deps
    
  4. Stage the changes to the Dockerfile and commit (the pre-commit hooks should all succeed).

    Example:

    git add .circleci/common.py 2.3.0/bullseye && git commit

Changelog

All changes applied to available point releases will be documented in the CHANGELOG.md files within each version folder:

Testing

Local testing

This testing will run automatically in CI, but it will save some time to try it out locally first.

Airflow is launched into a local Kubernetes cluster using the project "kind" and the most recent version of the Astronomer airflow chart. Python's 'testinfra' module is used to perform system testing on the components while they are running in "kind".

Ensure prerequisites are met:

  • docker
  • python3
  • virtualenv

Ensure docker installed, and user has permissions

docker run -it --rm hello-world

Ensure Python3 is installed and in PATH

python3 -c "print('Confirmed python3 installed.')"

Ensure virtualenv is installed

which virtualenv

Set up virtual environment

virtualenv --python=python3 venv
source venv/bin/activate
pip install -r .circleci/test-requirements.txt

Run system testing

Build the image you want to test

docker build -t airflow ./1.10.5/buster

Run system testing

.circleci/bin/test-airflow airflow

The first time you do the build, and the first time you do the system test it will take longer than subsequent runs. The system testing will install the tested versions of CI tools in /tmp/bin (helm, kubectl, kind). It will leave an airflow cluster running on your kind cluster in 'test-cluster'. When you run it again, it will delete the namespace of your most recent deployment and redeploy into a new namespace. If you make changes in the image, don't forget to re-build the image before testing it.

Use the newly installed tools

export PATH=/tmp/bin:$PATH

Ensure kubectl configured to use kind

kubectl cluster-info --context kind-test-cluster

Look at the pods

kubectl get pods --all-namespaces

Clean up

kind delete cluster --name test-cluster

Scheduled Tasks

The regularly scheduled tasks are:

Edge Builds

  • Rebase the astro-main branch of astronomer/airflow onto main and push it to astronomer/airflow:astro-main (this then kicks off a GitHub Actions workflow that builds Airflow and Astronomer Certified Python packages/wheels) and pushes them to our PyPI package repository
  • Build nightly Docker images for QA (using those nightly Airflow and Astronomer Certified wheels) and push them to the dev image repository

CircleCI Schedules

The CircleCI documentation on scheduled pipelines is very new and in slight disarray.

See also:

The API for manipulating schedules is documented here, including examples.

Here is an example listing all schedules with HTTPie (and colorizing the response with jq):

$ http https://circleci.com/api/v2/project/gh/astronomer/ap-airflow/schedule \
       circle-token:<CIRCLECI_PERSONAL_ACCESS_TOKEN> \
       | jq

To create a new schedule (refer to the HTTPie docs about raw JSON):

$ http --verbose \
       https://circleci.com/api/v2/project/gh/astronomer/ap-airflow/schedule \
       circle-token:<CIRCLECI_PERSONAL_ACCESS_TOKEN> \
       name="every-morning-0200-UTC" \
       description="Every morning at 02:00 UTC" \
       attribution-actor="system" \
       parameters:='{ "branch": "master" }' \
       timetable:='{ "per-hour": 1, "hours-of-day": [2], "days-of-week": ["SUN", "MON", "TUE", "WED", "THU", "FRI", "SAT"]}'

Note that updating and deleting schedules uses a different URL path:

$ http PATCH \
       https://circleci.com/api/v2/schedule/<schedule_uuid> \
       circle-token:<CRCLECI_PERSONAL_ACCESS_TOKEN> \
       name=every-sunday-0100-utc \
       description="Every Sunday at 01:00 UTC"

You can create a CircleCI personal API token in your CircleCI user settings. Do note that a PAT will authenticate as you, and have full, read and write access on CircleCI, so keep your PAT secret and do not publish it anywhere!

License

Apache 2.0 with Commons Clause

More Repositories

1

dag-factory

Dynamically generate Apache Airflow DAGs from YAML configuration files
Python
1,154
star
2

airflow-guides

Guides and docs to help you get up and running with Apache Airflow.
JavaScript
797
star
3

astronomer-cosmos

Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code
Python
589
star
4

astronomer

Helm Charts for the Astronomer Platform, Apache Airflow as a Service on Kubernetes
Python
444
star
5

astro-cli

CLI that makes it easy to create, test and deploy Airflow DAGs to Astronomer
Go
348
star
6

astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Python
346
star
7

airflow-chart

A Helm chart to install Apache Airflow on Kubernetes
Python
252
star
8

ask-astro

An end-to-end LLM reference implementation providing a Q&A interface for Airflow and Astronomer
Python
192
star
9

airflow-dbt-demo

A repository of sample code to accompany our blog post on Airflow and dbt.
Python
165
star
10

airflow-provider-great-expectations

Great Expectations Airflow operator
Python
158
star
11

astronomer-providers

Airflow Providers containing Deferrable Operators & Sensors from Astronomer
Python
134
star
12

airflow-data-quality-demo

A repository of sample code to show data quality checking best practices using Airflow.
Python
71
star
13

airflow-provider-sample

A template repo for building and releasing Airflow provider packages.
Python
69
star
14

airflow-example-dags

Sample Airflow DAGs
Python
60
star
15

webinar-dag-writing-best-practices

Python
48
star
16

airflow-quickstart

Get started with Apache Airflow. Check the README for instructions on how to run your first DAGs today. 🚀
Python
46
star
17

airflow-provider-kafka

A provider package for kafka
Python
37
star
18

docs

This repository contains all content and code for Astro and Astronomer Software documentation.
Python
36
star
19

telescope

Python
30
star
20

ray-airflow-demo

Jupyter Notebook
29
star
21

dynamic-dags-tutorial

Python
27
star
22

cosmos-demo

Demo DAGs that show how to run dbt Core in Airflow using Cosmos
Python
25
star
23

airflow-provider-mlflow

An MLflow Provider Package for Apache Airflow
Python
25
star
24

airflow-ui

TypeScript
24
star
25

starship

Python
22
star
26

astro-provider-databricks

Orchestrate your Databricks notebooks in Airflow and execute them as Databricks Workflows
Python
21
star
27

airflow-testing-guide

Python
20
star
28

deploy-action

Custom Github Actions
Python
20
star
29

webinar-demos

20
star
30

airflow-testing-skeleton

A skeleton project for testing Airflow code
Python
18
star
31

airflow-provider-fivetran-async

A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran
Python
18
star
32

airflow-covid-data

Sample Airflow DAGs to load data from the CovidTracking API to Snowflake via an AWS S3 intermediary.
Python
16
star
33

airflow-provider-duckdb

A provider package for DuckDB
Python
14
star
34

astronomer-fab-securitymanager

Security Manager for the Astronomer Airflow distribution
Python
12
star
35

apache-airflow-providers-transfers

Python
11
star
36

airflow-dbt-elt

This repo contains DAGs demonstrating a variety of ELT patterns using Airflow along with dbt.
Python
11
star
37

ap-vendor

Astronomer Vendor Images
Dockerfile
11
star
38

intro-to-airflow-webinar

Python
10
star
39

airflow-guide-passing-data-between-tasks

Python
10
star
40

terraform-google-astronomer-gcp

Intended for internal use: deploys all infrastructure required for Astronomer to run on GCP
HCL
10
star
41

cs-tutorial-msteams-callbacks

Example DAGs demonstrating how to implement alerting and notifications via Microsoft Teams
Python
9
star
42

astro-provider-venv

Easily create and use Python Virtualenvs in Apache Airflow
Go
9
star
43

astronomer-airflow-scripts

Waits for Apache Airflow database migrations to complete.
Python
9
star
44

terraform-aws-astronomer-aws

Deploys all infrastructure required for Astronomer to run on AWS. For a complete deployment, see https://github.com/astronomer/terraform-aws-astronomer-enterprise
HCL
9
star
45

terraform-provider-astro

Astro Terraform Provider
Go
9
star
46

airflow-scheduling-tutorial

Python
8
star
47

cosmos-example

Python
8
star
48

2-4-example-dags

Python
7
star
49

mlflow-example

Python
7
star
50

2-6-example-dags

Python
7
star
51

airflow-provider-pulumi

Python
6
star
52

academy-genai

Python
6
star
53

kedro-ge-airflow

Python
6
star
54

registry-dag-template

A template repository for contributing DAGs to the Astronomer Registry.
Python
6
star
55

dynamic-task-mapping-tutorial

Python
6
star
56

apache-airflow-providers-alembic

Python
6
star
57

webinar-secrets-management

Python
5
star
58

custom-xcom-backend-tutorial

Jupyter Notebook
5
star
59

airflow-sql-tutorial

Python
5
star
60

terraform-kubernetes-astronomer

Deploy Astronomer on Kubernetes
HCL
5
star
61

cs-tutorial-slack-callbacks

Example DAGs demonstrating how to implement alerting and notifications via Slack
Python
5
star
62

airflow-dags

Example DAGs for Airflow 2.9
Python
5
star
63

airflow-analytics-plugin

Python
5
star
64

airflow-databricks-tutorial

Python
4
star
65

azure-operator-tutorials

Python
4
star
66

education-sandbox

Codespace with Airflow and the Astro CLI
Python
4
star
67

cross-dag-dependencies-tutorial

Python
4
star
68

apache-airflow-providers-isolation

Python
4
star
69

terraform

Getting phased out - please use astronomer/terraform-* modules to track issues
HCL
4
star
70

cdc-cloudsql-airflow-demo

A repository of sample code to accompany our blog post on Change Data Capture and CloudSQL
Python
4
star
71

airflow-ldap-example

Example project for configuring opern source Airflow version with LDAP. Includes prepopulated OpenLDAP server
Python
4
star
72

airflow-wandb-demo

Python
3
star
73

spectra

Reusable UI components for Astronomer projects.
JavaScript
3
star
74

airflow-llm-demo

Python
3
star
75

2-7-example-dags

Python
3
star
76

airflow-snowpark-containers-demo

Python
3
star
77

astro-example-dags

Python
3
star
78

airflow-talend-tutorial

Tutorial for how to use Astronomer+Airflow with Talend. Contains reference DAGs and other supporting materials.
Python
3
star
79

azure_demo

Python
3
star
80

astro-gcp-onboarding

The script needed to set up a customer's Google Cloud Project for an Astro activation
Shell
3
star
81

greenplum-airflow-demo

Python
3
star
82

cs-astro-onboarding

Python
3
star
83

homebrew-tap

Homebrew Formulae to @astronomer binaries, powered by @astronomer
Ruby
3
star
84

debugging-dags-webinar

A repository containing the DAGs shown in the Debugging DAGs webinar on 2023-01-31.
Python
3
star
85

migrate-to-astro

Customer facing utilities to help customers migrating from Software/Nebula to Astro
Python
2
star
86

airflow-connection-docs

Guides and structured metadata about Airflow connections
Python
2
star
87

pass-data-between-tasks-webinar

The repository for example DAGs shown in the 2023-04-11 Astronomer webinar on passing data between tasks.
Python
2
star
88

pyconuk2022

Materials related to the PyCon UK Apache Airflow & Astro SDK workshop
Python
2
star
89

ds-ml-example-dags

Python
2
star
90

react-graphql-code-challenge

Code challenge for Front-End developers applying at Astronomer
JavaScript
2
star
91

airflow-sagemaker-tutorial

Python
2
star
92

databricks-ml-example

Jupyter Notebook
2
star
93

airflow_101_webinar

Repository for the Airflow 101 webinar on June 6th 2023.
Python
2
star
94

pagerduty_airflow_integration_benefits

Repo hosting PagerDuty + Airflow Integration Benefits Doc
2
star
95

sagemaker-batch-inference

Jupyter Notebook
2
star
96

airflow-adf-integration

An example DAG for orchestrating Azure Data Factory pipelines with Apache Airflow.
Python
2
star
97

astro-dbt-provider-tutorial-example

Example code for the dbt core Learn tutorial. The Astro dbt provider, also known as Cosmos, is a tool automatically integrate dbt models into your Airflow DAGs.
Python
2
star
98

cs-tutorial-reporting

How to Load reporting database for Airflow DAGS, DAG Runs, and Task Instances
Python
2
star
99

cosmos-dev

Python
2
star
100

llm-dags-dashboard

Repository for displaying LLM DAG runs status
HTML
2
star