• Stars
    star
    1,670
  • Rank 27,991 (Top 0.6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 8 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A series of DAGs/Workflows to help maintain the operation of Airflow

airflow-maintenance-dags

A series of DAGs/Workflows to help maintain the operation of Airflow

DAGs/Workflows

  • backup-configs
    • A maintenance workflow that you can deploy into Airflow to periodically take backups of various Airflow configurations and files.
  • clear-missing-dags
    • A maintenance workflow that you can deploy into Airflow to periodically clean out entries in the DAG table of which there is no longer a corresponding Python File for it. This ensures that the DAG table doesn't have needless items in it and that the Airflow Web Server displays only those available DAGs.
  • db-cleanup
    • A maintenance workflow that you can deploy into Airflow to periodically clean out the DagRun, TaskInstance, Log, XCom, Job DB and SlaMiss entries to avoid having too much data in your Airflow MetaStore.
  • kill-halted-tasks
    • A maintenance workflow that you can deploy into Airflow to periodically kill off tasks that are running in the background that don't correspond to a running task in the DB.
    • This is useful because when you kill off a DAG Run or Task through the Airflow Web Server, the task still runs in the background on one of the executors until the task is complete.
  • log-cleanup
    • A maintenance workflow that you can deploy into Airflow to periodically clean out the task logs to avoid those getting too big.
  • delete-broken-dags
    • A maintenance workflow that you can deploy into Airflow to periodically delete DAG files and clean out entries in the ImportError table for DAGs which Airflow cannot parse or import properly. This ensures that the ImportError table is cleaned every day.
  • sla-miss-report
    • DAG providing an extensive analysis report of SLA misses broken down on a daily, hourly, and task level

More Repositories

1

airflow-rest-api-plugin

A plugin for Apache Airflow that exposes rest end points for the Command Line Interfaces
Python
325
star
2

airflow-scheduler-failover-controller

A process that runs in unison with Apache Airflow to control the Scheduler process to ensure High Availability
Python
232
star
3

hadoop-deployment-bash

Code for the deployment of Hadoop clusters, written in Bourne or Bourne Again shell.
Shell
34
star
4

apache-airflow-cloudera-csd

CSD for Apache Airflow
Shell
20
star
5

airflow_demo

Airflow script for incremental data import from Mysql to Hive using Sqoop.
Java
18
star
6

apache-airflow-cloudera-parcel

Parcel for Apache Airflow
Dockerfile
17
star
7

jenkins-workspace-cleanup-groovy-script

Jenkins Workspace Cleanup script to automate folders clean up for all the jobs
Groovy
16
star
8

airflow-user-management-plugin

A plugin for Apache Airflow that allows you to manage the users that can login
Python
14
star
9

hadoop-smoke-tests

Basic smoke tests to determine component functionality of a Hadoop cluster.
8
star
10

terraform-hadoop-talk

Set up the AWS infrastructure for a small Hadoop cluster as well as install the Cloudera Manager server and agents.
HCL
6
star
11

airflow-plugins

A series of Plugins for Apache Airflow (https://airflow.incubator.apache.org/)
Python
5
star
12

intro-to-spark

Java
3
star
13

NameDatabases

List of public, open source Name Databases
3
star
14

cdp-azure

Bits and pieces to make it easy to set up CDP on Azure
HCL
2
star
15

MongoDB_OPSLOG

Python
2
star
16

SparkCluster_Ansible

Shell
2
star
17

database-comparison-tool

Java
2
star
18

spark-streaming-workshop

Java
2
star
19

nagios-plugins

Plugins built for Nagios
Python
2
star
20

saleor-storefront-poc

Customizing Saleor storefront to add more features and evaluate.
TypeScript
2
star
21

clairthon-ambivalent-aardvarks

TypeScript
1
star
22

rabbitmq-cloudera-parcel

RabbitMQ parcel to be deployed and managed through Cloudera Manager
Python
1
star
23

skills-base

Java
1
star
24

spark-workshop-2x

Java
1
star
25

automated-hadoop-smoke-test

Basic smoke tests to determine component functionality of a Hadoop cluster.
Shell
1
star
26

data-scalaxy-test-util

A scala library that provides additional utilities for testing spark applications.
Scala
1
star
27

spark-batch

Template repository for spark-batch
Java
1
star
28

GCP-serv

1
star
29

restonomer

Framework to ingest data from REST APIs, transform and persist the data.
Scala
1
star
30

snowflake-poc

Snowflake PoC
1
star
31

minimal-ai

Framework to automate ETL pipeline creation with a touch of AI.
Python
1
star
32

vagrant-sparkbuilder

Simple environment to help rebuild Cloudera's Apache Spark.
Puppet
1
star
33

auto-etl

Python
1
star
34

IntroToMachineLearning

Intro to machine learning - Code for article at http://blog.clairvoyantsoft.com/2015/03/intro-to-machine-learning/
Python
1
star