• Stars
    star
    130
  • Rank 277,575 (Top 6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 5 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

afctl helps to manage and deploy Apache Airflow projects faster and smoother.

afctl

The proposed CLI tool is authored to make creating and deployment of Apache Airflow (https://airflow.apache.org/) projects faster and smoother. As of now, there is no tool out there that can empower the user to create a boilerplate code structure for airflow projects and make development + deployment of projects seamless.

Requirements

  • Python 3.5+
  • Docker

Getting Started

1. Installation

Create a new python virtualenv. You can use the following command.

python3 -m venv <name>

Activate your virtualenv

source /path_to_venv/bin/activate
pip3 install afctl

2. Initialize a new afctl project.

The project is created in your present working directory. Along with this a configuration file with the same name is generated in /home/.afctl_configs directory.

afctl init <name of the project>

Eg.

afctl init project_demo
  • The following directory structure will be generated
.
β”œβ”€β”€ deployments
β”‚Β Β  └── project_demo-docker-compose.yml
β”œβ”€β”€ migrations
β”œβ”€β”€ plugins
β”œβ”€β”€ project_demo
β”‚Β Β  β”œβ”€β”€ commons
β”‚Β Β  └── dags
β”œβ”€β”€ requirements.txt
└── tests

If you already have a git repository and want to turn it into an afctl project. Run the following command :-

afctl init .

3. Add a new module in the project.

afctl generate module -n <name of the module>

The following directory structure will be generated :

afctl generate module -n first_module
afctl generate module -n second_module

.
β”œβ”€β”€ deployments
β”‚Β Β  └── project_demo-docker-compose.yml
β”œβ”€β”€ migrations
β”œβ”€β”€ plugins
β”œβ”€β”€ project_demo
β”‚Β Β  β”œβ”€β”€ commons
β”‚Β Β  └── dags
β”‚Β Β      β”œβ”€β”€ first_module
β”‚Β Β      └── second_module
β”œβ”€β”€ requirements.txt
└── tests
    β”œβ”€β”€ first_module
    └── second_module

4. Generate dag

afctl generate dag -n <name of dag> -m <name of module>

The following directory structure will be generate :

afctl generate dag -n new -m first_module

.
β”œβ”€β”€ deployments
β”‚Β Β  └── project_demo-docker-compose.yml
β”œβ”€β”€ migrations
β”œβ”€β”€ plugins
β”œβ”€β”€ project_demo
β”‚Β Β  β”œβ”€β”€ commons
β”‚Β Β  └── dags
β”‚Β Β      β”œβ”€β”€ first_module
β”‚Β Β      β”‚Β Β  └── new_dag.py
β”‚Β Β      └── second_module
β”œβ”€β”€ requirements.txt
└── tests
    β”œβ”€β”€ first_module
    └── second_module

The dag file will look like this :

from airflow import DAG
from datetime import datetime, timedelta

default_args = {
'owner': 'project_demo',
# 'depends_on_past': ,
# 'start_date': ,
# 'email': ,
# 'email_on_failure': ,
# 'email_on_retry': ,
# 'retries': 0

}

dag = DAG(dag_id='new', default_args=default_args, schedule_interval='@once')

5. Deploy project locally

You can add python packages that will be required by your dags in requirements.txt. They will automatically get installed.

  • To deploy your project, run the following command (make sure docker is running) :
afctl deploy local

If you do not want to see the logs, you can run

afctl deploy local -d

This will run it in detached mode and won't print the logs on the console.

  • You can access your airflow webserver on browser at localhost:8080

6. Deploy project on production

  • Here we will be deploying our project to Qubole. Sign up at us.qubole.com.
  • add git-origin and access-token (if want to keep the project as private repo on Github) to the configs. See how
  • Push the project once completed to Github.
  • Deploying to Qubole will require adding deployment configurations.
afctl config add -d qubole -n <name of deployment> -e <env> -c <cluster-label> -t <auth-token>

This command will modify your config file. You can see your config file with the following command :

afctl config show

For example -

afctl config add -d qubole -n demo -e https://api.qubole.com -c airflow_1102 -t khd34djs3
  • To deploy run the following command
afctl deploy qubole -n <name>

The following video also contains all the steps of deploying project using afctl -

https://www.youtube.com/watch?v=A4rcZDGtJME&feature=youtu.be

Manage configurations

The configuration file is used for deployment contains the following information.

global:
-airflow_version:
-git:
--origin:
--access-token:
deployment:
-qubole:
--local:
---compose:

  • airflow_version can be added to the project when you initialize the project.
afctl init <name> -v <version>
  • global configs (airflow_version, origin, access-token) can all be added/ updated with the following command :
afctl config global -o <git-origin> -t <access-token> -v <airflow_version>

Usage

Commands right now supported are

  • init
  • config
  • deploy
  • list
  • generate

To learn more, run

afctl <command> -h

Caution

Not yet ported for Windows.

Credits

Docker-compose file : https://github.com/puckel/docker-airflow

More Repositories

1

sparklens

Qubole Sparklens tool for performance tuning Apache Spark
Scala
562
star
2

rubix

Cache File System optimized for columnar formats and object stores
Java
182
star
3

spark-on-lambda

Apache Spark on AWS Lambda
Scala
151
star
4

kinesis-sql

Kinesis Connector for Structured Streaming
Scala
137
star
5

presto-udfs

Plugin for Presto to allow addition of user functions easily
Java
115
star
6

quark

Quark is a data virtualization engine over analytic databases.
Java
98
star
7

streamx

kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Java
97
star
8

spark-acid

ACID Data Source for Apache Spark based on Hive ACID
Scala
96
star
9

qds-sdk-py

Python SDK for accessing Qubole Data Service
Python
51
star
10

uchit

Python
29
star
11

streaminglens

Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
Scala
17
star
12

s3-sqs-connector

A library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming).
Scala
17
star
13

spark-state-store

Rocksdb state storage implementation for Structured Streaming.
Scala
16
star
14

presto-kinesis

Presto connector to Amazon Kinesis service.
Java
14
star
15

kinesis-storage-handler

Hive Storage Handler for Kinesis.
Java
11
star
16

qds-sdk-java

A Java library that provides the tools you need to authenticate with, and use the Qubole Data Service API.
Java
7
star
17

demotrends

Code required to setup the demo trends website (http://demotrends.qubole.com)
Ruby
6
star
18

qubole-terraform

HCL
6
star
19

space-ui

UI Ember components based on Space design specs
JavaScript
5
star
20

caching-metastore-client

A metastore client that caches objects
Java
5
star
21

rubix-admin

Admin scripts for Rubix
Python
5
star
22

tco

Python
4
star
23

qds-sdk-R

R extension to execute Hive Commands through Qubole Data Service Python SDK.
Python
4
star
24

docker-images

Qubole Docker Images
Dockerfile
4
star
25

tableau-qubole-connector

JavaScript
3
star
26

metriks-addons

Utilities for collecting metrics in a Rails Application
Ruby
3
star
27

qds-sdk-ruby

Ruby SDK for Qubole API
Ruby
3
star
28

qubole-log-datasets

3
star
29

hubot-qubole

Interaction with Qubole Data Services APIs via Hubot framework
CoffeeScript
3
star
30

customer-success

HCL
2
star
31

bootstrap-functions

Useful functions for Qubole cluster bootstraps
Shell
2
star
32

qubole-jar-test

A maven project to test that qubole jars can be listed as dependencies
Java
2
star
33

etl-examples

Scala
2
star
34

perf-kit-queries

2
star
35

tuning-paper

TeX
2
star
36

blogs

1
star
37

jupyter

1
star
38

presto-event-listeners

1
star
39

qubole-rstudio-example

1
star
40

presto

Presto
Java
1
star
41

qubole.github.io

Qubole OSS Page
1
star
42

quboletsdb

Setup opentsdb using Qubole
Python
1
star