• Stars
    star
    149
  • Rank 240,035 (Top 5 %)
  • Language
    Go
  • License
    MIT License
  • Created over 6 years ago
  • Updated almost 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A tool that help automate deployment to an Apache Flink cluster

Build Status codecov.io Gitter chat

Flink-deployer

A Go command-line utility to facilitate deployments to Apache Flink.

Currently, it supports several features:

  1. Listing jobs
  2. Deploying a new job
  3. Updating an existing job
  4. Terminating an existing job
  5. Querying Flink queryable state

For a full overview of the commands and flags, run flink-job-deployer help

How to run locally

To be able to test the deployer locally, follow these steps:

  1. Build the CLI tool docker image: docker-compose build deployer
  2. optional: cd flink-sample-job; sbt clean assembly; cd .. (Builds a jar with small stateful test job)
  3. docker-compose up -d jobmanager taskmanager (start a Flink job- and taskmanager)
  4. docker-compose run deployer help (run the Flink deployer with argument help)

Repeat step 3 with any commands you'd like to try.

Run a sample job

Provided you ran step 1 of the above guide, a jar with a sample Flink job is available in the deployer. It will be mounted in the deployer container at the following path:

/tmp/flink-sample-job/flink-stateful-wordcount-assembly-0.jar

To deploy it you can simply run (it's the default command specified in the docker-compose.yml):

docker-compose run deployer

This will print a simple word count to the output console, you can view it by checking the logs of the taskmanager as follows:

docker-compose logs -f taskmanager

If all went well you should see the word counter continue with where it was.

A list of some example commands to run can be found here.

Authentication

Apache Flink doesn't support any Web UI authentication out of the box. One of the custom approaches is using NGINX in front of Flink to protect the user interface. With NGINX, there are again a lot of different ways to add that authentication layer. To support the most basic one, we've added support for using Basic Authentication.

You can inject the FLINK_BASIC_AUTH_USERNAME and FLINK_BASIC_AUTH_PASSWORD environment variables to configure basic authentication.

Supported environment variables

  • FLINK_BASE_URL: Base Url to Flink's API (required, e.g. http://jobmanageraddress:8081/)
  • FLINK_BASIC_AUTH_USERNAME: Basic authentication username used for authenticating to Flink
  • FLINK_BASIC_AUTH_PASSWORD: Basic authentication password used for authenticating to Flink
  • FLINK_API_TIMEOUT_SECONDS: Number of seconds until requests to the Flink API time out (e.g. 10)

Development

Managing dependencies

This project uses dep to manage all project dependencies residing in the vendor folder.

Run dep status to review the status of the included and most recent available depencencies.

Build

Build from source for your current machine:

go build ./cmd/cli

Build from source for a specific machine architecture:

env GOOS=linux GOARCH=amd64 go build ./cmd/cli

Build the Docker container locally to test CLI tool:

docker-compose build deployer

Test

go test ./cmd/cli ./cmd/cli/flink ./cmd/cli/operations

Or with coverage:

sh test-with-coverage.sh

Docker

A docker image for this repo is available from the docker hub: nielsdenissen/flink-deployer

The image expects the following env vars:

FLINK_BASE_URL=http://localhost:8080

Kubernetes

When running in Kubernetes (or Openshift), you'll have to deploy the container to the cluster. A reason for this is Flink will try to reroute you to the internal Kubernetes address of the cluster, which doesn't resolve from outside. Besides that it'll give you the necessary access to the stored savepoints when you're using persistent volumes to store those.

This section is aimed at providing you with a quick getting started guide to deploy our container to Kubernetes. There are a few steps we'll need to take which we describe below:

0. Run a kubernetes cluster

If you don't have a kubernetes cluster readily available, you can quickly get started by setting up a minikube cluster.

minikube start

1. Setup a Flink cluster in Kubernetes

Flink has a guide on how to run a cluster in Kubernetes, you can find it here.

If you're using Minikube, be sure to pull the images that flink uses in their deploy configurations locally first. Otherwise minikube will not be able to find them. So perform a docker pull flink:latest on your host.

2. Add the test jar (or your own job you want to run) to the deployer image

We now need to package the jar into the container so we can deploy it in Kubernetes. There are other ways around this like storing the jar on a Persistent Volume or downloading it at runtime inside the container. This is the easiest getting started though and still the technique we use.

To build the container with the jar packaged you can use the Dockerfile-including-sample-job. Be sure to have create the jar for the test job in case you want to use it. See step 2 in the How to run locally section. Run this from the root of this repository:

docker build -t flinkdeployerstatefulwordcount:test -f Dockerfile-including-sample-job .

3. Run the deployer in Kubernetes

In this example we're going to show how you can do a simple deploy of the sample-job in this project to the cluster. For this we need a yaml that specifies what to do to Kubernetes. Here's an example of how such a kubernetes yaml could look like:

apiVersion: v1
kind: Pod
metadata:
    generateName: "flink-stateful-wordcount-deployer-"
spec:
    dnsPolicy: ClusterFirst
    restartPolicy: OnFailure
    containers:
    -   name: "flink-stateful-wordcount-deployer"
        image: "flinkdeployerstatefulwordcount:test"
        args:
        - "deploy"
        - "--file-name"
        - "/tmp/flink-stateful-wordcount-assembly-0.jar"
        - "--entry-class"
        - "WordCountStateful"
        - "--parallelism"
        - "2"
        - "--program-args"
        - "--intervalMs 1000"
        imagePullPolicy: Never
        env:
        -   name: FLINK_BASE_URL
            value: "http://flink-jobmanager:8081"

Go to Kubernetes, click the Create + button and copy paste the above YAML. This should trigger a POD to be deployed that runs once and stops after deploying the sample job to the Flink cluster running in Kubernetes.

MINIKUBE USERS: In order to use local images with Minikube (so images on your local docker installation instead of dockerHub), you need to perform the following steps:

  • Point minikube to your local docker: eval $(minikube docker-env) (See this guide for more info)
  • Rebuild the image as done in step 2 of this guide.
  • The imagePullPolicy in the yaml above must be set to Never.

4. Attach Persistent Volumes to all Flink containers

This step we won't outline completely, as it's a bit more involved for a getting started guide. In order to recover jobs from savepoints, you'll need to have a Persistent Volume shared among all Flink nodes and the deployer. You'll need this in any case if you want to persistent and thus not lose any data in your Flink cluster running in Kubernetes. After creating a Persistent Volume and hooking it up to the existing Flink containers, you'll need to add something like the following to the YAML of the deployer (besides of course change the command to for instance update):

        volumeMounts:
        -   name: flink-data
            mountPath: "/data/flink"
    volumes:
    -   name: flink-data
        persistentVolumeClaim:
            claimName: "PVC_FLINK"

The directory you put in your Persistent Volume should be the directory to which Flink stores it's savepoints.

Copyright

All copyright of project flink-job-deployer are held by Marc Rooding and Niels Denissen, 2017-2018.

More Repositories

1

lion

Fundamental white label web component features for your design system.
JavaScript
1,680
star
2

popmon

Monitor the stability of a Pandas or Spark dataframe ⚙︎
Python
478
star
3

sparse_dot_topn

Python package to accelerate the sparse matrix multiplication and top-n similarity selection
C++
379
star
4

baker

Orchestrate microservice-based process flows
Scala
317
star
5

threshold-signatures

Threshold Signature Scheme for ECDSA
Rust
198
star
6

zkrp

Reusable library for creating and verifying zero-knowledge range proofs and set membership proofs.
Go
170
star
7

probatus

Validation (like Recursive Feature Elimination for SHAP) of (multiclass) classifiers & regressors and data used to develop them.
Python
121
star
8

scruid

Scala + Druid: Scruid. A library that allows you to compose queries in Scala, and parse the result back into typesafe classes.
Scala
115
star
9

skorecard

scikit-learn compatible tools for building credit risk acceptance models
Python
78
star
10

rokku

Rokku project. This project acts as a proxy on top of any S3 storage solution providing services like authentication, authorization, short-term tokens, and lineage.
Scala
65
star
11

cassandra-jdbc-wrapper

A JDBC wrapper of Java Driver for Apache Cassandra®, which offers a simple JDBC compliant API to work with CQL3.
Java
57
star
12

ing-open-banking-cli

Shell
42
star
13

bdd-mobile-security-automation-framework

Mobile Security testing Framework
Ruby
40
star
14

ing-open-banking-sdk

Mustache
34
star
15

doing-cli

CLI tool to simplify the development workflow on azure devops
Python
32
star
16

spark-matcher

Record matching and entity resolution at scale in Spark
Python
28
star
17

industry2vec

Jupyter Notebook
27
star
18

gohateoas

Plug-and-play HATEOAS for REST API's written in Go
Go
24
star
19

rokku-dev-apache-atlas

Apache Atlas development image for the Rokku project: https://github.com/ing-bank/rokku
Shell
20
star
20

vscode-psl

For distributing plugins to the community and foster further developments with the community going forward.
TypeScript
19
star
21

apache-ranger-s3-plugin

Apache Ranger Plugin for S3
Java
18
star
22

EntityMatchingModel

Entity Matching Model solves the problem of matching company names between two possibly very large datasets.
Python
16
star
23

skafos

Kubenetes operator framework in Python
Python
13
star
24

zkkrypto

Collection of ZKP-related cryptographic primitives
Kotlin
12
star
25

zkflow

The ZKFlow consensus protocol enables private transactions on Corda for arbitrary smart contracts using Zero Knowledge Proofs
Kotlin
11
star
26

quota-scaler

Kubernetes Autoscaling operator
Go
10
star
27

rokku-sts

STS service for the Rokku project: https://github.com/ing-bank/rokku
Scala
9
star
28

tsforecast

A pipeline to execute time series forecasts and visualize them in a dashboard. The employable forecast models fall into three categories: simple heuristics (mean of last 12 months, last 3 months etc), classical time series econometrics (ARIMA, Holt-Winters, Kalman filters etc.) and machine learning (Neural networks, Facebook’s Prophet etc.)
R
7
star
29

prometheus-scenarios

This repo contains a collection of learning scenarios. Each scenario is meant to teach a topic through explanation and practical exercices.
Go
6
star
30

orchestration-pkg

orchestration-pkg
Go
5
star
31

rokku-dev-apache-ranger

Apache Ranger development image for the Rokku project: https://github.com/ing-bank/rokku
Shell
5
star
32

ing-ideal-connectors-java

Opensource tools and API to connect webshops and merchants to ING using iDeal
Java
4
star
33

ing-ideal-connectors-php

Opensource tools and API to connect webshops and merchants to ING using iDeal
PHP
4
star
34

ing-ideal-connectors-net

Opensource tools and API to connect webshops and merchants to ING using iDeal
C#
4
star
35

mint

An automated exploratory testing tool for Android
Kotlin
4
star
36

gormtestutil

Utilities for writing unit-tests with Gorm
Go
4
star
37

tsclean

Takes a time series, possibly with multiple groupings, and starts up a dashboard to visualize potential anomalies. Anomalies are detected via several algorithms. If anomalies are erroneous, the user can correct them from within the dashboard
R
4
star
38

psl-linter

TypeScript
3
star
39

ginerr

Error registry for Gin, translate rough error messages to user-friendly objects and status codes
Go
3
star
40

psl-parser

TypeScript
2
star
41

tstools

A set of helper functions, geared mostly towards data with a time dimension
R
2
star
42

rokku-dev-keycloak

Keycloak development image for the Rokku project: https://github.com/ing-bank/rokku
Dockerfile
1
star
43

gintestutil

Utilities for writing unit-tests with Gin
Go
1
star
44

rokku-dev-mariadb

MariaDB development image for the Rokku project: https://github.com/ing-bank/rokku
Dockerfile
1
star