• Stars
    star
    449
  • Rank 97,328 (Top 2 %)
  • Language
    Go
  • License
    MIT License
  • Created over 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Kubernetes controller to spread preemption for preemtible VMs in GKE to avoid mass deletion after 24 hours

estafette-gke-preemptible-killer

This small Kubernetes application loop through a given preemptibles node pool and kill a node before the regular 24h life time of a preemptible VM.

License

Why?

When creating a cluster, all the node are created at the same time and should be deleted after 24h of activity. To prevent large disruption, the estafette-gke-preemptible-killer can be used to kill instances during a random period of time between 12 and 24h. It makes use of the node annotation to store the time to kill value.

How does that work

At a given interval, the application get the list of preemptible nodes and check weither the node should be deleted or not. If the annotation doesn't exist, a time to kill value is added to the node annotation with a random range between 12h and 24h based on the node creation time stamp. When the time to kill time is passed, the Kubernetes node is marked as unschedulable, drained and the instance deleted on GCloud.

Known limitations

  • Selecting node pool is not supported yet, the code is processing ALL preemptible nodes attached to the cluster, and there is no way to limit it even via taints nor annotations
  • This tool increases the chances to have many small disruptions instead of one major disruption.
  • This tool does not guarantee that major disruption is avoided - GCP can trigger large disruption because the way preemptible instances are managed. Ensure your have PDB and enough of replicas, so for better safety just use non-preemptible nodes in different zones. You may also be interested in estafette-gke-node-pool-shifter

Usage

You can either use environment variables or flags to configure the following settings:

Environment variable Flag Default Description
BLACKLIST_HOURS --blacklist-hours (-b) List of UTC time intervals in the form of 09:00 - 12:00, 13:00 - 18:00 in which deletion is NOT allowed
DRAIN_TIMEOUT --drain-timeout 300 Max time in second to wait before deleting a node
FILTERS --filters (-f) Label filters in the form of key1: value1[, value2[, ...]][; key2: value3[, value4[, ...]], ...]
INTERVAL --interval (-i) 600 Time in second to wait between each node check
KUBECONFIG --kubeconfig Provide the path to the kube config path, usually located in ~/.kube/config. This argument is only needed if you're running the killer outside of your k8s cluster
METRICS_LISTEN_ADDRESS --metrics-listen-address :9001 The address to listen on for Prometheus metrics requests
METRICS_PATH --metrics-path /metrics The path to listen for Prometheus metrics requests
WHITELIST_HOURS --whitelist-hours (-w) List of UTC time intervals in the form of 09:00 - 12:00, 13:00 - 18:00 in which deletion is allowed and preferred

Create a Google Service Account

In order to have the estafette-gke-preemptible-killer instance delete nodes, create a service account and give the compute.instances.delete permissions.

You can either create the service account and associate the role using the GCloud web console or the cli:

$ export project_id=<PROJECT>
$ gcloud iam --project=$project_id service-accounts create preemptible-killer \
    --display-name preemptible-killer
$ gcloud iam --project=$project_id roles create preemptible_killer \
    --project $project_id \
    --title preemptible-killer \
    --description "Delete compute instances" \
    --permissions compute.instances.delete
$ export service_account_email=$(gcloud iam --project=$project_id service-accounts list --filter preemptible-killer --format 'value([email])')
$ gcloud projects add-iam-policy-binding $project_id \
    --member=serviceAccount:${service_account_email} \
    --role=projects/${project_id}/roles/preemptible_killer
$ gcloud iam --project=$project_id service-accounts keys create \
    --iam-account $service_account_email \
    google_service_account.json

Installation

Prepare using Helm:

brew install kubernetes-helm
kubectl -n kube-system create serviceaccount tiller
kubectl create clusterrolebinding tiller --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
helm init --service-account tiller --wait

Then install or upgrade with Helm:

helm repo add estafette https://helm.estafette.io
helm upgrade --install estafette-gke-preemptible-killer --namespace estafette estafette/estafette-gke-preemptible-killer

Deploy with Kustomize

Create a kustomization.yaml file:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: default
commonLabels:
  app: preemptible-killer
bases:
- github.com/estafette/estafette-gke-preemptible-killer//manifests
images:
- name: estafette/estafette-gke-preemptible-killer
  newTag: 1.1.21
secretGenerator:
- name: preemptible-killer-secrets
  files:
  - google-service-account.json=google_service_account.json
  type: "Opaque"

Apply manifests:

kubectl apply -k .

Development

To start development run

git clone [email protected]:estafette/estafette-ci-api.git
cd estafette-ci-api

Before committing your changes run

go test ./...
go mod tidy

Testing

In order to test your local changes against an external Kubernetes cluster use the following commands:

# proxy master
kubectl proxy

# in another shell
go build && ./estafette-gke-preemptible-killer -i 10

Note: KUBECONFIG=~/.kube/config as environment variable can also be used if you don't want to use the kubectl proxy command.

For an all-in-one script that launches a kind cluster with 3 nodes, runs estafette-gke-preemptible-killer and then reports on the kill time, run:

go build && ./scripts/all-in-one-test -i 10

where -i 10 are the arguments to be passed to estafette-gke-preemptible-killer, replace with your own test arguments. For safety, it does not remove the kind cluster it leaves behind.

More Repositories

1

estafette-gke-node-pool-shifter

Kubernetes controller that can shift nodes from one node pool to another, to favour for example preemptibles over regular VMs
Go
130
star
2

estafette-vulnerability-scanner

An application that regularly scans all containers in a Kubernetes cluster for vulnerabilities
Go
51
star
3

estafette-cloudflare-dns

Kubernetes controller to set and update dns records in Cloudflare for annotated services and ingresses
Go
39
star
4

estafette-k8s-node-compactor

Kubernetes controller to remove underutilized nodes to improve resource utilization.
Go
25
star
5

estafette-ci-builder

Component of Estafette CI that executes build steps
Go
15
star
6

estafette-ci-api

The API of the CI system that handles all incoming webhooks, bot request, UI requests, etc
Go
14
star
7

estafette-ci

The overarching project for Estafette CI; used for tracking issues
Smarty
11
star
8

estafette-letsencrypt-certificate

Kubernetes controller to retrieve and renews tls certificates from Letsencrypt for annotated Kubernetes secrets
Go
10
star
9

estafette-k8s-hpa-scaler

Kubernetes controller to set minimum replicas from a Prometheus query on annotated HorizontalPodAutoscalers to avoid collapsing deployments in case of errors
Go
10
star
10

estafette-extension-gke

This extension provides a base container to run commands against Kubernetes Engine
Go
8
star
11

estafette-ci-web

The web interface of Estafette CI
Vue
7
star
12

estafette-google-cloud-dns

Kubernetes controller to update dns record in a Google Cloud DNS zone for annotated services and ingresses
Go
4
star
13

estafette.io

Resilient and cloud-native CI/CD
HTML
4
star
14

estafette-gcloud-mig-scaler

Controller to scale a Google Cloud managed instance groups based on request rate retrieved from Prometheus
Go
4
star
15

estafette-foundation

Handles common logic like graceful shutdown, reloads on configmap or secret updates, etc
Go
4
star
16

estafette-ci-manifest

A library with the logic to deserialize the Estafette manifest, so it can be used from both the api and the builder
Go
3
star
17

nginx-sidecar

A sidecar container to take care of TLS termination
Lua
3
star
18

estafette-extension-docker

This extension allows you to build, push and tag docker images
Go
3
star
19

estafette-gcloud-quota-exporter

Prometheus exporter to turn Google Cloud quota into Prometheus timeline series
Go
3
star
20

estafette-extension-github-status

This Estafette extension updates the build status in Github
Go
2
star
21

k8s-node-termination-handler

Helm chart for GoogleCloudPlatform/k8s-node-termination-handler
HTML
2
star
22

estafette-gcp-service-account

Kubernetes controller to fetch GCP service account keyfiles for annotated secrets
Go
2
star
23

estafette-extension-bitbucket-status

This Estafette extension updates the build status in Bitbucket
Go
2
star
24

helm-charts

Repository for the official Estafette helm charts
2
star
25

estafette-extension-git-clone

This Estafette extension clones the git repository to build
Go
2
star
26

estafette-ci-crypt

This library has encryption/decryption helpers for Estafette secrets stored in plain sight
Go
2
star
27

estafette-extension-github-release

This Estafette extension assists in creating a release with resolved issues from a milestone if it exists
Go
2
star
28

estafette-extension-git-trigger

This extension can be used to trigger another pipeline by committing and empty commit to a repository
Go
1
star
29

estafette-docker-cache-heater

Runs as a sidecar to the pull through cache in order to warmup new pods with frequently used container images
Go
1
star
30

estafette-ci-hanging-job-cleaner

This cronjob checks for jobs that have been running for too long and cleans them up
Go
1
star
31

estafette-extension-slack-build-status

This Estafette extension makes it easy to send a build status message to a Slack channel
Go
1
star
32

estafette-extension-dotnet

This extension allows you to build and publish ASP.NET Core application and libraries
Go
1
star
33

estafette-extension-cloud-function

This extension can be used to create and deploy a cloud function
Go
1
star
34

estafette-extension-envvars

This Estafette extension logs all Estafette envvars available to your pipeline build
Go
1
star
35

prometheus-bigquery-adapter

Adapter for using BigQuery as remote storage for Prometheus
Go
1
star
36

estafette-promote-container

This repository provides just a manifest that can tag a specific image with another tag in order to promote a dev version to beta or stable
1
star
37

openresty-sidecar

A sidecar container to take care of TLS termination
Lua
1
star
38

estafette-ci-builder-cached-extensions

Estafette-ci-builder image with extensions pre-cached
Dockerfile
1
star
39

estafette-google-cloud-catalog-extractor

A job that extracts information from your Google Cloud Platform and stores in in Estafette's catalog
Go
1
star
40

estafette

The CLI for Estafette
Go
1
star
41

istio-helm-chart

Turns the helm chart bundled in the istio repository into a hosted helm chart
1
star
42

estafette-cloudflare-loadbalancer

Kubernetes controller to create a Cloudflare load balancer with all GKE nodes as a backend pool
Go
1
star
43

estafette-gke-node-recycler

This Kubernetes controller cycles vms on an interval to prevent hosts from filling up too early with containers or logs
Go
1
star
44

estafette-extension-helm

This extension helps with linting, packaging, testing and adding Helm charts to repositories
Go
1
star