• Stars
    star
    406
  • Rank 106,421 (Top 3 %)
  • Language
    Go
  • License
    MIT License
  • Created about 6 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Kubernetes network monitoring

CI Coverage Status GitHub release (latest SemVer)

Kubenurse

kubenurse is a little service that monitors all network connections in a Kubernetes cluster. Kubenurse measures request durations, records errors and exports those metrics in Prometheus format.

Deployment

You can get the Docker image from Docker Hub. The examples directory contains manifests which can be used to deploy kubenurse to the kube-system namespace of your cluster.

Helm deployment

You can also deploy kubenurse with Helm, the Chart can be found in repository https://postfinance.github.io/kubenurse/ or directory ./helm/kubenurse/. The following command can be used to install kubenurse with Helm: helm upgrade [RELEASE_NAME] --install --repo https://postfinance.github.io/kubenurse/ kubenurse.

Configuration settings

Setting Description Default
daemonset.image.repository The repository name postfinance/kubenurse
daemonset.image.tag The tag/ version of the image v1.4.0
daemonset.podLabels Additional labels to be added to the pods of the daemonset []
daemonset.podAnnotations Additional annotations to be added to the pods of the daemonset []
daemonset.podSecurityContext The security context of the daemonset {}
daemonset.containerSecurityContext The security context of the containers within the pods of the daemonset {}
daemonset.containerResources The container resources of the containers within the pods of the daemonset {}
daemonset.containerImagePullPolicy The container image pull policy the pods of the daemonset IfNotPresent
daemonset.tolerations The tolerations of the daemonset See Default tolerations below
daemonset.dnsConfig Specifies the DNS parameters of the pods in the daemonset {}
daemonset.volumeMounts Additional volumeMounts to be added to the pods of the daemonset []
daemonset.volumes Additional volumes to be added to the daemonset []
serviceMonitor.enabled Adds a ServiceMonitor for use with Prometheus-operator false
serviceMonitor.labels Additional labels to be added to the ServiceMonitor {}
serviceAccount.name The name of the service account which is used Release.Name
service.name The name of service which exposes the kubenurse application 8080-8080
service.port The port number of the service 8080
service.labels Additional labels to be added to the Service
ingress.enabled Enable/ Disable the ingress true
ingress.className The classname of the ingress controller (e.g. the nginx ingress controller) nginx
ingress.url The url of the ingress; e.g. kubenurse.westeurope.cloudapp.example.com dummy-kubenurse.example.com
insecure Set KUBENURSE_INSECURE environment variable true
allow_unschedulable Sets KUBENURSE_ALLOW_UNSCHEDULABLE environment variable false
neighbour_filter Sets KUBENURSE_NEIGHBOUR_FILTER environment variable app.kubernetes.io/name=kubenurse
extra_ca Sets KUBENURSE_EXTRA_CA environment variable
check_api_server_direct Sets KUBENURSE_CHECK_API_SERVER_DIRECT environment variable true
check_api_server_dns Sets KUBENURSE_CHECK_API_SERVER_DNS environment variable true
check_me_ingress Sets KUBENURSE_CHECK_ME_INGRESS environment variable true
check_me_service Sets KUBENURSE_CHECK_ME_SERVICE environment variable true
check_neighbourhood Sets KUBENURSE_CHECK_NEIGHBOURHOOD environment variable true
check_interval Sets KUBENURSE_CHECK_INTERVAL environment variable 5s
use_tls Sets KUBENURSE_USE_TLS environment variable false
cert_file Sets KUBENURSE_CERT_FILE environment variable
cert_key Sets KUBENURSE_CERT_KEY environment variable

Default tolerations:

- effect: NoSchedule
  key: node-role.kubernetes.io/master
  operator: Equal
- effect: NoSchedule
  key: node-role.kubernetes.io/control-plane
  operator: Equal

After everything is set up and Prometheus scrapes the kubenurses, you can build dashboards as this example that show network latencies and errors or use the metrics for alarming.

Grafana ingress view Grafana path view

Configuration

kubenurse is configured with environment variables:

  • KUBENURSE_INGRESS_URL: An URL to the kubenurse in order to check the ingress
  • KUBENURSE_SERVICE_URL: An URL to the kubenurse in order to check the Kubernetes service
  • KUBENURSE_INSECURE: If "true", TLS connections will not validate the certificate
  • KUBENURSE_EXTRA_CA: Additional CA cert path for TLS connections
  • KUBENURSE_NAMESPACE: Namespace in which to look for the neighbour kubenurses
  • KUBENURSE_NEIGHBOUR_FILTER: A Kubernetes label selector (eg. app=kubenurse) to filter neighbour kubenurses
  • KUBENURSE_ALLOW_UNSCHEDULABLE: If this is "true", path checks to neighbouring kubenurses are made even if they are running on unschedulable nodes.
  • KUBENURSE_CHECK_API_SERVER_DIRECT: If this is "true" kubenurse will perform the check [API Server Direct](#API Server Direct). default is "true"
  • KUBENURSE_CHECK_API_SERVER_DNS: If this is "true", kubenurse will perform the check [API Server DNS](#API Server DNS). default is "true"
  • KUBENURSE_CHECK_ME_INGRESS: If this is "true", kubenurse will perform the check [Me Ingress](#Me Ingress). default is "true"
  • KUBENURSE_CHECK_ME_SERVICE: If this is "true", kubenurse will perform the check [Me Service](#Me Service). default is "true"
  • KUBENURSE_CHECK_NEIGHBOURHOOD: If this is "true", kubenurse will perform the check Neighbourhood. default is "true"
  • KUBENURSE_CHECK_INTERVAL: the frequency to perform kubenurse checks. the string should be formatted for time.ParseDuration. defaults to 5s
  • KUBENURSE_USE_TLS: If this is "true", enable TLS endpoint on port 8443
  • KUBENURSE_CERT_FILE: Certificate to use with TLS endpoint
  • KUBENURSE_CERT_KEY: Key to use with TLS endpoint

Following variables are injected to the Pod by Kubernetes and should not be defined manually:

  • KUBERNETES_SERVICE_HOST: Host to communicate to the kube-apiserver
  • KUBERNETES_SERVICE_PORT: Port to communicate to the kube-apiserver

The used http client appends the certificate /var/run/secrets/kubernetes.io/serviceaccount/ca.crt if found.

http Endpoints

The kubenurse service listens for http requests on port 8080 (optionally https on port 8443) and exposes endpoints:

  • /: Redirects to /alive
  • /alive: Returns a pretty printed JSON with the check results, described below
  • /alwayshappy: Returns http-200 which is used for testing itself
  • /metrics: Exposes Prometheus metrics

The /alive endpoint returns a JSON like this with status code 200 if everything is OK else 500:

{
  "api_server_direct": "ok",
  "api_server_dns": "ok",
  "me_ingress": "ok",
  "me_service": "ok",
  "hostname": "kubenurse-1234-x2bwx",
  "neighbourhood_state": "ok",
  "neighbourhood": [
   {
    "PodName": "kubenurse-1234-8fh2x",
    "PodIP": "10.10.10.67",
    "HostIP": "10.12.12.66",
    "NodeName": "k8s-66.example.com",
    "Phase": "Running"
   },
   {
    "PodName": "kubenurse-1234-ffjbs",
    "PodIP": "10.10.10.138",
    "HostIP": "10.12.12.89",
    "NodeName": "k8s-89.example.com",
    "Phase": "Running"
   }
  ],
  "headers": {
   "Accept": [
    "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
   ],
   "Accept-Encoding": [
    "gzip, deflate, br"
   ],
   ...
  }
}

Health Checks

Every five seconds and on every access of /alive, the checks described below are run. Check results are cached for 3 seconds in order to prevent excessive network traffic.

A little illustration of what communication occurs, is here:

Communication

API Server Direct

Checks the /version endpoint of the Kubernetes API Server through the direct link (KUBERNETES_SERVICE_HOST, KUBERNETES_SERVICE_PORT).

Metric type: api_server_direct

API Server DNS

Checks the /version endpoint of the Kubernetes API Server through the Cluster DNS URL https://kubernetes.default.svc:$KUBERNETES_SERVICE_PORT. This also verifies a working kube-dns deployment.

Metric type: api_server_dns

Me Ingress

Checks if the kubenurse is reachable at the /alwayshappy endpoint behind the ingress. This address is provided by the environment variable KUBENURSE_INGRESS_URL that could look like https://kubenurse.example.com. This also verifies a correct upstream DNS resolution.

Metric type: me_ingress

Me Service

Checks if the kubenurse is reachable at the /alwayshappy endpoint through the Kubernetes service. The address is provided by the environment variable KUBENURSE_SERVICE_URL that could look like http://kubenurse.mynamespace.default.svc:8080. This also verifies a working kube-proxy setup.

Metric type: me_service

Neighbourhood

Checks if every neighbour kubenurse is reachable at the /alwayshappy endpoint. Neighbours are discovered by querying the kube-apiserver for every Pod in the KUBENURSE_NAMESPACE with label KUBENURSE_NEIGHBOUR_FILTER. The request is done directly to the Pod-IP (port 8080, or 8443 if TLS is enabled) and the metric types contains the prefix path_ and the hostname of the kubelet on which the neighbour kubenurse should run. Only kubenurses on nodes that are schedulable are considered as neighbours, this can be changed by setting KUBENURSE_ALLOW_UNSCHEDULABLE="true".

Metric type: path_$KUBELET_HOSTNAME

Metrics

All performed checks expose metrics which can be used to monitor/alert:

  • SDN network latencies and errors
  • kubelet-to-kubelet network latencies and errors
  • pod-to-apiserver communication
  • Ingress roundtrip latencies and errors
  • Service roundtrip latencies and errors (kube-proxy)
  • Major kube-apiserver issues
  • kube-dns (or CoreDNS) errors
  • External DNS resolution errors (ingress URL resolution)

At /metrics you will find these:

  • kubenurse_errors_total: Kubenurse error counter partitioned by error type
  • kubenurse_request_duration: a histogram for Kubenurse request duration partitioned by error type

More Repositories

1

kubelet-csr-approver

Kubernetes controller to enable automatic kubelet CSR validation after a series of (configurable) security checks
Go
167
star
2

kubectl-sudo

Run kubernetes commands with the security privileges of another user
Shell
163
star
3

vault-kubernetes

Authenticate services to @hashicorp Vault via the Kubernetes auth method
Go
78
star
4

single

single ensures that only one instance of your program is running
Go
55
star
5

kubectl-ctx

Simple kubectl plugin to display/switch contexts
Go
35
star
6

kuota-calc

Simple utility to calculate the resource quota needed for your k8s deployment(s)
Go
22
star
7

hlfabric-k8scc

Chaincode builder and launcher for Hyperledger Fabric on Kubernetes
Go
22
star
8

kubectl-ns

Simple kubectl plugin to display/switch namespaces
Go
20
star
9

discovery

Service discovery for prometheus.
Go
14
star
10

httpclient

Generates a HTTP client from a service definition (interface). The created client is ready to use in production with many configuration options and sensible defaults.
Go
13
star
11

kubewire

Kubernetes integrity checker
Go
10
star
12

kubectl-vault_sync

Kubernetes plugin to synchronize secrets from vault as kubernetes secrets.
Go
8
star
13

terraform-registry

Go
6
star
14

hostlookuper

DNS monitoring tool
Go
4
star
15

mage

mage (magefile.org) helper functions
Go
4
star
16

vault

Helper and wrapper functions for @hashicorp Vault
Go
2
star
17

vaultkv

Package kv provides version agnostic methods for read, write and list of secrets from @hashicorp Vault's KV secret engines
Go
2
star
18

argocd-cmp-ytt

ArgoCD ConfigManagementPlugin to permit templating with ytt
Go
2
star
19

profiler

pprof endpoint for Go applications that can be activated by a signal
Go
2
star
20

flash

Configures an opinionated zap logger.
Go
1
star
21

secfs

Go package secretfs implements afero.Fs and afero.File for Kubernetes secrets.
Go
1
star
22

vaultk8s

Package k8s provides authentication with Vault on Kubernetes
Go
1
star
23

promi

CLI to query targets and alerts of multiple prometheus servers.
Go
1
star
24

store

store with etcd or in-memory hash as backend
Go
1
star