• Stars
    star
    145
  • Rank 254,144 (Top 6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 5 years ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

MagTape Policy-as-Code for Kubernetes

Latest release License Gitter chat python-checks e2e-checks image-build

magtape-logo

MagTape

MagTape is a Policy-as-Code tool for Kubernetes that allows for evaluating Kubernetes resources against a set of defined policies to inform and enforce best practice configurations. MagTape includes variable policy enforcement, notifications, and targeted metrics.

MagTape builds on the Kubernetes Admission Webhook concept and uses Open Policy Agent (OPA) for its generic policy language and engine.

Our goal with MagTape is to show an example of wrapping additional business logic and features around OPA's core, not to be a competitor. While MagTape is not primarily meant to be a security tool, it can easily enforce security policy.

Overview

MagTape examines kubernetes objects against a set of defined policies (best practice configurations/security concepts) and can deny/alert on objects that fail policy checks. The webhook is written in Python using the Flask framework.

Prereqs

A modern version of Kubernetes with the admissionregistration.k8s.io API enabled. Verify that by the following command:

$ kubectl api-versions | grep admissionregistration.k8s.io

The result should be:

admissionregistration.k8s.io/v1

In addition, the MutatingAdmissionWebhook and ValidatingAdmissionWebhook admission controllers should be added and listed in the correct order in the admission-control flag of kube-apiserver.

NOTE: MagTape has been tested and is known to work for Kubernetes versions 1.13+ on various Distros/Cloud Providers (DOKS, GKE, EKS, AKS, PKS, and KinD) NOTE: MagTape v2.4.0+ no longer supports Kubernetes versions below v1.19.0. Please use MagTape v2.3.3 for earlier versions of Kubernetes

Permissions

MagTape requires cluster-admin permissions to deploy to Kubernetes since it requires access to create/read/update/delete cluster scoped resources (ValidatingWebhookConfigurations, Events, etc.)

MagTape's default RBAC permissions include get, list, and watch access to Secret resources across all namespaces in the cluster. This is to allow for lookup of user-defined Slack Incoming Webhook URL's. If this feature is not needed. the magtape-read ClusterRole can be adjusted to remove these permissions.

Quickstart

You can use the following command to install MagTape and the example policies from this repo with sane defaults. This won't have all features turned on as they require more configuration up front. Please see the Advanced Install section for more details.

NOTE: The quickstart installation is not meant for production use. Please read through the Advanced Install and Cautions sections, and as always, use your best judgement when configuring MagTape for production scenarios.

NOTE: The master branch of this repository is considered a working branch and may not always be in a functioning state. It's best to select a specific tag for a stable version of MagTape

$ kubectl apply -f https://raw.githubusercontent.com/tmobile/magtape/v2.4.0/deploy/install.yaml

This will do the following

  • Create the magtape-system namespace
  • Create cluster and namespace scoped roles/rolebindings
  • Deploy the MagTape workload and related configs
  • Deploy the example policies from this repo

Once this is complete you can do the following to test

Create and label a test namespace

$ kubectl create ns test1
$ kubectl label ns test1 k8s.t-mobile.com/magtape=enabled

Deploy some test workloads

# These examples assume you're in the root directory of this repo
# Example with no failures

$ kubectl apply -f ./testing/deployments/test-deploy01.yaml -n test1

# Example with deny
# You should get immediate feedback that this request was denied.

$ kubectl apply -f ./testing/deployments/test-deploy02.yaml -n test1

# Example with failures, but no deny
# While this request won't be denied, a K8s Event will be generated
# and can be viewed with "kubectl get events -n test1"

$ kubectl apply -f ./testing/deployments/test-deploy03.yaml -n test1

Beyond the Basics

Now that you've seen the basics of MagTape, try out some of the other features

Cleanup

Remove all MagTape deployed resources

# Assumes you're in the root directory of this repo
$ kubectl delete -f deploy/install.yaml
$ kubectl delete validatingwebhookconfiguration magtape-webhook

Policies

The below policy examples are available within this repo. The can be ignored or custom policies can be added. Policies use OPA's Rego language with a specific format to define policy metadata and the output message. This special formatting is required as it enables the additional functionality of MagTape.

  • Liveness Probe (Check ID: MT1001)
  • Readiness Probe (Check ID: MT1002)
  • Resource Limits (Check ID: MT1003)
  • Resource Requests (Check ID: MT1004)
  • Pod Disruption Budget (Check ID: MT1005)
  • Istio Port Name/Number Mismatch (Check ID: MT1006)
  • Singleton Pods (Check ID: MT1007)
  • Host Port (Check ID: MT1008)
  • emptyDir Volume (Check ID: MT1009)
  • Host Path (Check ID: MT1010)
  • Privileged Pod Security Context (Check ID: MT2001)
  • Node Port Range (Check ID: MT2002)

More detailed info about these policies can be found here.

The policy metadata is defined within each policy similar to this:

policy_metadata = {

    # Set MagTape Policy Info
    "name": "policy-resource-requests",
    "severity": "LOW",
    "errcode": "MT1004",
    "targets": {"Deployment", "StatefulSet", "DaemonSet", "Pod"},

}
  • name - Defines the name of the specific policy. This should be unique per policy.
  • severity - Defines the severity level of a specific policy. This correlates with the DENY_LEVEL to determine if a policy should result in a deny or not.
  • errcode - A unique code that can be used, typically in reference to an FAQ, to look up additional information about the policy, what produces a failure, and how to resolve failures.
  • targets - This controls which Kubernetes resources the policy targets. Each target should be the singular of the Kubernetes resource as found in the Kind field. Special care should be taken to make sure all target resources maintain similar JSON data paths within the policy logic, or that differences are handled appropriately.

Policies follow normal OPA operations for policy discovery. MagTape provides configuration to OPA to filter which configmaps it targets for discovery. If you're adding your own policies make sure to apply the following labels to the configmap:

app=opa
openpolicyagent.org/policy=rego

Example creating a policy configmap with appropriate labels from an existing Rego file

# Create a policy from a Rego file
$ kubectl create cm my-special-policy -n magtape-system --from-file=my-special-policy.rego --dry-run -o yaml | \
kubectl label --local app=opa openpolicyagent.org/policy=rego -f - --dry-run -o yaml > my-special-policy-cm.yaml

OPA will add/update the openpolicyagent.org/policy-status annotation on the policy configmaps to show they've been loaded successfully or if there are any syntax/validation issues.

Writing policies that Reference resources outside of the request object

As part of the integration MagTape has with OPA, the kube-mgmt service is also deployed within the MagTape pod. In short, kube-mgmt replicates resources from the Kubernetes cluster into OPA to allow for additional context with policies. kube-mgmt requires permissions to build the resource cache and those permissions should be updated accordingly when policies are developed that expand the scope of resources needed.

Please reference the kube-mgmt documentation on caching for additional information on how to configure kube-mgmt to watch new resource types and adjust the permissions in the magtape-read clusterrole accordingly.

Deny Level

Each policy is assigned a Severity level "LOW", "MED", or "HIGH". This is used to influence what policy checks result in an actual deny, or just become passive (alerting only)

The Deny Level is set within the deployment via an environment variable (MAGTAPE_DENY_VOLUME) and can be set to "OFF", "LOW", "MED", or "HIGH". The Deny Level has an inverse relationship to the Severity of the defined checks, which works as follows:

Deny Level Severities Blocked
OFF None
LOW HIGH
MED HIGH, MED
HIGH HIGH, MED, LOW

This configuration provides flexibility around controlling which checks should result in a "deny" and allows for a progressive approach as the platform and its users mature

Health Check

MagTape has a rudimentary healthcheck endpoint configured at /healthz. The endpoint displays a json output including the name of the pod running the webhook, the datetime of the request, and the overall health. This is nothing fancy. If the Flask app is running at all the health will report ok.

Image

MagTape uses a few images for operation. Please reference the image repos for more information on the image structure and contents

K8s Events

K8s Events can be generated for policy failures via the MAGTAPE_K8S_EVENTS_ENABLED environment variable.

Setting this variable to TRUE will cause a Kubernetes event to be created in the target namespace of the request object when a policy failure occurs. This will provide a more native method to passively inform users on policy failures (regardless of whether or not the request is denied).

Slack Alerts

Slack alerts can be enabled and controlled via environment variables (noted above):

  • MAGTAPE_SLACK_ENABLED
  • MAGTAPE_SLACK_PASSIVE
  • MAGTAPE_SLACK_WEBHOOK_URL_BASE
  • MAGTAPE_SLACK_WEBHOOK_URL_DEFAULT
  • MAGTAPE_SLACK_USER
  • MAGTAPE_SLACK_ICON

Override base domain for Slack Incoming Webhook URL

Some airgapped environments may need to use a forwarder/proxy service to assist in sending alerts to the Slack API. the MAGTAPE_SLACK_WEBHOOK_URL_BASE environment variable allows you to override the base domain for the Slack Incoming Webhook URL to target the forwarding/proxy service. This is very assumptive that the forwarding/proxy service will accept a Slack compliant payload and that the endpoint differs from the default Slack Incoming Webhook URL in domain only (ie. the protocol and trailing paths remain the same).

EXAMPLE:

MAGTAPE_SLACK_WEBHOOK_URL_DEFAULT="https://hooks.slack.com/services/XXXXXXXX/XXXXXXXXXXXX"
MAGTAPE_SLACK_WEBHOOK_URL_BASE="slack-proxy.example.com"

This configuration will override hooks.slack.com to be slack-proxy.example.com and the outcome will be:

https://slack-proxy.example.com/services/XXXXXXXX/XXXXXXXXXXXX

NOTE: The MAGTAPE_SLACK_WEBHOOK_URL_BASE environment variable is optional and if not specified the URL will remain unchanged from what is set in MAGTAPE_SLACK_WEBHOOK_URL_DEFAULT

Default Alert Target

When alerts are enabled they will be sent to the Slack Incoming Webhook URL defined in the MAGTAPE_SLACK_WEBHOOK_URL_DEFAULT environment variable. This is meant to be a channel controlled by the MagTape Webhook administrators.

User-defined Alert Target

When alerts are enabled they can be sent to a user-defined Slack Incoming Webhook URL in addition to the default mentioned above. This can be configured via a Kubernetes Secret resource in a target namespace. The secret should be named magtape-slack and the Slack Incoming Webhook URL should be set as the value (typical base64 encoding) for the webhook-url key. This will allow end-users to receive alerts in their desired Slack Channel for request objects targeting their own namespace.

EXAMPLE:

$ kubectl create secret generic magtape-slack -n my-cool-namespace --from-literal=webhook-url="https://hooks.slack.com/services/XXXXXXXX/XXXXXXXXXXXX"

Alert Format

Slack alert examples:

Slack Alert Deny Screenshot

Slack Alert Fail Screenshot

NOTE: For Slack Alerts to work, you will need to configure a Slack Incoming Webhook and set the environment variable for the webhook deployment as noted above.

Metrics

Prometheus formatted metrics are exposed on the /metrics endpoint. Metrics track counters for requests by:

  • CPU, Memory, and HTTP error rate
  • Number of requests passed, failed, and total
  • Breakdown by namespace
  • Breakdown by policy

Grafana dashboards showing Cluster, Namespace, and Policy scoped metrics are available in the metrics directory. An example Prometheus ServiceMonitor resource is located here.

These dashboards are simple, but serve a few purposes:

  • How busy the MagTape app itself is (ie. should the resources or replica count be increased/decreased)
  • What Namespaces seem to produce the most policy failures (Could indicate the team is struggling with certain concepts, there's something malicious going on, etc.)
  • What policies seem to be the most problematic (Maybe an opportunity to target education/training for specific topics based on the policy scope)

We've found that sometimes thinking about operations from a metrics perspective can lead you to develop a policy that is more about tracking how frequently some action occurs rather than explicitly if it should be allowed or denied. Your mileage may very!

Testing

  • Create namespace for testing and label it appropriately

    $ kubectl create ns test1
    $ kubectl label ns test1 k8s.t-mobile.com/magtape=enabled
  • Deploy test deployment to Kubernetes cluster

    $ kubectl apply -f test-deploy02.yaml -n test1

    NOTE: MagTape should deny this workload and should provide feedback similar to this:

    $ kubectl apply -f test-deploy02.yaml -n test1
    
    Error from server: error when creating "test-deploy02.yaml": admission webhook "magtape.webhook.k8s.t-mobile.com" denied the request: [FAIL] HIGH - Found privileged Security Context for container "test-deploy02" (MT2001), [FAIL] LOW - Liveness Probe missing for container "test-deploy02" (MT1001), [FAIL] LOW - Readiness Probe missing for container "test-deploy02" (MT1002), [FAIL] LOW - Resource limits missing (CPU/MEM) for container "test-deploy02" (MT1003), [FAIL] LOW - Resource requests missing (CPU/MEM) for container "test-deploy02" (MT1004)

Test Samples Available

Info on testing resources can be found in the testing directory

NOTE: These manifests are meant to test deploy-time validation, some pods related to these test manifests may fail to come up properly. A failing pod doesn't represent an issue with MagTape.

Cautions

Production Considerations

  • By Default the MagTape Validating Webhook Configuration is set to fail "closed". Meaning if the webhook is unreachable or doesn't return an expected response, requests to the Kubernetes API will be blocked. Please adjust the configuration if this is not something that fits your environment.
  • MagTape supports operation with multiple replicas that can increase availability and performance for critical clusters.

Break Glass Scenarios

MagTape can be enabled and disabled on a per namespace basis by utilizing the k8s.t-mobile.com/magtape label on namespace resources. In emergency situations the label can be removed from a namespace to disable policy assessment for workloads in that namespace.

If there are cluster-wide issues you can disable MagTape completely by removing the magtape-webhook Validating Webhook Configuration and deleting the MagTape deployment.

Troubleshooting

Certificate Trust

The ValidatingWebhookConfiguration needs to have a CA Bundle that includes the CA that signed the TLS cert used to secure the MagTape webhook. If this is not done the required trust between the K8s API and webhook will not exist and the webhook won't function correctly. More info is available here

Access MagTape API from local machine

$ kubectl get pods # to get the name of the running pod
$ kubectl port-forward <pod_name> -n <namespace> 5000:5000

Use Curl to perform HTTP POST to MagTape

$ curl -vX POST https://localhost:5000/ -d @test.json -H "Content-Type: application/json"

Follow logs of the webhook pod

$ kubectl get pods # to get the name of the running pod
$ kubectl logs <pod_name> -n <namespace> -f

More Repositories

1

pacbot

PacBot (Policy as Code Bot)
Java
1,288
star
2

t-vault

Simplified secrets management solution
Java
364
star
3

jazz

Platform to develop and manage serverless applications at an enterprise scale!
JavaScript
298
star
4

r-tensorflow-api

A small Docker container for using R and TensorFlow as an enterprise API
R
261
star
5

kardio

Service Health Dashboard for Kubernetes, Containers and more...
Java
222
star
6

loadtest

an R package that automates performance testing of ML models and summarizes the results in a dashboard w/ rad visualizations
R
93
star
7

magentaA11y

Magenta A11y is a tool built to simplify the process of accessibility testing.
HTML
60
star
8

POET-pipeline-library

POET pipeline framework automation code.
Groovy
46
star
9

hyperdirectory

Blockchain-based, highly auditable access management solution (directory service)
TypeScript
34
star
10

monarch

App-level Chaos Engineering
Python
28
star
11

tmus-geofeed

26
star
12

jazz-installer

Installer for Jazz Serverless Developer Platform!
Python
25
star
13

casquatch

Casquatch: an open source Java abstraction layer for Cassandra databases
Java
21
star
14

faas-java-templates

Java templates for OpenFaas, a serverless functions as a service platform built on Docker
Java
19
star
15

themes-platform-vendor-tmobile-apps-ThemeChooser

Java
18
star
16

DevEdge-IoTDevKit-ZephyrSDK

ZephyrSDK (TMO_shell) is a Zephyr application built by T-Mobile and comes shipped on the DevEdge - IoT Developer Kit
C
18
star
17

codeless

ETP codeless project allows you to quickly write tests using basic yaml and spreadsheet files without having to have in-depth understanding of writing Java automation tests.
Java
18
star
18

themes-platform-vendor-tmobile-themes-Androidian

13
star
19

DevEdge-IoTDevKit-ZephyrRTOS

T-Mobile Zephyr OS is a fork of zephyrproject-rtos/zephyr that is shipped on the DevKit and used for contributing upstream.
C
13
star
20

developer-puzzle

T-Mobile Developer Candidate Puzzles. Do not Fork, Do not submit a PR. Simply clone to your local and create a new repo in your public account. Solve the puzzle and send us the address of your repo. DO NOT FORK THIS REPO OR SUBMIT PULL REQUEST (your submission will be deleted!)
TypeScript
13
star
21

jest-jsdom-browser-compatibility

This is a matrix of issues and risks of using Jest with JSDOM to test browser applications. This will include several sub projects with example tests to demonstrate the failures.
TypeScript
12
star
22

passport-tmobileid

T-Mobile ID enabled authentication strategy for Passport and Node.js
JavaScript
12
star
23

percy-cake

Percival: A Configuration As Kode Editor to simplify managing distributed applications and services.
TypeScript
12
star
24

docinator

Build a website from your code's documentation with zero configuration.
JavaScript
11
star
25

pi-alarm

Raspberry Pi Alarm
Python
11
star
26

chaostoolkit-turbulence

Tools and resources to support Chaos Engineering
Python
11
star
27

themes-platform-vendor-tmobile-libs-com.tmobile.themes

Java
11
star
28

springboot-restapi-generator

Custom yoman generator for scaffolding REST API projects based on springboot
JavaScript
10
star
29

keybiner

Entitlements compression and validation library
Java
9
star
30

themes-platform-manifest

8
star
31

themes-platform-vendor-tmobile-providers-ThemeManager

Java
8
star
32

developer-kata

Some coding exercises for developers. There is not sample code here, only descriptions of problems to solve.
8
star
33

stf_ios_mirrorfeed

Stream IOS mirroring via USB to websocket
Go
7
star
34

themes-platform-frameworks-base

Java
7
star
35

opensource

t-mobile's open source microsite
CSS
7
star
36

cf-smoke-tests

Python
7
star
37

qapi

Query API
Java
5
star
38

orchestration-desk

Node module to connect to orchestration services like marathon to get details about applications, containers and more.
TypeScript
5
star
39

ducklett

Ducklett: managing all the little nodes of a Conducktor cluster
Go
5
star
40

themes-platform-vendor-tmobile-products-themes

Shell
5
star
41

parallelizer

A go library for building work pools.
Go
4
star
42

t-rover

Java
4
star
43

node-red-contrib-sms-send

JavaScript
4
star
44

DevEdge-IoTDevKit-Binaries

Zephyr DevKit Supporting Binaries
Makefile
4
star
45

themes-platform-packages-apps-Settings

Java
3
star
46

themes-platform-packages-apps-Contacts

Java
3
star
47

themes-platform-vendor-tmobile-libs-com.tmobile.themehelper

Java
3
star
48

DevEdge-IoTDevKit-SiLabs-WiseConnect

WiSeConnect Wi-Fi and Bluetooth Software used as a supporting module to DevEdge-IoTDevKit-ZephyrRTOS
C
3
star
49

node-red-contrib-yolo-object-detection

Node-RED contrib module that uses YOLO3 object detection to identify items in images.
Python
3
star
50

themes-platform-frameworks-policies-base

Java
3
star
51

node-red-contrib-object-to-array

Given a JavaScript object, when this node is executed it will transform the object key/value properties to an array of object properties with specified property names..
TypeScript
3
star
52

DevEdge-IoTDevKit-Sony-cxd5605

Zephyr module supporting Sony CXD5605 GNSS
C
3
star
53

node-red-contrib-rpi-adeept-motor

HTML
2
star
54

node-red-contrib-summarizer

Node-RED node to summarize arrays of data
HTML
2
star
55

themes-platform-packages-apps-Launcher2

Java
2
star
56

themes-platform-packages-apps-Email

Java
2
star
57

depaginator

Go
2
star
58

common-platform-vendor-tmobile-build-common

Shell
2
star
59

node-red-contrib-differences

A Node-RED node to detect differences between two sets of data
TypeScript
2
star
60

jazz-content

Static content for Jazz - https://github.com/tmobile/jazz
Dockerfile
2
star
61

node-red-contrib-array-iterator

Given an array input, when this node is executed it will pass the โ€œnextโ€ value of the array to the succeeding connected node(s).
TypeScript
2
star
62

qlkube-client

JS library to make interacting with qlkube easier
JavaScript
2
star
63

themes-platform-packages-apps-Music

Java
2
star
64

themes-platform-packages-apps-Mms

Java
2
star
65

emerald-platform-vendor-tmobile-themes-Androidian

Java
1
star
66

t-racer-legacy

consolidate t-racer projects into one repo
C++
1
star
67

gatsby-starter-gitlab

A Gatsby starter to generate a website from a list of Gitlab groups and/or projects.
TypeScript
1
star
68

node-red-contrib-tm-mjpg-server

Node red module to start a python mjpg server
Python
1
star
69

gatsby-source-gitlab

A Gatsby starter to generate a website from a list of Gitlab groups and/or projects.
TypeScript
1
star