• Stars
    star
    176
  • Rank 216,987 (Top 5 %)
  • Language
    Go
  • License
    Other
  • Created about 5 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Crash-Diagnostics (Crashd) is a tool to help investigate, analyze, and troubleshoot unresponsive or crashed Kubernetes clusters.

Go Report Card

Crashd - Crash Diagnostics

Crash Diagnostics (Crashd) is a tool that helps human operators to easily interact and collect information from infrastructures running on Kubernetes for tasks such as automated diagnosis and troubleshooting.

Crashd Features

  • Crashd uses the Starlark language, a Python dialect, to express and invoke automation functions
  • Easily automate interaction with infrastructures running Kubernetes
  • Interact and capture information from compute resources such as machines (via SSH)
  • Automatically execute commands on compute nodes to capture results
  • Capture object and cluster log from the Kubernetes API server
  • Easily extract data from Cluster-API managed clusters

How Does it Work?

Crashd executes script files, written in Starlark, that interacts a specified infrastructure along with its cluster resources. Starlark script files contain predefined Starlark functions that are capable of interacting and collect diagnostics and other information from the servers in the cluster.

For detail on the design of Crashd, see this Google Doc design document here.

Installation

There are two ways to get started with Crashd. Either download a pre-built binary or pull down the code and build it locally.

Download binary

  1. Dowload the latest binary release for your platform
  2. Extract tarball from release
    tar -xvf <RELEASE_TARBALL_NAME>.tar.gz
    
  3. Move the binary to your operating system's PATH

Compiling from source

Crashd is written in Go and requires version 1.11 or later. Clone the source from its repo or download it to your local directory. From the project's root directory, compile the code with the following:

GO111MODULE=on go build -o crashd .

Or, yo can run a versioned build using the build.go source code:

go run .ci/build/build.go

Build amd64/darwin OK: .build/amd64/darwin/crashd
Build amd64/linux OK: .build/amd64/linux/crashd

Getting Started

A Crashd script consists of a collection of Starlark functions stored in a file. For instance, the following script (saved as diagnostics.crsh) collects system information from a list of provided hosts using SSH. The collected data is then bundled as tar.gz file at the end:

# Crashd global config
crshd = crashd_config(workdir="{0}/crashd".format(os.home))

# Enumerate compute resources 
# Define a host list provider with configured SSH
hosts=resources(
    provider=host_list_provider(
        hosts=["170.10.20.30", "170.40.50.60"], 
        ssh_config=ssh_config(
            username=os.username,
            private_key_path="{0}/.ssh/id_rsa".format(os.home),
        ),
    ),
)

# collect data from hosts
capture(cmd="sudo df -i", resources=hosts)
capture(cmd="sudo crictl info", resources=hosts)
capture(cmd="df -h /var/lib/containerd", resources=hosts)
capture(cmd="sudo systemctl status kubelet", resources=hosts)
capture(cmd="sudo systemctl status containerd", resources=hosts)
capture(cmd="sudo journalctl -xeu kubelet", resources=hosts)

# archive collected data
archive(output_file="diagnostics.tar.gz", source_paths=[crshd.workdir])

The previous code snippet connects to two hosts (specified in the host_list_provider) and execute commands remotely, over SSH, and capture and stores the result.

See the complete list of supported functions here.

Running the script

To run the script, do the following:

$> crashd run diagnostics.crsh 

If you want to output debug information, use the --debug flag as shown:

$> crashd run --debug diagnostics.crsh

DEBU[0000] creating working directory /home/user/crashd
DEBU[0000] run: executing command on 2 resources
DEBU[0000] run: executing command on localhost using ssh: [sudo df -i]
DEBU[0000] ssh.run: /usr/bin/ssh -q -o StrictHostKeyChecking=no -i /home/user/.ssh/id_rsa -p 22  user@localhost "sudo df -i"
DEBU[0001] run: executing command on 170.10.20.30 using ssh: [sudo df -i]
...

Compute Resource Providers

Crashd utilizes the concept of a provider to enumerate compute resources. Each implementation of a provider is responsible for enumerating compute resources on which Crashd can execute commands using a transport (i.e. SSH). Crashd comes with several providers including

  • Host List Provider - uses an explicit list of host addresses (see previous example)
  • Kubernetes Nodes Provider - extracts host information from a Kubernetes API node objects
  • CAPV Provider - uses Cluster-API to discover machines in vSphere cluster
  • CAPA Provider - uses Cluster-API to discover machines running on AWS
  • More providers coming!

Accessing script parameters

Crashd scripts can access external values that can be used as script parameters.

Environment variables

Crashd scripts can access environment variables at runtime using the os.getenv method:

kube_capture(what="logs", namespaces=[os.getenv("KUBE_DEFAULT_NS")])

Command-line arguments

Scripts can also access command-line arguments passed as key/value pairs using the --args or --args-file flags. For instance, when the following command is used to start a script:

$ crashd run --args="kube_ns=kube-system, username=$(whoami)" diagnostics.crsh

Values from --args can be accessed as shown below:

kube_capture(what="logs", namespaces=["default", args.kube_ns])

More Examples

SSH Connection via a jump host

The SSH configuration function can be configured with a jump user and jump host. This is useful for providers that requires a host proxy for SSH connection as shown in the following example:

ssh=ssh_config(username=os.username, jump_user=args.jump_user, jump_host=args.jump_host)
hosts=host_list_provider(hosts=["some.host", "172.100.100.20"], ssh_config=ssh)
...

Connecting to Kubernetes nodes with SSH

The following uses the kube_nodes_provider to connect to Kubernetes nodes and execute remote commands against those nodes using SSH:

# SSH configuration
ssh=ssh_config(
    username=os.username,
    private_key_path="{0}/.ssh/id_rsa".format(os.home),
    port=args.ssh_port,
    max_retries=5,
)

# enumerate nodes as compute resources
nodes=resources(
    provider=kube_nodes_provider(
        kube_config=kube_config(path=args.kubecfg),
        ssh_config=ssh,
    ),
)

# exec `uptime` command on each node
uptimes = run(cmd="uptime", resources=nodes)

# print `run` result from first node
print(uptimes[0].result)

Retreiving Kubernetes API objects and logs

Thekube_capture is used, in the following example, to connect to a Kubernetes API server to retrieve Kubernetes API objects and logs. The retrieved data is then saved to the filesystem as shown below:

nspaces=[
    "capi-kubeadm-bootstrap-system",
    "capi-kubeadm-control-plane-system",
    "capi-system capi-webhook-system",
    "cert-manager tkg-system",
]

conf=kube_config(path=args.kubecfg)

# capture Kubernetes API object and store in files
kube_capture(what="logs", namespaces=nspaces, kube_config=conf)
kube_capture(what="objects", kinds=["services", "pods"], namespaces=nspaces, kube_config=conf)
kube_capture(what="objects", kinds=["deployments", "replicasets"], namespaces=nspaces, kube_config=conf)

Interacting with Cluster-API managed machines running on vSphere (CAPV)

As mentioned, Crashd provides the capv_provider which allows scripts to interact with Cluster-API managed clusters running on a vSphere infrastructure (CAPV). The following shows an abbreviated snippet of a Crashd script that retrieves diagnostics information from the management cluster machines managed by a CAPV-initiated cluster:

# enumerates management cluster nodes
nodes = resources(
    provider=capv_provider(
        ssh_config=ssh_config(username="capv", private_key_path=args.private_key),
        kube_config=kube_config(path=args.mc_config)
    )
)

# execute and capture commands output from management nodes
capture(cmd="sudo df -i", resources=nodes)
capture(cmd="sudo crictl info", resources=nodes)
capture(cmd="sudo cat /var/log/cloud-init-output.log", resources=nodes)
capture(cmd="sudo cat /var/log/cloud-init.log", resources=nodes)
...

The previous snippet interact with management cluster machines. The provider can be configured to enumerate workload machines (by specifying the name of a workload cluster) as shown in the following example:

# enumerates workload cluster nodes
nodes = resources(
    provider=capv_provider(
        workload_cluster=args.cluster_name,
        ssh_config=ssh_config(username="capv", private_key_path=args.private_key),
        kube_config=kube_config(path=args.mc_config)
    )
)

# execute and capture commands output from workload nodes
capture(cmd="sudo df -i", resources=nodes)
capture(cmd="sudo crictl info", resources=nodes)
...

All Examples

See all script examples in the ./examples directory.

Roadmap

This project has numerous possibilities ahead of it. Read about our evolving roadmap here.

Contributing

New contributors will need to sign a CLA (contributor license agreement). Details are described in our contributing documentation.

License

This project is available under the Apache License, Version 2.0

More Repositories

1

velero

Backup and migrate Kubernetes applications and their persistent volumes
Go
8,593
star
2

kubeapps

A web-based UI for deploying and managing applications in Kubernetes clusters
Go
4,954
star
3

sonobuoy

Sonobuoy is a diagnostic tool that makes it easier to understand the state of a Kubernetes cluster by running a set of Kubernetes conformance tests and other plugins in an accessible and non-destructive manner.
Go
2,822
star
4

community-edition

VMware Tanzu Community Edition is no longer an actively maintained project. Code is available for historical purposes only.
Go
1,333
star
5

carvel-ytt

YAML templating tool that works on YAML structure instead of text
Go
1,286
star
6

carvel-kapp

kapp is a simple deployment tool focused on the concept of "Kubernetes application" — a set of resources with the same label
Go
707
star
7

pinniped

Pinniped is the easy, secure way to log in to your Kubernetes clusters.
Go
493
star
8

cartographer

Cartographer is a Supply Chain Choreographer.
Go
436
star
9

k-bench

Workload Benchmark for Kubernetes
Go
372
star
10

helm-charts

Contains Helm charts for Kubernetes related open source tools
Mustache
248
star
11

carvel

Carvel provides a set of reliable, single-purpose, composable tools that aid in your application building, configuration, and deployment to Kubernetes. This repo contains information regarding the Carvel open-source community.
HTML
243
star
12

velero-plugin-for-aws

Plugins to support Velero on AWS
Go
208
star
13

tanzu-framework

Tanzu Framework provides a set of building blocks to build atop of the Tanzu platform and leverages Carvel packaging and plugins to provide users with a much stronger, more integrated experience than the loose coupling and stand-alone commands of the previous generation of tools.
Go
198
star
14

cluster-api-provider-bringyourownhost

Kubernetes Cluster API Provider BYOH for already-provisioned hosts running Linux.
Go
196
star
15

carvel-kbld

kbld seamlessly incorporates image building and image pushing into your development and deployment workflows
Go
182
star
16

carvel-vendir

Easy way to vendor portions of git repos, github releases, helm charts, docker image contents, etc. declaratively
Go
168
star
17

carvel-kapp-controller

Continuous delivery and package management for Kubernetes.
Go
153
star
18

carvel-kwt

Kubernetes Workstation Tools CLI
Go
145
star
19

carvel-imgpkg

Store application configuration files in Docker/OCI registries
Go
134
star
20

tanzu-dev-portal

Content for Tanzu dev portal
HTML
132
star
21

velero-plugin-for-csi

Velero plugins for integrating with CSI snapshot API
Go
111
star
22

cloud-native-security-inspector

This project scans and assesses workloads in Kubernetes at runtime. It can apply protection rules to workloads to avoid further risks as well.
Go
104
star
23

velero-plugin-for-microsoft-azure

Plugins to support Velero on Microsoft Azure
Go
99
star
24

vm-operator

Self-service manage your virtual infrastructure...
Go
86
star
25

cloud-suitability-analyzer

Automated, rule based source code scanning to determine cloud suitability
Go
77
star
26

secrets-manager

VMware Secrets Manager for Cloud-Native Apps is a lightweight secrets manager to protect your sensitive data. It’s perfect for edge deployments where energy and footprint requirements are strict—See more: https://vsecm.com/
Go
77
star
27

velero-plugin-for-gcp

Plugins to support Velero on Google Cloud Platform (GCP)
Go
75
star
28

sonobuoy-plugins

Plugins for Sonobuoy
Go
60
star
29

carvel-secretgen-controller

secretgen-controller provides CRDs to specify what secrets need to be on Kubernetes cluster (to be generated or not)
Go
57
star
30

velero-plugin-for-vsphere

Plugin to support Velero on vSphere
Go
57
star
31

service-installer-for-vmware-tanzu

Service Installer for VMware Tanzu is a one-click automation solution that enables VMware field engineers to easily and rapidly install, configure, and operate VMware Tanzu services across a variety of cloud infrastructures.
Python
55
star
32

velero-plugin-example

Example project for plugins for Velero, a Kubernetes disaster recovery utility
Go
50
star
33

application-accelerator-samples

Project for samples to be used with "Application Accelerator for VMware Tanzu" which is part of "VMware Tanzu Platform".
Java
45
star
34

terraform-provider-carvel

Carvel Terraform provider with resources for ytt and kapp to template and deploy to Kubernetes
Go
43
star
35

asset-relocation-tool-for-kubernetes

A tool for relocating Kubernetes Assets
Go
38
star
36

astrolabe

Data protection framework for complex applications
Go
37
star
37

carvel-community

Carvel provides a set of reliable, single-purpose, composable tools that aid in your application building, configuration, and deployment to Kubernetes. This repo contains information regarding the Carvel open-source community.
Shell
35
star
38

servicebinding

Service Bindings for Kubernetes
Go
32
star
39

cert-injection-webhook

Provides a Kubernetes webhook to inject CA certificates and proxy environment variables into pods.
Go
31
star
40

carvel-simple-app-on-kubernetes

K8s simple Go app example deployed with k14s tools
Shell
29
star
41

vsphere-kubernetes-drivers-operator

vSphere Kubernetes Driver Operator to simplify and automate the lifecycle management of CSI and CPI for Kubernetes cluster running on vSphere
Go
28
star
42

application-portfolio-auditor

Application Portfolio Auditor is a tool assessing cloud readiness, quality, and security of large sets of apps. It gathers and aggregates insights of multiple software analyzers.
Shell
28
star
43

graph-framework-for-microservices

Graph Framework for Microservices is a platform software stack that bootstraps and accelerates cloud-native microservice development, that is out-of-the-box ready to thrive in the ever challenging world of distributed systems and SaaS.
Go
28
star
44

load-balancer-operator-for-kubernetes

A Cluster API speaking operator for load balancers
Go
27
star
45

tanzu-cli

The Tanzu Core CLI project provides the core functionality of the Tanzu CLI. The CLI is based on a plugin architecture where CLI command functionality can be delivered through independently developed plugin binaries
Go
27
star
46

sources-for-knative

VMware-related event sources for Knative.
Go
26
star
47

kpack-cli

A command line interface for interacting with kpack.
Go
26
star
48

cross-cluster-connectivity

Multi-cluster DNS for Cluster API
Go
26
star
49

thepodlets

A VMware cloud native podcast. Exploring cloud native, one buzzword at a time!
HTML
25
star
50

carvel-ytt-library-for-kubernetes

ytt (https://github.com/k14s/ytt) library that includes reusable K8s components (app, ...)
Shell
21
star
51

function-buildpacks-for-knative

Buildpacks for Knative Functions
Go
21
star
52

vsphere-tanzu-kubernetes-grid-image-builder

For building virtual machine images with VMware vSphere
Python
18
star
53

carvel-setup-action

Github Action for setting up Carvel apps (ytt, kbld, kapp, kctrl, kwt, imgpkg and vendir)
TypeScript
17
star
54

apps-cli-plugin

Apps Plugin for the Tanzu CLI
Go
16
star
55

cartographer-conventions

Conventions provide a mechanism for platform operators to define cross cutting behavior that is applied to Kubernetes resources by understanding the developers intent and the semantics of the resources being advised.
Go
15
star
56

community-engagement

Go
14
star
57

projects-operator

Provides a `Project` CRD and controller for k8s to help with organising resources
Go
13
star
58

vscode-ytt

Visual Studio Code extension for working with ytt yaml files
12
star
59

carvel-ytt-library-for-kubernetes-demo

Demo of ytt + kapp + k8s-lib to deploy a simple app with basic autoscaling
Go
10
star
60

tanzu-toolkit-for-visual-studio

C#
9
star
61

nsx-operator

Kubernetes Operator for managing NSX network resources
Go
9
star
62

dependency-labeler

Dependency Labeler adds metadata about a container image's dependencies as a label to the container image. Formerly maintained by the NavCon team, currently maintained by the Source Insight Tooling team
Go
9
star
63

tanzu-application-platform-reference-service-packages

Reference Service Instance Packages for Tanzu Application Platform.
Shell
9
star
64

homebrew-carvel

Provides tools from https://carvel.dev via Homebrew package.
Ruby
8
star
65

carvel-ytt-starter-for-kubernetes

Use this repo as an example for organizing ytt templates within your application repo
HTML
8
star
66

tanzu-plugin-runtime

The Tanzu Plugin Runtime provides functionality and helper methods to develop Tanzu CLI plugins
Go
8
star
67

carvel-docker-image

Source for ghcr.io/vmware-tanzu/carvel-docker-image:latest that includes various Carvel tools
Dockerfile
8
star
68

homebrew-tanzu

Homebrew tap and formulas for installing Tanzu Community Edition
Ruby
7
star
69

build-tooling-for-integrations

This project enables developers to start building and packaging new Tanzu integrations, including cluster packages and custom CLI commands.
Go
7
star
70

app-migrator-for-cloud-foundry

A CLI tool for exporting and importing Cloud Foundry applications between Cloud Foundry installations.
Go
7
star
71

package-for-cartographer

carvel-based Packaging for Cartographer
Shell
6
star
72

tanzu-source-controller

Tanzu Source Controller enables app devs to fetch OCI images and maven artifacts from remote source code repository. The controller follows the spirit of the FluxCD Source Controller.
Go
6
star
73

image-registry-operator-api

As part of the vSphere on Tanzu project, this VM Image Service offers a Kubernetes API to upload/download/share VM images backed by vSphere Content Library.
Go
5
star
74

observability-event-resource

Go
5
star
75

service-instance-migrator-for-cloud-foundry

A CLI tool for exporting and importing Cloud Foundry service instances between Cloud Foundry installations.
Go
5
star
76

ytt.vim

syntax for ytt
Vim Script
5
star
77

package-for-kpack

This repo will house the carvel tooling specific configuration and templating for a deployment of kpack (https://github.com/pivotal/kpack) that will be leveraged by TCE and TBS
Shell
5
star
78

package-for-kubeapps

This repo will house the carvel tooling specific configuration and templating for a deployment of Kubeapps (https://github.com/vmware-tanzu/kubeapps) that will be leveraged by TCE
Mustache
5
star
79

tanzu-plug-in-for-asdf

This ASDF plugin enables the download of Tanzu related tools from Github.
Shell
4
star
80

concourse-kpack-resource

Use a kpack image in a concourse pipeline naturally.
Go
4
star
81

net-operator-api

A client API for the Net Operator project, designed to allow for integration with vSphere 7 with Kubernetes
Go
4
star
82

carvel-guestbook-example-on-kubernetes

K8s guestbook example deployed with k14s tools
JavaScript
4
star
83

oss-httpd-build

This project is a schema to build Apache HTTP Server (httpd), along with a number of frequently updated library components (dependencies), on Linux or Windows. The results of this build are also distributed periodically to the general public from the https://network.tanzu.vmware.com/products/p-apache-http-server (login required)
PowerShell
4
star
84

carvel-release-scripts

contains scripts for releasing carvel tools
Shell
3
star
85

rotate-instance-identity-certificates

Tooling to rotate the Diego Instance Identity Certificates on Tanzu Application Service 2.4-2.6.
Go
3
star
86

build-image-action

A GitHub Action that can be used to call into a Tanzu Application Platform (TAP) installation and use Tanzu Build Service (TBS) to build an image from source.
Go
3
star
87

cartographer-catalog

Reusable Cartographer blueprints
Shell
2
star
88

homebrew-pinniped

Homebrew tap for Pinniped
Ruby
2
star
89

homebrew-kpack-cli

Homebrew Tap for kpack-cli (https://github.com/vmware-tanzu/kpack-cli)
Ruby
2
star
90

azure-log-analytics-nozzle-release

Go
2
star
91

vmotion-migration-tool-for-bosh-deployments

Tooling and instructions to seamlessly move a BOSH deployed Cloud Foundry installation to new vSphere hardware via vMotion. This tool allows you to live migrate without needing to take on an expensive and disruptive migration to a new platform just to support a hardware refresh.
Go
2
star
92

cartographer-site

Cartographer website
JavaScript
1
star
93

tanzu-observability-slug-generator

Java
1
star
94

asdf-carvel

k14s asdf plugin
Shell
1
star
95

package-for-source-controller

This repo will house the carvel tooling specific configuration and templating for a deployment of fluxcd-source-controller (https://github.com/fluxcd/source-controller) that will be leveraged by TCE
Shell
1
star
96

service-apis

Fork of kubernetes-sigs/service-apis
1
star