• Stars
    star
    5,960
  • Rank 6,776 (Top 0.2 %)
  • Language
  • License
    Creative Commons ...
  • Created over 7 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A curated list of Chaos Engineering resources.

Awesome Chaos Engineering Awesome

A curated list of awesome Chaos Engineering resources.

What is Chaos Engineering?

Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production. - Principles Of Chaos Engineering website.

Contents

Culture

Books

Education

Notable Tools

  • Chaos Monkey - A resiliency tool that helps applications tolerate random instance failures.
  • orchestrator - MySQL replication topology management and HA.
  • kube-monkey - An implementation of Netflix's Chaos Monkey for Kubernetes clusters.
  • Gremlin Inc. - Failure as a Service.
  • Chaos Toolkit - A chaos engineering toolkit to help you build confidence in your software system.
  • steadybit - A Chaos Engineering platform (SaaS or On-Prem) with auto discovery features, different attack types, user management and many more.
  • PowerfulSeal - Adds chaos to your Kubernetes clusters, so that you can detect problems in your systems as early as possible. It kills targeted pods and takes VMs up and down.
  • drax - DC/OS Resilience Automated Xenodiagnosis tool. It helps to test DC/OS deployments by applying a Chaos Monkey-inspired, proactive and invasive testing approach.
  • Wiremock - API mocking (Service Virtualization) which enables modeling real world faults and delays
  • MockLab - API mocking (Service Virtualization) as a service which enables modeling real world faults and delays.
  • Pod-Reaper - A rules based pod killing container. Pod-Reaper was designed to kill pods that meet specific conditions that can be used for Chaos testing in Kubernetes.
  • Muxy - A chaos testing tool for simulating a real-world distributed system failures.
  • Toxiproxy - A TCP proxy to simulate network and system conditions for chaos and resiliency testing.
  • Chaos engineering for Docker:
    • Pumba - Chaos testing and network emulation for Docker containers (and clusters).
    • Blockade - Docker-based utility for testing network failures and partitions in distributed applications.
  • chaos-lambda - Randomly terminate ASG instances during business hours.
  • Namazu - Programmable fuzzy scheduler for testing distributed systems.
  • Chaos Monkey for Spring Boot - Injects latencies, exceptions, and terminations into Spring Boot applications
  • Byte-Monkey - Bytecode-level fault injection for the JVM. It works by instrumenting application code on the fly to deliberately introduce faults like exceptions and latency.
  • GomJabbar - ChaosMonkey for your private cloud
  • Turbulence - Tool focused on BOSH environments capable of stressing VMs, manipulating network traffic, and more. It is very simmilar to Gremlin.
  • chaosblade - An Easy to Use and Powerful Chaos Engineering Toolkit.
  • KubeInvaders - Gamfied Chaos engineering tool for Kubernetes Clusters
  • Cthulhu - Chaos Engineering tool that helps evaluating the resiliency of microservice systems simulating various disaster scenarios against a target infrastructure in a data-driven manner.
  • VMware Mangle - Orchestrating Chaos Engineering.
  • Byteman - A Swiss Army Knife for Byte Code Manipulation.
  • Litmus - Framework for Kubernetes environments that enables users to run test suites, capture logs, generate reports and perform chaos tests.
  • Perses - A project to cause (controlled) destruction to a JVM application.
  • ChaosKube - chaoskube periodically kills random pods in your Kubernetes cluster.
  • Chaos Mesh - Chaos Mesh is a cloud-native Chaos Engineering platform that orchestrates chaos on Kubernetes environments.
  • failure-lambda - A small Node module for injecting failure into AWS Lambda using latency, exception, statuscode or diskspace.
  • aws-chaos-scripts - Collection of python scripts to run failure injection on AWS infrastructure
  • chaos-ssm-documents - Collection of AWS SSM Documents to perform Chaos Engineering experiments
  • aws-lambda-chaos-injection - A library injecting chaos into AWS Lambda. It offers simple python decorators to do delay, exception and statusCode injection and a Class to add delay to any 3rd party dependencies.
  • chaos-dingo - A tool to mess with Azure services using the Azure NodeJS SDK.
  • Chaos HTTP Proxy - Introduce failures into HTTP requests via a proxy server
  • Chaos Lemur - A self-hostable application to randomly destroy virtual machines in a BOSH-managed environment
  • Simoorg - Linkedin’s very own failure inducer framework.
  • react-chaos - A chaos engineering tool for your React apps
  • vue-chaos - A chaos engineering tool for your Vue apps
  • Chaos Engine - tool designed to intermittently destroy or degrade application resources running in cloud based infrastructure. Documentation
  • kubedoom - Kill Kubernetes pods by playing Id's DOOM.
  • kubethanos - Kills half of your randomly selected Kubernetes pods.
  • go-fault - Fault injection middleware in Go
  • Proofdock's Chaos Engineering Platform - A chaos engineering platform that seamlessly integrates in Azure DevOps and has a focus on the Azure cloud platform.
  • Pystol - Pystol is a fault injection platform allowing users to execute fault injection Actions in cloud-native environments in a controlled and prescribed way.
  • AWSSSMChaosRunner - Amazon's light-weight open-source library for chaos engineering on AWS. It can be used for EC2, ECS (with EC2 launch type) and Fargate.
  • Kraken - Chaos and resiliency testing tool for Kubernetes and OpenShift.
  • kube-burner - A tool aimed at stressing Kubernetes clusters by creating or deleting a high quantity of objects.
  • Chaos Experimentation Framework - An extensible platform for infrastructure management including Chaos Engineering
  • NetHavoc - A Chaos Engineering Tool for Linux, K8s, Windows, PCF, Cloud, and Containers for injecting Resource, Infrastructure, Network, and Application failures.
  • gorm-sqlchaos - A runtime SQL manipulator for your Golang applications based on gorm.
  • Chaos Frontend Toolkit - A set of tools to apply Chaos Engineering to frontend
  • Mitigant - The Continuos Security Verification Platform, enables confidence in cloud security posture by leveraging security chaos engineering.

Retired tools

  • The Simian Army - A suite of tools for keeping your cloud operating in top form.
  • ChaoSlingr - Introducing Security Chaos Engineering. ChaoSlingr focuses primarily on the experimentation on AWS Infrastructure to proactively instrument system security failure through experimentation.

Cloud Services

Papers

Gamedays

Blogs & Newsletters

Podcasts

  • Break Things On Purpose - Monthly podcast about Chaos Engineering presented by Gremlin Inc. Also available on Spotify, Google Play, and Stitcher.

Conferences & Meetups

Forums

Contributing

Please take a look at the contribution guidelines first. Contributions are always welcome!

More Repositories

1

awesome-sre

A curated list of Site Reliability and Production Engineering resources.
11,903
star
2

postmortem-templates

A collection of postmortem templates
1,297
star
3

wheel-of-misfortune

A role-playing game for incident management training
HTML
164
star
4

availability-calculator

Calculate how much downtime should be permitted in your Service Level Agreement or Objective
HTML
67
star
5

kubectl-janitor

List Kubernetes objects in a problematic state
Go
56
star
6

sreworkbook-templates-md

A collection templates ported from the SRE Workbook
37
star
7

vegeta-operator

Kubernetes Operator for running HTTP load testing scenarios with Vegeta
Go
34
star
8

common-disaster-recovery-scenarios

A list of common Disaster Recovery (DR) scenarios for software companies
33
star
9

tc-panel

Geo-Distributed Infrastructure Emulation using Traffic Shaping
Python
12
star
10

ansible-rpi-cluster

Automate common tasks in your Raspberry Pi cluster with Ansible
9
star
11

strgz

CLI tool for listing and searching users' starred repositories on Github
Go
7
star
12

oomutil

A Go package with read-only operations for determining the Out-Of-Memory (OOM) status of a process on Linux
Go
7
star
13

error-budget-calculator

Calculate the tolerable downtime of your service
HTML
7
star
14

tidyman

Script for tiding files
Shell
5
star
15

py-deterministic-subsetting

Deterministic Subsetting as defined in the SRE book
Python
4
star
16

tmux-load-avg

tmux plugin that displays the system load average in the last 1, 5 and 15 minutes.
Shell
4
star
17

fstree

A tool that generates a depth indented listing of files and sub-directories in a tree-like format.
Go
3
star
18

golang-timemap

Time-based key-value store for Go
Go
3
star
19

proctree

A tool to display a tree of running processes
Go
3
star
20

gopheracademy-advent2019-tcp-no-delay

Material from my GopherAcademy Advent 2019 blog post about TCP_NODELAY and Go
Go
3
star
21

dastergon.github.io

Personal website
HTML
2
star
22

rawdog-list-authors

rawdog plugin to display authors list
2
star
23

sremuc

Site Reliability Engineering Munich Meetup Page
CSS
1
star
24

rawdog

HTML
1
star
25

venus2rawdog

Command line utility to migrate config files from RSS aggregator venus to rawdog
Python
1
star
26

sampler-recipes

A list of sampler recipes to run on the CLI
1
star