• Stars
    star
    122
  • Rank 292,031 (Top 6 %)
  • Language
    Python
  • Created over 14 years ago
  • Updated over 10 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Gremlins is a python framework for fault-testing distributed systems

Gremlins is a python program/framework which assists in fault testing distributed systems.

Overview

Gremlins is a fault injector - it is an evil program that sits on a machine and does nasty things to other programs on that machine (or the machine overall).

Its primary purpose is for black-box long-lived testing of a distributed system, as it does not make any effort to provide concepts like "test cases" or code instrumentation. The idea is not to trigger specific error cases, but to simulate generally faulty hardware and networking so that a larger class of problems can be found on a small test cluster.

The hope is that, if a fault tolerant system can make progress without corruption or unrecoverable errors on a 5-node cluster full of gremlins, it will also be free of such errors on a 500-node cluster where errors may occur every day by natural causes.

Setup

The very simplest way to run gremlins is to just set your python path:

$ PYTHONPATH=. ./gremlins/gremlin.py

The second simplest way (and the recommended one) is to run it within a python virtual environment after running python setup.py develop

$ gremlins

Concepts

Gremlins is built around a few key concepts: faults, triggers, and profiles.

Faults

A fault is the most specific unit of abstraction - it is simply a thing that can go wrong. One example fault is "kill -9 the DataNode JVM, wait 60 seconds, then start it back up". Another example might be "turn off networking for 5 minutes"

Faults are simply python callables. Anything that can be called can be run by the framework.

Some handy faults are provided in the gremlins.faults module. Note that most of these faults are functions that return other functions. This is a useful pattern, though some people might prefer making classes whose instances are callable.

A related concept is "metafaults". These are simply faults that provide nice containers around other faults. Currently the only example of a metafault is gremlins.metafaults.pick_fault, which takes a list of (weight, fault) pairs, and picks one of the subfaults according to the provided weights.

Triggers

A trigger is a way of running faults. The simplest trigger (and the only one that really works well at the moment) is gremlins.triggers.Periodic. This trigger is constructed with an interval and a fault. It simply repeats a process of sleeping for the given interval, and then running the fault.

Another trigger in development is the webserver trigger. This exposes a CherryPy webserver which accepts POST requests (eg via curl) so that cross-machine fault testing can be controlled from a central location.

Future trigger ideas include the ability to watch a log for a given line before triggering a fault, etc.

Profiles

A profile currently doesn't have a type, but is just a normal python list of triggers. When running gremlins, one can provide a profile, and each of the triggers will be started. This concept allows one to start both a webserver trigger and a periodic trigger from a single invocation.

Running gremlins

Gremlins can be run in two different modes. In the first mode, the user specifies a list of faults. These faults are executed immediately, and then gremlins exists. In the second mode, the user specifies a Profile to start. This profile is run until the user kills the gremlin process.

For example, to run the hbase.rs_pause fault just once, simply execute:

$ gremlins -m gremlins.profiles.hbase -f hbase.rs_pause

The -m flag causes gremlins to import the given python module. This can be useful to specify faults from a module that is not included with gremlins itself.

Multiple faults can be executed in sequence by passing multiple -f options.

To run a fault profile, simply pass it with a -p flag:

$ gremlins -m gremlins.profiles.hbase -p hbase.profile

More Repositories

1

hadoop-lzo-packager

Packaging utilities for GPL compression libraries in Hadoop
Shell
34
star
2

thrift_erl_skel

skeleton for thrift services in erlang
Erlang
29
star
3

tlipcon-bin

various utility scripts
Python
22
star
4

mlockall_agent

JVMTI agent which calls mlockall and setuids down to a target user upon initialization
C
21
star
5

mochiweb

mochiweb git-svn checkout
Erlang
16
star
6

haatkit

Toolkit of simple scripts useful for managing Hadoop
Python
16
star
7

hadoop

Mirror of Apache Hadoop Core
Java
12
star
8

cassandra

my git-svn repo for fb's cassandra
Java
11
star
9

bitserve

in-memory database with bitmap indexing and a thrift interface
Python
10
star
10

pmp

poor man's profiler based on ptrace
C++
8
star
11

python-caltrain

API-like access to caltrain schedules
Python
8
star
12

erlycomet

git-svn co of erlycomet
JavaScript
8
star
13

kudu-go

golang bindings for kudu (experimental)
C++
8
star
14

performance-blog-code

sample code for Cloudera blog post
Java
8
star
15

jcarder

git clone of jcarder with some improvements - see cloudera branch
Java
8
star
16

crepo

cloudera repo management tool
Python
7
star
17

nutch

git-svn for nutch
Java
6
star
18

erlbal

erlang project similar to perlbal
Erlang
6
star
19

erlrrd

an erlang port that provides an interface to rrdtool (git-svn checkout)
Erlang
6
star
20

plc

parse_transform allowing for parallelized list comprehensions in erlang
Erlang
5
star
21

python-sasl

SASL wrappers for Python, borrowed from QPid
C++
5
star
22

arb

amie review board
Ruby
5
star
23

helenus

Java
4
star
24

marquee-pong

my crowning achievement
JavaScript
4
star
25

zkfc-design

4
star
26

jetty-hadoop-fix

A patched Jetty 6.1.26 for use in Hadoop
Java
4
star
27

tlipcon-dotfiles

my dot files
Emacs Lisp
3
star
28

hdfs-1073-design

design doc for HDFS-1073 branch
3
star
29

amie_udfs

some mysql UDFs written for use at amiest
C++
3
star
30

kudu-ycsb-experiments

Experimental setup and analysis tools for running YCSB on Kudu
Python
3
star
31

hadoop-meta

Metarepository for checking out all of the hadoop subprojects
Shell
2
star
32

cpp-dfsclient

some hacking around a DFS client in C++
C
2
star
33

python-hrpc

Python
1
star
34

PScope

a lissajous scope AudioUnit for OSX
C++
1
star
35

kudu-tsbs

Harnesses/scripts for running the TSBS time series benchmark on Kudu
Python
1
star
36

pitch-tester

simple program to test for audibility of subtle differences in pitch
Python
1
star
37

libhrpc

C++
1
star
38

jetty6

1
star
39

parquet-mr

Java implementation to use with Map-Reduce
Java
1
star
40

mr-collector-benchmark

Benchmarks for map output collector implementations
Java
1
star
41

pykerberos

PyKerberos from Calendar Server
1
star
42

parquet-format

Columnar file format for hadoop
Perl
1
star