+---------------------------------------------------------+
| Canonical repo at https://github.com/dLuna/chaos_monkey |
+---------------------------------------------------------+
This is The CHAOS MONKEY. It will kill your processes.
"What kills you, makes you stronger" -- The Chaos Monkey
The purpose of The Chaos Monkey is to find out if your system is
stable or not. What will your system do when things start to go wrong
and your processes die randomly? The Chaos Monkey will show you.
With a stick.
Start by including the chaos_monkey amongst your applications. The
Chaos Monkey sits in its cage and eats bananas. Let it out by running
chaos_monkey:on() or have him kill a single process using
chaos_monkey:kill(). More installation instructions can be found in
INSTALL.
chaos_monkey:on() -> {ok, started} | {error, already_running}
chaos_monkey:on(Opts) ->
{ok, started}
| {error, already_running}
| {error, badarg}
chaos_monkey:off() -> {ok, stopped} | {error, not_running}
Types:
Opts :: [Opt]
Opt :: {ms, non_neg_integer()}
| {apps, all | all_but_otp | [atom()]}
Will let The Chaos Monkey wreck reasonable havoc over time on your
system. This is the command to use if you want The Chaos Monkey
running all the time.
It will stay away from system processes and supervisors like a good
monkey.
Opts default to [{ms, 5000}, {apps, all_but_otp}] which allows The
Chaos Monkey to kill one process every five seconds on average. It
can deviate from this number by 30%. If your restart frequency
setting doesn't allow for this then you could be in for a surprise.
chaos_monkey:almost_kill() ->
{ok, NumberOfKilledProcesses} | {error, Error}
chaos_monkey:almost_kill(Applications) ->
{ok, NumberOfKilledProcesses} | {error, Error}
Types:
Applications :: all | all_but_otp | [application()]
NumberOfKilledProcesses :: non_neg_integer()
Error :: term()
Synchronous.
This function call will almost kill your system. If it works as
published, The Chaos Monkey should stay one process away from
bringing your system down. Can you recover from that?
The Chaos Monkey will randomly walk through processes belonging to
the list of applications and kill things. Supervisors are too
strong for The Chaos Monkey, so it will kill their children instead,
aiming to kill them by going above the restart threshold. The Chaos
Monkey is not suicidal so it will respect restart thresholds of
permanent top level supervisors.
As well as not killing supervisors; system processes and processes
in the kernel application are too strong. As mentioned above The
Chaos Monkey will avoid suicide and by extension its siblings and
parent.
The Applications argument, tells The Chaos Monkey to focus its
killing spree on:
all -- All applications are available for killing.
all_but_otp -- The Chaos Monkey will stay away from applications
in OTP. Everything else is fair bait. Default.
[ListOfApps] -- Sic The Chaos Monkey on the list of application.
Remember that the Monkey will always see lonesome
processes that don't have the protection of an
application as available for harassment.
chaos_monkey:kill() -> {ok, ProcData}.
Kills a single random non-OTP process in your system.
ProcData contains information about the process that was killed.
chaos_monkey:find_orphans() -> [Pid]
The Chaos Monkey will smell your processes and find the ones which
lack protection from an application. It gladly hands them over to
you to do with as you please. A well-behaved system should return
[] when calling this function.