Interferon
This repo contains the interferon gem.
This gem enables you to store your alerts configuration in code.
You should create your own repository, with a Gemfile
which imports the interferon gem.
For an example of such a repository, along with example configuration and alerts files, see https://www.github.com/airbnb/alerts
Running This Gem
This gem provides a single executable, called interferon
.
You are meant to invoke it like so:
$ bundle exec interferon --config /path/to/config_file
Additional options:
-h
,--help
-- prints out usage information-n
,--dry-run
-- runs interferon without making any changes to alerting destinations
Configuration File
The configuration file is written in YAML. It accepts the following parameters:
verbose_logging
-- whether to print more outputalerts_repo_path
-- the location to your alerts repo, containing your interferon DSL filesgroup_sources
-- a list of sources which can return groups of people to alerthost_sources
-- a list of sources which can read inventory systems and return lists of hosts to monitordestinations
-- a list of alerting providers, which can monitor metrics and dispatch alerts as specified in your alerts dsl filesprocesses
-- number of processes to run the alert generation on (optional; default is to use all available cores)
For more information, see config.example.yaml file in this repo.
The Moving Parts
This repo knows about four kinds of objects:
- host_sources: these query various inventory systems and return lists of hosts or entities to alert on
- destinations: these are metric systems, which can watch metrics and alert engineers
- groups: these are groups of actual engineers who can be alerted in case of trouble
- alerts: these are ruby DSL files which specify when and how engineers and groups are alerted via the destination about hosts
Host Sources
- optica: can read a list of AWS instances from optica
- optica_services: returns smartstack service information parsed from optica
- aws_rds: lists RDS instances
- aws_dynamo: lists dynamo-db tables
- aws_elasticache: lists elasticache nodes and clusters
Destinations
Datadog
Datadog is our only alerting destination at the moment. Datadog's alerting syntax rule are here: http://docs.datadoghq.com/api/#alerts Here's a chart explaining the datadog metric syntax (generated via asciiflow):
+---------+ alert condition +-------------------------------------------------+
| |
| +-----+ metric to alert on |
| | |
| | tags to slice the metric by +------+ |
| | | |
v v v v
|----------| |-------------------------||--------------------------| |---|
max(last_5m):avg:haproxy_count_by_status{role:<%= role %>,status:up} by {host} > 0
^ ^ ^ ^
| | | |
| | +----+------------------------------+ |
| | | math on the metric over all tags | |
| | |-----------------------------------| +------------------------------------+
| | | * max, min, avg, sum | |trigger a separate alert for each |
| + +-----------------------------------+ |different value of these tags the |
| +----+----------------------------------------------+ |entire `by {}` clause can be omitted|
| | the interval to look at; always starts with last_ | +------------------------------------+
| |---------------------------------------------------|
| | * 5m, 10m, 15m, 30m |
| | * 1h, 2h, 4h |
+ +---------------------------------------------------+
+-------------------------------------------------------------------------------------------------+
| metric condition, can be one of: |
|-------------------------------------------------------------------------------------------------|
| * max: the metric gets this high at least once during the interval |
| * avg: the metric is this on average during the interval |
| * min: the metric is this small at least once during the interval |
| * change: the metric changes this much between a value N minutes ago and now (raw difference). |
| * pct_change: the metric changes this much between a value N minutes ago and now (percentage). |
+-------------------------------------------------------------------------------------------------+
Groups
Groups actually come from group_sources. We only have a single group source right now, which reads groups in YAML files from the filesystem. However, we would like to add additional group sources, such as LDAP-based ones.