• Stars
    star
    260
  • Rank 151,309 (Top 4 %)
  • Language
    Go
  • License
    Mozilla Public Li...
  • Created over 6 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

External service monitoring for Consul

Go Reference build ci

Consul ESM (External Service Monitor)

This project provides a daemon to run alongside Consul in order to run health checks for external nodes and update the status of those health checks in the catalog. It can also manage updating the coordinates of these external nodes, if enabled. See Consul's External Services guide for some more information about external nodes.

Community Support

If you have questions about how consul-esm works, its capabilities or anything other than a bug or feature request (use github's issue tracker for those), please see our community support resources.

Community portal: https://discuss.hashicorp.com/tags/c/consul/29/consul-esm

Other resources: https://www.consul.io/community.html

Additionally, for issues and pull requests, we'll be using the 👍 reactions as a rough voting system to help gauge community priorities. So please add 👍 to any issue or pull request you'd like to see worked on. Thanks.

Prerequisites

Consul ESM requires at least version 1.4.1 of Consul.

ESM version Consul version required
0.3.2 and higher 1.4.1+
0.3.1 and lower 1.0.1-1.4.0

Installation

  1. Download a pre-compiled, released version from the Consul ESM releases page.

  2. Extract the binary using unzip or tar.

  3. Move the binary into $PATH.

To compile from source, please see the instructions in the contributing section.

Usage

In order for the ESM to detect external nodes and health checks, any external nodes must be registered directly with the catalog with "external-node": "true" set in the node metadata. Health checks can also be registered with a 'Definition' field which includes the details of running the check. For example:

$ curl --request PUT --data @node.json localhost:8500/v1/catalog/register

node.json:

{
  "Datacenter": "dc1",
  "ID": "40e4a748-2192-161a-0510-9bf59fe950b5",
  "Node": "foo",
  "Address": "192.168.0.1",
  "TaggedAddresses": {
    "lan": "192.168.0.1",
    "wan": "192.168.0.1"
  },
  "NodeMeta": {
    "external-node": "true",
    "external-probe": "true"
  },
  "Service": {
    "ID": "web1",
    "Service": "web",
    "Tags": [
      "v1"
    ],
    "Address": "127.0.0.1",
    "Port": 8000
  },
  "Checks": [{
    "Node": "foo",
    "CheckID": "service:web1",
    "Name": "Web HTTP check",
    "Notes": "",
    "Status": "passing",
    "ServiceID": "web1",
    "Definition": {
      "HTTP": "http://localhost:8000/health",
      "Interval": "10s",
      "Timeout": "5s"
    }
  },{
    "Node": "foo",
    "CheckID": "service:web2",
    "Name": "Web TCP check",
    "Notes": "",
    "Status": "passing",
    "ServiceID": "web1",
    "Definition": {
      "TCP": "localhost:8000",
      "Interval": "5s",
      "Timeout": "1s",
      "DeregisterCriticalServiceAfter": "30s"
     }
  }]
}

The external-probe field determines whether the ESM will do regular pings to the node and maintain an externalNodeHealth check for the node (similar to the serfHealth check used by Consul agents).

The ESM will perform a leader election by holding a lock in Consul, and the leader will then continually watch Consul for updates to the catalog and perform health checks defined on any external nodes it discovers. This allows externally registered services and checks to access the same features as if they were registered locally on Consul agents.

Each ESM registers a health check for itself with the agent with "DeregisterCriticalServiceAfter": "30m", which is currently not configurable. This means after failing its health check, the ESM will switch from passing status to critical status. If the ESM remains in critical status for 30 minutes, then the agent will attempt to deregister the ESM. During critical status the ESM’s assigned external health checks will be reassigned to another ESM with passing status to monitor. Note: this is separate from the example JSON above for registering an external health check which has a DeregisterCriticalServiceAfter of 30 seconds.

Command Line

To run the daemon, pass the -config-file or -config-dir flag, giving the location of a config file or a directory containing .json or .hcl files.

$ consul-esm -config-file=/path/to/config.hcl -config-dir /etc/consul-esm.d
Consul ESM running!
            Datacenter: "dc1"
               Service: "consul-esm"
           Service Tag: ""
            Service ID: "consul-esm:5a6411b3-1c41-f272-b719-99b4f958fa97"
Node Reconnect Timeout: "72h"

Log data will now stream in as it occurs:

2017/10/31 21:59:41 [INFO] Waiting to obtain leadership...
2017/10/31 21:59:41 [INFO] Obtained leadership
2017/10/31 21:59:42 [DEBUG] agent: Check 'foo/service:web1' is passing
2017/10/31 21:59:42 [DEBUG] agent: Check 'foo/service:web2' is passing

Configuration

Configuration files can be provided in either JSON or HashiCorp Configuration Language (HCL) format. For more information, please see the HCL specification. The following is an example HCL config file, with the default values filled in:

// The log level to use.
log_level = "INFO"

// Controls whether to enable logging to syslog.
enable_syslog = false

// The syslog facility to use, if enabled.
syslog_facility = ""

// Whether to log in json format
log_json = false

// The unique id for this agent to use when registering itself with Consul.
// If unconfigured, a UUID will be generated for the instance id.
// Note: do not reuse the same instance id value for other agents. This id
// must be unique to disambiguate different instances on the same host.
// Failure to maintain uniqueness will result in an already-exists error.
instance_id = ""

// The service name for this agent to use when registering itself with Consul.
consul_service = "consul-esm"

// The service tag for this agent to use when registering itself with Consul.
// ESM instances that share a service name/tag combination will have the work
// of running health checks and pings for any external nodes in the catalog
// divided evenly amongst themselves.
consul_service_tag = ""

// The directory in the Consul KV store to use for storing runtime data.
consul_kv_path = "consul-esm/"

// The node metadata values used for the ESM to qualify a node in the catalog
// as an "external node".
external_node_meta {
    "external-node" = "true"
}

// The length of time to wait before reaping an external node due to failed
// pings.
node_reconnect_timeout = "72h"

// The interval to ping and update coordinates for external nodes that have
// 'external-probe' set to true. By default, ESM will attempt to ping and
// update the coordinates for all nodes it is watching every 10 seconds.
node_probe_interval = "10s"

// Controls whether or not to disable calculating and updating node coordinates
// when doing the node probe. Defaults to false i.e. coordinate updates
// are enabled.
disable_coordinate_updates = false

// The address of the local Consul agent. Can also be provided through the
// CONSUL_HTTP_ADDR environment variable.
http_addr = "localhost:8500"

// The ACL token to use when communicating with the local Consul agent. Can
// also be provided through the CONSUL_HTTP_TOKEN environment variable.
token = ""

// The Consul datacenter to use.
datacenter = "dc1"

// The CA file to use for talking to Consul over TLS. Can also be provided
// though the CONSUL_CACERT environment variable.
ca_file = ""

// The path to a directory of CA certs to use for talking to Consul over TLS.
// Can also be provided through the CONSUL_CAPATH environment variable.
ca_path = ""

// The client cert file to use for talking to Consul over TLS. Can also be
// provided through the CONSUL_CLIENT_CERT environment variable.
cert_file = ""

// The client key file to use for talking to Consul over TLS. Can also be
// provided through the CONSUL_CLIENT_KEY environment variable.
key_file = ""

// The server name to use as the SNI host when connecting to Consul via TLS.
// Can also be provided through the CONSUL_TLS_SERVER_NAME environment
// variable.
tls_server_name = ""

// The CA file to use for talking to HTTPS checks.
https_ca_file = ""

// The path to a directory of CA certs to use for talking to HTTPS checks.
https_ca_path = ""

// The client cert file to use for talking to HTTPS checks.
https_cert_file = ""

// The client key file to use for talking to HTTPS checks.
https_key_file = ""

// Client address to expose API endpoints. Required in order to expose /metrics endpoint for Prometheus. Example: "127.0.0.1:8080"
client_address = ""

// The method to use for pinging external nodes. Defaults to "udp" but can
// also be set to "socket" to use ICMP (which requires root privileges).
ping_type = "udp"

// The telemetry configuration which matches Consul's telemetry config options.
// See Consul's documentation https://www.consul.io/docs/agent/options#telemetry
// for more details on how to configure
telemetry {
	circonus_api_app = ""
 	circonus_api_token = ""
 	circonus_api_url = ""
 	circonus_broker_id = ""
 	circonus_broker_select_tag = ""
 	circonus_check_display_name = ""
 	circonus_check_force_metric_activation = ""
 	circonus_check_id = ""
 	circonus_check_instance_id = ""
 	circonus_check_search_tag = ""
 	circonus_check_tags = ""
 	circonus_submission_interval = ""
 	circonus_submission_url = ""
 	disable_hostname = false
 	dogstatsd_addr = ""
 	dogstatsd_tags = []
 	filter_default = false
 	prefix_filter = []
 	metrics_prefix = ""
 	prometheus_retention_time = "0"
 	statsd_address = ""
 	statsite_address = ""
}

// The number of additional successful checks needed to trigger a status update to
// passing. Defaults to 0, meaning the status will update to passing on the
// first successful check.
passing_threshold = 0

// The number of additional failed checks needed to trigger a status update to
// critical. Defaults to 0, meaning the status will update to critical on the
// first failed check.
critical_threshold = 0

Threshold for Updating Check Status

To prevent flapping, thresholds for updating a check status can be configured by passing_threshold and critical_threshold such that a check will update and switch to be passing / critical after an additional number of consecutive or non-consecutive checks.

By default, these configurations are set to 0, which retains the original ESM behavior. If the status of a check is 'passing', then the next failed check will cause the status to update to be 'critical'. Hence, the first failed check causes the update and 0 additional checks are needed.

If a check is currently 'passing' and configuration is critical_threshold=3, then after the first failure, 3 additional consecutive failures (4 in total) are needed in order to update the status to 'critical'.

ESM also employs a counting system that allows for non-consecutive checks to aggregate and update the check status. This counting system increments when a check result is the opposite of the current status and decrements when same as the current status.

For an example of how non-consecutive checks are counted, we have a check that has the status 'passing', critical_threshold=3, and the counter is at 0 (c=0). The following pattern of pass/fail will decrement/increment the counter as such:

PASS (c=0), FAIL (c=1), FAIL (c=2), PASS (c=1), FAIL (c=2), FAIL (c=3), PASS (c=2), FAIL (c=3), FAIL (c=4)

When the counter reaches 4 (1 initial fail + 3 additional fails), the critical_threshold is met and the check status will update to 'critical' and the counter will reset.

Note: this implementation diverges from Consul's anti-flapping thresholds, which counts total consecutive checks.

Consul ACL Policies

With ACL system enabled on Consul agents, a specific ACL policy may be required for ESM's token in order for ESM to perform its functions. To narrow down the privileges required for ESM the following ACL policy rules can be used:

agent_prefix "" {
  policy = "read"
}

key_prefix "consul-esm/" {
  policy = "write"
}

node_prefix "" {
  policy = "write"
}

service_prefix "" {
  policy = "write"
}

session_prefix "" {
   policy = "write"
}

The key_prefix rule is set to allow the consul-esm/ KV prefix, which is defined in the config file using the consul_kv_path parameter.

It is possible to have even finer-grained ACL policies if you know the the set name of the consul agent that ESM is registered with and a set list of nodes that ESM will monitor.

  • <consul-agent-node-name>: insert the node name for the consul agent that consul-esm is registered with
  • <monitored-node-name>: insert the name of the nodes that ESM will monitor
  • <consul-esm-name>: insert the name that ESM is registered with. Default value is 'consul-esm' if not defined in config file using the consul_service parameter
agent "<consul-agent-node-name>" {
  policy = "read"
}

key_prefix "consul-esm/" {
  policy = "write"
}

node "<monitored-node-name: one acl block needed per node>" {
  policy = "write"
}

node_prefix "" {
  policy = "read"
}

service "<consul-esm-name>" {
  policy = "write"
}

session "<consul-agent-node-name>" {
   policy = "write"
}

For context on usage of each ACL:

  • agent:read - for features to check version compatibility and calculating network coordinates
  • key:write - to store assigned checks
  • node:write - to update the status of each node that esm monitors
  • node:read - to retrieve nodes that need to be monitored
  • service:write - to register esm service
  • session:write - to acquire esm cluster leader lock

Consul Namespaces (Enterprise Feature)

ESM supports Consul Enterprise Namespaces . When run with enterprise Consul servers it will scan all accessible Namespaces for external nodes and health checks to monitor. What is meant by "all accessible" is all Namespaces accessible via Namespace ACL rules that provide read level access to the Namespace. The simplest case of wanting to access all Namespaces would add the below rule to the ESM ACL policy in the previous section...

namespace_prefix "" {
  acl = "read"
}

If an ESM instance needs to monitor only a subset of existing Namespaces, the policy will need to grant access to each Namespace explicitly. For example lets say we have 3 Namespaces, "foo", "bar" and "zed" and you want this ESM to only monitor "foo" and "bar". Your policy would need to have these listed (or a common prefix would work)...

namespace "foo" {
  acl = "read"
}
namespace "bar" {
  acl = "read"
}

Namespaces + consul_kv_path config setting:

  • If you have multiple ESMs for HA (secondary, backup ESMs) have the same value set to consul_kv_path. (in practice these configs are identical)

  • If you have multiple ESMs for separate Namespaces each must use a different setting for consul_kv_path.

ESM uses the consul_kv_path to determine where to keep its meta data. This meta data will be different for each ESM monitoring different Namespaces.

Note you can have both, those in HA clusters would have the same value and each separate HA cluster would use different values.

Contributing

Note if you run Linux and see socket: permission denied errors with UDP ping, you probably need to modify system permissions to allow for non-root access to the ports. Running sudo sysctl -w net.ipv4.ping_group_range="0 65535" should fix the problem (until you reboot, see sysctl man page for how to persist).

To build and install Consul ESM locally, you will need to install the Docker engine:

Clone the repository:

$ git clone https://github.com/hashicorp/consul-esm.git

To compile the consul-esm binary for your local machine:

$ make dev

This will compile the consul-esm binary into bin/consul-esm as well as your $GOPATH and run the test suite.

If you want to compile a specific binary, run make XC_OS/XC_ARCH. For example:

make darwin/amd64

Or run the following to generate all binaries:

$ make build

If you just want to run the tests:

$ make test

Or to run a specific test in the suite:

go test ./... -run SomeTestFunction_name

More Repositories

1

terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
Go
40,845
star
2

vault

A tool for secrets management, encryption as a service, and privileged access management
Go
29,344
star
3

consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
Go
27,763
star
4

vagrant

Vagrant is a tool for building and distributing development environments.
Ruby
25,729
star
5

packer

Packer is a tool for creating identical machine images for multiple platforms from a single source configuration.
Go
14,818
star
6

nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
Go
14,315
star
7

terraform-provider-aws

Terraform AWS provider
Go
9,438
star
8

raft

Golang implementation of the Raft consensus protocol
Go
7,383
star
9

serf

Service orchestration and management tool.
Go
5,692
star
10

go-plugin

Golang plugin system over RPC.
Go
4,874
star
11

hcl

HCL is the HashiCorp configuration language.
Go
4,827
star
12

waypoint

A tool to build, deploy, and release any application on any platform.
Go
4,789
star
13

terraform-cdk

Define infrastructure resources using programming constructs and provision them using HashiCorp Terraform
TypeScript
4,701
star
14

consul-template

Template rendering, notifier, and supervisor for @HashiCorp Consul and Vault data.
Go
4,682
star
15

terraform-provider-azurerm

Terraform provider for Azure Resource Manager
Go
4,347
star
16

otto

Development and deployment made easy.
HTML
4,282
star
17

golang-lru

Golang LRU cache
Go
4,015
star
18

boundary

Boundary enables identity-based access management for dynamic infrastructure.
Go
3,762
star
19

memberlist

Golang package for gossip based membership and failure detection
Go
3,303
star
20

go-memdb

Golang in-memory database built on immutable radix trees
Go
2,937
star
21

next-mdx-remote

Load mdx content from anywhere through getStaticProps in next.js
TypeScript
2,245
star
22

terraform-provider-google

Terraform Google Cloud Platform provider
Go
2,213
star
23

go-multierror

A Go (golang) package for representing a list of errors as a single error.
Go
2,029
star
24

yamux

Golang connection multiplexing library
Go
2,003
star
25

envconsul

Launch a subprocess with environment variables using data from @HashiCorp Consul and Vault.
Go
1,967
star
26

go-retryablehttp

Retryable HTTP client in Go
Go
1,702
star
27

go-getter

Package for downloading things from a string URL using a variety of protocols.
Go
1,541
star
28

terraform-provider-kubernetes

Terraform Kubernetes provider
Go
1,538
star
29

best-practices

HCL
1,490
star
30

go-version

A Go (golang) library for parsing and verifying versions and version constraints.
Go
1,459
star
31

go-metrics

A Golang library for exporting performance and runtime metrics to external metrics systems (i.e. statsite, statsd)
Go
1,404
star
32

terraform-guides

Example usage of HashiCorp Terraform
HCL
1,324
star
33

setup-terraform

Sets up Terraform CLI in your GitHub Actions workflow.
JavaScript
1,238
star
34

mdns

Simple mDNS client/server library in Golang
Go
1,020
star
35

vault-guides

Example usage of HashiCorp Vault secrets management
Shell
990
star
36

terraform-provider-helm

Terraform Helm provider
Go
976
star
37

go-immutable-radix

An immutable radix tree implementation in Golang
Go
926
star
38

vault-helm

Helm chart to install Vault and other associated components.
Shell
904
star
39

terraform-ls

Terraform Language Server
Go
896
star
40

vscode-terraform

HashiCorp Terraform VSCode extension
TypeScript
870
star
41

levant

An open source templating and deployment tool for HashiCorp Nomad jobs
Go
822
star
42

vault-k8s

First-class support for Vault and Kubernetes.
Go
697
star
43

terraform-aws-vault

A Terraform Module for how to run Vault on AWS using Terraform and Packer
HCL
653
star
44

terraform-github-actions

Terraform GitHub Actions
Shell
618
star
45

terraform-exec

Terraform CLI commands via Go.
Go
608
star
46

terraform-provider-vsphere

Terraform Provider for VMware vSphere
Go
601
star
47

consul-k8s

First-class support for Consul Service Mesh on Kubernetes
Go
599
star
48

raft-boltdb

Raft backend implementation using BoltDB
Go
585
star
49

nextjs-bundle-analysis

A github action that provides detailed bundle analysis on PRs for next.js apps
JavaScript
539
star
50

go-discover

Discover nodes in cloud environments
Go
537
star
51

consul-replicate

Consul cross-DC KV replication daemon.
Go
504
star
52

next-mdx-enhanced

A Next.js plugin that enables MDX pages, layouts, and front matter
JavaScript
496
star
53

terraform-provider-kubernetes-alpha

A Terraform provider for Kubernetes that uses dynamic resource types and server-side apply. Supports all Kubernetes resources.
Go
493
star
54

docker-vault

Official Docker images for Vault
Shell
492
star
55

terraform-k8s

Terraform Cloud Operator for Kubernetes
Go
449
star
56

puppet-bootstrap

A collection of single-file scripts to bootstrap your machines with Puppet.
Shell
444
star
57

terraform-provider-vault

Terraform Vault provider
Go
431
star
58

cap

A collection of authentication Go packages related to OIDC, JWKs, Distributed Claims, LDAP
Go
426
star
59

consul-helm

Helm chart to install Consul and other associated components.
Shell
422
star
60

nomad-autoscaler

Nomad Autoscaler brings autoscaling to your Nomad workloads.
Go
411
star
61

damon

A terminal UI (TUI) for HashiCorp Nomad
Go
405
star
62

terraform-provider-azuread

Terraform provider for Azure Active Directory
Go
404
star
63

vault-ssh-helper

Vault SSH Agent is used to enable one time keys and passwords
Go
404
star
64

terraform-provider-scaffolding

Quick start repository for creating a Terraform provider
Go
402
star
65

docker-consul

Official Docker images for Consul.
Dockerfile
399
star
66

vault-secrets-operator

The Vault Secrets Operator (VSO) allows Pods to consume Vault secrets natively from Kubernetes Secrets.
Go
398
star
67

terraform-aws-consul

A Terraform Module for how to run Consul on AWS using Terraform and Packer
HCL
397
star
68

vault-action

A GitHub Action that simplifies using HashiCorp Vault™ secrets as build variables.
JavaScript
391
star
69

terraform-plugin-sdk

Terraform Plugin SDK enables building plugins (providers) to manage any service providers or custom in-house solutions
Go
383
star
70

hil

HIL is a small embedded language for string interpolations.
Go
382
star
71

nomad-pack

Go
377
star
72

hcl2

Former temporary home for experimental new version of HCL
Go
375
star
73

errwrap

Errwrap is a Go (golang) library for wrapping and querying errors.
Go
373
star
74

learn-terraform-provision-eks-cluster

HCL
364
star
75

go-cleanhttp

Go
359
star
76

design-system

Helios Design System
TypeScript
358
star
77

logutils

Utilities for slightly better logging in Go (Golang).
Go
356
star
78

vault-ruby

The official Ruby client for HashiCorp's Vault
Ruby
336
star
79

vault-rails

A Rails plugin for easily integrating Vault secrets
Ruby
334
star
80

waypoint-examples

Example Apps that can be deployed with Waypoint
PHP
326
star
81

next-remote-watch

Decorated local server for next.js that enables reloads from remote data changes
JavaScript
325
star
82

go-hclog

A common logging package for HashiCorp tools
Go
307
star
83

terraform-config-inspect

A helper library for shallow inspection of Terraform configurations
Go
293
star
84

consul-haproxy

Consul HAProxy connector for real-time configuration
Go
279
star
85

nomad-guides

Example usage of HashiCorp Nomad
HCL
275
star
86

http-echo

A tiny go web server that echos what you start it with!
Makefile
257
star
87

vault-csi-provider

HashiCorp Vault Provider for Secret Store CSI Driver
Go
253
star
88

terraform-aws-nomad

A Terraform Module for how to run Nomad on AWS using Terraform and Packer
HCL
253
star
89

faas-nomad

OpenFaaS plugin for Nomad
Go
252
star
90

terraform-provider-google-beta

Terraform Google Cloud Platform Beta provider
Go
251
star
91

go-sockaddr

IP Address/UNIX Socket convenience functions for Go
Go
250
star
92

terraform-foundational-policies-library

Sentinel is a language and framework for policy built to be embedded in existing software to enable fine-grained, logic-based policy decisions. This repository contains a library of Sentinel policies, developed by HashiCorp, that can be consumed directly within the Terraform Cloud platform.
HCL
233
star
93

vagrant-vmware-desktop

Official provider for VMware desktop products: Fusion, Player, and Workstation.
Go
225
star
94

nomad-driver-podman

A nomad task driver plugin for sandboxing workloads in podman containers
Go
219
star
95

go-tfe

Terraform Cloud/Enterprise API Client/SDK in Golang
Go
217
star
96

terraform-provider-awscc

Terraform AWS Cloud Control provider
HCL
213
star
97

boundary-reference-architecture

Example reference architecture for a high availability Boundary deployment on AWS.
HCL
206
star
98

nomad-pack-community-registry

A repo for Packs written and maintained by Nomad community members
HCL
205
star
99

terraform-plugin-framework

A next-generation framework for building Terraform providers.
Go
204
star
100

vault-plugin-auth-kubernetes

Vault authentication plugin for Kubernetes Service Accounts
Go
192
star