• Stars
    star
    216
  • Rank 183,179 (Top 4 %)
  • Language
    Go
  • License
    Apache License 2.0
  • Created about 5 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

NATS Monitoring, Simplified.

License Build Coverage

NATS Surveyor

NATS Monitoring, Simplified.

NATS surveyor polls the NATS server for Statz messages to generate data for Prometheus. This allows a single exporter to connect to any NATS server and get an entire picture of a NATS deployment without requiring extra monitoring components or sidecars. Surveyor has been used extensively by Synadia.

System accounts must be enabled to use surveyor.

Usage

Usage:
  nats-surveyor [flags]

Flags:
      --accounts                            Export per account metrics
  -a, --addr string                         Network host to listen on. (default "0.0.0.0")
      --config string                       config file (default is ./nats-surveyor.yaml)
  -c, --count int                           Expected number of servers (-1 for undefined). (default 1)
      --creds string                        Credentials File
  -h, --help                                help for nats-surveyor
      --http-pass string                    Set the password for HTTP scrapes. NATS bcrypt supported.
      --http-tlscacert string               Client certificate CA for verification (used with HTTPS).
      --http-tlscert string                 Server certificate file (Enables HTTPS).
      --http-tlskey string                  Private key for server certificate (used with HTTPS).
      --http-user string                    Enable basic auth and set user name for HTTP scrapes.
      --jetstream string                    Listen for JetStream Advisories based on config files in a directory.
      --jwt string                          User JWT. Use in conjunction with --seed
      --log-level string                    Log level, one of: trace|debug|info|warn|error|fatal|panic (default "info")
      --nkey string                         Nkey Seed File
      --observe string                      Listen for observation statistics based on config files in a directory.
      --password string                     NATS user password
  -p, --port int                            Port to listen on. (default 7777)
      --prefix string                       Replace the default prefix for all the metrics.
      --seed string                         Private key (nkey seed). Use in conjunction with --jwt
      --server-discovery-timeout duration   Maximum wait time between responses from servers during server discovery. Use in conjunction with -count=-1. (default 500ms)
  -s, --servers string                      NATS Cluster url(s) (default "nats://127.0.0.1:4222")
      --timeout duration                    Polling timeout (default 3s)
      --tlscacert string                    Client certificate CA on NATS connections.
      --tlscert string                      Client certificate file for NATS connections.
      --tlskey string                       Client private key for NATS connections.
      --user string                         NATS user name or token
  -v, --version                             version for nats-surveyor

At this time, NATS 2.0 System credentials are required for meaningful usage. Those can be provided in 2 ways:

  • using --creds option to supply chained credentials file (containing JWT and NKey seed):
./nats-surveyor --creds ./test/SYS.creds
2019/10/14 21:35:40 Connected to NATS Deployment: 127.0.0.1:4222
2019/10/14 21:35:40 No certificate file specified; using http.
2019/10/14 21:35:40 Prometheus exporter listening at http://0.0.0.0:7777/metrics
  • using --jwt and --seed options to provide user JWT and NKey seed directly:
./nats-surveyor --jwt $NATS_USER_JWT --seed $NATS_NKEY_SEED
2019/10/14 21:35:40 Connected to NATS Deployment: 127.0.0.1:4222
2019/10/14 21:35:40 No certificate file specified; using http.
2019/10/14 21:35:40 Prometheus exporter listening at http://0.0.0.0:7777/metrics

Config

Config Files

Surveyor uses Viper to read configs, so it will support all file types that Viper supports (JSON, TOML, YAML, HCL, envfile, and Java properties)

To use a config file pass the --config flag. The defaults are /etc/nats-surveyor/nats-surveyor[.ext] and ./nats-surveyor[.ext] with one of the supported extensions.

The config is simple, just set each flag in the config file. Example nats-surveyor.yaml:

servers: nats://127.0.0.1:4222
accounts: true
log-level: debug

Environment Variables

Environment variables are also taken into account. Any environment variable that is prefixed with NATS_SURVEYOR_ will be read.

Each flag has a matching environment variable, flag names should be converted to uppercase and dashes replaced with underscores. Example:

NATS_SURVEYOR_SERVERS=nats://127.0.0.1:4222
NATS_SURVEYOR_ACCOUNTS=true
NATS_SURVEYOR_LOG_LEVEL=debug

Metrics

Scrape output is the in form of nats_core_NNNN_metric, where NNN is server, route, or gateway.

To aid filtering, each metric has labels. These include nats_server_cluster, nats_server_host, nats_server_id. Routes have additional flags, nats_server_route_id and gatways have nats_server_gateway_id and nats_server_gateway_name.

The info metrics has a nats_server_version label with the current version.

Additionally, there is a nats_up metric that will normally return 1, but will return 0 and no additional NATS metrics when there is no connectivity to the NATS system. This allows users to differentiate between a problem with the exporter itself connectivity with the NATS system.

Scrape Output

nats_core_active_account_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 2
nats_core_active_account_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 2
nats_core_active_account_count{nats_server_cluster="region2",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 2
# HELP nats_core_connection_count Current number of client connections gauge
# TYPE nats_core_connection_count gauge
nats_core_connection_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 0
nats_core_connection_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 1
nats_core_connection_count{nats_server_cluster="region2",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 0
# HELP nats_core_core_count Machine cores gauge
# TYPE nats_core_core_count gauge
nats_core_core_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 8
nats_core_core_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 8
nats_core_core_count{nats_server_cluster="region2",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 8
# HELP nats_core_cpu_percentage Server cpu utilization gauge
# TYPE nats_core_cpu_percentage gauge
nats_core_cpu_percentage{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 0
nats_core_cpu_percentage{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 0
nats_core_cpu_percentage{nats_server_cluster="region2",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 0
# HELP nats_core_gateway_inbound_msg_count Number inbound messages through the gateway gauge
# TYPE nats_core_gateway_inbound_msg_count gauge
nats_core_gateway_inbound_msg_count{nats_server_cluster="region1",nats_server_gateway_id="7",nats_server_gateway_name="region2",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 0
nats_core_gateway_inbound_msg_count{nats_server_cluster="region1",nats_server_gateway_id="9",nats_server_gateway_name="region2",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 1
nats_core_gateway_inbound_msg_count{nats_server_cluster="region2",nats_server_gateway_id="4",nats_server_gateway_name="region1",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 2
# HELP nats_core_gateway_recv_bytes Number of messages sent over the gateway gauge
# TYPE nats_core_gateway_recv_bytes gauge
nats_core_gateway_recv_bytes{nats_server_cluster="region1",nats_server_gateway_id="7",nats_server_gateway_name="region2",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 0
nats_core_gateway_recv_bytes{nats_server_cluster="region1",nats_server_gateway_id="9",nats_server_gateway_name="region2",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 852
nats_core_gateway_recv_bytes{nats_server_cluster="region2",nats_server_gateway_id="4",nats_server_gateway_name="region1",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 4005
# HELP nats_core_gateway_recv_msg_count Number of messages sent over the gateway gauge
# TYPE nats_core_gateway_recv_msg_count gauge
nats_core_gateway_recv_msg_count{nats_server_cluster="region1",nats_server_gateway_id="7",nats_server_gateway_name="region2",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 0
nats_core_gateway_recv_msg_count{nats_server_cluster="region1",nats_server_gateway_id="9",nats_server_gateway_name="region2",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 1
nats_core_gateway_recv_msg_count{nats_server_cluster="region2",nats_server_gateway_id="4",nats_server_gateway_name="region1",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 5
# HELP nats_core_gateway_sent_bytes Number of messages sent over the gateway gauge
# TYPE nats_core_gateway_sent_bytes gauge
nats_core_gateway_sent_bytes{nats_server_cluster="region1",nats_server_gateway_id="7",nats_server_gateway_name="region2",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 1719
nats_core_gateway_sent_bytes{nats_server_cluster="region1",nats_server_gateway_id="9",nats_server_gateway_name="region2",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 2286
nats_core_gateway_sent_bytes{nats_server_cluster="region2",nats_server_gateway_id="4",nats_server_gateway_name="region1",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 852
# HELP nats_core_gateway_sent_msgs Number of messages sent over the gateway gauge
# TYPE nats_core_gateway_sent_msgs gauge
nats_core_gateway_sent_msgs{nats_server_cluster="region1",nats_server_gateway_id="7",nats_server_gateway_name="region2",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 2
nats_core_gateway_sent_msgs{nats_server_cluster="region1",nats_server_gateway_id="9",nats_server_gateway_name="region2",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 3
nats_core_gateway_sent_msgs{nats_server_cluster="region2",nats_server_gateway_id="4",nats_server_gateway_name="region1",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 1
# HELP nats_core_info General Server information Summary gauge
# TYPE nats_core_info gauge
nats_core_info{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW",nats_server_version="2.0.2"} 1
nats_core_info{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF",nats_server_version="2.0.2"} 1
nats_core_info{nats_server_cluster="region2",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A",nats_server_version="2.0.2"} 1
# HELP nats_core_mem_bytes Server memory gauge
# TYPE nats_core_mem_bytes gauge
nats_core_mem_bytes{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 1.2685312e+07
nats_core_mem_bytes{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 1.2992512e+07
nats_core_mem_bytes{nats_server_cluster="region2",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 1.1309056e+07
# HELP nats_core_nats_up 1 if connected to NATS, 0 otherwise.  A gauge.
# TYPE nats_core_nats_up gauge
nats_core_nats_up 1
# HELP nats_core_recv_bytes Number of messages received gauge
# TYPE nats_core_recv_bytes gauge
nats_core_recv_bytes{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 0
nats_core_recv_bytes{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 6528
nats_core_recv_bytes{nats_server_cluster="region2",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 4005
# HELP nats_core_recv_msgs_count Number of messages received gauge
# TYPE nats_core_recv_msgs_count gauge
nats_core_recv_msgs_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 7
nats_core_recv_msgs_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 15
nats_core_recv_msgs_count{nats_server_cluster="region2",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 5
# HELP nats_core_route_pending_bytes Number of bytes pending in the route gauge
# TYPE nats_core_route_pending_bytes gauge
nats_core_route_pending_bytes{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW",nats_server_route_id="4"} 0
nats_core_route_pending_bytes{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF",nats_server_route_id="4"} 0
# HELP nats_core_route_recv_bytes Number of bytes received over the route gauge
# TYPE nats_core_route_recv_bytes gauge
nats_core_route_recv_bytes{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW",nats_server_route_id="4"} 0
nats_core_route_recv_bytes{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF",nats_server_route_id="4"} 5676
# HELP nats_core_route_recv_msg_count Number of messages received over the route gauge
# TYPE nats_core_route_recv_msg_count gauge
nats_core_route_recv_msg_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW",nats_server_route_id="4"} 7
nats_core_route_recv_msg_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF",nats_server_route_id="4"} 7
# HELP nats_core_route_sent_bytes Number of bytes sent over the route gauge
# TYPE nats_core_route_sent_bytes gauge
nats_core_route_sent_bytes{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW",nats_server_route_id="4"} 5676
nats_core_route_sent_bytes{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF",nats_server_route_id="4"} 0
# HELP nats_core_route_sent_msg_count Number of messages sent over the route gauge
# TYPE nats_core_route_sent_msg_count gauge
nats_core_route_sent_msg_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW",nats_server_route_id="4"} 7
nats_core_route_sent_msg_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF",nats_server_route_id="4"} 7
# HELP nats_core_rtt_nanoseconds RTT in nanoseconds gauge
# TYPE nats_core_rtt_nanoseconds gauge
nats_core_rtt_nanoseconds{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 1.8008293e+07
nats_core_rtt_nanoseconds{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 1.3031788e+07
nats_core_rtt_nanoseconds{nats_server_cluster="region2",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 1.7976382e+07
# HELP nats_core_sent_bytes Number of messages sent gauge
# TYPE nats_core_sent_bytes gauge
nats_core_sent_bytes{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 7395
nats_core_sent_bytes{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 13661
nats_core_sent_bytes{nats_server_cluster="region2",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 852
# HELP nats_core_sent_msgs_count Number of messages sent gauge
# TYPE nats_core_sent_msgs_count gauge
nats_core_sent_msgs_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 17
nats_core_sent_msgs_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 32
nats_core_sent_msgs_count{nats_server_cluster="region2",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 2
# HELP nats_core_slow_consumer_count Number of slow consumers gauge
# TYPE nats_core_slow_consumer_count gauge
nats_core_slow_consumer_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 0
nats_core_slow_consumer_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 0
nats_core_slow_consumer_count{nats_server_cluster="region2",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 0
# HELP nats_core_start_time Server start time gauge
# TYPE nats_core_start_time gauge
nats_core_start_time{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 1.571110522019796e+18
nats_core_start_time{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 1.571110522019795e+18
nats_core_start_time{nats_server_cluster="region2",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 1.571110952301371e+18
# HELP nats_core_subs_count Current number of subscriptions gauge
# TYPE nats_core_subs_count gauge
nats_core_subs_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 17
nats_core_subs_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 17
nats_core_subs_count{nats_server_cluster="region2",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 8
# HELP nats_core_total_connection_count Total number of client connections serviced gauge
# TYPE nats_core_total_connection_count gauge
nats_core_total_connection_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDGERVW3RX7A6RAJQ34E7HPBFUD35322XRZJNTOMTFI7MHAXL2PS3OVW"} 2
nats_core_total_connection_count{nats_server_cluster="region1",nats_server_host="localhost",nats_server_id="NDYW2PLO6QVP2VKKUMWGWJXBMPTZKB3UAYME26BTKOGLNN55NSEK3RQF"} 5
nats_core_total_connection_count{nats_server_cluster="region2",nats_server_host="localhost",nats_server_id="NCBI75V5ASPJAEAR3VPS2YELXP7K6CUXXWAD5PB2SJ4BOIYQHU6JKV7A"} 0

We feel Prometheus is the right fit for this project, but it's worth noting that this is not the recommended Prometheus archtecture, preferring ease of use over installing and configuring a full monitoring infrastructure. For a more robust monitoring architecture the prometheus-nats-exporter should be placed and configured alongside every NATS server component.

Docker Compose

An easy way to start the NATS Surveyor stack (Grafana, Prometheus, and NATS Surveyor) is through docker-compose.

Follow these links for installation instructions:

Environment Variables

The following environment variables MUST be set, either in your environment or through the .env file that is automatically read by docker-compose. There is a survey.sh script that will set them for you as a convenience.

Environment Variable Example Description
NATS_SURVEYOR_SERVERS nats://hostname:4222 The URLs of any deployed NATS server(s)
NATS_SURVEYOR_CREDS ./SYS.creds NATS 2.0 System Account credentials
NATS_SURVEYOR_SERVER_COUNT 9 Number of expected NATS servers
PROMETHEUS_STORAGE ./storage/prometheus Path to store prometheus data locally
SURVEYOR_DOCKER_TAG latest Surveyor docker tag to pull
PROMETHEUS_DOCKER_TAG latest Prometheus docker tag to pull
GRAFANA_DOCKER_TAG latest Grafana docker tag to pull

Note: For referencing files and paths, docker always expects volume mounts to be either a fully qualified directory, or a relative directory beginning with with ./.

Server URLs

You only need to connect to a single NATS server to monitor your entire NATS deployment. In configuring NATS_SURVEYOR_SERVERS, only one server is required, but it's recommended you provide a list for backup servers to connect to, e.g. nats://host1:4222,nats://host2:5222. Valid urls are formatted as hostname (defaulting to port 4222), hostname:port, or nats://hostname:port.

Starting Up

You can start the Surveyor stack two ways. The first is through docker compose. Ensure the environment varibles are set, that you are working from the /docker-compose directory and run docker-compose up.

$ docker-compose up
Recreating nats-surveyor ... done
Recreating prometheus    ... done
Recreating grafana       ... done
Attaching to nats-surveyor, prometheus, grafana
...

Alternatively, you can pass variables into the survey.sh script in the docker-compose directory.

$ ./survey.sh
usage: survey.sh <url> <server count> <system credentials>

e.g.

./survey.sh nats://mydeployment:4222 24 /privatekeys/SYS.creds

If things aren't working, look in the output for any lines that contain exited with code 1 and address the problem. They are usually docker volume mount problems or connectivity problems.

Next, with your browser, navigate to http://127.0.0.1:3000, or if you are running the Surveyor stack remotely, the hostname of the host running the NATS surveyor stack, e.g. http://yourremotehost:3000.

The first time you connect, you'll need to login:

  • User: admin
  • Password: admin

After logging in, navigate to "Manage dashboards" and you'll see a dashboard available named NATS Surveyor, where you'll be able to monitor your entire NATS deployment.

Stopping (while keeping the containers)

To stop the surveyor stack, but keep the containers run: docker-compose stop

Restarting Surveyor

To restart the surveyor stack after being stopped, run: docker-compose up

Stopping and removing containers

To cleanup your installation, run: docker-compose down

Running Surveyor as a service

For platforms that support systemd, surveyor.service is provided as a service definition template. Modify and save this file as /etc/systemd/system/surveyor.service.

systemctl start surveyor will launch the service.

Errors

The logs should normally contain enough information about the cause of problems or errors.

If you encounter a Prometheus error of: panic: Unable to create mmap-ed active query log, set the UID of the container to match the UID of your user in the docker-compose file.

e.g:

  prometheus:
    image: prom/prometheus:${PROMETHEUS_DOCKER_TAG}
    user: "1000:1000"

If the above doesn't work, using root will work but may pose a security thread to the node it is running on.

  prometheus:
    image: prom/prometheus:${PROMETHEUS_DOCKER_TAG}
    user: root

More information can be found here.

Service Observations

Services can be observed by creating JSON files in the observations directory. The file extension must be .json. Only one authentication method needs to be provided. Example file format:

{
  "name":       "my service",
  "topic":      "email.subscribe.>",
  "jwt":        "jwt portion of creds, must include seed also",
  "seed":       "seed portion of creds, must include jwt also",
  "credential": "/path/to/file.creds",
  "nkey":       "nkey seed",
  "token":      "token",
  "username":   "username, must include password also",
  "password":   "password, must include user also",
  "tls_ca":     "/path/to/ca.pem, defaults to surveyor's ca if one exists",
  "tls_cert":   "/path/to/cert.pem, defaults to surveyor's cert if one exists",
  "tls_key":    "/path/to/key.pem, defaults to surveyor's key if one exists"
}

Files are watched and updated using fsnotify

JetStream

JetStream can be monitored on a per-account basis by creating JSON files in the jetstream directory. The file extension must be .json. Only one authentication method needs to be provided. e sure that you give access to the $JS.EVENT.> subject to your user. Example file format:

Credentials

{
  "name":       "my account",
  "jwt":        "jwt portion of creds, must include seed also",
  "seed":       "seed portion of creds, must include jwt also",
  "credential": "/path/to/file.creds",
  "nkey":       "nkey seed",
  "token":      "token",
  "username":   "username, must include password also",
  "password":   "password, must include user also",
  "tls_ca":     "/path/to/ca.pem, defaults to surveyor's ca if one exists",
  "tls_cert":   "/path/to/cert.pem, defaults to surveyor's cert if one exists",
  "tls_key":    "/path/to/key.pem, defaults to surveyor's key if one exists"
}

Files are watched and updated using fsnotify

TODO

  • Windows builds
  • Other events (connections, disconnects, etc)
  • Best Guess Server Count

More Repositories

1

nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
Go
15,450
star
2

nats.go

Golang client for NATS, the cloud native messaging system.
Go
5,504
star
3

nats-streaming-server

NATS Streaming System Server
Go
2,511
star
4

nats.node

Node.js client for NATS, the cloud native messaging system.
JavaScript
1,542
star
5

nats.rs

Rust client for NATS, the cloud native messaging system.
Rust
1,042
star
6

nats.rb

Ruby client for NATS, the cloud native messaging system.
Ruby
881
star
7

nats.py

Python3 client for NATS
Python
860
star
8

stan.go

NATS Streaming System
Go
706
star
9

nats.net.v1

The official C# Client for NATS
C#
646
star
10

nats-operator

NATS Operator
Go
574
star
11

nats.java

Java client for NATS
Java
568
star
12

natscli

The NATS Command Line Interface
Go
468
star
13

jetstream

JetStream Utilities
Dockerfile
454
star
14

k8s

NATS on Kubernetes with Helm Charts
Go
445
star
15

nats.c

A C client for NATS
C
389
star
16

prometheus-nats-exporter

A Prometheus exporter for NATS metrics
Go
367
star
17

nuid

NATS Unique Identifiers
Go
361
star
18

nats-top

A top-like tool for monitoring NATS servers.
Go
353
star
19

nats.ws

WebSocket NATS
JavaScript
316
star
20

stan.js

Node.js client for NATS Streaming
JavaScript
293
star
21

nats.net

Full Async C# / .NET client for NATS
C#
252
star
22

nats.ex

Elixir client for NATS, the cloud native messaging system. https://nats.io
Elixir
203
star
23

nats-architecture-and-design

Architecture and Design Docs
Go
195
star
24

graft

A RAFT Election implementation in Go.
Go
178
star
25

nats.ts

TypeScript Node.js client for NATS, the cloud native messaging system
TypeScript
178
star
26

nats-streaming-operator

NATS Streaming Operator
Go
174
star
27

nats.deno

Deno client for NATS, the cloud native messaging system
TypeScript
158
star
28

nack

NATS Controllers for Kubernetes (NACK)
Go
154
star
29

jsm.go

JetStream Management Library for Golang
Go
149
star
30

stan.net

The official NATS .NET C# Streaming Client
C#
138
star
31

nats-docker

Official Docker image for the NATS server
Dockerfile
133
star
32

nkeys

NATS Keys
Go
129
star
33

nats-kafka

NATS to Kafka Bridging
Go
128
star
34

nats-pure.rb

Ruby client for NATS, the cloud native messaging system.
Ruby
127
star
35

nats.zig

Zig Client for NATS
124
star
36

stan.py

Python Asyncio NATS Streaming Client
Python
113
star
37

nats-box

A container with NATS utilities
HCL
112
star
38

go-nats-examples

Single repository for go-nats example code. This includes all documentation examples and any common message pattern examples.
Go
109
star
39

nats.docs

NATS.io Documentation on Gitbook
HTML
109
star
40

nginx-nats

NGINX client module for NATS, the cloud native messaging system.
C
108
star
41

not.go

A reference for distributed tracing with the NATS Go client.
Go
97
star
42

nsc

Tool for creating nkey/jwt based configurations
Go
96
star
43

stan.java

NATS Streaming Java Client
Java
93
star
44

nats-site

Website content for https://nats.io. For technical issues with NATS products, please log an issue in the proper repository.
Markdown
91
star
45

jparse

Small, Fast, Compliant JSON parser that uses events parsing and index overlay
Java
89
star
46

nats-account-server

A simple HTTP/NATS server to host JWTs for nats-server 2.0 account authentication.
Go
77
star
47

jwt

JWT tokens signed using NKeys for Ed25519 for the NATS ecosystem.
Go
77
star
48

elixir-nats

Elixir NATS client
Elixir
76
star
49

nats-general

General NATS Information
63
star
50

nats.py2

A Tornado based Python 2 client for NATS
Python
62
star
51

spring-nats

A Spring Cloud Stream Binder for NATS
Java
59
star
52

terraform-provider-jetstream

Terraform Provider to manage NATS JetStream
Go
54
star
53

nats-streaming-docker

Official Docker image for the NATS Streaming server
Python
45
star
54

nats.cr

Crystal client for NATS
Crystal
44
star
55

nats-rest-config-proxy

NATS REST Configuration Proxy
Go
34
star
56

java-nats-examples

Repo for java-nats-examples
Java
33
star
57

nats-connector-framework

A pluggable service to bridge NATS with other technologies
Java
33
star
58

demo-minio-nats

Demo of syncing across clouds with minio
Go
27
star
59

asyncio-nats-examples

Repo for Python Asyncio examples
Python
26
star
60

jetstream-leaf-nodes-demo

Go
25
star
61

stan.rb

Ruby NATS Streaming Client
Ruby
21
star
62

nats.swift

Swift client for NATS, the cloud native messaging system.
Swift
21
star
63

nats-mq

Simple bridge between NATS streaming and MQ Series
Go
21
star
64

nats-replicator

Bridge to replicate NATS Subjects or Channels to NATS Subject or Channels
Go
20
star
65

go-nats

[ARCHIVED] Golang client for NATS, the cloud native messaging system.
Go
20
star
66

nats-on-a-log

Raft log replication using NATS.
Go
20
star
67

nkeys.js

NKeys for JavaScript - Node.js, Browsers, and Deno.
TypeScript
19
star
68

latency-tests

Latency and Throughput Test Framework
HCL
17
star
69

nats.js

TypeScript
16
star
70

nkeys.py

NATS Keys for Python
Python
12
star
71

nats-jms-bridge

NATS to JMS Bridge for request/reply
Java
12
star
72

nats-connector-redis

A Redis Publish/Subscribe NATS Connector
Java
12
star
73

nats-java-vertx-client

Java
11
star
74

nuid.js

A Node.js implementation of NUID
TypeScript
10
star
75

sublist

History of the original sublist
Go
9
star
76

kotlin-nats-examples

Repo for Kotlin Nats examples.
Kotlin
8
star
77

nats-siddhi-demo

A NATS with Siddhi Event Processing Reference Architecture
8
star
78

jetstream-gh-action

Collection of JetStream related Actions for GitHub Actions
Go
8
star
79

node-nats-examples

Documentation samples for node-nats
JavaScript
8
star
80

nats-spark-connector

Scala
8
star
81

java-nats-server-runner

Run the Nats Server From your Java code.
Java
7
star
82

kubecon2020

Go
7
star
83

jwt.js

JWT tokens signed using nkeys for Ed25519 for the NATS JavaScript ecosystem
TypeScript
6
star
84

go-nats-streaming

[ARCHIVED] NATS Streaming System
Go
6
star
85

ts-nats-examples

typescript nats examples
TypeScript
5
star
86

js-nuid

TypeScript
5
star
87

integration-tests

Repository for integration test suites of any language
Java
5
star
88

homebrew-nats-tools

Repository hosting homebrew taps for nats-io tools
Ruby
5
star
89

ts-nkeys

A public-key signature system based on Ed25519 for the NATS ecosystem in typescript for ts-nats and node-nats
TypeScript
4
star
90

nats-steampipe-plugin

Example steampipe plugin for NATS
Go
4
star
91

kinesis-bridge

Bridge Amazon Kinesis to NATS streams.
Go
4
star
92

nkeys.rb

NATS Keys for Ruby
Ruby
4
star
93

nkeys.net

NATS Keys for .NET
C#
4
star
94

advisories

Advisories related to the NATS project
HTML
2
star
95

not.java

A reference for distributed tracing with the NATS Java client.
Java
2
star
96

deploy

Deployment for NATS
Ruby
2
star
97

nats.c.deps

C
2
star
98

stan2js

NATS Streaming to JetStream data migration tool.
Go
2
star
99

jwt.net

JWT tokens signed using NKeys for Ed25519 for NATS .NET
C#
2
star
100

netlify-slack

Trivial redirector website
1
star