• Stars
    star
    254
  • Rank 160,264 (Top 4 %)
  • Language
    Go
  • License
    Apache License 2.0
  • Created over 7 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Google Stackdriver Prometheus exporter

Google Stackdriver Prometheus Exporter

Build Status golangci-lint Go Report Card GoDoc Release GitHub go.mod Go version License

A Prometheus exporter for Google Stackdriver Monitoring metrics. It acts as a proxy that requests Stackdriver API for the metric's time-series everytime prometheus scrapes it.

Installation

Binaries

Download the already existing binaries for your platform:

$ ./stackdriver_exporter <flags>

From source

Using the standard go install (you must have Go already installed in your local machine):

$ go install github.com/prometheus-community/stackdriver_exporter
$ stackdriver_exporter <flags>

Docker

To run the stackdriver exporter as a Docker container, run:

$ docker run -p 9255:9255 prometheuscommunity/stackdriver-exporter <flags>

Kubernetes

You can find a helm chart in the prometheus-community charts repository at https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-stackdriver-exporter

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install [RELEASE_NAME] prometheus-community/prometheus-stackdriver-exporter

Cloud Foundry

The exporter can be deployed to an already existing Cloud Foundry environment:

$ git clone https://github.com/prometheus-community/stackdriver_exporter.git
$ cd stackdriver_exporter

Modify the included application manifest file to include the desired properties. Then you can push the exporter to your Cloud Foundry environment:

$ cf push

BOSH

This exporter can be deployed using the Prometheus BOSH Release.

Usage

Credentials and Permissions

The Google Stackdriver Exporter uses the Google Golang Client Library, which offers a variety of ways to provide credentials. Please refer to the Google Application Default Credentials documentation to see how the credentials can be provided.

If you are using IAM roles, the roles/monitoring.viewer IAM role contains the required permissions. See the Access Control Guide for more information.

If you are still using the legacy Access scopes, the https://www.googleapis.com/auth/monitoring.read scope is required.

Flags

Flag Required Default Description
google.project-id No GCloud SDK auto-discovery Comma seperated list of Google Project IDs
monitoring.metrics-ingest-delay No Offsets metric collection by a delay appropriate for each metric type, e.g. because bigquery metrics are slow to appear
`monitoring.drop-delegated-projects No No Drop metrics from attached projects and fetch project_id only.
monitoring.metrics-type-prefixes Yes Comma separated Google Stackdriver Monitoring Metric Type prefixes (see example and available metrics)
monitoring.metrics-interval No 5m Metric's timestamp interval to request from the Google Stackdriver Monitoring Metrics API. Only the most recent data point is used
monitoring.metrics-offset No 0s Offset (into the past) for the metric's timestamp interval to request from the Google Stackdriver Monitoring Metrics API, to handle latency in published metrics
monitoring.filters No Formatted string to allow filtering on certain metrics type
monitoring.aggregate-deltas No If enabled will treat all DELTA metrics as an in-memory counter instead of a gauge. Be sure to read what to know about aggregating DELTA metrics
monitoring.aggregate-deltas-ttl No 30m How long should a delta metric continue to be exported and stored after GCP stops producing it. Read slow moving metrics to understand the problem this attempts to solve
monitoring.descriptor-cache-ttl No 0s How long should the metric descriptors for a prefixed be cached for
stackdriver.max-retries No 0 Max number of retries that should be attempted on 503 errors from stackdriver.
stackdriver.http-timeout No 10s How long should stackdriver_exporter wait for a result from the Stackdriver API.
stackdriver.max-backoff= No Max time between each request in an exp backoff scenario.
stackdriver.backoff-jitter No `1s The amount of jitter to introduce in a exp backoff scenario.
stackdriver.retry-statuses No 503 The HTTP statuses that should trigger a retry.
web.config.file No [EXPERIMENTAL] Path to configuration file that can enable TLS or authentication.
web.listen-address No :9255 Address to listen on for web interface and telemetry Repeatable for multiple addresses.
web.systemd-socket No Use systemd socket activation listeners instead of port listeners (Linux only).
web.stackdriver-telemetry-path No "/metrics" Path under which to expose Stackdriver metrics.
web.telemetry-path No /metrics Path under which to expose Prometheus metrics

TLS and basic authentication

The Stackdriver Exporter supports TLS and basic authentication.

To use TLS and/or basic authentication, you need to pass a configuration file using the --web.config.file parameter. The format of the file is described in the exporter-toolkit repository.

Metrics

The exporter returns the following metrics:

Metric Description Labels
stackdriver_monitoring_api_calls_total Total number of Google Stackdriver Monitoring API calls made project_id
stackdriver_monitoring_scrapes_total Total number of Google Stackdriver Monitoring metrics scrapes project_id
stackdriver_monitoring_scrape_errors_total Total number of Google Stackdriver Monitoring metrics scrape errors project_id
stackdriver_monitoring_last_scrape_error Whether the last metrics scrape from Google Stackdriver Monitoring resulted in an error (1 for error, 0 for success) project_id
stackdriver_monitoring_last_scrape_timestamp Number of seconds since 1970 since last metrics scrape from Google Stackdriver Monitoring project_id
stackdriver_monitoring_last_scrape_duration_seconds Duration of the last metrics scrape from Google Stackdriver Monitoring project_id

Metrics gathered from Google Stackdriver Monitoring are converted to Prometheus metrics:

  • Metric's names are normalized according to the Prometheus specification using the following pattern:
    1. namespace is a constant prefix (stackdriver)
    2. subsystem is the normalized monitored resource type (ie gce_instance)
    3. name is the normalized metric type (ie compute_googleapis_com_instance_cpu_usage_time)
  • Labels attached to each metric are an aggregation of:
    1. the unit in which the metric value is reported
    2. the metric type labels (see Metrics List)
    3. the monitored resource labels (see Monitored Resource Types)
  • For each timeseries, only the most recent data point is exported.
  • Stackdriver GAUGE metric kinds are reported as Prometheus Gauge metrics
  • Stackdriver CUMULATIVE metric kinds are reported as Prometheus Counter metrics.
  • Stackdriver DELTA metric kinds are reported as Prometheus Gauge metrics or an accumulating Counter if monitoring.aggregate-deltas is set
  • Only BOOL, INT64, DOUBLE and DISTRIBUTION metric types are supported, other types (STRING and MONEY) are discarded.
  • DISTRIBUTION metric type is reported as a Prometheus Histogram, except the _sum time series is not supported.

Example

If we want to get all CPU (compute.googleapis.com/instance/cpu) and Disk (compute.googleapis.com/instance/disk) metrics for all Google Compute Engine instances, we can run the exporter with the following options:

stackdriver_exporter \
  --google.project-id=my-test-project \
  --monitoring.metrics-type-prefixes "compute.googleapis.com/instance/cpu,compute.googleapis.com/instance/disk"

Using extra filters:

stackdriver_exporter \
 --google.project-id=my-test-project \
 --monitoring.metrics-type-prefixes='pubsub.googleapis.com/subscription' \
 --monitoring.filters='pubsub.googleapis.com/subscription:resource.labels.subscription_id=monitoring.regex.full_match("us-west4.*my-team-subs.*")'

Filtering enabled collectors

The stackdriver_exporter collects all metrics type prefixes by default.

For advanced uses, the collection can be filtered by using a repeatable URL param called collect. In the Prometheus configuration you can use you can use this syntax under the scrape config.

params:
  collect:
  - compute.googleapis.com/instance/cpu
  - compute.googleapis.com/instance/disk

What to know about Aggregating DELTA Metrics

Treating DELTA Metrics as a gauge produces data which is wildly inaccurate/not very useful (see #116). However, aggregating the DELTA metrics overtime is not a perfect solution and is intended to produce data which mirrors GCP's data as close as possible.

The biggest challenge to producing a correct result is that a counter for prometheus does not start at 0, it starts at the first value which is exported. This can cause inconsistencies when the exporter first starts and for slow moving metrics which are described below.

Start-up Delay

When the exporter first starts it has no persisted counter information and the stores will be empty. When the first sample is received for a series it is intended to be a change from a previous value according to GCP, a delta. But the prometheus counter is not initialized to 0 so it does not export this as a change from 0, it exports that the counter started at the sample value. Since the series exported are dynamic it's not possible to export an initial 0 value in order to account for this issue. The end result is that it can take a few cycles for aggregated metrics to start showing rates exactly as GCP.

As an example consider a prometheus query, sum by(backend_target_name) (rate(stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_request_bytes_count[1m])) which is aggregating 5 series. All 5 series will need to have two samples from GCP in order for the query to produce the same result as GCP.

Slow Moving Metrics

A slow moving metric would be a metric which is not constantly changing with every sample from GCP. GCP does not consistently report slow moving metrics DELTA metrics. If this occurs for too long (default 5m) prometheus will mark the series as stale. The end result is that the next reported sample will be treated as the start of a new series and not an increment from the previous value. Here's an example of this in action,

There are two features which attempt to combat this issue,

  1. monitoring.aggregate-deltas-ttl which controls how long a metric is persisted in the data store after its no longer being reported by GCP
  2. Metrics which were not collected during a scrape are still exported at their current counter value

The configuration when using monitoring.aggregate-deltas gives a 30 minute buffer to slower moving metrics and monitoring.aggregate-deltas-ttl can be adjusted to tune memory requirements vs correctness. Storing the data for longer results in a higher memory cost.

The feature which continues to export metrics which are not collected can cause the sample has been rejected because another sample with the same timestamp, but a different value, has already been ingested if your scrape config for the exporter has honor_timestamps enabled (this is the default value). This is caused by the fact that it's not possible to know the different between GCP having late arriving data and GCP not exporting a value. The underlying counter is still incremented when this happens so the next reported sample will show a higher rate than expected.

Contributing

Refer to the contributing guidelines.

License

Apache License 2.0, see LICENSE.

More Repositories

1

helm-charts

Prometheus community Helm charts
Mustache
4,981
star
2

windows_exporter

Prometheus exporter for Windows machines
Go
2,905
star
3

postgres_exporter

A PostgreSQL metric exporter for Prometheus
Go
2,720
star
4

elasticsearch_exporter

Elasticsearch stats exporter for Prometheus
Go
1,925
star
5

PushProx

Proxy to allow Prometheus to scrape through NAT etc.
Go
715
star
6

json_exporter

A prometheus exporter which scrapes remote JSON by JSONPath
Go
631
star
7

node-exporter-textfile-collector-scripts

Scripts for node-exporter's textfile collector
Python
490
star
8

ipmi_exporter

Remote IPMI exporter for Prometheus
Go
459
star
9

avalanche

Prometheus/OpenMetrics endpoint series generator for load testing.
Go
393
star
10

ansible

Ansible Collection for Prometheus
Python
357
star
11

jiralert

JIRA integration for Prometheus Alertmanager
Go
333
star
12

pro-bing

A library for creating continuous probers
Go
317
star
13

bind_exporter

Prometheus exporter for BIND
Go
299
star
14

smartctl_exporter

Export smartctl statistics to prometheus
Go
289
star
15

systemd_exporter

Exporter for systemd unit metrics
Go
283
star
16

prom-label-proxy

A proxy that enforces a given label in a given PromQL query.
Go
262
star
17

promql-langserver

PromQL language server
Go
176
star
18

prometheus-playground

Turnkey sandbox projects demonstrating a wide variety of Prometheus use cases
Go
164
star
19

pgbouncer_exporter

Prometheus exporter for PgBouncer
Go
138
star
20

ecs_exporter

Prometheus exporter for Amazon Elastic Container Service (ECS)
Go
78
star
21

vscode-promql

This is supposed to become a PromQL extension for vs code.
TypeScript
52
star
22

monaco-promql

PromQL support for the Monaco code editor
TypeScript
31
star
23

community

Prometheus & The Ecosystem Community Meeting Information
20
star
24

prometheus-community

13
star
25

snmp

Tools and configurations for translating SNMP into Prometheus
11
star
26

kitefactory

Makefile
3
star
27

sublimelsp-promql

PromQL support for Sublime LSP plugin, using promql-langserver
Python
2
star