• Stars
    star
    169
  • Rank 224,453 (Top 5 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created almost 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Apache Cassandra® metrics exporter for Prometheus

Cassandra Exporter travis badge

logo

Description

Cassandra exporter is a standalone application which exports Apache Cassandra® metrics throught a prometheus friendly endpoint. This project is originally a fork of JMX exporter but aims at an easier integration with Apache Cassandra®.

Specifically, this project brings :

  • Exporting EstimatedHistogram metrics specific to Apache Cassandra®
  • Filtering on mbean's attributes
  • Metrics naming that respect the mbean hierarchy
  • Comprehensive config file

An essential design choice the project makes is to not let prometheus drive the scraping frequency. This decision has been taken because a lot of Apache Cassandra® metrics are expensive to scrap and can hinder the performance of the node. As we don't want this kind of situation to happen in production, the scrape frequency is restricted via the configuration of Cassandra Exporter.

Grafana Grafana

Design explanation

The project has two focus: safety and maintainability.

Every time a tradeoff had to be made, the solution that prioritize one of those points got the advantage

Why not provide the exporter as an agent for cassandra ?
  • Safety: The agent share the same jvm than cassandra itself and I don't want metrics calls to be able to hammer down cassandra nodes.
  • Safety: If there is a bug/leak in the exporter itself it should not impact cassandra
  • Maintainability: Upgrading the exporter should not require to restart the cassandra cluster
Why cache metrics results, this is not the prometheus way ?
  • Safety: JMX is an heayweight RPC mechanism and some cassandra metrics calls are expensive to scrap (i.e: snapshots size) as they trigger some heavy operations for cassandra. Not caching results mean that you can bring down your nodes by just requesting the metrics page
Why not make more use of labels, be more prometheus way ?
  • Maintainability: I want the exporter to be able to support multiple version of cassandra (2.2.X/3.X/4.X) without having to hand tune the metrics labels for each version of cassandra. Metrics path change between versions of cassandra and I want to avoid the hustle of having to maintain the mapping
Why this exporter is slower than jmx_exporter ?
  • Maintainability: When your cluster grow in number of nodes, the cardinality of metrics start to put too much pressure on Prometheus itself. A lot of this cardinality is due to the not too much usefulness of metrics like 999thpercentile and others. This exporter let you choose to not export them, which is not possible with jmx_exporter, but at the cost of a small runtime penality in order to discover them. So this exporter let you reach a bigger scale before you have to rely on metric aggregation in order to scale more.

Unless you have hundreds of tables, the scrap time will stay below 10sec

Why the exporter is not written in GO ?
  • Cassandra metrics are only available trought JMX, which in turn is only accessible with Java.

How to use

To start the application

java -jar cassandra_exporter.jar config.yml

The Cassandra exporter needs to run on every Cassandra nodes to get all the informations regarding the whole cluster.

You can have a look at a full configuration file here The 2 main parts are :

  1. blacklist
  2. maxScrapFrequencyInSec

In the blacklist block, you specify the metrics you don't want the exporter to scrape. This is important as JMX is an RPC mechanism and you don't want to trigger some of those RPC. For example, mbeans endpoint from org:apache:cassandra:db:.* does not expose any metrics but are used to trigger actions on Cassandra's nodes.

In the maxScrapFrequencyInSec, you specify the metrics you want to be scraped at which frequency. Basically, starting from the set of all mbeans, the blacklist is applied first to filter this set and then the maxScrapFrequencyInSec is applied as a whitelist to filter the resulting set.

As an example, if we take as input set the metrics {a, b, c} and the config file is

blacklist:
  - a
maxScrapFrequencyInSec:
  50:
    - .*
  3600:
    - b

Cassandra Exporter will have the following behavior:

  1. The metrics matching the blacklisted entries will never be scraped, here the metric a won't be available
  2. In reverse order of frequency the metrics matching maxScrapFrequencyInSec will be scraped
    1. Metric b will be scraped every hour
    2. Remaining metrics will be scrapped every 50s, here only c

Resulting in :

Metric Scrap Frequency
a never
b every hour
c every 50 seconds

Once started the prometheus endpoint will be available at localhost:listenPort/ or localhost:listenPort/metrics and metrics format will look like the one below

cassandra_stats{name="org:apache:cassandra:metrics:table:biggraphite:datapoints_5760p_3600s_aggr:writelatency:50thpercentile",} 35.425000000000004

How to debug

Run the program with the following options:

java -Dorg.slf4j.simpleLogger.defaultLogLevel=trace -jar cassandra_exporter.jar config.yml --oneshot

You will get the duration of how long it took to scrape individual MBean, this is useful to understand which metrics are expansive to scrape.

Goods sources of information to understand what Mbeans are doing/create your dashboards are:

  1. https://cassandra.apache.org/doc/latest/operating/metrics.html
  2. https://github.com/apache/cassandra/tree/trunk/src/java/org/apache/cassandra/metrics
  3. http://thelastpickle.com/blog/2017/12/05/datadog-tlp-dashboards.html
  4. https://www.youtube.com/watch?v=Q9AAR4UQzMk

Config file example

host: localhost:7199
ssl: False
user:
password:
listenAddress: 0.0.0.0
listenPort: 8080
# Regular expression to match environment variables that will be added
# as labels to all data points. The name of the label will be either
# $1 from the regex below, or the entire environment variable name if no match groups are defined
#
# Example:
# additionalLabelsFromEnvvars: "^ADDL\_(.*)$"
additionalLabelsFromEnvvars:
blacklist:
   # Unaccessible metrics (not enough privilege)
   - java:lang:memorypool:.*usagethreshold.*

   # Leaf attributes not interesting for us but that are presents in many path (reduce cardinality of metrics)
   - .*:999thpercentile
   - .*:95thpercentile
   - .*:fifteenminuterate
   - .*:fiveminuterate
   - .*:durationunit
   - .*:rateunit
   - .*:stddev
   - .*:meanrate
   - .*:mean
   - .*:min

   # Path present in many metrics but uninterresting
   - .*:viewlockacquiretime:.*
   - .*:viewreadtime:.*
   - .*:cas[a-z]+latency:.*
   - .*:colupdatetimedeltahistogram:.*

   # Mostly for RPC, do not scrap them
   - org:apache:cassandra:db:.*

   # columnfamily is an alias for Table metrics in cassandra 3.x
   # https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/metrics/TableMetrics.java#L162
   - org:apache:cassandra:metrics:columnfamily:.*

   # Should we export metrics for system keyspaces/tables ?
   - org:apache:cassandra:metrics:[^:]+:system[^:]*:.*

   # Don't scrape us
   - com:criteo:nosql:cassandra:exporter:.*

maxScrapFrequencyInSec:
  50:
    - .*

  # Refresh those metrics only every hour as it is costly for cassandra to retrieve them
  3600:
    - .*:snapshotssize:.*
    - .*:estimated.*
    - .*:totaldiskspaceused:.*

Docker

You can pull an image directly from Dockerhub:

docker pull criteord/cassandra_exporter:latest

Run docker in read-only mode (/tmp must be mounted as tmpfs to authorize sed on the config.yml when using dedicated env variables)

docker run -e CASSANDRA_EXPORTER_CONFIG_host=localhost:7198 --read-only --tmpfs=/tmp criteord/cassandra_exporter:latest

Kubernetes

To get an idea on how to integrate Cassandra Exporter in Kubernetes, you can look at this helm Chart.

Grafana

Dedicated dashboards can be found here

More Repositories

1

autofaiss

Automatically create Faiss knn indices with the most optimal similarity search parameters.
Python
811
star
2

biggraphite

Simple Scalable Time Series Database
Python
130
star
3

babar

Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.
Java
125
star
4

kerberos-docker

Run kerberos environment in docker containers
Shell
124
star
5

cuttle

An embedded job scheduler.
Scala
114
star
6

kafka-sharp

A C# Kafka driver
C#
110
star
7

lolhttp

An HTTP Server and Client library for Scala.
Scala
91
star
8

tf-yarn

Train TensorFlow models on YARN in just a few lines of code!
Python
86
star
9

consul-templaterb

consul-template-like with erb (ruby) template expressiveness
Ruby
77
star
10

Spark-RSVD

Randomized SVD of large sparse matrices on Spark
Scala
77
star
11

JVips

Java wrapper for libvips using JNI.
Java
72
star
12

deepr

The deepr module provide abstractions (layers, readers, prepro, metrics, config) to help build tensorflow models on top of tf estimators
Python
50
star
13

cluster-pack

A library on top of either pex or conda-pack to make your Python code easily available on a cluster
Python
45
star
14

findjars

Gradle plugin to debug classpath issues
Kotlin
45
star
15

py-consul

Python client for Consul (http://www.consul.io/)
Python
44
star
16

kafka-ganglia

Kafka Ganglia Metrics Reporter
Java
39
star
17

garmadon

Java event logs collector for hadoop and frameworks
Java
39
star
18

graphite-remote-adapter

Fully featured graphite remote adapter for Prometheus
Go
38
star
19

command-launcher

A command launcher 🚀 made with ❤️
Go
36
star
20

marathon_exporter

A Prometheus metrics exporter for the Marathon Mesos framework
Go
34
star
21

haproxy-spoe-auth

Plugin for authorizing users against LDAP
Go
33
star
22

netprobify

Network probing tool crafted for datacenters (but not only)
Python
31
star
23

vizsql

Scala and SQL happy together.
Scala
28
star
24

haproxy-spoe-go

An implementation of the SPOP protocol in Go. https://www.haproxy.org/download/2.0/doc/SPOE.txt
Go
28
star
25

CriteoDisplayCTR-TFOnSpark

Python
27
star
26

netcompare

Python
26
star
27

loop

enhance your web application development workflow
JavaScript
26
star
28

openapi-comparator

C#
25
star
29

fromconfig

A library to instantiate any Python object from configuration files.
Python
24
star
30

vertica-hyperloglog

C++
22
star
31

slab

An extensible Scala framework for creating monitoring dashboards.
Scala
22
star
32

consul-bench

A tool to bench Consul Clusters
Go
20
star
33

socco

A Scala compiler plugin to generate documentation from Scala source files.
Scala
20
star
34

hwbench

hwbench is a benchmark orchestration tool to automate the low-level testing of servers.
Python
20
star
35

mesos-term

Web terminal and sandbox explorer for your mesos containers
TypeScript
19
star
36

memcache-driver

Criteo's .NET MemCache driver
C#
16
star
37

NinjaTurtlesMutation

C#
16
star
38

defcon

DefCon - Status page and API for production status
Python
16
star
39

vagrant-winrm

Vagrant 1.6+ plugin extending WinRM communication features
Ruby
16
star
40

criteo-python-marketing-sdk

Official Python SDK to access the Criteo Marketing API
Python
15
star
41

mesos-external-container-logger

Mesos container logger module for logging to processes, backported from MESOS-6003
C++
13
star
42

android-publisher-sdk

Criteo Publisher SDK for Android
Java
12
star
43

lobster

Simple loop job runner
Ruby
12
star
44

ocserv-exporter

ocserv exporter for Prometheus
Go
11
star
45

berilia

Create hadoop cluster in aws ec2 for development
Scala
11
star
46

mlflow-yarn

Backend implementation for running MLFlow projects on Hadoop/YARN.
Python
10
star
47

openpass

TypeScript
10
star
48

ios-publisher-sdk

Criteo Publisher SDK for iOS
Objective-C
10
star
49

traffic-mirroring

Go
8
star
50

ipam-client

Python ipam-client library
Python
7
star
51

eslint-plugin-criteo

JavaScript
7
star
52

tableau-parser

Scala
7
star
53

gourde

Flask sugar for Python microservices
Python
7
star
54

criteo-java-marketing-sdk

Official Java SDK to access the Criteo Marketing API
Java
7
star
55

metrics-net

Archived: Capturing CLR and application-level metrics. So you know what's going on.
C#
6
star
56

casspoke

Prometheus probe exporter for Cassandra latency and availability
Java
6
star
57

newman-server

A simple webserver to run Postman collections using the newman engine
TypeScript
6
star
58

mewpoke

Memcached / couchbase probe
Java
6
star
59

je-code-crazy-filters

Python
6
star
60

http-proxy-exporter

Expose proxy performance statistics in a Prometheus-friendly way.
Go
5
star
61

kitchen-transport-speedy

Speed up kitchen file transfer using archives
Ruby
5
star
62

AFK

5
star
63

vertica-datasketch

C++
5
star
64

django-memcached-consul

Used consul discovered memcached servers
Python
4
star
65

skydive-visualizer

Go
4
star
66

log4j-jndi-jar-detector

Application trying to detect processes vulnerable to log4j JNDI exploit
Go
4
star
67

criteo-api-python-sdk

Python
4
star
68

RabbitMQHare

High-level RabbitMQ C# client
C#
4
star
69

hardware-manifesto

Criteo's hardware operating principles manifest
4
star
70

automerge-plugin

Gerrit plugin to automatically merge reviews
Java
4
star
71

cassback

This project aims to backup Cassandra SSTables and store them into HDFS
Ruby
4
star
72

vertica-hll-druid

C++
3
star
73

fromconfig-mlflow

A fromconfig Launcher for MlFlow
Python
3
star
74

hive-client

A Pure Scala/Thrift Hive Client
Thrift
3
star
75

android-events-sdk

Java
3
star
76

rundeck-dsl

Groovy
3
star
77

tableau-maven-plugin

Java
3
star
78

ml-hadoop-experiment

Python
3
star
79

android-publisher-sdk-examples

Java
3
star
80

vault-auth-plugin-chef

Go
3
star
81

mesos-command-modules

Mesos modules running external commands
C++
3
star
82

sonic-saltstack

Saltstack modules for SONiC
Python
3
star
83

scala-schemas

use scala classes as schema definition across different systems
Scala
3
star
84

tf-collective-all-reduce

Lightweight framework for distributed TensorFlow training based on dmlc/rabit
Python
3
star
85

criteo-marketing-sdk-generator

A Gradle project to generate custom SDKs for Criteo's marketing API
Mustache
3
star
86

s3-probe

Go
3
star
87

blackbox-prober

Go
3
star
88

netbox-network-cmdb

Python
3
star
89

kitchen-vagrant_winrm

A test-kitchen driver using vagrant-winrm
Ruby
2
star
90

graphite-dashboard-api

Graphite Dashboard API
Ruby
2
star
91

nrpe_exporter

Go
2
star
92

criteo-dotnet-blog

C#
2
star
93

criteo-java-marketing-transition-sdk

Java
2
star
94

pgwrr

Python
2
star
95

criteo-python-marketing-transition-sdk

Python
2
star
96

privacy

2
star
97

knife-ssh-agent

Authenticate to a chef server using a SSH agent
Ruby
2
star
98

carbonate-utils

Utilities for carbonate - resync whisper easilly
Python
2
star
99

ios-events-sdk

Objective-C
2
star
100

gtm-criteo-useridentification

Smarty
2
star