• Stars
    star
    165
  • Rank 221,886 (Top 5 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Apache Cassandra® metrics exporter for Prometheus

Cassandra Exporter travis badge

logo

Description

Cassandra exporter is a standalone application which exports Apache Cassandra® metrics throught a prometheus friendly endpoint. This project is originally a fork of JMX exporter but aims at an easier integration with Apache Cassandra®.

Specifically, this project brings :

  • Exporting EstimatedHistogram metrics specific to Apache Cassandra®
  • Filtering on mbean's attributes
  • Metrics naming that respect the mbean hierarchy
  • Comprehensive config file

An essential design choice the project makes is to not let prometheus drive the scraping frequency. This decision has been taken because a lot of Apache Cassandra® metrics are expensive to scrap and can hinder the performance of the node. As we don't want this kind of situation to happen in production, the scrape frequency is restricted via the configuration of Cassandra Exporter.

Grafana Grafana

Design explanation

The project has two focus: safety and maintainability.

Every time a tradeoff had to be made, the solution that prioritize one of those points got the advantage

Why not provide the exporter as an agent for cassandra ?
  • Safety: The agent share the same jvm than cassandra itself and I don't want metrics calls to be able to hammer down cassandra nodes.
  • Safety: If there is a bug/leak in the exporter itself it should not impact cassandra
  • Maintainability: Upgrading the exporter should not require to restart the cassandra cluster
Why cache metrics results, this is not the prometheus way ?
  • Safety: JMX is an heayweight RPC mechanism and some cassandra metrics calls are expensive to scrap (i.e: snapshots size) as they trigger some heavy operations for cassandra. Not caching results mean that you can bring down your nodes by just requesting the metrics page
Why not make more use of labels, be more prometheus way ?
  • Maintainability: I want the exporter to be able to support multiple version of cassandra (2.2.X/3.X/4.X) without having to hand tune the metrics labels for each version of cassandra. Metrics path change between versions of cassandra and I want to avoid the hustle of having to maintain the mapping
Why this exporter is slower than jmx_exporter ?
  • Maintainability: When your cluster grow in number of nodes, the cardinality of metrics start to put too much pressure on Prometheus itself. A lot of this cardinality is due to the not too much usefulness of metrics like 999thpercentile and others. This exporter let you choose to not export them, which is not possible with jmx_exporter, but at the cost of a small runtime penality in order to discover them. So this exporter let you reach a bigger scale before you have to rely on metric aggregation in order to scale more.

Unless you have hundreds of tables, the scrap time will stay below 10sec

Why the exporter is not written in GO ?
  • Cassandra metrics are only available trought JMX, which in turn is only accessible with Java.

How to use

To start the application

java -jar cassandra_exporter.jar config.yml

The Cassandra exporter needs to run on every Cassandra nodes to get all the informations regarding the whole cluster.

You can have a look at a full configuration file here The 2 main parts are :

  1. blacklist
  2. maxScrapFrequencyInSec

In the blacklist block, you specify the metrics you don't want the exporter to scrape. This is important as JMX is an RPC mechanism and you don't want to trigger some of those RPC. For example, mbeans endpoint from org:apache:cassandra:db:.* does not expose any metrics but are used to trigger actions on Cassandra's nodes.

In the maxScrapFrequencyInSec, you specify the metrics you want to be scraped at which frequency. Basically, starting from the set of all mbeans, the blacklist is applied first to filter this set and then the maxScrapFrequencyInSec is applied as a whitelist to filter the resulting set.

As an example, if we take as input set the metrics {a, b, c} and the config file is

blacklist:
  - a
maxScrapFrequencyInSec:
  50:
    - .*
  3600:
    - b

Cassandra Exporter will have the following behavior:

  1. The metrics matching the blacklisted entries will never be scraped, here the metric a won't be available
  2. In reverse order of frequency the metrics matching maxScrapFrequencyInSec will be scraped
    1. Metric b will be scraped every hour
    2. Remaining metrics will be scrapped every 50s, here only c

Resulting in :

Metric Scrap Frequency
a never
b every hour
c every 50 seconds

Once started the prometheus endpoint will be available at localhost:listenPort/ or localhost:listenPort/metrics and metrics format will look like the one below

cassandra_stats{name="org:apache:cassandra:metrics:table:biggraphite:datapoints_5760p_3600s_aggr:writelatency:50thpercentile",} 35.425000000000004

How to debug

Run the program with the following options:

java -Dorg.slf4j.simpleLogger.defaultLogLevel=trace -jar cassandra_exporter.jar config.yml --oneshot

You will get the duration of how long it took to scrape individual MBean, this is useful to understand which metrics are expansive to scrape.

Goods sources of information to understand what Mbeans are doing/create your dashboards are:

  1. https://cassandra.apache.org/doc/latest/operating/metrics.html
  2. https://github.com/apache/cassandra/tree/trunk/src/java/org/apache/cassandra/metrics
  3. http://thelastpickle.com/blog/2017/12/05/datadog-tlp-dashboards.html
  4. https://www.youtube.com/watch?v=Q9AAR4UQzMk

Config file example

host: localhost:7199
ssl: False
user:
password:
listenAddress: 0.0.0.0
listenPort: 8080
# Regular expression to match environment variables that will be added
# as labels to all data points. The name of the label will be either
# $1 from the regex below, or the entire environment variable name if no match groups are defined
#
# Example:
# additionalLabelsFromEnvvars: "^ADDL\_(.*)$"
additionalLabelsFromEnvvars:
blacklist:
   # Unaccessible metrics (not enough privilege)
   - java:lang:memorypool:.*usagethreshold.*

   # Leaf attributes not interesting for us but that are presents in many path (reduce cardinality of metrics)
   - .*:999thpercentile
   - .*:95thpercentile
   - .*:fifteenminuterate
   - .*:fiveminuterate
   - .*:durationunit
   - .*:rateunit
   - .*:stddev
   - .*:meanrate
   - .*:mean
   - .*:min

   # Path present in many metrics but uninterresting
   - .*:viewlockacquiretime:.*
   - .*:viewreadtime:.*
   - .*:cas[a-z]+latency:.*
   - .*:colupdatetimedeltahistogram:.*

   # Mostly for RPC, do not scrap them
   - org:apache:cassandra:db:.*

   # columnfamily is an alias for Table metrics in cassandra 3.x
   # https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/metrics/TableMetrics.java#L162
   - org:apache:cassandra:metrics:columnfamily:.*

   # Should we export metrics for system keyspaces/tables ?
   - org:apache:cassandra:metrics:[^:]+:system[^:]*:.*

   # Don't scrape us
   - com:criteo:nosql:cassandra:exporter:.*

maxScrapFrequencyInSec:
  50:
    - .*

  # Refresh those metrics only every hour as it is costly for cassandra to retrieve them
  3600:
    - .*:snapshotssize:.*
    - .*:estimated.*
    - .*:totaldiskspaceused:.*

Docker

You can pull an image directly from Dockerhub:

docker pull criteord/cassandra_exporter:latest

Run docker in read-only mode (/tmp must be mounted as tmpfs to authorize sed on the config.yml when using dedicated env variables)

docker run -e CASSANDRA_EXPORTER_CONFIG_host=localhost:7198 --read-only --tmpfs=/tmp criteord/cassandra_exporter:latest

Kubernetes

To get an idea on how to integrate Cassandra Exporter in Kubernetes, you can look at this helm Chart.

Grafana

Dedicated dashboards can be found here

More Repositories

1

autofaiss

Automatically create Faiss knn indices with the most optimal similarity search parameters.
Python
750
star
2

biggraphite

Simple Scalable Time Series Database
Python
128
star
3

babar

Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.
Java
125
star
4

cuttle

An embedded job scheduler.
Scala
114
star
5

kafka-sharp

A C# Kafka driver
C#
110
star
6

kerberos-docker

Run kerberos environment in docker containers
Shell
108
star
7

lolhttp

An HTTP Server and Client library for Scala.
Scala
91
star
8

tf-yarn

Train TensorFlow models on YARN in just a few lines of code!
Python
86
star
9

Spark-RSVD

Randomized SVD of large sparse matrices on Spark
Scala
77
star
10

consul-templaterb

consul-template-like with erb (ruby) template expressiveness
Ruby
75
star
11

JVips

Java wrapper for libvips using JNI.
Java
67
star
12

deepr

The deepr module provide abstractions (layers, readers, prepro, metrics, config) to help build tensorflow models on top of tf estimators
Python
50
star
13

cluster-pack

A library on top of either pex or conda-pack to make your Python code easily available on a cluster
Python
46
star
14

findjars

Gradle plugin to debug classpath issues
Kotlin
44
star
15

kafka-ganglia

Kafka Ganglia Metrics Reporter
Java
39
star
16

garmadon

Java event logs collector for hadoop and frameworks
Java
39
star
17

graphite-remote-adapter

Fully featured graphite remote adapter for Prometheus
Go
36
star
18

marathon_exporter

A Prometheus metrics exporter for the Marathon Mesos framework
Go
34
star
19

command-launcher

A command launcher 🚀 made with ❤️
Go
31
star
20

haproxy-spoe-auth

Plugin for authorizing users against LDAP
Go
30
star
21

haproxy-spoe-go

An implementation of the SPOP protocol in Go. https://www.haproxy.org/download/2.0/doc/SPOE.txt
Go
28
star
22

vizsql

Scala and SQL happy together.
Scala
28
star
23

CriteoDisplayCTR-TFOnSpark

Python
28
star
24

py-consul

Python client for Consul (http://www.consul.io/)
Python
28
star
25

netcompare

Python
26
star
26

loop

enhance your web application development workflow
JavaScript
26
star
27

netprobify

Network probing tool crafted for datacenters (but not only)
Python
24
star
28

fromconfig

A library to instantiate any Python object from configuration files.
Python
23
star
29

openapi-comparator

C#
23
star
30

vertica-hyperloglog

C++
22
star
31

slab

An extensible Scala framework for creating monitoring dashboards.
Scala
22
star
32

socco

A Scala compiler plugin to generate documentation from Scala source files.
Scala
20
star
33

consul-bench

A tool to bench Consul Clusters
Go
19
star
34

mesos-term

Web terminal and sandbox explorer for your mesos containers
TypeScript
19
star
35

memcache-driver

Criteo's .NET MemCache driver
C#
16
star
36

NinjaTurtlesMutation

C#
16
star
37

vagrant-winrm

Vagrant 1.6+ plugin extending WinRM communication features
Ruby
16
star
38

mlflow-elasticsearchstore

ElasticSearch implementation of MlFlow tracking store
Python
16
star
39

defcon

DefCon - Status page and API for production status
Python
15
star
40

criteo-python-marketing-sdk

Official Python SDK to access the Criteo Marketing API
Python
15
star
41

mesos-external-container-logger

Mesos container logger module for logging to processes, backported from MESOS-6003
C++
14
star
42

android-publisher-sdk

Criteo Publisher SDK for Android
Java
12
star
43

lobster

Simple loop job runner
Ruby
12
star
44

berilia

Create hadoop cluster in aws ec2 for development
Scala
11
star
45

ios-publisher-sdk

Criteo Publisher SDK for iOS
Objective-C
11
star
46

mlflow-yarn

Backend implementation for running MLFlow projects on Hadoop/YARN.
Python
10
star
47

openpass

TypeScript
10
star
48

traffic-mirroring

Go
8
star
49

ipam-client

Python ipam-client library
Python
7
star
50

eslint-plugin-criteo

JavaScript
7
star
51

tableau-parser

Scala
7
star
52

gourde

Flask sugar for Python microservices
Python
7
star
53

criteo-java-marketing-sdk

Official Java SDK to access the Criteo Marketing API
Java
7
star
54

metrics-net

Archived: Capturing CLR and application-level metrics. So you know what's going on.
C#
6
star
55

casspoke

Prometheus probe exporter for Cassandra latency and availability
Java
6
star
56

newman-server

A simple webserver to run Postman collections using the newman engine
JavaScript
6
star
57

mewpoke

Memcached / couchbase probe
Java
6
star
58

je-code-crazy-filters

Python
6
star
59

ocserv-exporter

ocserv exporter for Prometheus
Go
5
star
60

http-proxy-exporter

Expose proxy performance statistics in a Prometheus-friendly way.
Go
5
star
61

kitchen-transport-speedy

Speed up kitchen file transfer using archives
Ruby
5
star
62

vertica-datasketch

C++
5
star
63

django-memcached-consul

Used consul discovered memcached servers
Python
4
star
64

skydive-visualizer

Go
4
star
65

log4j-jndi-jar-detector

Application trying to detect processes vulnerable to log4j JNDI exploit
Go
4
star
66

criteo-api-python-sdk

Python
4
star
67

RabbitMQHare

High-level RabbitMQ C# client
C#
4
star
68

automerge-plugin

Gerrit plugin to automatically merge reviews
Java
4
star
69

cassback

This project aims to backup Cassandra SSTables and store them into HDFS
Ruby
4
star
70

vertica-hll-druid

C++
3
star
71

hive-client

A Pure Scala/Thrift Hive Client
Thrift
3
star
72

fromconfig-mlflow

A fromconfig Launcher for MlFlow
Python
3
star
73

android-events-sdk

Java
3
star
74

rundeck-dsl

Groovy
3
star
75

tableau-maven-plugin

Java
3
star
76

android-publisher-sdk-examples

Java
3
star
77

ml-hadoop-experiment

Python
3
star
78

vault-auth-plugin-chef

Go
3
star
79

mesos-command-modules

Mesos modules running external commands
C++
3
star
80

scala-schemas

use scala classes as schema definition across different systems
Scala
3
star
81

tf-collective-all-reduce

Lightweight framework for distributed TensorFlow training based on dmlc/rabit
Python
3
star
82

criteo-marketing-sdk-generator

A Gradle project to generate custom SDKs for Criteo's marketing API
Mustache
3
star
83

s3-probe

Go
3
star
84

blackbox-prober

Go
3
star
85

AFK

3
star
86

kitchen-vagrant_winrm

A test-kitchen driver using vagrant-winrm
Ruby
2
star
87

graphite-dashboard-api

Graphite Dashboard API
Ruby
2
star
88

nrpe_exporter

Go
2
star
89

criteo-dotnet-blog

C#
2
star
90

criteo-python-marketing-transition-sdk

Python
2
star
91

criteo-java-marketing-transition-sdk

Java
2
star
92

pgwrr

Python
2
star
93

privacy

2
star
94

knife-ssh-agent

Authenticate to a chef server using a SSH agent
Ruby
2
star
95

carbonate-utils

Utilities for carbonate - resync whisper easilly
Python
2
star
96

sonic-saltstack

Saltstack modules for SONiC
Python
2
star
97

mesos-modules-ruby

A simple way to use ruby script as mesos modules
C++
2
star
98

ios-events-sdk

Objective-C
2
star
99

marathon-capabilities-plugin

A plugin to allow marathon to leverage mesos capabilities isolator
Scala
2
star
100

node-disruption-controller

Go
2
star