• Stars
    star
    10,062
  • Rank 3,234 (Top 0.07 %)
  • Language
    Go
  • License
    Apache License 2.0
  • Created almost 11 years ago
  • Updated 20 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Exporter for machine metrics

Node exporter

CircleCI Buildkite status Docker Repository on Quay Docker Pulls Go Report Card

Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors.

The Windows exporter is recommended for Windows users. To expose NVIDIA GPU metrics, prometheus-dcgm can be used.

Installation and Usage

If you are new to Prometheus and node_exporter there is a simple step-by-step guide.

The node_exporter listens on HTTP port 9100 by default. See the --help output for more options.

Ansible

For automated installs with Ansible, there is the Prometheus Community role.

Docker

The node_exporter is designed to monitor the host system. It's not recommended to deploy it as a Docker container because it requires access to the host system.

For situations where Docker deployment is needed, some extra flags must be used to allow the node_exporter access to the host namespaces.

Be aware that any non-root mount points you want to monitor will need to be bind-mounted into the container.

If you start container for host monitoring, specify path.rootfs argument. This argument must match path in bind-mount of host root. The node_exporter will use path.rootfs as prefix to access host filesystem.

docker run -d \
  --net="host" \
  --pid="host" \
  -v "/:/host:ro,rslave" \
  quay.io/prometheus/node-exporter:latest \
  --path.rootfs=/host

For Docker compose, similar flag changes are needed.

---
version: '3.8'

services:
  node_exporter:
    image: quay.io/prometheus/node-exporter:latest
    container_name: node_exporter
    command:
      - '--path.rootfs=/host'
    network_mode: host
    pid: host
    restart: unless-stopped
    volumes:
      - '/:/host:ro,rslave'

On some systems, the timex collector requires an additional Docker flag, --cap-add=SYS_TIME, in order to access the required syscalls.

Collectors

There is varying support for collectors on each operating system. The tables below list all existing collectors and the supported systems.

Collectors are enabled by providing a --collector.<name> flag. Collectors that are enabled by default can be disabled by providing a --no-collector.<name> flag. To enable only some specific collector(s), use --collector.disable-defaults --collector.<name> ....

Include & Exclude flags

A few collectors can be configured to include or exclude certain patterns using dedicated flags. The exclude flags are used to indicate "all except", while the include flags are used to say "none except". Note that these flags are mutually exclusive on collectors that support both.

Example:

--collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)

List:

Collector Scope Include Flag Exclude Flag
arp device --collector.arp.device-include --collector.arp.device-exclude
cpu bugs --collector.cpu.info.bugs-include N/A
cpu flags --collector.cpu.info.flags-include N/A
diskstats device --collector.diskstats.device-include --collector.diskstats.device-exclude
ethtool device --collector.ethtool.device-include --collector.ethtool.device-exclude
ethtool metrics --collector.ethtool.metrics-include N/A
filesystem fs-types N/A --collector.filesystem.fs-types-exclude
filesystem mount-points N/A --collector.filesystem.mount-points-exclude
hwmon chip --collector.hwmon.chip-include --collector.hwmon.chip-exclude
netdev device --collector.netdev.device-include --collector.netdev.device-exclude
qdisk device --collector.qdisk.device-include --collector.qdisk.device-exclude
sysctl all --collector.sysctl.include N/A
systemd unit --collector.systemd.unit-include --collector.systemd.unit-exclude

Enabled by default

Name Description OS
arp Exposes ARP statistics from /proc/net/arp. Linux
bcache Exposes bcache statistics from /sys/fs/bcache/. Linux
bonding Exposes the number of configured and active slaves of Linux bonding interfaces. Linux
btrfs Exposes btrfs statistics Linux
boottime Exposes system boot time derived from the kern.boottime sysctl. Darwin, Dragonfly, FreeBSD, NetBSD, OpenBSD, Solaris
conntrack Shows conntrack statistics (does nothing if no /proc/sys/net/netfilter/ present). Linux
cpu Exposes CPU statistics Darwin, Dragonfly, FreeBSD, Linux, Solaris, OpenBSD
cpufreq Exposes CPU frequency statistics Linux, Solaris
diskstats Exposes disk I/O statistics. Darwin, Linux, OpenBSD
dmi Expose Desktop Management Interface (DMI) info from /sys/class/dmi/id/ Linux
edac Exposes error detection and correction statistics. Linux
entropy Exposes available entropy. Linux
exec Exposes execution statistics. Dragonfly, FreeBSD
fibrechannel Exposes fibre channel information and statistics from /sys/class/fc_host/. Linux
filefd Exposes file descriptor statistics from /proc/sys/fs/file-nr. Linux
filesystem Exposes filesystem statistics, such as disk space used. Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
hwmon Expose hardware monitoring and sensor data from /sys/class/hwmon/. Linux
infiniband Exposes network statistics specific to InfiniBand and Intel OmniPath configurations. Linux
ipvs Exposes IPVS status from /proc/net/ip_vs and stats from /proc/net/ip_vs_stats. Linux
loadavg Exposes load average. Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris
mdadm Exposes statistics about devices in /proc/mdstat (does nothing if no /proc/mdstat present). Linux
meminfo Exposes memory statistics. Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netclass Exposes network interface info from /sys/class/net/ Linux
netdev Exposes network interface statistics such as bytes transferred. Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netisr Exposes netisr statistics FreeBSD
netstat Exposes network statistics from /proc/net/netstat. This is the same information as netstat -s. Linux
nfs Exposes NFS client statistics from /proc/net/rpc/nfs. This is the same information as nfsstat -c. Linux
nfsd Exposes NFS kernel server statistics from /proc/net/rpc/nfsd. This is the same information as nfsstat -s. Linux
nvme Exposes NVMe info from /sys/class/nvme/ Linux
os Expose OS release info from /etc/os-release or /usr/lib/os-release any
powersupplyclass Exposes Power Supply statistics from /sys/class/power_supply Linux
pressure Exposes pressure stall statistics from /proc/pressure/. Linux (kernel 4.20+ and/or CONFIG_PSI)
rapl Exposes various statistics from /sys/class/powercap. Linux
schedstat Exposes task scheduler statistics from /proc/schedstat. Linux
selinux Exposes SELinux statistics. Linux
sockstat Exposes various statistics from /proc/net/sockstat. Linux
softnet Exposes statistics from /proc/net/softnet_stat. Linux
stat Exposes various statistics from /proc/stat. This includes boot time, forks and interrupts. Linux
tapestats Exposes statistics from /sys/class/scsi_tape. Linux
textfile Exposes statistics read from local disk. The --collector.textfile.directory flag must be set. any
thermal Exposes thermal statistics like pmset -g therm. Darwin
thermal_zone Exposes thermal zone & cooling device statistics from /sys/class/thermal. Linux
time Exposes the current system time. any
timex Exposes selected adjtimex(2) system call stats. Linux
udp_queues Exposes UDP total lengths of the rx_queue and tx_queue from /proc/net/udp and /proc/net/udp6. Linux
uname Exposes system information as provided by the uname system call. Darwin, FreeBSD, Linux, OpenBSD
vmstat Exposes statistics from /proc/vmstat. Linux
xfs Exposes XFS runtime statistics. Linux (kernel 4.4+)
zfs Exposes ZFS performance statistics. FreeBSD, Linux, Solaris

Disabled by default

node_exporter also implements a number of collectors that are disabled by default. Reasons for this vary by collector, and may include:

  • High cardinality
  • Prolonged runtime that exceeds the Prometheus scrape_interval or scrape_timeout
  • Significant resource demands on the host

You can enable additional collectors as desired by adding them to your init system's or service supervisor's startup configuration for node_exporter but caution is advised. Enable at most one at a time, testing first on a non-production system, then by hand on a single production node. When enabling additional collectors, you should carefully monitor the change by observing the scrape_duration_seconds metric to ensure that collection completes and does not time out. In addition, monitor the scrape_samples_post_metric_relabeling metric to see the changes in cardinality.

Name Description OS
buddyinfo Exposes statistics of memory fragments as reported by /proc/buddyinfo. Linux
cgroups A summary of the number of active and enabled cgroups Linux
cpu_vulnerabilities Exposes CPU vulnerability information from sysfs. Linux
devstat Exposes device statistics Dragonfly, FreeBSD
drm Expose GPU metrics using sysfs / DRM, amdgpu is the only driver which exposes this information through DRM Linux
drbd Exposes Distributed Replicated Block Device statistics (to version 8.4) Linux
ethtool Exposes network interface information and network driver statistics equivalent to ethtool, ethtool -S, and ethtool -i. Linux
interrupts Exposes detailed interrupts statistics. Linux, OpenBSD
ksmd Exposes kernel and system statistics from /sys/kernel/mm/ksm. Linux
lnstat Exposes stats from /proc/net/stat/. Linux
logind Exposes session counts from logind. Linux
meminfo_numa Exposes memory statistics from /proc/meminfo_numa. Linux
mountstats Exposes filesystem statistics from /proc/self/mountstats. Exposes detailed NFS client statistics. Linux
network_route Exposes the routing table as metrics Linux
perf Exposes perf based metrics (Warning: Metrics are dependent on kernel configuration and settings). Linux
processes Exposes aggregate process statistics from /proc. Linux
qdisc Exposes queuing discipline statistics Linux
slabinfo Exposes slab statistics from /proc/slabinfo. Note that permission of /proc/slabinfo is usually 0400, so set it appropriately. Linux
softirqs Exposes detailed softirq statistics from /proc/softirqs. Linux
sysctl Expose sysctl values from /proc/sys. Use --collector.sysctl.include(-info) to configure. Linux
systemd Exposes service and system status from systemd. Linux
tcpstat Exposes TCP connection status information from /proc/net/tcp and /proc/net/tcp6. (Warning: the current version has potential performance issues in high load situations.) Linux
wifi Exposes WiFi device and station statistics. Linux
zoneinfo Exposes NUMA memory zone metrics. Linux

Deprecated

These collectors are deprecated and will be removed in the next major release.

Name Description OS
ntp Exposes local NTP daemon health to check time any
runit Exposes service status from runit. any
supervisord Exposes service status from supervisord. any

Perf Collector

The perf collector may not work out of the box on some Linux systems due to kernel configuration and security settings. To allow access, set the following sysctl parameter:

sysctl -w kernel.perf_event_paranoid=X
  • 2 allow only user-space measurements (default since Linux 4.6).
  • 1 allow both kernel and user measurements (default before Linux 4.6).
  • 0 allow access to CPU-specific data but not raw tracepoint samples.
  • -1 no restrictions.

Depending on the configured value different metrics will be available, for most cases 0 will provide the most complete set. For more information see man 2 perf_event_open.

By default, the perf collector will only collect metrics of the CPUs that node_exporter is running on (ie runtime.NumCPU. If this is insufficient (e.g. if you run node_exporter with its CPU affinity set to specific CPUs), you can specify a list of alternate CPUs by using the --collector.perf.cpus flag. For example, to collect metrics on CPUs 2-6, you would specify: --collector.perf --collector.perf.cpus=2-6. The CPU configuration is zero indexed and can also take a stride value; e.g. --collector.perf --collector.perf.cpus=1-10:5 would collect on CPUs 1, 5, and 10.

The perf collector is also able to collect tracepoint counts when using the --collector.perf.tracepoint flag. Tracepoints can be found using perf list or from debugfs. And example usage of this would be --collector.perf.tracepoint="sched:sched_process_exec".

Sysctl Collector

The sysctl collector can be enabled with --collector.sysctl. It supports exposing numeric sysctl values as metrics using the --collector.sysctl.include flag and string values as info metrics by using the --collector.sysctl.include-info flag. The flags can be repeated. For sysctl with multiple numeric values, an optional mapping can be given to expose each value as its own metric. Otherwise an index label is used to identify the different fields.

Examples

Numeric values
Single values

Using --collector.sysctl.include=vm.user_reserve_kbytes: vm.user_reserve_kbytes = 131072 -> node_sysctl_vm_user_reserve_kbytes 131072

Multiple values

A sysctl can contain multiple values, for example:

net.ipv4.tcp_rmem = 4096	131072	6291456

Using --collector.sysctl.include=net.ipv4.tcp_rmem the collector will expose:

node_sysctl_net_ipv4_tcp_rmem{index="0"} 4096
node_sysctl_net_ipv4_tcp_rmem{index="1"} 131072
node_sysctl_net_ipv4_tcp_rmem{index="2"} 6291456

If the indexes have defined meaning like in this case, the values can be mapped to multiple metrics by appending the mapping to the --collector.sysctl.include flag: Using --collector.sysctl.include=net.ipv4.tcp_rmem:min,default,max the collector will expose:

node_sysctl_net_ipv4_tcp_rmem_min 4096
node_sysctl_net_ipv4_tcp_rmem_default 131072
node_sysctl_net_ipv4_tcp_rmem_max 6291456
String values

String values need to be exposed as info metric. The user selects them by using the --collector.sysctl.include-info flag.

Single values

kernel.core_pattern = core -> node_sysctl_info{key="kernel.core_pattern_info", value="core"} 1

Multiple values

Given the following sysctl:

kernel.seccomp.actions_avail = kill_process kill_thread trap errno trace log allow

Setting --collector.sysctl.include-info=kernel.seccomp.actions_avail will yield:

node_sysctl_info{key="kernel.seccomp.actions_avail", index="0", value="kill_process"} 1
node_sysctl_info{key="kernel.seccomp.actions_avail", index="1", value="kill_thread"} 1
...

Textfile Collector

The textfile collector is similar to the Pushgateway, in that it allows exporting of statistics from batch jobs. It can also be used to export static metrics, such as what role a machine has. The Pushgateway should be used for service-level metrics. The textfile module is for metrics that are tied to a machine.

To use it, set the --collector.textfile.directory flag on the node_exporter commandline. The collector will parse all files in that directory matching the glob *.prom using the text format. Note: Timestamps are not supported.

To atomically push completion time for a cron job:

echo my_batch_job_completion_time $(date +%s) > /path/to/directory/my_batch_job.prom.$$
mv /path/to/directory/my_batch_job.prom.$$ /path/to/directory/my_batch_job.prom

To statically set roles for a machine using labels:

echo 'role{role="application_server"} 1' > /path/to/directory/role.prom.$$
mv /path/to/directory/role.prom.$$ /path/to/directory/role.prom

Filtering enabled collectors

The node_exporter will expose all metrics from enabled collectors by default. This is the recommended way to collect metrics to avoid errors when comparing metrics of different families.

For advanced use the node_exporter can be passed an optional list of collectors to filter metrics. The collect[] parameter may be used multiple times. In Prometheus configuration you can use this syntax under the scrape config.

  params:
    collect[]:
      - foo
      - bar

This can be useful for having different Prometheus servers collect specific metrics from nodes.

Development building and running

Prerequisites:

Building:

git clone https://github.com/prometheus/node_exporter.git
cd node_exporter
make build
./node_exporter <flags>

To see all available configuration flags:

./node_exporter -h

Running tests

make test

TLS endpoint

** EXPERIMENTAL **

The exporter supports TLS via a new web configuration file.

./node_exporter --web.config.file=web-config.yml

See the exporter-toolkit https package for more details.

More Repositories

1

prometheus

The Prometheus monitoring system and time series database.
Go
52,273
star
2

alertmanager

Prometheus Alertmanager
Go
6,196
star
3

client_golang

Prometheus instrumentation library for Go applications
Go
5,056
star
4

blackbox_exporter

Blackbox prober exporter
Go
4,198
star
5

client_python

Prometheus instrumentation library for Python applications
Python
3,726
star
6

jmx_exporter

A process for exposing JMX Beans via HTTP for Prometheus consumption
Java
2,866
star
7

pushgateway

Push acceptor for ephemeral and batch jobs.
Go
2,836
star
8

client_java

Prometheus instrumentation library for JVM applications
Java
2,116
star
9

mysqld_exporter

Exporter for MySQL server metrics
Go
1,976
star
10

snmp_exporter

SNMP Exporter for Prometheus
Go
1,489
star
11

statsd_exporter

StatsD to Prometheus metrics exporter
Go
884
star
12

cloudwatch_exporter

Metrics exporter for Amazon AWS CloudWatch
Java
854
star
13

procfs

procfs provides functions to retrieve system, kernel and process metrics from the pseudo-filesystem proc.
Go
726
star
14

docs

Prometheus documentation: content and static site generator
SCSS
619
star
15

haproxy_exporter

Simple server that scrapes HAProxy stats and exports them via HTTP for Prometheus consumption
Go
607
star
16

promlens

PromLens – The query builder, analyzer, and explainer for PromQL
TypeScript
507
star
17

client_ruby

Prometheus instrumentation library for Ruby applications
Ruby
498
star
18

consul_exporter

Exporter for Consul metrics
Go
419
star
19

client_rust

Prometheus / OpenMetrics client library in Rust
Rust
407
star
20

prom2json

A tool to scrape a Prometheus client and dump the result as JSON.
Go
336
star
21

graphite_exporter

Server that accepts metrics via the Graphite protocol and exports them as Prometheus metrics
Go
330
star
22

promu

Prometheus Utility Tool
Go
255
star
23

collectd_exporter

A server that accepts collectd stats via HTTP POST and exports them via HTTP for Prometheus consumption
Go
251
star
24

common

Go libraries shared across Prometheus components and libraries.
Go
250
star
25

influxdb_exporter

A server that accepts InfluxDB metrics via the HTTP API and exports them via HTTP for Prometheus consumption
Go
248
star
26

exporter-toolkit

Utility package to build exporters
Go
237
star
27

memcached_exporter

Exports metrics from memcached servers for consumption by Prometheus.
Go
176
star
28

test-infra

Prometheus E2E benchmarking tool
Go
147
star
29

compliance

A set of tests to check compliance with various Prometheus interfaces
Go
120
star
30

nagios_plugins

Nagios plugins for alerting on Prometheus query results
Shell
103
star
31

demo-site

Demo site auto-deployed with Ansible and Travis CI.
HTML
93
star
32

client_model

Data model artifacts for Prometheus.
Makefile
70
star
33

golang-builder

Prometheus Golang builder Docker images
Shell
69
star
34

codemirror-promql

PromQL support for the CodeMirror code editor
TypeScript
37
star
35

busybox

Prometheus Busybox Docker base images
Makefile
37
star
36

prometheus_api_client_ruby

A Ruby library for reading metrics stored on a Prometheus server
Ruby
34
star
37

talks

Track Prometheus talks
20
star
38

lezer-promql

A lezer-based PromQL grammar
JavaScript
11
star
39

circleci

8
star
40

host_exporter

See the "node_exporter" repository instead!
8
star
41

proposals

Design documents for Prometheus Ecosystem
Makefile
7
star
42

snmp_exporter_mibs

4
star
43

promci

GitHub Actions repository
4
star
44

kube-demo-site

Kubernetes Demo Site
Go
1
star