• Stars
    star
    217
  • Rank 182,446 (Top 4 %)
  • Language
    Go
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

NVIDIA GPU Prometheus Exporter

NVIDIA GPU Prometheus Exporter

This is a Prometheus Exporter for exporting NVIDIA GPU metrics. It uses the Go bindings for NVIDIA Management Library (NVML) which is a C-based API that can be used for monitoring NVIDIA GPU devices. Unlike some other similar exporters, it does not call the nvidia-smi binary.

Building

The repository includes nvml.h, so there are no special requirements from the build environment. go get should be able to build the exporter binary.

go get github.com/mindprince/nvidia_gpu_prometheus_exporter

Running

The exporter requires the following:

  • access to NVML library (libnvidia-ml.so.1).
  • access to the GPU devices.

To make sure that the exporter can access the NVML libraries, either add them to the search path for shared libraries. Or set LD_LIBRARY_PATH to point to their location.

By default the metrics are exposed on port 9445. This can be updated using the -web.listen-address flag.

Running inside a container

There's a docker image available on Docker Hub at mindprince/nvidia_gpu_prometheus_exporter

If you are running the exporter inside a container, you will need to do the following to give the container access to NVML library:

-e LD_LIBRARY_PATH=<path-where-nvml-is-present>
--volume <above-path>:<above-path>

And you will need to do one of the following to give it access to the GPU devices:

  • Run with --privileged
  • If you are on docker v17.04.0-ce or above, run with --device-cgroup-rule 'c 195:* mrw'
  • Run with --device /dev/nvidiactl:/dev/nvidiactl /dev/nvidia0:/dev/nvidia0 /dev/nvidia1:/dev/nvidia1 <and-so-on-for-all-nvidia-devices>

If you don't want to do the above, you can run it using nvidia-docker.

Running using nvidia-docker

nvidia-docker run -p 9445:9445 -ti mindprince/nvidia_gpu_prometheus_exporter:0.1