• Stars
    star
    175
  • Rank 218,059 (Top 5 %)
  • Language
    Perl
  • License
    PostgreSQL License
  • Created about 10 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Nagios remote agent
=head1 NAME

check_pgactivity - PostgreSQL plugin for Nagios

=head1 SYNOPSIS

  check_pgactivity {-w|--warning THRESHOLD} {-c|--critical THRESHOLD} [-s|--service SERVICE ] [-h|--host HOST] [-U|--username ROLE] [-p|--port PORT] [-d|--dbname DATABASE] [-S|--dbservice SERVICE_NAME] [-P|--psql PATH] [--debug] [--status-file FILE] [--path PATH] [-t|--timemout TIMEOUT]
  check_pgactivity [-l|--list]
  check_pgactivity [--help]

=head1 DESCRIPTION

check_pgactivity is designed to monitor PostgreSQL clusters from Nagios. It
offers many options to measure and monitor useful performance metrics.

=head1 COMPATIBILITY

Each service is available from a different PostgreSQL version,
from 7.4, as documented below.
The psql client must be 8.3 at least. It can be used with an older server.
Please report any undocumented incompatibility.

=over

=item B<-s>, B<--service> SERVICE

The Nagios service to run. See section SERVICES for a description of
available services or use C<--list> for a short service and description
list.

=item B<-h>, B<--host> HOST

Database server host or socket directory (default: $PGHOST or "localhost")

See section C<CONNECTIONS> for more informations.

=item B<-U>, B<--username> ROLE

Database user name (default: $PGUSER or "postgres").

See section C<CONNECTIONS> for more informations.

=item B<-p>, B<--port> PORT

Database server port (default: $PGPORT or "5432").

See section C<CONNECTIONS> for more informations.

=item B<-d>, B<--dbname> DATABASE

Database name to connect to (default: $PGDATABASE or "template1").

B<WARNING>! This is not necessarily one of the database that will be
checked. See C<--dbinclude> and C<--dbexclude> .

See section C<CONNECTIONS> for more informations.

=item B<-S>, B<--dbservice> SERVICE_NAME

The connection service name from pg_service.conf to use.

See section C<CONNECTIONS> for more informations.

=item B<--dbexclude> REGEXP

Some services automatically check all the databases of your
cluster (note: that does not mean they always need to connect on all
of them to check them though). C<--dbexclude> excludes any
database whose name matches the given Perl regular expression.
Repeat this option as many time as needed.

See C<--dbinclude> as well. If a database match both dbexclude and
dbinclude arguments, it is excluded.

=item B<--dbinclude> REGEXP

Some services automatically check all the databases of your
cluster (note: that does not imply that they always need to connect to all
of them though). Some always exclude the 'postgres'
database and templates. C<--dbinclude> checks B<ONLY>
databases whose names match the given Perl regular expression.
Repeat this option as many time as needed.

See C<--dbexclude> as well. If a database match both dbexclude and
dbinclude arguments, it is excluded.

=item B<-w>, B<--warning> THRESHOLD

The Warning threshold.

=item B<-c>, B<--critical> THRESHOLD

The Critical threshold.

=item B<-F>, B<--format> OUTPUT_FORMAT

The output format. Supported output are: C<binary>, C<debug>, C<human>,
C<nagios>, C<nagios_strict>, C<json> and C<json_strict>.

Using the C<binary> format, the results are written in a binary file (using
perl module C<Storable>) given in argument C<--output>. If no output is given,
defaults to file C<check_pgactivity.out> in the same directory as the script.

The C<nagios_strict> and C<json_strict> formats are equivalent to the C<nagios>
and C<json> formats respectively. The only difference is that they enforce the
units to follow the strict Nagios specs: B, c, s or %. Any unit absent from
this list is dropped (Bps, Tps, etc).

=item B<--tmpdir> DIRECTORY

Path to a directory where the script can create temporary files. The
script relies on the system default temporary directory if possible.

=item B<-P>, B<--psql> FILE

Path to the C<psql> executable (default: "psql").
It should be version 8.3 at least, but the server can be older.

=item B<--status-file> PATH

Path to the file where service status information is kept between successive
calls. Default is to save a file called C<check_pgactivity.data> in the same
directory as the script.

Note that this file is protected from concurrent writes using a lock file
located in the same directory, having the same name than the status file, but
with the extension C<.lock>.

On some plateform, network filesystems may not be supported correctly by the
locking mechanism. See C<perldoc -f flock> for more information.

=item B<--dump-status-file>

Dump the content of the status file and exit. This is useful for debugging
purpose.

=item B<--dump-bin-file> [PATH]

Dump the content of the given binary file previously created using
C<--format binary>. If no path is given, defaults to file
C<check_pgactivity.out> in the same directory as the script.

=item B<-t>, B<--timeout> TIMEOUT

Timeout (default: "30s"), as raw (in seconds) or as
an interval. This timeout will be used as C<statement_timeout> for psql and URL
timeout for C<minor_version> service.

=item B<-l>, B<--list>

List available services.

=item B<-V>, B<--version>

Print version and exit.

=item B<--debug>

Print some debug messages.

=item B<-?>, B<--help>

Show this help page.

=back

=head2 THRESHOLDS

THRESHOLDS provided as warning and critical values can be raw numbers,
percentages, intervals or sizes. Each available service supports one or more
formats (eg. a size and a percentage).

=over

=item B<Percentage>

If THRESHOLD is a percentage, the value should end with a '%' (no space).
For instance: 95%.

=item B<Interval>

If THRESHOLD is an interval, the following units are accepted (not case
sensitive): s (second), m (minute), h (hour), d (day). You can use more than
one unit per given value. If not set, the last unit is in seconds.
For instance: "1h 55m 6" = "1h55m6s".

=pod

=item B<Size>

If THRESHOLD is a size, the following units are accepted (not case sensitive):
b (Byte), k (KB), m (MB), g (GB), t (TB), p (PB), e (EB) or Z (ZB). Only
integers are accepted. Eg. C<1.5MB> will be refused, use C<1500kB>.

The factor between units is 1024 bytes. Eg. C<1g = 1G = 1024*1024*1024.>

=back

=head2 CONNECTIONS

check_pgactivity allows two different connection specifications: by service or
by specifying values for host, user, port, and database.
Some services can run on multiple hosts, or needs to connect to multiple hosts.

You might specify one of the parameters below to connect to your PostgreSQL
instance.  If you don't, no connection parameters are given to psql: connection
relies on binary defaults and environment.

The format for connection parameters is:

=over

=item B<Parameter> C<--dbservice SERVICE_NAME>

Define a new host using the given service. Multiple hosts can be defined by
listing multiple services separated by a comma. Eg.

  --dbservice service1,service2

For more information about service definition, see:
L<https://www.postgresql.org/docs/current/libpq-pgservice.html>

=item B<Parameters> C<--host HOST>, C<--port PORT>, C<--user ROLE> or C<--dbname DATABASE>

One parameter is enough to define a new host. Usual environment variables
(PGHOST, PGPORT, PGDATABASE, PGUSER, PGSERVICE, PGPASSWORD) or default values
are used for missing parameters.

As for usual PostgreSQL tools, there is no command line argument to set the
password, to avoid exposing it. Use PGPASSWORD, .pgpass or a service file
(recommended).

If multiple values are given, define as many host as maximum given values.

Values are associated by position. Eg.:

  --host h1,h2 --port 5432,5433

Means "host=h1 port=5432" and "host=h2 port=5433".

If the number of values is different between parameters, any host missing a
parameter will use the first given value for this parameter. Eg.:

  --host h1,h2 --port 5433

Means: "host=h1 port=5433" and "host=h2 port=5433".

=item B<Services are defined first>

For instance:

  --dbservice s1 --host h1 --port 5433

means: use "service=s1" and "host=h1 port=5433" in this order. If the service
supports only one host, the second host is ignored.

=item B<Mutual exclusion between both methods>

You can not overwrite services connections variables with parameters C<--host HOST>,
C<--port PORT>, C<--user ROLE> or C<--dbname DATABASE>

=back

=head2 SERVICES

Descriptions and parameters of available services.

=over


=item B<archive_folder>

Check if all archived WALs exist between the oldest and the latest WAL in the
archive folder and make sure they are 16MB. The given folder must have archived
files from ONE cluster. The version of PostgreSQL that created the archives is
only checked on the last one, for performance consideration.

This service requires the argument C<--path> on the command line to specify the
archive folder path to check. Obviously, it must have access to this
folder at the filesystem level: you may have to execute it on the archiving
server rather than on the PostgreSQL instance.

The optional argument C<--suffix> defines the suffix of your archived
WALs; this is useful for compressed WALs (eg. .gz, .bz2, ...).
Default is no suffix.

This service needs to read the header of one of the archives to define how many
segments a WAL owns. Check_pgactivity automatically handles files with
extensions .gz, .bz2, .xz, .zip or .7z using the following commands:

  gzip -dc
  bzip2 -dc
  xz -dc
  unzip -qqp
  7z x -so

If needed, provide your own command that writes the uncompressed file
to standard output with the C<--unarchiver> argument.

Optional argument C<--ignore-wal-size> skips the WAL size check. This is useful
if your archived WALs are compressed and check_pgactivity is unable to guess
the original size. Here are the commands check_pgactivity uses to guess the
original size of .gz, .xz or .zip files:

  gzip -ql
  xz -ql
  unzip -qql

Default behaviour is to check the WALs size.

Perfdata contains the number of archived WALs and the age of the most recent
one.

Critical and Warning define the max age of the latest archived WAL as an
interval (eg. 5m or 300s ).

Required privileges: unprivileged role; the system user needs read access
to archived WAL files.

Sample commands:

  check_pgactivity -s archive_folder --path /path/to/archives -w 15m -c 30m
  check_pgactivity -s archive_folder --path /path/to/archives --suffix .gz -w 15m -c 30m
  check_pgactivity -s archive_folder --path /path/to/archives --ignore-wal-size --suffix .bz2 -w 15m -c 30m
  check_pgactivity -s archive_folder --path /path/to/archives --unarchiver "unrar p" --ignore-wal-size --suffix .rar -w 15m -c 30m

=item B<archiver> (8.1+)

Check if the archiver is working properly and the number of WAL files ready to
archive.

Perfdata returns the number of WAL files waiting to be archived.

Critical and Warning thresholds are optional. They apply on the number of files
waiting to be archived. They only accept a raw number of files.

Whatever the given threshold, a critical alert is raised if the archiver
process did not archive the oldest waiting WAL to be archived since last call.

Required privileges: superuser (<v11), grant execute on function pg_stat_file(text) for v11+.

=item B<autovacuum> (8.1+)

Check the autovacuum activity on the cluster.

Perfdata contains the age of oldest running autovacuum and the number of
workers by type (VACUUM, VACUUM ANALYZE, ANALYZE, VACUUM FREEZE).

Thresholds, if any, are ignored.

Required privileges: unprivileged role.

=item B<backends> (all)

Check the total number of connections in the PostgreSQL cluster.

Perfdata contains the number of connections per database.

Critical and Warning thresholds accept either a raw number or a percentage (eg.
80%). When a threshold is a percentage, it is compared to the difference
between the cluster parameters C<max_connections> and
C<superuser_reserved_connections>.

Required privileges: an unprivileged user only sees its own queries;
a pg_monitor (10+) or superuser (<10) role is required to see all queries.

=item B<backends_status> (8.2+)

Check the status of all backends. Depending on your PostgreSQL version,
statuses are: C<idle>, C<idle in transaction>, C<idle in transaction (aborted)>
(>=9.0 only), C<fastpath function call>, C<active>, C<waiting for lock>,
C<undefined>, C<disabled> and C<insufficient privilege>.
B<insufficient privilege> appears when you are not allowed to see the statuses
of other connections.

This service supports the argument C<--exclude REGEX> to exclude queries
matching the given regular expression.

You can use multiple C<--exclude REGEX> arguments.

Critical and Warning thresholds are optional. They accept a list of
'status_label=value' separated by a comma. Available labels are C<idle>,
C<idle_xact>, C<aborted_xact>, C<fastpath>, C<active> and C<waiting>. Values
are raw numbers or time units and empty lists are forbidden. Here is an example:

    -w 'waiting=5,idle_xact=10' -c 'waiting=20,idle_xact=30,active=1d'

Perfdata contains the number of backends for each status and the oldest one for
each of them, for 8.2+.

Note that the number of backends reported in Nagios message B<includes>
excluded backends.

Required privileges: an unprivileged user only sees its own queries;
a pg_monitor (10+) or superuser (<10) role is required to see all queries.

=item B<checksum_errors> (12+)

Check for data checksums error, reported in pg_stat_database.

This service requires that data checksums are enabled on the target instance.
UNKNOWN will be returned if that's not the case.

Critical and Warning thresholds are optional. They only accept a raw number of
checksums errors per database.  If the thresholds are not provided, a default
value of `1` will be used for both thresholds.

Checksums errors are CRITICAL issues, so it's highly recommended to keep
default threshold, as immediate action should be taken as soon as such a
problem arises.

Perfdata contains the number of error per database.

Required privileges: unprivileged user.

=item B<backup_label_age> (8.1+)

Check the age of the backup label file.

Perfdata returns the age of the backup_label file, -1 if not present.

Critical and Warning thresholds only accept an interval (eg. 1h30m25s).

Required privileges: unprivileged role (9.3+); superuser (<9.3)

=item B<bgwriter> (8.3+)

Check the percentage of pages written by backends since last check.

This service uses the status file (see C<--status-file> parameter).

Perfdata contains the ratio per second for each C<pg_stat_bgwriter> counter
since last execution. Units Nps for checkpoints, max written clean and fsyncs
are the number of "events" per second.

Critical and Warning thresholds are optional. If set, they I<only> accept a
percentage.

Required privileges: unprivileged role.

=item B<btree_bloat>

Estimate bloat on B-tree indexes.

Warning and critical thresholds accept a comma-separated list of either
raw number(for a size), size (eg. 125M) or percentage. The thresholds apply to
B<bloat> size, not object size. If a percentage is given, the threshold will
apply to the bloat size compared to the total index size. If multiple threshold
values are passed, check_pgactivity will choose the largest (bloat size) value.

This service supports both C<--dbexclude> and C<--dbinclude> parameters.
The 'postgres' database and templates are always excluded.

It also supports a C<--exclude REGEX> parameter to exclude relations matching
a regular expression. The regular expression applies to
"database.schema_name.relation_name". This enables you to filter either on a
relation name for all schemas and databases, on a qualified named relation
(schema + relation) for all databases or on a qualified named relation in
only one database.

You can use multiple C<--exclude REGEX> parameters.

Perfdata will return the number of indexes of concern, by warning and critical
threshold per database.

A list of the bloated indexes will be returned after the
perfdata. This list contains the fully qualified bloated index name, the
estimated bloat size, the index size and the bloat percentage.

Required privileges: superuser (<10) able to log in all databases, or at least
those in C<--dbinclude>; superuser (<10);
on PostgreSQL 10+, a user with the role pg_monitor suffices,
provided that you grant SELECT on the system table pg_statistic
to the pg_monitor role, in each database of the cluster:
C<GRANT SELECT ON pg_statistic TO pg_monitor;>

=item B<session_stats> (14+)

Gather miscellaneous session statistics.

This service uses the status file (see --status-file parameter).

Perfdata contains the session / active / idle-in-transaction times
for each database since last call, as well as the number of sessions per second,
and the number of sessions killed / abandoned / terminated by fatal errors.

Required privileges: unprivileged role.

=item B<commit_ratio> (all)

Check the commit and rollback rate per second since last call.

This service uses the status file (see --status-file parameter).

Perfdata contains the commit rate, rollback rate, transaction rate and rollback
ratio for each database since last call.

Critical and Warning thresholds are optional. They accept a list of comma
separated 'label=value'. Available labels are B<rollbacks>, B<rollback_rate>
and B<rollback_ratio>, which will be compared to the number of rollbacks, the
rollback rate and the rollback ratio of each database. Warning or critical will
be raised if the reported value is greater than B<rollbacks>, B<rollback_rate>
or B<rollback_ratio>.

Required privileges: unprivileged role.

=item B<configuration> (8.0+)

Check the most important settings.

Warning and Critical thresholds are ignored.

Specific parameters are :
C<--work_mem>, C<--maintenance_work_mem>, C<--shared_buffers>,C<--wal_buffers>,
C<--checkpoint_segments>, C<--effective_cache_size>, C<--no_check_autovacuum>,
C<--no_check_fsync>, C<--no_check_enable>, C<--no_check_track_counts>.

Required privileges: unprivileged role.

=item B<connection> (all)

Perform a simple connection test.

No perfdata is returned.

This service ignores critical and warning arguments.

Required privileges: unprivileged role.

=item B<custom_query> (all)

Perform the given user query.

Specify the query with C<--query>. The first column will be
used to perform the test for the status if warning and critical are provided.

The warning and critical arguments are optional. They can be of format integer
(default), size or time depending on the C<--type> argument.
Warning and Critical will be raised if they are greater than the first column,
or less if the C<--reverse> option is used.

All other columns will be used to generate the perfdata. Each field name is
used as the name of the perfdata. The field value must contain your perfdata
value and its unit appended to it. You can add as many fields as needed. Eg.:

  SELECT pg_database_size('postgres'),
         pg_database_size('postgres')||'B' AS db_size

Required privileges: unprivileged role (depends on the query).

=item B<database_size> (8.1+)

B<Check the variation> of database sizes, and B<return the size> of every
databases.

This service uses the status file (see C<--status-file> parameter).

Perfdata contains the size of each database and their size delta since last call.

Critical and Warning thresholds are optional. They are a list of optional 'label=value'
separated by a comma. It allows to fine tune the alert based on the
absolute C<size> and/or the C<delta> size. Eg.:

    -w 'size=500GB' -c 'size=600GB'
    -w 'delta=1%' -c 'delta=10%'
    -w 'size=500GB,delta=1%' -c 'size=600GB,delta=10GB'

The C<size> label accepts either a raw number or a size and checks the total database size.
The C<delta> label accepts either a raw number, a percentage, or a size.
The aim of the delta parameter is to detect unexpected database size variations.
Delta thresholds are absolute value, and delta percentages are computed against
the previous database size.
A same label must be filled for both warning and critical.

For backward compatibility, if a single raw number or percentage or size is given with no
label, it applies on the size difference for each database since the last execution.
Both threshold bellow are equivalent:

    -w 'delta=1%' -c 'delta=10%'
    -w '1%' -c '10%'

This service supports both C<--dbexclude> and C<--dbinclude> parameters.

Required privileges: unprivileged role.

=item B<extensions_versions> (9.1+)

Check all extensions installed in all databases (including templates)
and raise a critical alert if the current version is not the default
version available on the instance (according to pg_available_extensions).

Typically, it is used to detect forgotten extension upgrades after package
upgrades or a pg_upgrade.

Perfdata returns the number of outdated extensions in each database.

This service supports both C<--dbexclude> and C<--dbinclude> parameters.
Schemas are ignored, as an extension cannot be installed more than once
in a database.

This service supports multiple C<--exclude> argument to exclude one or
more extensions from the check. To ignore an extension only in a particular database,
use  'dbname/extension_name' syntax.

Examples:

    --dbexclude 'devdb' --exclude 'testdb/postgis' --exclude 'testdb/postgis_topology'
    --dbinclude 'proddb' --dbinclude 'testdb'  --exclude 'powa'

Required privileges: unprivileged role able to log in all databases

=item B<hit_ratio> (all)

Check the cache hit ratio on the cluster.

This service uses the status file (see C<--status-file> parameter).

Perfdata returns the cache hit ratio per database. Template databases and
databases that do not allow connections will not be checked, nor will the
databases which have never been accessed.

Critical and Warning thresholds are optional. They only accept a percentage.

This service supports both C<--dbexclude> and C<--dbinclude> parameters.

Required privileges: unprivileged role.

=item B<hot_standby_delta> (9.0)

Check the data delta between a cluster and its hot standbys.

You must give the connection parameters for two or more clusters.

Perfdata returns the data delta in bytes between the master and each hot
standby cluster listed.

Critical and Warning thresholds are optional. They can take one or two values
separated by a comma. If only one value given, it applies to both received and
replayed data.
If two values are given, the first one applies to received data, the second one
to replayed ones. These thresholds only accept a size (eg. 2.5G).

This service raises a Critical if it doesn't find exactly ONE valid master
cluster (ie. critical when 0 or 2 and more masters).

Required privileges: unprivileged role.

=item B<is_hot_standby> (9.0+)

Checks if the cluster is in recovery and accepts read only queries.

This service ignores critical and warning arguments.

No perfdata is returned.

Required privileges: unprivileged role.

=item B<is_master> (all)

Checks if the cluster accepts read and/or write queries. This state is reported
as "in production" by pg_controldata.

This service ignores critical and warning arguments.

No perfdata is returned.

Required privileges: unprivileged role.

=item B<invalid_indexes> (8.2+)

Check if there are invalid indexes in a database.

A critical alert is raised if an invalid index is detected.

This service supports both C<--dbexclude>  and C<--dbinclude> parameters.
The 'postgres' database and templates are always excluded.

This service supports a C<--exclude REGEX>  parameter to exclude indexes
matching a regular expression. The regular expression applies to
"database.schema_name.index_name". This enables you to filter either on a
relation name for all schemas and databases, on a qualified named
index (schema + index) for all databases or on a qualified named
index in only one database.

You can use multiple C<--exclude REGEX>  parameters.

Perfdata will return the number of invalid indexes per database.

A list of invalid indexes will be returned after the
perfdata. This list contains the fully qualified index name. If
excluded index is set, the number of exclude indexes is returned.

Required privileges: unprivileged role able to log in all databases.

=item B<is_replay_paused> (9.1+)

Checks if the replication is paused. The service will return UNKNOWN if
executed on a master server.

Thresholds are optional. They must be specified as interval. OK will always be
returned if the standby is not paused, even if replication delta time hits the
thresholds.

Critical or warning are raised if last reported replayed timestamp is greater
than given threshold AND some data received from the master are not applied
yet.  OK will always be returned if the standby is paused, or if the standby
has already replayed everything from master and until some write activity
happens on the master.

Perfdata returned:
  * paused status (0 no, 1 yes, NaN if master)
  * lag time (in second)
  * data delta with master (0 no, 1 yes)

Required privileges: unprivileged role.

=item B<last_analyze> (8.2+)

Check on each databases that the oldest C<analyze> (from autovacuum or not) is
not older than the given threshold.

This service uses the status file (see C<--status-file> parameter) with
PostgreSQL 9.1+.

Perfdata returns oldest C<analyze> per database in seconds. With PostgreSQL
9.1+, the number of [auto]analyses per database since last call is also
returned.

Critical and Warning thresholds only accept an interval (eg. 1h30m25s)
and apply to the oldest execution of analyse.

Tables that were never analyzed, or whose analyze date was lost due to a crash,
will raise a critical alert.

B<NOTE>: this service does not raise alerts if the database had strictly
no writes since last call. In consequence, a read-only database can have
its oldest analyze reported in perfdata way after your thresholds, but not
raise any alerts.

This service supports both C<--dbexclude> and C<--dbinclude> parameters.
The 'postgres' database and templates are always excluded.

Required privileges: unprivileged role able to log in all databases.

=item B<last_vacuum> (8.2+)

Check that the oldest vacuum (from autovacuum or otherwise) in each database
in the cluster is not older than the given threshold.

This service uses the status file (see C<--status-file> parameter) with
PostgreSQL 9.1+.

Perfdata returns oldest vacuum per database in seconds. With PostgreSQL
9.1+, it also returns the number of [auto]vacuums per database since last
execution.

Critical and Warning thresholds only accept an interval (eg. 1h30m25s)
and apply to the oldest vacuum.

Tables that were never vacuumed, or whose vacuum date was lost due to a crash,
will raise a critical alert.

B<NOTE>: this service does not raise alerts if the database had strictly
no writes since last call. In consequence, a read-only database can have
its oldest vacuum reported in perfdata way after your thresholds, but not
raise any alerts.

This service supports both C<--dbexclude> and C<--dbinclude> parameters.
The 'postgres' database and templates are always excluded.

Required privileges: unprivileged role able to log in all databases.

=item B<locks> (all)

Check the number of locks on the hosts.

Perfdata returns the number of locks, by type.

Critical and Warning thresholds accept either a raw number of locks or a
percentage. For percentage, it is computed using the following limits
for 7.4 to 8.1:

  max_locks_per_transaction * max_connections

for 8.2+:

  max_locks_per_transaction * (max_connections + max_prepared_transactions)

for 9.1+, regarding lockmode :

  max_locks_per_transaction * (max_connections + max_prepared_transactions)
or max_pred_locks_per_transaction * (max_connections + max_prepared_transactions)

Required privileges: unprivileged role.

=item B<longest_query> (all)

Check the longest running query in the cluster.

Perfdata contains the max/avg/min running time and the number of queries per
database.

Critical and Warning thresholds only accept an interval.

This service supports both C<--dbexclude> and C<--dbinclude> parameters.

It also supports argument C<--exclude REGEX> to exclude queries matching the
given regular expression from the check.

Above 9.0, it also supports C<--exclude REGEX> to filter out application_name.

You can use multiple C<--exclude REGEX> parameters.

Required privileges: an unprivileged role only checks its own queries;
a pg_monitor (10+) or superuser (<10) role is required to check all queries.

=item B<max_freeze_age> (all)

Checks oldest database by transaction age.

Critical and Warning thresholds are optional. They accept either a raw number
or percentage for PostgreSQL 8.2 and more. If percentage is given, the
thresholds are computed based on the "autovacuum_freeze_max_age" parameter.
100% means that some table(s) reached the maximum age and will trigger an
autovacuum freeze. Percentage thresholds should therefore be greater than 100%.

Even with no threshold, this service will raise a critical alert if a database
has a negative age.

Perfdata returns the age of each database.

This service supports both C<--dbexclude> and C<--dbinclude> parameters.

Required privileges: unprivileged role.

=item B<minor_version> (all)

Check if the cluster is running the most recent minor version of PostgreSQL.

Latest versions of PostgreSQL can be fetched from PostgreSQL official
website if check_pgactivity has access to it, or must be given as a parameter.

Without C<--critical> or C<--warning> parameters, this service attempts
to fetch the latest version numbers online. A critical alert is raised if the
minor version is not the most recent.

You can optionally set the path to your prefered retrieval tool using
the C<--path> parameter (eg. C<--path '/usr/bin/wget'>). Supported programs are:
GET, wget, curl, fetch, lynx, links, links2.

If you do not want to (or cannot) query the PostgreSQL website,
provide the expected versions using either C<--warning> OR
C<--critical>, depending on which return value you want to raise.

The given string must contain one or more MINOR versions separated by anything
but a '.'. For instance, the following parameters are all equivalent:

  --critical "10.1 9.6.6 9.5.10 9.4.15 9.3.20 9.2.24 9.1.24 9.0.23 8.4.22"
  --critical "10.1, 9.6.6, 9.5.10, 9.4.15, 9.3.20, 9.2.24, 9.1.24, 9.0.23, 8.4.22"
  --critical "10.1,9.6.6,9.5.10,9.4.15,9.3.20,9.2.24,9.1.24,9.0.23,8.4.22"
  --critical "10.1/9.6.6/9.5.10/9.4.15/9.3.20/9.2.24/9.1.24/9.0.23/8.4.22"

Any other value than 3 numbers separated by dots (before version 10.x)
or 2 numbers separated by dots (version 10 and above) will be ignored.
If the running PostgreSQL major version is not found, the service raises an
unknown status.

Perfdata returns the numerical version of PostgreSQL.

Required privileges: unprivileged role; access to http://www.postgresql.org
required to download version numbers.

=item B<oldest_2pc> (8.1+)

Check the oldest I<two-phase commit transaction> (aka. prepared transaction) in
the cluster.

Perfdata contains the max/avg age time and the number of prepared
transactions per databases.

Critical and Warning thresholds only accept an interval.

Required privileges: unprivileged role.

=item B<oldest_idlexact> (8.3+)

Check the oldest I<idle> transaction.

Perfdata contains the max/avg age and the number of idle transactions
per databases.

Critical and Warning thresholds only accept an interval.

This service supports both C<--dbexclude> and C<--dbinclude> parameters.

Above 9.2, it supports C<--exclude> to filter out connections. Eg., to
filter out pg_dump and pg_dumpall, set this to 'pg_dump,pg_dumpall'.

Before 9.2, this services checks for idle transaction with their start time.
Thus, the service can mistakenly take account of transaction transiently in
idle state.
From 9.2 and up, the service checks for transaction that really had no activity
since the given thresholds.

Required privileges: an unprivileged role checks only its own queries;
a pg_monitor (10+) or superuser (<10) role is required to check all queries.

=item B<oldest_xmin> (8.4+)

Check the xmin I<horizon> from distinct sources of xmin retention.

Per default, Perfdata outputs the oldest known xmin age for each database among
running queries, opened or idle transactions, pending prepared transactions,
replication slots and walsender. For versions prior to 9.4, only C<2pc> source
of xmin retention is checked.

Using C<--detailed>, Perfdata contains the oldest xmin and maximum age for the
following source of xmin retention: C<query> (a running query), C<active_xact>
(an opened transaction currently executing a query), C<idle_xact> (an opened
transaction being idle), C<2pc> (a pending prepared transaction), C<repslot> (a
replication slot) and C<walwender> (a WAL sender replication process), for each
connectable database.  If a source doesn't retain any transaction for a
database, NaN is returned.  For versions prior to 9.4, only C<2pc> source of
xmin retention is available, so other sources won't appear in the perfdata.
Note that xmin retention from walsender is only set if C<hot_standby_feedback>
is enabled on remote standby.

Critical and Warning thresholds are optional. They only accept a raw number of
transaction.

This service supports both C<--dbexclude>" and C<--dbinclude>" parameters.

Required privileges: a pg_read_all_stats (10+) or superuser (<10) role is
required to check pg_stat_replication.  2PC, pg_stat_activity, and replication
slots don't require special privileges.

=item B<pg_dump_backup>

Check the age and size of backups.

This service uses the status file (see C<--status-file> parameter).

The C<--path> argument contains the location to the backup folder. The
supported format is a glob pattern matching every folder or file that you need
to check.

The C<--pattern> is required, and must contain a regular expression matching
the backup file name, extracting the database name from the first matching
group.

Optionally, a C<--global-pattern> option can be supplied to check for an
additional global file.

Examples:

To monitor backups like:

    /var/lib/backups/mydb-20150803.dump
    /var/lib/backups/otherdb-20150803.dump
    /var/lib/backups/mydb-20150804.dump
    /var/lib/backups/otherdb-20150804.dump

you must set:

    --path    '/var/lib/backups/*'
    --pattern '(\w+)-\d+.dump'

If the path contains the date, like this:

   /var/lib/backups/2015-08-03-daily/mydb.dump
   /var/lib/backups/2015-08-03-daily/otherdb.dump

then you can set:

    --path    '/var/lib/backups/*/*.dump'
    --pattern '/\d+-\d+-\d+-daily/(.*).dump'

For compatibility with pg_back (https://github.com/orgrim/pg_back),
you should use:

   --path '/path/*{dump,sql}'
   --pattern '(\w+)_[0-9-_]+.dump'
   --global-pattern 'pg_global_[0-9-_]+.sql'

The C<--critical> and C<--warning> thresholds are optional. They accept a list
of 'metric=value' separated by a comma. Available metrics are C<oldest> and
C<newest>, respectively the age of the oldest and newest backups, and C<size>,
which must be the maximum variation of size since the last check, expressed as
a size or a percentage. C<mindeltasize>, expressed in B, is the minimum
variation of size needed to raise an alert.

This service supports the C<--dbinclude> and C<--dbexclude> arguments, to
respectively test for the presence of include or exclude files.

The argument C<--exclude> enables you to exclude files younger than an
interval. This is useful to ignore files from a backup in progress. Eg., if
your backup process takes 2h, set this to '125m'.

Perfdata returns the age of the oldest and newest backups, as well as the size
of the newest backups.

Required privileges: unprivileged role; the system user needs read access
on the directory containing the dumps (but not on the dumps themselves).

=item B<pga_version>

Check if this script is running the given version of check_pgactivity.
You must provide the expected version using either C<--warning> OR
C<--critical>.

No perfdata is returned.

Required privileges: none.

=item B<pgdata_permission> (8.2+)

Check that the instance data directory rights are 700, and belongs
to the system user currently running postgresql.

The check on rights works on all Unix systems.

Checking the user only works on Linux systems (it uses /proc to avoid
dependencies). Before 9.3, you need to provide the expected owner using the
C<--uid> argument, or the owner will not be checked.

Required privileges:
 <11:superuser
 v11: user with pg_monitor or pg_read_all_setting
The system user must also be able to read the folder containing
PGDATA: B<the service has to be executed locally on the monitored server.>

=item B<replication_slots> (9.4+)

Check the number of WAL files retained and spilled files for each replication
slots.

Perfdata returns the number of WAL kept for each slot and the number of spilled
files in pg_replslot for each logical replication slot. Since v13, if
C<max_slot_wal_keep_size> is greater or equal to 0, perfdata reports the size
of WAL to produce before each slot becomes C<unreserved> or C<lost>. Note that
this size can become negative if the WAL status for the limited time where the
slot becomes C<unreserved>. It is set to zero as soon as the last checkpoint
finished and the status becomes C<lost>.

This service needs superuser privileges to obtain the number of spill files or
returns 0 in last resort.

Critical and Warning thresholds are optional. They accept either a raw number
(for backward compatibility, only wal threshold will be used) or a list of
'wal=value' and/or 'spilled=value' and/or 'remaining=size'. Respectively number
of kept wal files, number of spilled files in pg_replslot for each logical slot
and remaining bytes before a slot becomes C<unreserved> or C<lost>.

Moreover, with v13 and after, the service raises a warning alert if a slot
becomes C<unreserved>. It raises a critical alert if the slot becomes C<lost>.

Required privileges:
 v9.4: unprivileged role, or superuser to monitor spilled files for logical replication
 v11+: unprivileged user with GRANT EXECUTE on function pg_ls_dir(text)

Here is somes examples:

    -w 'wal=50,spilled=20' -c 'wal=100,spilled=40'
    -w 'spilled=20,remaining=160MB' -c 'spilled=40,remaining=48MB'

=item B<settings> (9.0+)

Check if the current settings have changed since they were stored in the
service file.

The "known" settings are recorded during the very first call of the service.
To update the known settings after a configuration change, call this service
again with the argument C<--save>.

No perfdata.

Critical and Warning thresholds are ignored.

A Critical is raised if at least one parameter changed.

Required privileges: unprivileged role.

=item B<sequences_exhausted> (7.4+)

Check all sequences and raise an alarm if the column or sequences gets too
close to the maximum value. The maximum value is calculated from the maxvalue
of the sequence or from the column type when the sequence is owned by a
column (the smallserial, serial and bigserial types).

Perfdata returns the sequences that trigger the alert.

This service supports both C<--dbexclude> and C<--dbinclude> parameters.
The 'postgres' database and templates are always excluded.

Critical and Warning thresholds accept a percentage of the sequence filled.

Required privileges: unprivileged role able to log in all databases

=item B<stat_snapshot_age> (9.5+)

Check the age of the statistics snapshot (statistics collector's statistics).
This probe helps to detect a frozen stats collector process.

Perfdata returns the statistics snapshot age.

Critical and Warning thresholds only accept an interval (eg. 1h30m25s).

Required privileges: unprivileged role.

=item B<streaming_delta> (9.1+)

Check the data delta between a cluster and its standbys in streaming
replication.

Optional argument C<--slave> allows you to specify some slaves that MUST be
connected. This argument can be used as many times as desired to check multiple
slave connections, or you can specify multiple slaves connections at one time,
using comma separated values. Both methods can be used in a single call. The
provided values must be of the form "APPLICATION_NAME IP".
Both following examples will check for the presence of two slaves:

  --slave 'slave1 192.168.1.11' --slave 'slave2 192.168.1.12'
  --slave 'slave1 192.168.1.11','slave2 192.168.1.12'

This service supports a C<--exclude REGEX> parameter to exclude every result
matching a regular expression on application_name or IP address fields.

You can use multiple C<--exclude REGEX>  parameters.

Perfdata returns the data delta in bytes between the master and every standbies
found, the number of standbies connected and the number of excluded standbies.

Critical and Warning thresholds are optional. They can take one or two values
separated by a comma. If only one value is supplied, it applies to both flushed
and replayed data. If two values are supplied, the first one applies to flushed
data, the second one to replayed data. These thresholds only accept a size
(eg. 2.5G).

Required privileges: unprivileged role.

=item B<table_unlogged> (9.5+)

Check if tables are changed to unlogged. In 9.5, you can switch between logged
and unlogged.

Without C<--critical>  or C<--warning> parameters, this service attempts to
fetch all unlogged tables.

A critical alert is raised if an unlogged table is detected.

This service supports both C<--dbexclude>  and C<--dbinclude> parameters.
The 'postgres' database and templates are always excluded.

This service supports a C<--exclude REGEX>  parameter to exclude relations
matching a regular expression. The regular expression applies to
"database.schema_name.relation_name". This enables you to filter either on a
relation name for all schemas and databases, on a qualified named relation
(schema + relation) for all databases or on a qualified named relation in
only one database.

You can use multiple C<--exclude REGEX>  parameters.

Perfdata will return the number of unlogged tables per database.

A list of the unlogged tables will be returned after the
perfdata. This list contains the fully qualified table name. If
C<--exclude REGEX> is set, the number of excluded tables is returned.

Required privileges: unprivileged role able to log in all databases,
or at least those in C<--dbinclude>.

=item B<table_bloat>

Estimate bloat on tables.

Warning and critical thresholds accept a comma-separated list of either
raw number(for a size), size (eg. 125M) or percentage. The thresholds apply to
B<bloat> size, not object size. If a percentage is given, the threshold will
apply to the bloat size compared to the table + TOAST size.
If multiple threshold values are passed, check_pgactivity will choose the
largest (bloat size) value.

This service supports both C<--dbexclude> and C<--dbinclude> parameters.
The 'postgres' database and templates are always excluded.

This service supports a C<--exclude REGEX> parameter to exclude relations
matching the given regular expression. The regular expression applies to
"database.schema_name.relation_name". This enables you to filter either on a
relation name for all schemas and databases, on a qualified named relation
(schema + relation) for all databases or on a qualified named relation in
only one database.

You can use multiple C<--exclude REGEX> parameters.

B<Warning>: With a non-superuser role, this service can only check the tables
that the given role is granted to read!

Perfdata will return the number of tables matching the warning and critical
thresholds, per database.

A list of the bloated tables will be returned after the
perfdata. This list contains the fully qualified bloated table name, the
estimated bloat size, the table size and the bloat percentage.

Required privileges: superuser (<10) able to log in all databases, or at least
those in C<--dbinclude>; superuser (<10);
on PostgreSQL 10+, a user with the role pg_monitor suffices,
provided that you grant SELECT on the system table pg_statistic
to the pg_monitor role, in each database of the cluster:
C<GRANT SELECT ON pg_statistic TO pg_monitor;>

=item B<temp_files> (8.1+)

Check the number and size of temp files.

This service uses the status file (see C<--status-file> parameter) for 9.2+.

Perfdata returns the number and total size of temp files found in
C<pgsql_tmp> folders. They are aggregated by database until 8.2, then
by tablespace (see GUC temp_tablespaces).

Starting with 9.2, perfdata returns as well the number of temp files per
database since last run, the total size of temp files per database since last
run and the rate at which temp files were generated.

Critical and Warning thresholds are optional. They accept either a number
of file (raw value), a size (unit is B<mandatory> to define a size) or both
values separated by a comma.

Thresholds are applied on current temp files being created AND the number/size
of temp files created since last execution.

Required privileges:
 <10: superuser
 v10: an unprivileged role is possible but it will not monitor databases
that it cannot access, nor live temp files
 v11: an unprivileged role is possible but must be granted EXECUTE
on functions pg_ls_dir(text), pg_read_file(text), pg_stat_file(text, boolean);
the same restrictions than on v10 will still apply
 v12+: a role with pg_monitor privilege.

=item B<uptime> (8.1+)

Returns time since postmaster start ("uptime", from 8.1),
since configuration reload (from 8.4),
and since shared memory initialization (from 10).

Please note that the uptime is unaffected when the postmaster resets
all its children (for example after a kill -9 on a process or a failure).

From 10+, the 'time since shared memory init' aims at detecting this situation:
in fact we use the age of the oldest non-client child process (usually
checkpointer, writer or startup). This needs pg_monitor access to read
pg_stat_activity.

Critical and Warning thresholds are optional. If both are set, Critical is
raised when the postmaster uptime or the time since shared memory
initialization is less than the critical threshold.

Warning is raised when the time since configuration reload is less than the
warning threshold.  If only a warning or critical threshold is given, it will
be used for both cases.  Obviously these alerts will disappear from themselves
once enough time has passed.

Perfdata contain the three values (when available).

Required privileges: pg_monitor on PG10+; otherwise unprivileged role.

=item B<wal_files> (8.1+)

Check the number of WAL files.

Perfdata returns the total number of WAL files, current number of written WAL,
the current number of recycled WAL, the rate of WAL written to disk since the
last execution on the master cluster and the current timeline.

Critical and Warning thresholds accept either a raw number of files or a
percentage. In case of percentage, the limit is computed based on:

  100% = 1 + checkpoint_segments * (2 + checkpoint_completion_target)

For PostgreSQL 8.1 and 8.2:

  100% = 1 + checkpoint_segments * 2

If C<wal_keep_segments> is set for 9.0 to 9.4, the limit is the greatest
of the following formulas:

  100% = 1 + checkpoint_segments * (2 + checkpoint_completion_target)
  100% = 1 + wal_keep_segments + 2 * checkpoint_segments

For 9.5 to 12, the limit is:

  100% =  max_wal_size      (as a number of WAL)
        + wal_keep_segments (if set)

For 13 and above:

  100% =  max_wal_size + wal_keep_size (as numbers of WAL)

Required privileges:
 <10:superuser (<10)
 v10:unprivileged user with pg_monitor
 v11+ :unprivileged user with pg_monitor, or with grant EXECUTE on function
pg_ls_waldir

=back

=head2 EXAMPLES

=over

=item Execute service "last_vacuum" on host "host=localhost port=5432":

  check_pgactivity -h localhost -p 5432 -s last_vacuum -w 30m -c 1h30m

=item Execute service "hot_standby_delta" between hosts "service=pg92" and "service=pg92s":

  check_pgactivity --dbservice pg92,pg92s --service hot_standby_delta -w 32MB -c 160MB

=item Execute service "streaming_delta" on host "service=pg92" to check its slave "stby1" with the IP address "192.168.1.11":

  check_pgactivity --dbservice pg92 --slave "stby1 192.168.1.11" --service streaming_delta -w 32MB -c 160MB

=item Execute service "hit_ratio" on host "slave" port "5433, excluding database matching the regexps "idelone" and "(?i:sleep)":

  check_pgactivity -p 5433 -h slave --service hit_ratio --dbexclude idelone --dbexclude "(?i:sleep)" -w 90% -c 80%

=item Execute service "hit_ratio" on host "slave" port "5433, only for databases matching the regexp "importantone":

  check_pgactivity -p 5433 -h slave --service hit_ratio --dbinclude importantone -w 90% -c 80%

=back

=head1 VERSION

check_pgactivity version 2.6, released on Fri Jul 08 2022.

=head1 LICENSING

This program is open source, licensed under the PostgreSQL license.
For license terms, see the LICENSE provided with the sources.

=head1 AUTHORS

S<Author: Open PostgreSQL Monitoring Development Group>
S<Copyright: (C) 2012-2022 Open PostgreSQL Monitoring Development Group>