• Stars
    star
    2,846
  • Rank 15,978 (Top 0.4 %)
  • Language
    C
  • License
    PostgreSQL License
  • Created about 8 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Run periodic jobs in PostgreSQL

Citus Banner

What is pg_cron?

pg_cron is a simple cron-based job scheduler for PostgreSQL (10 or higher) that runs inside the database as an extension. It uses the same syntax as regular cron, but it allows you to schedule PostgreSQL commands directly from the database. You can also use '[1-59] seconds' to schedule a job based on an interval.

-- Delete old data on Saturday at 3:30am (GMT)
SELECT cron.schedule('30 3 * * 6', $$DELETE FROM events WHERE event_time < now() - interval '1 week'$$);
 schedule
----------
       42

-- Vacuum every day at 10:00am (GMT)
SELECT cron.schedule('nightly-vacuum', '0 10 * * *', 'VACUUM');
 schedule
----------
       43

-- Change to vacuum at 3:00am (GMT)
SELECT cron.schedule('nightly-vacuum', '0 3 * * *', 'VACUUM');
 schedule
----------
       43

-- Stop scheduling jobs
SELECT cron.unschedule('nightly-vacuum' );
 unschedule 
------------
 t

SELECT cron.unschedule(42);
 unschedule
------------
          t

-- Vacuum every Sunday at 4:00am (GMT) in a database other than the one pg_cron is installed in
SELECT cron.schedule_in_database('weekly-vacuum', '0 4 * * 0', 'VACUUM', 'some_other_database');
 schedule
----------
       44

-- Call a stored procedure every 5 seconds
SELECT cron.schedule('process-updates', '5 seconds', 'CALL process_updates()'); 

pg_cron can run multiple jobs in parallel, but it runs at most one instance of a job at a time. If a second run is supposed to start before the first one finishes, then the second run is queued and started as soon as the first run completes.

The schedule uses the standard cron syntax, in which * means "run every time period", and a specific number means "but only at this time":

 ┌───────────── min (0 - 59)
 │ ┌────────────── hour (0 - 23)
 │ │ ┌─────────────── day of month (1 - 31)
 │ │ │ ┌──────────────── month (1 - 12)
 │ │ │ │ ┌───────────────── day of week (0 - 6) (0 to 6 are Sunday to
 │ │ │ │ │                  Saturday, or use names; 7 is also Sunday)
 │ │ │ │ │
 │ │ │ │ │
 * * * * *

An easy way to create a cron schedule is: crontab.guru.

The code in pg_cron that handles parsing and scheduling comes directly from the cron source code by Paul Vixie, hence the same options are supported.

Installing pg_cron

Install on Red Hat, CentOS, Fedora, Amazon Linux with PostgreSQL 15 using PGDG:

# Install the pg_cron extension
sudo yum install -y pg_cron_15

Install on Debian, Ubuntu with PostgreSQL 15 using apt.postgresql.org:

# Install the pg_cron extension
sudo apt-get -y install postgresql-15-cron

You can also install pg_cron by building it from source:

git clone https://github.com/citusdata/pg_cron.git
cd pg_cron
# Ensure pg_config is in your path, e.g.
export PATH=/usr/pgsql-15/bin:$PATH
make && sudo PATH=$PATH make install

Setting up pg_cron

To start the pg_cron background worker when PostgreSQL starts, you need to add pg_cron to shared_preload_libraries in postgresql.conf. Note that pg_cron does not run any jobs as a long a server is in hot standby mode, but it automatically starts when the server is promoted.

# add to postgresql.conf

# required to load pg_cron background worker on start-up
shared_preload_libraries = 'pg_cron'

By default, the pg_cron background worker expects its metadata tables to be created in the "postgres" database. However, you can configure this by setting the cron.database_name configuration parameter in postgresql.conf.

# add to postgresql.conf

# optionally, specify the database in which the pg_cron background worker should run (defaults to postgres)
cron.database_name = 'postgres'

pg_cron may only be installed to one database in a cluster. If you need to run jobs in multiple databases, use cron.schedule_in_database().

Previously pg_cron could only use GMT time, but now you can adapt your time by setting cron.timezone in postgresql.conf.

# add to postgresql.conf

# optionally, specify the timezone in which the pg_cron background worker should run (defaults to GMT). E.g:
cron.timezone = 'PRC'

After restarting PostgreSQL, you can create the pg_cron functions and metadata tables using CREATE EXTENSION pg_cron.

-- run as superuser:
CREATE EXTENSION pg_cron;

-- optionally, grant usage to regular users:
GRANT USAGE ON SCHEMA cron TO marco;

Ensuring pg_cron can start jobs

Important: By default, pg_cron uses libpq to open a new connection to the local database, which needs to be allowed by pg_hba.conf. It may be necessary to enable trust authentication for connections coming from localhost in for the user running the cron job, or you can add the password to a .pgpass file, which libpq will use when opening a connection.

You can also use a unix domain socket directory as the hostname and enable trust authentication for local connections in pg_hba.conf, which is normally safe:

# Connect via a unix domain socket:
cron.host = '/tmp'

# Can also be an empty string to look for the default directory:
cron.host = ''

Alternatively, pg_cron can be configured to use background workers. In that case, the number of concurrent jobs is limited by the max_worker_processes setting, so you may need to raise that.

# Schedule jobs via background workers instead of localhost connections
cron.use_background_workers = on
# Increase the number of available background workers from the default of 8
max_worker_processes = 20

For security, jobs are executed in the database in which the cron.schedule function is called with the same permissions as the current user. In addition, users are only able to see their own jobs in the cron.job table.

Viewing job run details

You can view the status of running and recently completed job runs in the cron.job_run_details:

select * from cron.job_run_details order by start_time desc limit 5;
┌───────┬───────┬─────────┬──────────┬──────────┬───────────────────┬───────────┬──────────────────┬───────────────────────────────┬───────────────────────────────┐
│ jobid │ runid │ job_pid │ database │ username │      command      │  status   │  return_message  │          start_time           │           end_time            │
├───────┼───────┼─────────┼──────────┼──────────┼───────────────────┼───────────┼──────────────────┼───────────────────────────────┼───────────────────────────────┤
│    1043282610 │ postgres │ marco    │ select process()  │ succeeded │ SELECT 12023-02-07 09:30:00.098164+012023-02-07 09:30:00.130729+01 │
│    1043272609 │ postgres │ marco    │ select process()  │ succeeded │ SELECT 12023-02-07 09:29:00.015168+012023-02-07 09:29:00.832308+01 │
│    1043212603 │ postgres │ marco    │ select process()  │ succeeded │ SELECT 12023-02-07 09:28:00.011965+012023-02-07 09:28:01.420901+01 │
│    1043202602 │ postgres │ marco    │ select process()  │ failed    │ server restarted │ 2023-02-07 09:27:00.011833+012023-02-07 09:27:00.72121+01  │
│     943202602 │ postgres │ marco    │ select do_stuff() │ failed    │ job canceled     │ 2023-02-07 09:26:00.011833+012023-02-07 09:26:00.22121+01  │
└───────┴───────┴─────────┴──────────┴──────────┴───────────────────┴───────────┴──────────────────┴───────────────────────────────┴───────────────────────────────┘
(10 rows)

The records in cron.job_run_details are not cleaned automatically, but every user that can schedule cron jobs also has permission to delete their own cron.job_run_details records.

Especially when you have jobs that run every few seconds, it can be a good idea to clean up regularly, which can easily be done using pg_cron itself:

-- Delete old cron.job_run_details records of the current user every day at noon
SELECT  cron.schedule('delete-job-run-details', '0 12 * * *', $$DELETE FROM cron.job_run_details WHERE end_time < now() - interval '7 days'$$);

If you do not want to use cron.job_run_details at all, then you can add cron.log_run = off to postgresql.conf.

Example use cases

Articles showing possible ways of using pg_cron:

Managed services

The following table keeps track of which of the major managed Postgres services support pg_cron.

Service Supported
Aiven ✔️
Alibaba Cloud ✔️
Amazon RDS ✔️
Azure ✔️
Crunchy Bridge ✔️
DigitalOcean ✔️
Google Cloud ✔️
Heroku
ScaleGrid ✔️
Scaleway ✔️
Supabase ✔️

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

More Repositories

1

citus

Distributed PostgreSQL as an extension
C
10,512
star
2

cstore_fdw

Columnar storage extension for Postgres built as a foreign data wrapper. Check out https://github.com/citusdata/citus for a modernized columnar storage implementation built as a table access method.
C
1,758
star
3

postgresql-hll

PostgreSQL extension adding HyperLogLog data structures as a native data type
C
1,116
star
4

pg_shard

ATTENTION: pg_shard is superseded by Citus, its more powerful replacement
C
1,058
star
5

pg_auto_failover

Postgres extension and service for automated failover and high-availability
C
852
star
6

django-multitenant

Python/Django support for distributed multi-tenant databases like Postgres+Citus
Python
722
star
7

activerecord-multi-tenant

Rails/ActiveRecord support for distributed multi-tenant databases like Postgres+Citus
Ruby
717
star
8

postgres_vectorization_test

Vectorized executor to speed up PostgreSQL
C
331
star
9

docker

🚢 Docker images and configuration for Citus
Dockerfile
241
star
10

postgresql-topn

TopN is an open source PostgreSQL extension that returns the top values in a database according to some criteria
C
238
star
11

mongo_fdw

DEPRECATED, moved to
C
154
star
12

podyn

DynamoDB to PostgreSQL & Citus continuous replication tool
Java
85
star
13

citus-example-ad-analytics

Reference App for Ad Analytics, using Ruby on Rails.
CSS
75
star
14

citus_docs

Documentation for Citus. Distributed PostgreSQL as an extension.
CSS
58
star
15

pgconfsv-tutorial

Files for the PGConf SV tutorial on real-time analytics
Python
54
star
16

citus-benchmark

Tools for running benchmarks against Citus
Python
38
star
17

postgres-analytics-tutorial

Exercises for the Architecting Real-Time Analytics for your Customers tutorial
PLpgSQL
28
star
18

membership-manager

🚢 Docker image for managing Citus membership via docker-py
Python
21
star
19

packaging

Packaging scripts for Citus
Dockerfile
19
star
20

test-automation

Tools for making our tests easier to run
C
14
star
21

tools

Tools and config used in Citus Data projects
Python
14
star
22

pgconfus-tutorial-multi-tenant

Files for the tutorial on Citus & Multi-Tenant Models @ PGConf US 2017
Ruby
10
star
23

pg_octopus

A health checker for PostgreSQL
C
9
star
24

python-citus-rebalancer

A python command line tool to recommend shard moves to customers with unbalanced clusters
Python
9
star
25

pgmasq

Transparently forward transactions from a hot standby to a primary in PostgreSQL (failed experiment)
C
8
star
26

citus-django-example-ad-analytics

Reference App for Ad Analytics, using Django
CSS
8
star
27

pg_intpair

C
7
star
28

the-process

Trust it.
Dockerfile
6
star
29

interactive-tutorials

Shell
5
star
30

realtime-dashboards-resources

Some useful scripts
PLpgSQL
5
star
31

pg_shard-old

Repo used for pg_shard development before release.
C
4
star
32

PathToCitusCon

This repo stores show notes for our PathToCitusCon series of events on Discord
3
star
33

citus-example-microservices

Python
3
star
34

workerlist-gen

🚢 Docker image for generating Citus workerlist file using docker-gen
Shell
3
star
35

real-time-analytics-Hands-On-Lab-Hyperscale-Citus

2
star
36

Nationwide-Hands-On-Session

1
star