• Stars
    star
    160
  • Rank 227,130 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 10 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Simple command line interface for backup rotation

rotate-backups: Simple command line interface for backup rotation

https://travis-ci.org/xolox/python-rotate-backups.svg?branch=master https://coveralls.io/repos/xolox/python-rotate-backups/badge.svg?branch=master

Backups are good for you. Most people learn this the hard way (including me). Nowadays my Linux laptop automatically creates a full system snapshot every four hours by pushing changed files to an rsync daemon running on the server in my home network and creating a snapshot afterwards using the cp -al command (the article Easy Automated Snapshot-Style Backups with Linux and Rsync explains the basic technique). The server has a second disk attached which asynchronously copies from the main disk so that a single disk failure doesn't wipe all of my backups (the "time delayed replication" aspect has also proven to be very useful).

Okay, cool, now I have backups of everything, up to date and going back in time! But I'm running through disk space like crazy... A proper deduplicating filesystem would be awesome but I'm running crappy consumer grade hardware and e.g. ZFS has not been a good experience in the past. So I'm going to have to delete backups...

Deleting backups is never nice, but an easy and proper rotation scheme can help a lot. I wanted to keep things manageable so I wrote a Python script to do it for me. Over the years I actually wrote several variants. Because I kept copy/pasting these scripts around I decided to bring the main features together in a properly documented Python package and upload it to the Python Package Index.

The rotate-backups package is currently tested on cPython 2.7, 3.5+ and PyPy (2.7). It's tested on Linux and Mac OS X and may work on other unixes but definitely won't work on Windows right now.

Features

Dry run mode
Use it. I'm serious. If you don't and rotate-backups eats more backups than intended you have no right to complain ;-)
Flexible rotation
Rotation with any combination of hourly, daily, weekly, monthly and yearly retention periods.
Fuzzy timestamp matching in filenames

The modification times of the files and/or directories are not relevant. If you speak Python regular expressions, here is how the fuzzy matching works:

# Required components.
(?P<year>\d{4}) \D?
(?P<month>\d{2}) \D?
(?P<day>\d{2}) \D?
(
   # Optional components.
   (?P<hour>\d{2}) \D?
   (?P<minute>\d{2}) \D?
   (?P<second>\d{2})?
)?
All actions are logged
Log messages are saved to the system log (e.g. /var/log/syslog) so you can retrace what happened when something seems to have gone wrong.

Installation

The rotate-backups package is available on PyPI which means installation should be as simple as:

$ pip install rotate-backups

There's actually a multitude of ways to install Python packages (e.g. the per user site-packages directory, virtual environments or just installing system wide) and I have no intention of getting into that discussion here, so if this intimidates you then read up on your options before returning to these instructions ;-).

Usage

There are two ways to use the rotate-backups package: As the command line program rotate-backups and as a Python API. For details about the Python API please refer to the API documentation available on Read the Docs. The command line interface is described below.

Command line

Usage: rotate-backups [OPTIONS] [DIRECTORY, ..]

Easy rotation of backups based on the Python package by the same name.

To use this program you specify a rotation scheme via (a combination of) the --hourly, --daily, --weekly, --monthly and/or --yearly options and the directory (or directories) containing backups to rotate as one or more positional arguments.

You can rotate backups on a remote system over SSH by prefixing a DIRECTORY with an SSH alias and separating the two with a colon (similar to how rsync accepts remote locations).

Instead of specifying directories and a rotation scheme on the command line you can also add them to a configuration file. For more details refer to the online documentation (see also the --config option).

Please use the --dry-run option to test the effect of the specified rotation scheme before letting this program loose on your precious backups! If you don't test the results using the dry run mode and this program eats more backups than intended you have no right to complain ;-).

Supported options:

Option Description
-M, --minutely=COUNT In a literal sense this option sets the number of "backups per minute" to preserve during rotation. For most use cases that doesn't make a lot of sense :-) but you can combine the --minutely and --relaxed options to preserve more than one backup per hour. Refer to the usage of the -H, --hourly option for details about COUNT.
-H, --hourly=COUNT

Set the number of hourly backups to preserve during rotation:

  • If COUNT is a number it gives the number of hourly backups to preserve, starting from the most recent hourly backup and counting back in time.
  • Alternatively you can provide an expression that will be evaluated to get a number (e.g. if COUNT is "7 * 2" the result would be 14).
  • You can also pass "always" for COUNT, in this case all hourly backups are preserved.
  • By default no hourly backups are preserved.
-d, --daily=COUNT Set the number of daily backups to preserve during rotation. Refer to the usage of the -H, --hourly option for details about COUNT.
-w, --weekly=COUNT Set the number of weekly backups to preserve during rotation. Refer to the usage of the -H, --hourly option for details about COUNT.
-m, --monthly=COUNT Set the number of monthly backups to preserve during rotation. Refer to the usage of the -H, --hourly option for details about COUNT.
-y, --yearly=COUNT Set the number of yearly backups to preserve during rotation. Refer to the usage of the -H, --hourly option for details about COUNT.
-t, --timestamp-pattern=PATTERN Customize the regular expression pattern that is used to match and extract timestamps from filenames. PATTERN is expected to be a Python compatible regular expression that must define the named capture groups 'year', 'month' and 'day' and may define 'hour', 'minute' and 'second'.
-I, --include=PATTERN Only process backups that match the shell pattern given by PATTERN. This argument can be repeated. Make sure to quote PATTERN so the shell doesn't expand the pattern before it's received by rotate-backups.
-x, --exclude=PATTERN Don't process backups that match the shell pattern given by PATTERN. This argument can be repeated. Make sure to quote PATTERN so the shell doesn't expand the pattern before it's received by rotate-backups.
-j, --parallel

Remove backups in parallel, one backup per mount point at a time. The idea behind this approach is that parallel rotation is most useful when the files to be removed are on different disks and so multiple devices can be utilized at the same time.

Because mount points are per system the -j, --parallel option will also parallelize over backups located on multiple remote systems.

-p, --prefer-recent By default the first (oldest) backup in each time slot is preserved. If you'd prefer to keep the most recent backup in each time slot instead then this option is for you.
-r, --relaxed

By default the time window for each rotation scheme is enforced (this is referred to as strict rotation) but the -r, --relaxed option can be used to alter this behavior. The easiest way to explain the difference between strict and relaxed rotation is using an example:

  • When using strict rotation and the number of hourly backups to preserve is three, only backups created in the relevant time window (the hour of the most recent backup and the two hours leading up to that) will match the hourly frequency.
  • When using relaxed rotation the three most recent backups will all match the hourly frequency (and thus be preserved), regardless of the calculated time window.

If the explanation above is not clear enough, here's a simple way to decide whether you want to customize this behavior or not:

  • If your backups are created at regular intervals and you never miss an interval then strict rotation (the default) is probably the best choice.
  • If your backups are created at irregular intervals then you may want to use the -r, --relaxed option in order to preserve more backups.
-i, --ionice=CLASS Use the "ionice" program to set the I/O scheduling class and priority of the "rm" invocations used to remove backups. CLASS is expected to be one of the values "idle" (3), "best-effort" (2) or "realtime" (1). Refer to the man page of the "ionice" program for details about these values. The numeric values are required by the 'busybox' implementation of 'ionice'.
-c, --config=FILENAME

Load configuration from FILENAME. If this option isn't given the following default locations are searched for configuration files:

  • /etc/rotate-backups.ini and /etc/rotate-backups.d/*.ini
  • ~/.rotate-backups.ini and ~/.rotate-backups.d/*.ini
  • ~/.config/rotate-backups.ini and ~/.config/rotate-backups.d/*.ini

Any available configuration files are loaded in the order given above, so that sections in user-specific configuration files override sections by the same name in system-wide configuration files. For more details refer to the online documentation.

-C, --removal-command=CMD

Change the command used to remove backups. The value of CMD defaults to rm ``-f``R. This choice was made because it works regardless of whether "backups to be rotated" are files or directories or a mixture of both.

As an example of why you might want to change this, CephFS snapshots are represented as regular directory trees that can be deleted at once with a single 'rmdir' command (even though according to POSIX semantics this command should refuse to remove nonempty directories, but I digress).

-u, --use-sudo Enable the use of "sudo" to rotate backups in directories that are not readable and/or writable for the current user (or the user logged in to a remote system over SSH).
-S, --syslog=CHOICE Explicitly enable or disable system logging instead of letting the program figure out what to do. The values '1', 'yes', 'true' and 'on' enable system logging whereas the values '0', 'no', 'false' and 'off' disable it.
-f, --force If a sanity check fails an error is reported and the program aborts. You can use --force to continue with backup rotation instead. Sanity checks are done to ensure that the given DIRECTORY exists, is readable and is writable. If the --removal-command option is given then the last sanity check (that the given location is writable) is skipped (because custom removal commands imply custom semantics).
-n, --dry-run Don't make any changes, just print what would be done. This makes it easy to evaluate the impact of a rotation scheme without losing any backups.
-v, --verbose Increase logging verbosity (can be repeated).
-q, --quiet Decrease logging verbosity (can be repeated).
-h, --help Show this message and exit.

Configuration files

Instead of specifying directories and rotation schemes on the command line you can also add them to a configuration file.

Configuration files are text files in the subset of ini syntax supported by Python's configparser module. They can be located in the following places:

Directory Main configuration file Modular configuration files
/etc /etc/rotate-backups.ini /etc/rotate-backups.d/*.ini
~ ~/.rotate-backups.ini ~/.rotate-backups.d/*.ini
~/.config ~/.config/rotate-backups.ini ~/.config/rotate-backups.d/*.ini

The available configuration files are loaded in the order given above, so that user specific configuration files override system wide configuration files.

You can load a configuration file in a nonstandard location using the command line option --config, in this case the default locations mentioned above are ignored.

Each section in the configuration defines a directory that contains backups to be rotated. The options in each section define the rotation scheme and other options. Here's an example based on how I use rotate-backups to rotate the backups of the Linux installations that I make regular backups of:

# /etc/rotate-backups.ini:
# Configuration file for the rotate-backups program that specifies
# directories containing backups to be rotated according to specific
# rotation schemes.

[/backups/laptop]
hourly = 24
daily = 7
weekly = 4
monthly = 12
yearly = always
ionice = idle

[/backups/server]
daily = 7 * 2
weekly = 4 * 2
monthly = 12 * 4
yearly = always
ionice = idle

[/backups/mopidy]
daily = 7
weekly = 4
monthly = 2
ionice = idle

[/backups/xbmc]
daily = 7
weekly = 4
monthly = 2
ionice = idle

As you can see in the retention periods of the directory /backups/server in the example above you are allowed to use expressions that evaluate to a number (instead of having to write out the literal number).

Here's an example of a configuration for two remote directories:

# SSH as a regular user and use `sudo' to elevate privileges.
[server:/backups/laptop]
use-sudo = yes
hourly = 24
daily = 7
weekly = 4
monthly = 12
yearly = always
ionice = idle

# SSH as the root user (avoids sudo passwords).
[server:/backups/server]
ssh-user = root
hourly = 24
daily = 7
weekly = 4
monthly = 12
yearly = always
ionice = idle

As this example shows you have the option to connect as the root user or to connect as a regular user and use sudo to elevate privileges.

Customizing the rotation algorithm

Since publishing rotate-backups I've found that the default rotation algorithm is not to everyone's satisfaction and because the suggested alternatives were just as valid as the choices that I initially made, options were added to expose the alternative behaviors:

Default Alternative
Strict rotation (the time window for each rotation frequency is enforced). Relaxed rotation (time windows are not enforced). Enabled by the -r, --relaxed option.
The oldest backup in each time slot is preserved and newer backups in the time slot are removed. The newest backup in each time slot is preserved and older backups in the time slot are removed. Enabled by the -p, --prefer-recent option.

Supported configuration options

  • Rotation schemes are defined using the minutely, hourly, daily, weekly, monthly and yearly options, these options support the same values as documented for the command line interface.

  • The timestamp-pattern option can be used to customize the regular expression that's used to extract timestamps from filenames. The value is expected to be a Python compatible regular expression that must contain the named capture groups 'year', 'month' and 'day' and may contain the groups 'hour', 'minute' and 'second'. As an example here is the default regular expression:

    # Required components.
    (?P<year>\d{4} ) \D?
    (?P<month>\d{2}) \D?
    (?P<day>\d{2}  ) \D?
    (?:
        # Optional components.
        (?P<hour>\d{2}  ) \D?
        (?P<minute>\d{2}) \D?
        (?P<second>\d{2})?
    )?
    

    Note how this pattern spans multiple lines: Regular expressions are compiled using the re.VERBOSE flag which means whitespace (including newlines) is ignored.

  • The include-list and exclude-list options define a comma separated list of filename patterns to include or exclude, respectively:

    • Make sure not to quote the patterns in the configuration file, just provide them literally.
    • If an include or exclude list is defined in the configuration file it overrides the include or exclude list given on the command line.
  • The prefer-recent, strict and use-sudo options expect a boolean value (yes, no, true, false, 1 or 0).

  • The removal-command option can be used to customize the command that is used to remove backups.

  • The ionice option expects one of the I/O scheduling class names idle, best-effort or realtime (or the corresponding numbers).

  • The ssh-user option can be used to override the name of the remote SSH account that's used to connect to a remote system.

How it works

The basic premise of rotate-backups is fairly simple:

  1. You point rotate-backups at a directory containing timestamped backups.

  2. It will scan the directory for entries (it doesn't matter whether they are files or directories) with a recognizable timestamp in the name.

    Note

    All of the matched directory entries are considered to be backups of the same data source, i.e. there's no filename similarity logic to distinguish unrelated backups that are located in the same directory. If this presents a problem consider using the --include and/or --exclude options.

  3. The user defined rotation scheme is applied to the entries. If this doesn't do what you'd expect it to you can try the --relaxed and/or --prefer-recent options.

  4. The entries to rotate are removed (or printed in dry run).

Contact

The latest version of rotate-backups is available on PyPI and GitHub. The documentation is hosted on Read the Docs and includes a changelog. For bug reports please create an issue on GitHub. If you have questions, suggestions, etc. feel free to send me an e-mail at [email protected].

License

This software is licensed under the MIT license.

© 2020 Peter Odding.

More Repositories

1

vim-notes

Easy note taking in Vim
Vim Script
1,585
star
2

vim-easytags

Automated tag file generation and syntax highlighting of tags in Vim
Vim Script
1,018
star
3

vim-session

Extended session management for Vim (:mksession on steroids)
Vim Script
961
star
4

python-coloredlogs

Colored terminal output for Python's logging module
Python
517
star
5

vim-misc

Miscellaneous auto-load Vim scripts
Vim Script
363
star
6

python-humanfriendly

Human friendly input/output for text interfaces using Python
Python
302
star
7

vim-lua-ftplugin

Lua file type plug-in for the Vim text editor
Vim Script
185
star
8

vim-shell

Improved integration between Vim and its environment (fullscreen, open URL, background command execution)
Vim Script
170
star
9

dedupfs

A Python FUSE file system that features transparent deduplication and compression which make it ideal for archiving backups.
Python
122
star
10

vim-colorscheme-switcher

Makes it easy to quickly switch between color schemes in Vim
Vim Script
114
star
11

python-executor

Programmer friendly subprocess wrapper
Python
98
star
12

vim-lua-inspect

Semantic highlighting for Lua in Vim
Lua
94
star
13

vim-reload

Automatic reloading of Vim scripts ((file-type) plug-ins, auto-load/syntax/indent scripts, color schemes)
Vim Script
79
star
14

lua-lxsh

Lexing & Syntax Highlighting in Lua (using LPeg)
Lua
70
star
15

lua-apr

Apache Portable Runtime binding for Lua
C
57
star
16

python-rsync-system-backup

Linux system backups powered by rsync
Python
48
star
17

vim-tools

Python scripts that make it easier (for me) to publish Vim plug-ins
Python
42
star
18

python-negotiator

Scriptable KVM/QEMU guest agent implemented in Python
Python
41
star
19

python-apt-mirror-updater

Automated, robust apt-get mirror selection for Debian and Ubuntu
Python
41
star
20

python-deb-pkg-tools

Debian packaging tools
Python
40
star
21

python-verboselogs

Verbose logging for Python's logging module
Python
33
star
22

vim-pyref

A plug-in for the Vim text editor that provides context-sensitive documentation for Python source code.
Vim Script
31
star
23

python-capturer

Easily capture stdout/stderr of the current process and subprocesses
Python
29
star
24

python-proc

Simple interface to Linux process information
Python
22
star
25

python-chat-archive

Easy to use offline chat archive
Python
18
star
26

python-redock

Human friendly wrapper around Docker
Python
16
star
27

sync-dotfiles

Quickly push your dotfiles from your workstation to your servers.
16
star
28

python-auto-adjust-display-brightness

Automatically adjust Linux display brightness
Python
15
star
29

vim-publish

A Vim plug-in that helps you publish hyperlinked, syntax highlighted source code
Vim Script
14
star
30

python-property-manager

Useful property variants for Python programming
Python
13
star
31

python-qpass

Frontend for pass (the standard unix password manager)
Python
13
star
32

python-vcs-repo-mgr

Version control repository manager
Python
12
star
33

python-naturalsort

Simple natural order sorting API for Python that just works
Python
12
star
34

mopidy-simple-webclient

Simple and minimalistic Mopidy HTTP client, touch friendly, works in most (mobile) web browsers
JavaScript
12
star
35

mpd-myfm

A client for Music Player Daemon that fills your playlist based on similar artists from Last.fm
Python
10
star
36

lua-buildbot

A build bot for popular Lua projects (Lua 5.1, LuaJIT 1 & LuaJIT 2)
Lua
8
star
37

python-preview-markup

Live preview Markdown and reStructuredText files as HTML in a web browser
Python
8
star
38

python-debuntu-tools

Debian and Ubuntu system administration tools
Python
7
star
39

python-linux-utils

Linux system administration tools for Python
Python
7
star
40

python-apache-manager

Monitor and control Apache web server workers from Python
Python
6
star
41

python-npm-accel

Accelerator for npm, the Node.js package manager
Python
6
star
42

python-update-dotdee

Generic modular configuration file manager
Python
6
star
43

vim-tlv-mode

Transaction-Level Verilog support for Vim
Vim Script
5
star
44

python-pdiffcopy

Fast large file synchronization inspired by rsync
Python
5
star
45

python-crypto-drive-manager

Unlock all your encrypted drives with one pass phrase
Python
5
star
46

python-gentag

Simple and powerful tagging for Python objects
Python
4
star
47

python-dwim

Location aware application launcher
Python
3
star