• Stars
    star
    181
  • Rank 212,110 (Top 5 %)
  • Language
    Python
  • License
    BSD 2-Clause "Sim...
  • Created over 13 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Uses inotify to monitor Cassandra SSTables and upload them to S3

MAINTAINERS WANTED

Tablesnap

Theory of Operation

Tablesnap is a script that uses inotify to monitor a directory for IN_MOVED_TO events and reacts to them by spawning a new thread to upload that file to Amazon S3, along with a JSON-formatted list of what other files were in the directory at the time of the copy.

When running a Cassandra cluster, this behavior can be quite useful as it allows for automated point-in-time backups of SSTables. Theoretically, tablesnap should work for any application where files are written to some temporary location, then moved into their final location once the data is written to disk. Tablesnap also makes the assumption that files are immutable once written.

Installation

The simplest way to install tablesnap is from the Python Package Index, PyPI. https://pypi.python.org/pypi/tablesnap

pip install tablesnap

This distribution provides a debian/ source directory, allowing it to be built as a standard Debian/Ubuntu package and stored in a repository. The Debian package includes an init script that can run and daemonize tablesnap for you. Tablesnap does not daemonize itself. This is best left to tools like init, supervisord, daemontools, etc.

We do not currently maintain binary packages of tablesnap. To build the debian package from source, assuming you have a working pbuilder environment:

git checkout debian
git-buildpackage --git-upstream-branch=master --git-debian-branch=debian --git-builder='pdebuild'

The daemonized version of the Debian/Ubuntu package uses syslog for logging. The messages are sent to the DAEMON logging facility and tagged with tablesnap. If you want to redirect the log output to a log file other than /var/log/daemon.log you can filter by this tag. E.g. if you are using syslog-ng you could add

# tablesnap
filter f_tablesnap { filter(f_daemon) and match("tablesnap" value("PROGRAM")); };
destination d_tablesnap { file("/var/log/tablesnap.log"); };
log { source(s_src); filter(f_tablesnap); destination(d_tablesnap); flags(final); };

to /etc/syslog-ng/syslog-ng.conf.

If you are not a Debian/Ubuntu user or do not wish to install the tablesnap package, you may copy the tablesnap script anywhere you'd like and run it from there. Tablesnap depends on the pyinotify and boto Python packages. These are available via "pip install pyinotify; pip install boto;", or as packages from most common Linux distributions.

Configuration

All configuration for tablesnap happens on the command line. If you are using the Debian package, you'll set these options in the DAEMON_OPTS variable in /etc/default/tablesnap.

usage: tablesnap [-h] -k AWS_KEY -s AWS_SECRET [-r] [-a] [-B] [-p PREFIX]
                 [--without-index] [--keyname-separator KEYNAME_SEPARATOR]
                 [-t THREADS] [-n NAME] [-e EXCLUDE | -i INCLUDE]
                 [--listen-events {IN_MOVED_TO,IN_CLOSE_WRITE}]
                 [--max-upload-size MAX_UPLOAD_SIZE]
                 [--multipart-chunk-size MULTIPART_CHUNK_SIZE]
                 bucket paths [paths ...]

Tablesnap is a script that uses inotify to monitor a directory for events and
reacts to them by spawning a new thread to upload that file to Amazon S3,
along with a JSON-formatted list of what other files were in the directory at
the time of the copy.

positional arguments:
  bucket                S3 bucket
  paths                 Paths to be watched

optional arguments:
  -h, --help            show this help message and exit
  -k AWS_KEY, --aws-key AWS_KEY
  -s AWS_SECRET, --aws-secret AWS_SECRET
  -r, --recursive       Recursively watch the given path(s)s for new SSTables
  -a, --auto-add        Automatically start watching new subdirectories within
                        path(s)
  -B, --backup          Backup existing files to S3 if they are not already
                        there
  -p PREFIX, --prefix PREFIX
                        Set a string prefix for uploaded files in S3
  --without-index       Do not store a JSON representation of the current
                        directory listing in S3 when uploading a file to S3.
  --keyname-separator KEYNAME_SEPARATOR
                        Separator for the keyname between name and path.
  -t THREADS, --threads THREADS
                        Number of writer threads
  -n NAME, --name NAME  Use this name instead of the FQDN to identify the
                        files from this host
  -e EXCLUDE, --exclude EXCLUDE
                        Exclude files matching this regular expression from
                        upload.WARNING: If neither exclude nor include are
                        defined, then all files matching "-tmp" are excluded.
  -i INCLUDE, --include INCLUDE
                        Include only files matching this regular expression
                        into upload.WARNING: If neither exclude nor include
                        are defined, then all files matching "-tmp" are
                        excluded.
  --listen-events {IN_MOVED_TO,IN_CLOSE_WRITE,IN_CREATE}
                        Which events to listen on, can be specified multiple
                        times. Values: IN_MOVED_TO, IN_CLOSE_WRITE, IN_CREATE
                        (default: IN_MOVED_TO, IN_CLOSE_WRITE)
  --max-upload-size MAX_UPLOAD_SIZE
                        Max size for files to be uploaded before doing
                        multipart (default 5120M)
  --multipart-chunk-size MULTIPART_CHUNK_SIZE
                        Chunk size for multipart uploads (default: 256M or 10%
                        of free memory if default is not available)

For example:

$ tablesnap -k AAAAAAAAAAAAAAAA -s BBBBBBBBBBBBBBBB me.synack.sstables /var/lib/cassandra/data/GiantKeyspace

This would cause tablesnap to use the given Amazon Web Services credentials to backup the SSTables for my GiantKeyspace to the S3 bucket named me.synack.sstables.

Questions, Comments, and Help

The fine folks in #cassandra-ops on irc.freenode.net are an excellent resource for getting tablesnap up and running, and also for solving more general Cassandra issues.

More Repositories

1

repoman

A reprepro and pbuilder management API with a RESTful interface
Python
85
star
2

rp2040_hal

Ada drivers for the Raspberry Pi RP2040 SoC
Ada
35
star
3

pico_examples

Ada examples for the Raspberry Pi Pico
Ada
30
star
4

python-poster

DEPRECATED Streaming HTTP uploads and multipart/form-data encoding
Python
16
star
5

python-digg

Python client implementation for Digg's new writable API
Python
16
star
6

pico_bsp

Ada BSP for the Raspberry Pi Pico
Ada
15
star
7

codenames

Generate codenames similar to those used by the US Armed Forces
Shell
15
star
8

mpegts

Python module for decoding MPEG2 transport streams
Python
12
star
9

notcursesada

Ada bindings for the notcurses text user interface library
Ada
12
star
10

noaaport

NWS NOAAPORT and EMWIN client library for Python
Python
11
star
11

rfid

Python library for interfacing with RFID modules from Seeed Studio
Python
8
star
12

rp

Ada on RP2040 proof of concept
Ada
8
star
13

scrobbleshark

Scrobble track plays from GrooveShark to Last.fm (Python)
Python
8
star
14

fincore

A port of fincore from linux-ftools to a Python C extension
C
7
star
15

dewpoint

A utility library for interacting with clouds
Python
7
star
16

pynewspaper

A python based RSS and NNTP reader
Python
6
star
17

clustoclient

Python client library and utilities for Clusto (http://clusto.org/)
Python
6
star
18

cobalt

A frontend and server for Graphite using nvd3 and clusto
JavaScript
5
star
19

epoll-ada

Ada binding for Linux epoll
Ada
5
star
20

gophernews

Gopher server for reading Digg and Hacker News
Python
5
star
21

pagercal

Generate an iCal feed for PagerDuty schedules
Python
4
star
22

picolua

A minimal example of Lua with Raspberry Pi pico-sdk
C
4
star
23

hue-tricks

Scripts for automating Philips Hue lights.
Python
4
star
24

drivers

An assortment of drivers
Ada
4
star
25

linux_hal

Ada HAL drivers for Linux GPIO and SPI interfaces
Ada
4
star
26

infra

Client tools for clusto
Python
4
star
27

carrier

A RESTful KVM control daemon
Python
4
star
28

advent

Advent of Code - Ada
Ada
4
star
29

wacc

Ada
3
star
30

ipstack

AdaCore TCP/IP stack for high-integrity systems
Ada
3
star
31

tiny_text

Low resolution bitmap font with a small memory footprint
Ada
3
star
32

chests

Chests are bounded containers
Ada
3
star
33

fah-docker

Folding@Home in a docker container
Dockerfile
3
star
34

bigstack

format stack size reports from GCC's -fstack-usage output
Python
2
star
35

retry

retry: run a command until it succeeds
Ada
2
star
36

synack_misc

Miscellaneous useful packages for embedded projects
Ada
2
star
37

meta-clockwork

Yocto BSP for ClockworkPi boards
PHP
2
star
38

clusto-viz

Visualization tools for clusto
Python
2
star
39

cities

2
star
40

pyfat

A python module for interacting with FAT images
2
star
41

flameon

Campfire to IRC bridge
Python
2
star
42

ada-builder

Docker environment for cross-compiling Ada arm-eabi binaries on x86_64
Dockerfile
2
star
43

robokad

Python
1
star
44

ev3

LEGO Mindstorms EV3
HTML
1
star
45

the_grid

A game demo for the Pimoroni Picosystem
Ada
1
star
46

cyw43

CYW43439 wireless driver
C
1
star
47

picow_test

Test program for Raspberry Pi Pico W
Ada
1
star
48

macropad_bsp

Adafruit Macropad RP2040 BSP (Ada)
Ada
1
star
49

chromesdr

Software Defined Radio in a Chrome App
JavaScript
1
star
50

adafruit_metro_rp2040_bsp

Adafruit Metro RP2040 BSP
Ada
1
star
51

quisk

Software Defined Radio (SDR) written in Python and C
Python
1
star
52

i2c_led_matrix

Ada
1
star
53

msp430test

Demo of SPARK/Ada on MSP430
C
1
star
54

ravenscar_full_rp2040

Ada
1
star
55

ietfnotify

IETF event notification service
1
star
56

learnesp32

Ada on ESP32-C3: Proof of concept
Ada
1
star
57

pico_doc

Documentation for Ada on the Raspberry Pi Pico
HTML
1
star
58

feather51

An 8051 based Feather main-board
Roff
1
star
59

breakouts

Random breakout boards, rarely useful
1
star