• Stars
    star
    563
  • Rank 79,150 (Top 2 %)
  • Language
    Python
  • Created over 12 years ago
  • Updated almost 8 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

minimalist file based workflow

Introduction

workflow.py is a minimalist file based workflow engine. It runs as a background process and can automate certain tasks such as deleting old files, emailing you when new files are created or run a script to process new files.

License

3-clause BSD License

Copyright (c) 2012, Massimo Di Pierro All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of the nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Configuring and Starting the workflow

  • create a file workflow.config using the syntax below
  • run workflow.py in that folder

Workflow options

  • -f <path> the folder to monitor and process
  • -s <seconds> the time interval between checks for new files
  • -n <name> the current filename, defaults to $0
  • -x <path> the config file to use (workflow.config)
  • -y <path> the cache file to use (workflow.cache.db)
  • -l <path> the output logfile (else console output)
  • -d daemonizes the workflow process
  • -c <rulename> does not start the workflow but clears a rule (see below)

workflow.config syntax

workflow.config consists of a series of rules with the following syntax

rulename: pattern [dt]: command

where

  • rulename is the name of the rule (cannot contain spaces).
  • pattern is a glob pattern for files to monitor. Avoid using *.*!
  • dt is a time interval (default is 1 second). Only files modified more than dt seconds ago will be considered.
  • command is the command to execute for each file matching pattern created more than dt seconds ago and not processed already. If the command ends in &, it is executed in background, else it blocks the workflow until completion. The name of the matching file can be referred to into the command with $0. Multiline commands can be continued with \.

Lines starting with # are interpreted as comments and ignored.

Examples of rules

Delete all *.log files older than one day

delete_old_logs: *.log [1d]: rm $0

Move all *.txt files older than one hour to other folder

move_old_txt: *.txt [1h]: mv $0 otherfolder/$0

Email me when a new *.doc file is created

email_me_on_new_doc: *.doc: mail -s 'new file: $0' [email protected] < /dev/null

Process new *.dat files using a Python script

process_dat: *.dat: python process.py $0

Create a finite state machine for each *.src file

rule1: *.src [1s]: echo > $0.state.1
rule2: *.state.1 [1s]: mv $0 `expr "$0" : '\(.*\).1'`.2
rule3: *.state.2 [1s]: mv $0 `expr "$0" : '\(.*\).2'`.3
rule4: *.state.3 [1s]: rm $0

Details

When a file matches a pattern, a new process is created to execute the corresponding command. The pid of the process is saved in <filename>.<rulename>.pid. This file is deleted when the process is completed. If the process fails the output log and error is saved in <filename>.<rulename>.err. If the process does not fail the output is stored in <filename>.<rulename>.out.

If a file has already been processed according to a certain rule, this info is stored in a file workflow.cache and it is not processed again unless:

  • the mtime of the file changes (for example you edit or touch the file)
  • the rule is cleaned up.

You can cleanup a rule with

python workflow.py -c rulename

This has the effect of creating a file .workflow.rulename.clear which the running workflow.py picks up and uses to clear the entry identified by rulename in workflow.cache, after which the rule will run again.

You can also delete the workflow.cache file. In this case all rules will run again when you restart workflow.py.

If the main workflow.py process is killed or crashes while some commands are being executed, those commands also are killed. You can find which files and rules where being processed by looking for <filename>.<rulename>.pid files. If you restart workflow.py those pid files are deleted.

If a rule results in an error and a <filename>.<rulename>.err is created, the file is not processed again according to the rule, unless the error file is deleted.

If a file is edited or touched and the rule runs again, the <filename>.<rulename>.out will be overwritten.

Unless otherwise specified each file is processed 1s after it is last modified. It is possible that a different process is still writing the file but it is pausing more than 1s between writes (for example the file is being downloaded via a slow connection). In this case it is best to download the file with a different name than the name used for the pattern and rename the file to its proper name after the write of the file is completed. This must be handled outside of workflow. Workflow has no way of knowing when a file is completed or not.

If the workflow.config file is edited or changed, it is reloaded without the need to re-start workflow.py.

More Repositories

1

nlib

The book "Annotated Algorithms in Python" and the nlib.py library
Python
1,327
star
2

web2py-appliances

Set of Example Web2py Appliances
JavaScript
210
star
3

kryten

A shell tool to turn code into a video presentation
Python
148
star
4

evote

A system for secure, trusted, and verifiable voting on the web
Python
116
star
5

autoinstaller

Python
104
star
6

no.css

Tiny CSS framework with almost no classes and some pure CSS effects
CSS
99
star
7

stupid.css

HTML
92
star
8

gluino

Port of web2py to Bottle, Flask, Pyramid, Tornado, wsgiref and other frameworks
Python
82
star
9

ocl

Python to C99/OpenCL/JS compiler
Python
64
star
10

Plasmid

web app to clone, edit in-place, and republish any web site
Python
54
star
11

ulid

Python
48
star
12

canvas

canvas is a simple interface to most common matplotlib functions
Python
42
star
13

web2py-recipes-source

Source code from the web2py recipes book published by packt
Python
37
star
14

pacioli

double entry account system compatible with ledger and beancount (but BSD license)
HTML
32
star
15

web2py-haystack

A full-text search engine for web2py named after Django-Haystack (since serves a similar purpose)
Python
29
star
16

buckingham

A library for error propagation and metric conversions
Python
25
star
17

web3py

web3py (work in progress... an experiment)
Python
25
star
18

videomonitor

Python
20
star
19

markmin.js

markmin.js (yet another wiki markup language but different)
JavaScript
18
star
20

mdpcl

some exercises with pyopencl
Python
17
star
21

algorithms-animator

Algorithms Animator
Python
11
star
22

fermiqcd

Automatically exported from code.google.com/p/fermiqcd
HTML
10
star
23

w2cms

w2cms
JavaScript
10
star
24

collection2

Python
10
star
25

web2py-plugins

9
star
26

nsa

Python
8
star
27

human_security

simple rsa signing API
Python
7
star
28

regift

Python
7
star
29

kusoma

JavaScript
6
star
30

date_finder

Python
6
star
31

csvstudio

A tool for processing csv files
Python
6
star
32

web2py-cordova

Python
6
star
33

estore3

Python
5
star
34

markmin-reveal-slides

markmin-reveal-slides
Python
5
star
35

cylon

physics engine
C++
5
star
36

emte-trading

Automatically exported from code.google.com/p/emte-trading
Python
5
star
37

w3

JavaScript
5
star
38

kpar

k(onfiguration) paramaters
Python
4
star
39

qcl

qcl
Python
4
star
40

qcdutils

Automatically exported from code.google.com/p/qcdutils
Python
4
star
41

demo.python.org

demo.python.org
JavaScript
4
star
42

simplescreencast.py

Python
4
star
43

evote_ranking

Python
4
star
44

vision-controlled-arm

vision-controlled-arm
JavaScript
3
star
45

web2py-welcome-theme-stupid

Python
2
star
46

scientific-cms

JavaScript
2
star
47

mdipierro.github.io

TeX
2
star
48

iotcallme

Python
2
star
49

forgetful

Python
2
star
50

blockchain_logger

Python
2
star
51

depy2015-cats

Python
2
star
52

random_variable

library to compute expectation values of random variables
Python
1
star
53

countach

Yet another static site generator - but different
Python
1
star
54

taskutils

Python
1
star
55

p2p-toy

Python
1
star
56

compy

compy
1
star
57

my_first_project

1
star
58

xml_parser

Python
1
star