• Stars
    star
    443
  • Rank 98,504 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 12 years ago
  • Updated over 9 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A simple distributed queue designed for handling one-off tasks with large sets of tasks

Taskmaster

Taskmaster is a simple distributed queue designed for handling large numbers of one-off tasks.

We built this at DISQUS to handle frequent, but uncommon tasks like "migrate this data to a new schema".

Why?

You might ask, "Why not use Celery?". Well the answer is simply that normal queueing requires (not literally, but it'd be painful without) you to buffer all tasks into a central location. This becomes a problem when you have a large amount of tasks, especially when they contain a large amount of data.

Imagine you have 1 billion tasks, each weighing in at 5k. Thats, uncompressed, at minimum 4 terabytes of storage required just to keep that around, and gains you very little.

Taskmaster on the other hand is designed to take a resumable iterator, and only pull in a maximum number of jobs at a time (using standard Python Queue's). This ensures a consistent memory pattern that can scale linearly.

Requirements

Requirements should be handled by setuptools, but if they are not, you will need the following Python packages:

  • progressbar
  • pyzmq (zeromq)
  • gevent
  • gevent_zeromq

A note on Gevent

Being that Taskmaster uses gevent for both its iterator task (master) and its consumers, your application will need to correctly implement non-blocking gevent compatible callers. In most cases this won't be a problem, but if you're using the network you'll need to look for a compatible library for your adapter. For example, there is an alternative version of psycopg2 designed for gevent called gevent-psycopg2.

Usage

Create an iterator, and callback:

# taskmaster/example.py
def get_jobs(last=0):
    # last would be sent if state was resumed
    # from a previous run
    for i in xrange(last, 100000000):
        # jobs yielded must be serializeable with pickle
        yield i

def handle_job(i):
    # this **must** be idempotent, as resuming the process may execute a job
    # that had already been run
    print "Got %r!" % i

Spawn a master:

$ tm-master taskmaster.example

You can also pass keyword arguments for the master:

$ tm-master taskmaster.example argument=value

Spawn a slave:

$ tm-slave taskmaster.example

Or spawn 8 slaves (each containing a threadpool):

$ tm-spawn taskmaster.example 8

Dont like the magical function discover for master/slave? Specify your own targets:

$ tm-master taskmaster.example:get_jobs
$ tm-slave taskmaster.example:handle_job

Maybe you simply need to run things on the same server?

$ tm-run taskmaster/example.py 8

Note

All arguments are optional, and the address will default to tcp://0.0.0.0:3050.

More Repositories

1

django-devserver

A drop-in replacement for Django's runserver.
Python
1,269
star
2

mangodb

A database that operates at CLOUD SCALE
Python
883
star
3

django-ratings

Pluggable rating fields in Django.
Python
409
star
4

django-sphinx

A transparent layer for full-text search using Sphinx and Django
Python
357
star
5

django-uuidfield

A UUIDField for Django
Python
264
star
6

mock-django

Python
224
star
7

logan

Logan is a toolkit for building standalone Django applications
Python
206
star
8

django-db-log

This project is no longer updated. Please see https://sentry.io/ for its successor
Python
122
star
9

wp-lifestream

Lifestreaming plugin for Wordpress
PHP
121
star
10

django-paging

Sexy pagination in Django
Python
108
star
11

django-orm-cache

A caching layer for Django
87
star
12

decruft

python-readability, but faster (mirror-ish)
Python
83
star
13

django-view-as

A Django middleware which allows you to view the site on behalf of a user.
Python
81
star
14

django-idmapper

An identify mapper for the Django ORM
Python
72
star
15

piplint

Piplint validates your current environment against requirements files
Python
62
star
16

python-tools-tmbundle

Adds support for automated PyFlakes linting in TextMate
Python
61
star
17

peated

HTML
61
star
18

django-static-compiler

Python
56
star
19

pdbinject

A Python utility which uses GDB to inject a telnet-able PDB session into an existing process
Python
55
star
20

py-wikimarkup

A MediaWiki-to-HTML parser for Python.
Python
53
star
21

feedreader

An RSS/Atom feed parsing layer for lxml.objectify in Python
Python
52
star
22

django-sentry

This repo has been moved!
49
star
23

django-indexer

A simple key/value store for indexing meta data on JSON-type fields
Python
46
star
24

chardet

Forked version of chardet
Python
41
star
25

sentry-old

(In Development) Sentry 2.x is a realtime event logging and aggregation platform
Python
40
star
26

django-data-tools

Python
38
star
27

django-compositepks

Composite Primary Keys fork
Python
38
star
28

django-oursql

Django database backend for MySQL via oursql.
Python
37
star
29

dotfiles

My personal dotfiles
Shell
36
star
30

quickunit

A Nose plugin which enables determining which tests to run based on the current git diff
Python
34
star
31

hive

My home infrastructure
Jinja
33
star
32

nibbits-shared

Some shared libraries which we have created at Nibbits
Python
28
star
33

numbers

Python Numbers for Humans
Python
26
star
34

sexytime

Python
26
star
35

nexus-memcache

Memcache statistics plugin for Nexus
Python
23
star
36

sentry

THIS REPOSITORY HAS BEEN MOVED
22
star
37

django-notices

A message notification system for Django
Python
22
star
38

django-db-routes

work in progress
Python
20
star
39

peek

Take a peek at whats slowing down your Python application
Python
20
star
40

ghostplusplus

Git Mirror of GHost++
C
20
star
41

kleenex

A Nose plugin designed to detect coverage and only run the tests that matter.
Python
18
star
42

git-owners

Python
16
star
43

nexus-redis

Redis stats in Nexus
Python
16
star
44

pastethat

A Django Pastebin (Pastethat.com)
Python
15
star
45

dateminer

Extract dates from webpages
Python
13
star
46

selenium-saucelabs-python

Selenium driver for Sauce OnDemand
Python
11
star
47

pmp

Python
11
star
48

anti-spam

11
star
49

pytest-django-lite

The bare minimum to integrate py.test with Django.
Python
11
star
50

SublimeFlakes

Inline PyFlakes in Sublime Text 2
Python
11
star
51

objtrack

Generic object 'viewed' status tracking in Django
Python
11
star
52

php-httplib

A port of Python's httplib in PHP
PHP
10
star
53

gitstats

Unofficial fork of GitStats with some bugfixes
Python
10
star
54

panelkit

WIP: A kit for building a tablet-focused Home Assistant UI.
TypeScript
9
star
55

europython-2011

9
star
56

django-bbcode

I'm tired of bitbucket
Python
9
star
57

gitboard

Python
8
star
58

wiki-to-overview

Redmine Plugin: Forward overview to Wiki
Ruby
8
star
59

nose-json

Python
8
star
60

lovelace-nextbus-card

A card giving richer public transit display using NextBus sensors.
TypeScript
7
star
61

pyconsg-tutorial-bootstrap

Python
7
star
62

hass-luxor

FXLuminaire Luxor integration for Home Assistant
Python
6
star
63

tabletop-server

Python
6
star
64

muskrats

TypeScript
6
star
65

nexus-celery

6
star
66

php-database

A simple database library for MySQL and PGSQL.
PHP
6
star
67

djangospot

DjangoSpot.com Source
JavaScript
6
star
68

nose-bisect

Flush out bad tests with easy bisection in Python/Nose
Python
6
star
69

redmine-improved-revisions

Redmine Plugin: Improved revisions in Redmine
Ruby
5
star
70

nibbits-maploader

Nibbits automated map and replay installer
C#
5
star
71

unifi-mqtt

Python
5
star
72

redmine_hudson

Ruby
5
star
73

forward-to-diffs

Redmine plugin: Forward revisions to diffs
Ruby
5
star
74

soundbot

Audio player extension for Phenny
Python
4
star
75

minecraft-tools

Python
4
star
76

site

JavaScript
4
star
77

rss-to-tumblr

Allows importing an rss under a specific set of tags
Python
4
star
78

nexus-postgresql

4
star
79

jinja1-djangosupport

Jinja 1 with updated Django Support
Python
4
star
80

reraise

Python
4
star
81

notsetuptools

Python
4
star
82

protobufs

Google Protocal Buffers
C++
4
star
83

redmine-home-to-projects

Forward a Redmine user to a the project listing when visiting the Home page.
Ruby
4
star
84

djangospot2

DjangoSpot using Pylons and Redis
Python
3
star
85

flask-redis

Redis support for Flask
Python
3
star
86

tabletop-mobile

JavaScript
3
star
87

pyconsg-tutorial-example

Python
3
star
88

raven

THIS PROJECT HAS BEEN MOVED
3
star
89

scmap

Python
3
star
90

gochatter

2
star
91

homeline

very wip
TypeScript
2
star
92

galaxyvalidator

galaxyvalidator.com source
Python
2
star
93

ghostplusplus-nibbits

Nibbit's version of GHost++
C
2
star
94

cask-server

Python
2
star
95

davidcramer-redirect

Redirects links on davidcramer.net to JustCramer.com
Python
2
star
96

redmine_disqus_ci

Disqus CI for Redmine
Ruby
2
star
97

stonescript

1
star
98

ad-alarm-manager

Python
1
star
99

cask-web

TypeScript
1
star
100

gobot

Go
1
star