• This repository has been archived on 09/May/2021
  • Stars
    star
    479
  • Rank 91,143 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 13 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Lightweight MapReduce in python

mincemeat.py: MapReduce on Python

Introduction

mincemeat.py is a Python implementation of the MapReduce distributed computing framework.

mincemeat.py is:

  • Lightweight - All of the code is contained in a single Python file (currently weighing in at <13kB) that depends only on the Python Standard Library. Any computer with Python and mincemeat.py can be a part of your cluster.
  • Fault tolerant - Workers (clients) can join and leave the cluster at any time without affecting the entire process.
  • Secure - mincemeat.py authenticates both ends of every connection, ensuring that only authorized code is executed.
  • Open source - mincemeat.py is distributed under the MIT License, and consequently is free for all use, including commercial, personal, and academic, and can be modified and redistributed without restriction.

Download

  • Just mincemeat.py (v 0.1.4)
  • The full 0.1.4 release (includes documentation and examples)
  • Clone this git repository: git clone https://github.com/michaelfairley/mincemeatpy.git

Example

Let's look at the canonical MapReduce example, word counting:

example.py:

#!/usr/bin/env python
import mincemeat

data = ["Humpty Dumpty sat on a wall",
        "Humpty Dumpty had a great fall",
        "All the King's horses and all the King's men",
        "Couldn't put Humpty together again",
        ]
# The data source can be any dictionary-like object
datasource = dict(enumerate(data))

def mapfn(k, v):
    for w in v.split():
        yield w, 1

def reducefn(k, vs):
    result = sum(vs)
    return result

s = mincemeat.Server()
s.datasource = datasource
s.mapfn = mapfn
s.reducefn = reducefn

results = s.run_server(password="changeme")
print results

Execute this script on the server:

python example.py

Run mincemeat.py as a worker on a client:

python mincemeat.py -p changeme [server address]

And the server will print out:

{'a': 2, 'on': 1, 'great': 1, 'Humpty': 3, 'again': 1, 'wall': 1, 'Dumpty': 2, 'men': 1, 'had': 1, 'all': 1, 'together': 1, "King's": 2, 'horses': 1, 'All': 1, "Couldn't": 1, 'fall': 1, 'and': 1, 'the': 2, 'put': 1, 'sat': 1}

This example was overly simplistic, but changing the datasource to be a collection of large files and running the client on multiple machines will work just as well. In fact, mincemeat.py has been used to produce a word frequency lists for many gigabytes of text using a slightly modified version of this code.

Clients

You can run the client manually from within other Python scripts (rather than running mincemeat.py directly):

import mincemeat

client = mincemeat.Client()
client.password	= "changeme"
client.conn("localhost", mincemeat.DEFAULT_PORT)

Shepherd.py provides more sophisticated ways to run clients, including having client that poll or are forked on the same machine.

Imports

One potential gotcha when using mincemeat.py: Your mapfn and reducefn functions don't have access to their enclosing environment, including imported modules. If you need to use an imported module in one of these functions, be sure to include import whatever in the functions themselves.

Python 3 support

ziyuang has a fork of mincemeat.py that's comptable with python 3: ziyuang/mincemeatpy

More Repositories

1

method_decorators

Python's method decorators for Ruby
Ruby
239
star
2

hasu

Faster iteration on Gosu games
Ruby
64
star
3

ezing

Easing functions for Rust
Rust
50
star
4

rust-imgui-sdl2

SDL2 Input handling for imgui-rs
Rust
48
star
5

rust-imgui-opengl-renderer

OpenGL (3+) rendering for imgui-rs
Rust
37
star
6

pong

Pong
Ruby
28
star
7

mfrs

My rust utilities
Rust
11
star
8

hybrid-chess

Rust
11
star
9

rspec-celluloid

Run your RSpec suite in parallel on top of Celluloid
Ruby
9
star
10

unicorn-heroku

[Unmaintained] Unicorn that cooperates with Heroku's signal handling
Ruby
9
star
11

rubycraft

Ruby
7
star
12

abongo

(unmaintained) Ruby A/B testing on MongoDB
Ruby
7
star
13

hnd-client

[Unmaintained] HackerNews'd! client
JavaScript
6
star
14

gro

Go's concurrency model "implemented" for Ruby
Ruby
4
star
15

mygl

Software implementation of OpenGL ES 3.2
Rust
4
star
16

clueless

My LD28 entry
Python
2
star
17

mapi-kata

Microblog API Kata
Ruby
2
star
18

opool

A simple Ruby object pool
Ruby
2
star
19

enumerable-kata

Enumerable Kata
Ruby
2
star
20

mapi-kata-tigertonic-gorp

Go
2
star
21

velocity_check

Lets you know if something is happening too often
Ruby
2
star
22

hnd-server

[Unmaintained] HackerNews'd! server
Ruby
2
star
23

mapi-kata-elixir-dynamo-ecto

Elixir
1
star
24

dotfiles-old

My dotfiles
Emacs Lisp
1
star
25

gravitas

My LD29 entry
Java
1
star
26

jokevm

A toy JVM implementation
Ruby
1
star
27

wfc

Rust implementation of wave function collapse
Rust
1
star
28

gdxpong

Getting my feet wet with libgdx
Java
1
star
29

rust-pong

Pong in rust with SDL2
Rust
1
star
30

flippy_bird

Flippy Bird
Java
1
star
31

adventofcode2020

Advent of Code 2020 in Elixir
Elixir
1
star
32

rarkanoid

A quick 'n dirty arkanoid clone in Ruby
Ruby
1
star