• Stars
    star
    243
  • Rank 166,489 (Top 4 %)
  • Language
    C
  • License
    GNU General Publi...
  • Created about 15 years ago
  • Updated almost 9 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.
Brandyn White <[email protected]>
Andrew Miller <[email protected]>

Source  https://github.com/bwhite/hadoopy/
Issues  https://github.com/bwhite/hadoopy/issues
Docs    http://bwhite.github.com/hadoopy/

IRC: #hadoopy @ freenode.net

Requirements
python development headers (python-dev), build tools (build-essential)

Optional
cython (>=.13) (without this it falls back to the pregenerated .c files)

Features
- oozie support
- Automated job parallelization 'auto-oozie' available in the hadoopy_flow project (maintained out of branch)
- typedbytes support (very fast)
- Local execution of unmodified MapReduce job with launch_local
- Read/write sequence files of TypedBytes directly to HDFS from python (readtb, writetb)
- Works on OS X
- Allows printing to stdout and stderr in Hadoop tasks without causing problems (uses the 'pipe hopping' technique, both are available in the task's stderr)
- critical path is in Cython
- works on clusters without any extra installation, Python, or any Python libraries (uses Pyinstaller that is included in this source tree)
- Simple HDFS access (readtb and ls) inside Python, even inside running jobs
- Unit test interface
- Reporting using status and counters (and print statements! no need to be scared of them in Hadoopy)
- Supports design patterns in the Lin/Dyer book (http://www.umiacs.umd.edu/~jimmylin/book.html)

Limitations
- Hadoop Local currently unsupported due to a bug in Hadoop's handling of the distributed cache in this mode.  Use psuedo-distributed instead for now.  (#40)

Used in
- A Case for Query by Image and Text Content: Searching Computer Help using Screenshots and Keywords (to appear in WWW'11)
- Web-Scale Computer Vision using MapReduce for Multimedia Data Mining (at KDD'10)
- Vitrieve: Visual Search engine
- Picarus: Hadoop computer vision toolbox

Ubuntu Install (others are similar)
sudo apt-get install python-dev build-essential
sudo python setup.py install

More Repositories

1

hadoop_vision

Example code for "Web-Scale Computer Vision using MapReduce for Multimedia Data Mining"
Python
49
star
2

picarus

Computer vision in the cloud: CV + ML + Hadoop + HBase + REST.
JavaScript
42
star
3

imfeat

Image Feature Descriptors
Python
26
star
4

classipy

A collection of classifiers with a standardized interface. Has a HTTP server interface that allows any language to access.
Python
18
star
5

dfs

This is a distributed FUSE filesystem I wrote for a class. It supports capability based authentication, public key handshake, symmetric session encryption, extent server, and log server (similar to GFS)
C
9
star
6

kinectfs

ZeroMQ based project for using Pub-Sub for the Kinect. Dumps can be mounted and accessed using a FUSE filesystem.
Python
7
star
7

vision_data

Python
7
star
8

distpy

Python distance metrics
C
6
star
9

image_server

Simple image server for visualization on headless boxes
Python
6
star
10

fpga-image-registration

5
star
11

hadoopy_hbase

Library that adds hbase support to Hadoopy
Python
5
star
12

imseg

C
4
star
13

.emacs.d

My emacs stuff
Emacs Lisp
4
star
14

hadoopy_tutorial

Python
4
star
15

hadoop_clustering

Python
4
star
16

opennpy

OpenNI python wrapper with a libfreenect-esque interface
C
4
star
17

crawlers

Python
4
star
18

vision_results

Collection of simple result visualizations for vision tasks, readily hackable for your own use.
Python
4
star
19

image_search

Python
4
star
20

hadoop_log

Hadoop Jobtracker webserver scraper
Python
4
star
21

jewel-thief

Python
3
star
22

dv_tp_integration

Java
3
star
23

dv_bench

Python
3
star
24

pyram

Python parameter selection library
Python
3
star
25

hadoopy_helper

Useful tools that compliment hadoopy
Python
3
star
26

texas_pete

Python
3
star
27

hadoopy-goodies

Extra tools and helper scripts using the Hadoopy library
Python
3
star
28

impoint

C
3
star
29

keyframe

Python
3
star
30

openeyes

Python
3
star
31

hadoopy_flow

Hadoopy monkey patch library to added parallel job execution automatically
Python
3
star
32

hadoopy-picnic

Hadoopy-based collage maker (still under development, watch this space over the next week)
Python
3
star
33

viderator

Python
3
star
34

graphical_models

Binary CRF experiments
Python
3
star
35

white-knight

A background subtraction, tracking, and classification program written in C++ and Python. It was bothering me that good code was going stale in my backups for about a year, so I decided to work on it in my spare time.
C++
3
star
36

hadoopy_utils

Python
2
star
37

dv_hadoop_tests

Python
2
star
38

annotation

Python tools to annotate images
Python
2
star
39

python_examples

Example python tasks
Python
2
star
40

pywxopengl-fun

Python
2
star
41

kernels

C
2
star
42

pythonrc

Python
2
star
43

vidfeat

Python
2
star
44

interactive_learning

Python
2
star
45

mturk_vision

Mechanical turk vision scripts
Python
2
star
46

camera_geometry

Python
2
star
47

opencv-examples

2
star
48

puppet_config

Puppet config for Hadoop and Picarus
Puppet
2
star
49

python_templates

Boiler plate python headers, etc
Python
2
star
50

pkgtest

A basic distutils package that shows a possible way to use ctypes modules
Python
1
star
51

rest-examples

REST client/server examples
1
star
52

coq-confuse

Verilog
1
star
53

picarus_takeout

C++
1
star
54

upload_server

Simple server to allow uploading files into a local directory
Python
1
star
55

patch_classifier

Python
1
star
56

.ipython

Python
1
star
57

pycassa_server

HTML display server for Cassandra using pycassa
Python
1
star
58

meme_hunter

1
star
59

nn_bench

C
1
star
60

gmaps_annotations

JavaScript
1
star
61

project_status

1
star
62

filter_fun

Workspace for image filter design
1
star
63

hadoopy_rt

Python
1
star
64

data_sources

Common interface for column oriented read-only data sources (used in a few of my projects)
Python
1
star