• This repository has been archived on 09/Nov/2017
  • Stars
    star
    157
  • Rank 238,399 (Top 5 %)
  • Language
    Clojure
  • License
    Eclipse Public Li...
  • Created over 15 years ago
  • Updated almost 13 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Library to aid writing Hadoop jobs in Clojure.
UP-TO-DATE fork with more recent maintenance is here:
https://github.com/alexott/clojure-hadoop 


clojure-hadoop

A library to assist in writing Hadoop MapReduce jobs in Clojure.

by Stuart Sierra
http://stuartsierra.com/

For stable releases, see
http://stuartsierra.com/software/clojure-hadoop

For more information
on Clojure, http://clojure.org/
on Hadoop, http://hadoop.apache.org/

Also see my presentation about this library at
http://vimeo.com/7669741


Copyright (c) Stuart Sierra, 2009. All rights reserved.  The use and
distribution terms for this software are covered by the Eclipse Public
License 1.0 (http://opensource.org/licenses/eclipse-1.0.php) which can
be found in the file LICENSE.html at the root of this distribution.
By using this software in any fashion, you are agreeing to be bound by
the terms of this license.  You must not remove this notice, or any
other, from this software.



DEPENDENCIES

This library requires Java 6 JDK, http://java.sun.com/

Building from source requires Apache Maven 2, http://maven.apache.org/



BUILDING

If you downloaded the library distribution as a .zip or .tar file,
everything is pre-built and there is nothing you need to do.

If you downloaded the sources from Git, then you need to run the build
with Maven. In the top-level directory of this project, run:

    mvn assembly:assembly

This compiles and builds the JAR files.

You can find these files in the "target" directory (replace ${VERSION}
with the current version number of this library):

    clojure-hadoop-${VERSION}-examples.jar :

        This JAR contains all dependencies, including all of Hadoop
        0.18.3.  You can use this JAR to run the example MapReduce
        jobs from the command line.  This file is ONLY for running the
        examples.


    clojure-hadoop-${VERSION}-job.jar :

        This JAR contains the clojure-hadoop libraries and Clojure
        1.0.  It is suitable for inclusion in the "lib" directory of a
        JAR file submitted as a Hadoop job.


    clojure-hadoop-${VERSION}.jar :

        This JAR contains ONLY the clojure-hadoop libraries.  It can
        be placed in the "lib" directory of a JAR file submitted as a
        Hadoop job; that JAR must also include the Clojure 1.0 JAR.



RUNNING THE EXAMPLES

After building, copy the file from

    target/clojure-hadoop-${VERSION}-examples.jar

to something short, like "examples.jar".  Each of the *.clj files in
the src/examples directory contains instructions for running that
example.



USING THE LIBRARY IN HADOOP

After building, include the "clojure-hadoop-${VERSION}-job.jar" file
in the lib/ directory of the JAR you submit as your Hadoop job.



DEPENDING ON THE LIBRARY WITH MAVEN

You can depend on clojure-hadoop in your Maven 2 projects by adding
the following lines to your pom.xml:

    <dependencies>
      ...

      <dependency>
        <groupId>com.stuartsierra</groupId>
        <artifactId>clojure-hadoop</artifactId>
        <version>${VERSION}</version>
      </dependency>

      ...
    </dependencies>
    ...
    <repositories>
      ...
      <!-- For released versions: -->
      <repository>
        <id>stuartsierra-releases</id>
        <name>Stuart Sierra's personal Maven 2 release repository</name>
        <url>http://stuartsierra.com/maven2</url>
      </repository>

      <!-- For SNAPSHOT versions: -->
      <repository>
        <id>stuartsierra-snapshots</id>
        <name>Stuart Sierra's personal Maven 2 SNAPSHOT repository</name>
        <url>http://stuartsierra.com/m2snapshots</url>
      </repository>
      ...
    </repositories>



USING THE LIBRARY

This library provides different layers of abstraction away from the
raw Hadoop API.

Layer 1: clojure-hadoop.imports

    Provides convenience functions for importing the many classes and
    interfaces in the Hadoop API.

Layer 2: clojure-hadoop.gen

    Provides gen-class macros to generate the multiple classes needed
    for a MapReduce job.  See the example file "wordcount1.clj" for a
    demonstration of these macros.

Layer 3: clojure-hadoop.wrap

    clojure-hadoop.wrap: provides wrapper functions that automatically
    convert between Hadoop Text objects and Clojure data structures.
    See the example file "wordcount2.clj" for a demonstration of these
    wrappers.

Layer 4: clojure-hadoop.job

    Provides a complete implementation of a Hadoop MapReduce job that
    can be dynamically configured to use any Clojure functions in the
    map and reduce phases.  See the example file "wordcount3.clj" for
    a demonstration of this usage.

Layer 5: clojure-hadoop.defjob

    A convenient macro to configure MapReduce jobs with Clojure code.
    See the example files "wordcount4.clj" and "wordcount5.clj" for
    demonstrations of this macro.

More Repositories

1

component

Managed lifecycle of stateful objects in Clojure
Clojure
2,086
star
2

reloaded

Leiningen 2 project template with tools.namespace, profiles, and user.clj
HTML
248
star
3

dependency

A data structure for representing dependency graphs in Clojure
Clojure
222
star
4

lazytest

(archived) New test framework for Clojure
Clojure
162
star
5

class-diagram

Generate & display class hierarchy diagrams for Java classes
Clojure
112
star
6

cljque

(archived) experiments with event streams in Clojure
Clojure
80
star
7

flow

(archived) Function definitions derived from graph declarations.
Clojure
78
star
8

frequencies

Basic statistical computations on frequency maps (histograms) in Clojure
Clojure
72
star
9

mapgraph

Basic in-memory graph database of maps with links
Clojure
66
star
10

dotfiles

my configuration files
Emacs Lisp
53
star
11

parallel-async

Examples of parallel processing with Clojure's core.async
Clojure
50
star
12

component.repl

Development utilities for the Component framework
Clojure
50
star
13

log.dev

Quick-start logging setup for Java or Clojure development
Clojure
43
star
14

clojure-rdf

(archived) RDF-manipulation library for Clojure
Clojure
34
star
15

validate

(archived) Composable data validation functions
Clojure
22
star
16

stacktrace.raw

Undo the effect of Clojure stacktrace pretty-printers
Clojure
20
star
17

clojure-archetype

Maven 2 archetype for Clojure projects
Clojure
17
star
18

clojure.walk2

Reimplementation of clojure.walk with protocols
Clojure
14
star
19

altlaw-backend

Back-end and web site code for www.altlaw.org
Clojure
13
star
20

devbox

Provisioning scripts for my personal development workstation
Shell
13
star
21

need

(archived) Clojure 'need' macro to replace use/require/refer
Clojure
11
star
22

classpath-manager

(archived) Nailgun add-on to run Java classes with configurable classpath
Java
11
star
23

hadoop-maven

(archived) Maven 2 POMs for Apache Hadoop projects
10
star
24

find-on-github

Emacs functions to open current file on GitHub.com
Emacs Lisp
10
star
25

password-store

My fork of 'pass' from http://www.passwordstore.org/
Shell
8
star
26

pairhost

(archived) EC2 remote pairing development environment
Shell
8
star
27

component.pedestal

Pedestal web server wrapper for Component systems
Clojure
8
star
28

org-mode

My copy of the org-mode repo
Emacs Lisp
7
star
29

cljs-formatter

(archived) ClojureScript pretty-printer that utilizes the DOM
Clojure
6
star
30

new-clojure-maven-plugin

(archived) leaner, meaner Clojure Maven plugin
Java
5
star
31

altlaw-pdf

PDF-to-HTML conversion for www.altlaw.org
C++
4
star
32

altlaw-clojure-restlet

Clojure/Restlet integration for AltLaw.org
Clojure
2
star
33

altlaw-vocab

RDF vocabulary for AltLaw.org
2
star
34

pedestal-test-sse-close

Test case for when client closes Pedestal SSE connection
Clojure
1
star
35

altlaw-template

StringTemplate-based rendering library for AltLaw
Clojure
1
star
36

altlaw-crawler

HTTP Web crawler for AltLaw.org
Clojure
1
star
37

clojure-time-trials

(archived) Performance tests for different programming techniques in Clojure
Clojure
1
star
38

buildtest

Testing Maven / Hudson build & release process for Clojure & contrib libraries
1
star
39

altlaw-extract

Data extraction / parsing code for AltLaw.org
Java
1
star
40

altlaw-jruby

Custom JRuby build for AltLaw.org
1
star
41

altlaw-parent

Parent POM for Maven 2 builds of AltLaw projects
1
star