• Stars
    star
    1,374
  • Rank 33,432 (Top 0.7 %)
  • Language
    Clojure
  • License
    Other
  • Created over 14 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Data processing on Hadoop without the hassle.

Cascalog

Build Status

Cascalog is a fully-featured data processing and querying library for Clojure or Java. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer. Cascalog is a replacement for tools like Pig, Hive, and Cascading and operates at a significantly higher level of abstraction than those tools.

Follow the getting started steps, check out the tutorial, and you'll be running Cascalog queries on your local computer within 5 minutes.

Getting Started with JCascalog

To get started with JCascalog, Cascalog's pure-Java API, see this wiki page. The jcascalog.Playground class has in-memory datasets that you can play with to learn the basics.

Latest Version

The latest release version of Cascalog is hosted on Clojars:

Current Version

Getting started with Clojure Cascalog

The best way to get started with Cascalog is experiment with the toy datasets that ship with the project. These datasets are served from memory and can be played with purely from the REPL. Just follow these steps and you'll be on your way:

  1. Install leiningen
  2. Make sure you have Java 1.6 (run java -version)
  3. Start a new leiningen project with lein new <project name>, replacing <project name>
  4. Include dependency on Cascalog in your project by adding [cascalog/cascalog-core "2.1.0"] into your project's project.clj file.
  5. Work through the examples in the Getting Started Guide.

Using Cascalog within a project

Cascalog is hosted at Clojars, and some of its dependencies are hosted at Conjars. Both Clo/Con-jars are maven repos that's easy to use with maven or leiningen.

To include Cascalog in your leiningen or cake project, add the following to your project.clj:

General

[cascalog/cascalog-core "3.0.0"] ;; under :dependencies
[org.apache.hadoop/hadoop-core "1.2.1"] ;; under :dev-dependencies

Leiningen 2.0

:repositories {"conjars" "http://conjars.org/repo"}
:dependencies [cascalog/cascalog-core "3.0.0"]
:profiles { :provided {:dependencies [[org.apache.hadoop/hadoop-core "1.2.1"]]}}

Leiningen < 2.0

:dependencies [cascalog/cascalog-core "3.0.0"]
:dev-dependencies [[org.apache.hadoop/hadoop-core "1.2.1"]]

Note that Cascalog is compatible with Clojure 1.2.0, 1.2.1, 1.3.0, 1.4.0, and 1.5.1.

Documentation and Issue Tracker

Come chat with us in the Google group: cascalog-user

Or in the #cascalog or #cascading rooms on freenode!

Priorities for Cascalog development

  1. Replicated and bloom joins
  2. Cross query optimization: push constants and filters down into subqueries when possible

Acknowledgements

YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit's leading software products: YourKit Java Profiler and YourKit .NET Profiler.

Cascalog is based off of a very early branch of cascading-clojure project (http://github.com/clj-sys/cascading-clojure). Special thanks to Bradford Cross and Mark McGranaghan for their work on that project. Much of that code appears within Cascalog in either its original form or a modified form.

More Repositories

1

storm

Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more
Java
8,849
star
2

storm-starter

Learn to use Storm!
Java
942
star
3

storm-contrib

A collection of spouts, bolts, serializers, DSLs, and other goodies to use with Storm
Java
580
star
4

elephantdb

Distributed database specialized in exporting key/value data from Hadoop
Java
557
star
5

storm-deploy

One click deploy for Storm clusters on AWS
Clojure
517
star
6

dfs-datastores

Dead-simple vertical partitioning, compression, appends, and consolidation of data on a distributed filesystem.
Java
216
star
7

storm-kestrel

Library to use Kestrel as a spout within Storm
Java
134
star
8

kafka-deploy

Automated deploy for Kafka on AWS
Clojure
124
star
9

storm-mesos

Run Storm on top of the Mesos cluster resource manager
Java
68
star
10

nanny

A simple dependency management system for your projects.
Python
46
star
11

cascalog-contrib

Java
45
star
12

trident-memcached

Trident state implementation for Memcached
Java
41
star
13

cascalog-demo

A short Cascalog program that produces a simplified version of a Facebook-like news feed.
Clojure
26
star
14

basic-specter

Implementation of core of Specter without any optimizations – a reference to understand the basics of how Specter works
Clojure
23
star
15

cascading-batch-query

Optimized joins using bloom filters on Hadoop via Cascading.
Java
21
star
16

cascalog-workshop

Materials for Cascalog workshop
Clojure
18
star
17

elephantdb-cascalog

Seamless integration of ElephantDB with Cascalog
Clojure
18
star
18

trident-kafka

NOTE: This project has been moved into storm-kafka in storm-contrib
Java
15
star
19

elephantdb-cascading

Adapters to write to ElephantDB using Cascading
Java
13
star
20

specter-demo

Code for Strange Loop talk on Specter
Clojure
13
star
21

cascalog-conj

Code from my presentation of Cascalog at Clojure/conj 2011
Clojure
10
star
22

storm-website

Source for storm-project.net
CSS
7
star
23

thrift-dev

Apache Thrift + additional patches that I need
C++
6
star
24

specter-clojure-west

Clojure
6
star
25

swarm

Intense Space Invaders-like game with "terminal graphics"
C++
5
star
26

warzone

Turn based strategy game
Java
4
star
27

formula-inverse

A high-speed 3D racing game where the track can curve any which way and your car is bound to the track
C
4
star
28

specter-wiki

Repository for wiki of https://github.com/redplanetlabs/specter
4
star
29

cascalog-workshop-starter

Starter code for Cascalog workshop
Clojure
2
star
30

specter-presentation

Clojure
2
star