• Stars
    star
    1,481
  • Rank 31,741 (Top 0.7 %)
  • Language
    Clojure
  • License
    Other
  • Created almost 12 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Data workflow tool, like a "Make for data"

Drake

Drake is a simple-to-use, extensible, text-based data workflow tool that organizes command execution around data and its dependencies. Data processing steps are defined along with their inputs and outputs and Drake automatically resolves their dependencies and calculates:

  • which commands to execute (based on file timestamps)
  • in what order to execute the commands (based on dependencies)

Drake is similar to GNU Make, but designed especially for data workflow management. It has HDFS support, allows multiple inputs and outputs, and includes a host of features designed to help you bring sanity to your otherwise chaotic data processing workflows.

Drake walk-through

If you like screencasts, check out this Drake walk-through video recorded by Artem Boytsov, Drake's primary designer:

Installation

Drake has been tested under Linux, Mac OS X and Windows 8. We've not tested it on other operating systems.

Drake installs itself on the first run of the drake shell script; there is no separate install script. Follow these instructions to install drake manually:

  1. Make sure you have Java version 6 or later.
  2. Download the drake script from the master branch of this project.
  3. Place the drake script on your $PATH. (~/bin is a good choice if it is on your path.)
  4. Set it to be executable. (chmod 755 ~/bin/drake)
  5. Run it (drake)

Homebrew

If you're on a Mac you can alternatively use Homebrew to install Drake:

brew install drake

Upgrade Drake

Starting with Drake version 1.0.0, once you have Drake installed you can easily upgrade your version of Drake by running drake --upgrade. The latest version of Drake will be downloaded and installed for you.

Download or build the uberjar

You can build Drake from source or run from a prebuilt jar. Detailed instructions

Use Drake as a Clojure library

You can programmatically use Drake from your Clojure project by using Drake's Clojure front end. Your project.clj dependencies should include the latest Drake library, e.g.:

[factual/drake "1.0.3"]

Faster startup time

The JVM startup time can be a nuisance. To reduce startup time, we recommend using the way cool Drip. Please see the Drake with Drip wiki page.

Basic Usage

The wiki is the home for Drake's documentation, but here are simple notes on usage:

To build a specific target (and any out-of-date dependencies, if necessary):

$ drake mytarget

To build a target and everything that depends on it (a.k.a. "down-tree" mode):

$ drake ^mytarget

To build a specific target only, without any dependencies, up or down the tree:

$ drake =mytarget

To force build a target:

$ drake +mytarget

To force build a target and all its downtree dependencies:

$ drake +^mytarget

To force build the entire workflow:

$ drake +...

To exclude targets:

$ drake ... -sometarget -anothertarget

By default, Drake will look for ./Drakefile. The simplest way to run your workflow is to name your workflow file Drakefile, and make sure you're in the same directory. Then, simply:

$ drake

To specify the workflow file explicitly, use -w or --workflow. E.g.:

$ drake -w /myworkflow/my-workflow.drake

Use drake --help for the full list of options.

Documentation, etc.

The wiki is the home for Drake's documentation.

A lot of work went into designing and specifying Drake. To prove it, here's the 60 page specification and user manual. It's stored in Google Docs, and we encourage everyone to use its superb commenting feature to provide feedback. Just select the text you want to comment on, and click Insert -> Comment (Ctrl + Alt + M on Windows, Cmd + Option + M on Mac). It can also be downloaded as a PDF.

There are annotated workflow examples in the demos directory.

There's a Google Group for Drake where you can ask questions. And if you found a bug or want to submit a feature request, go to Drake's GitHub issues page.

Visualize your workflow

See more detail

Asynchronous Execution of Steps

Please see the wiki page on async.

Plugins

Drake has a plugin mechanism, allowing developers to publish and use custom plugins that extend Drake. See the Plugin wiki page for details.

HDFS Compatibility

Drake provides HDFS support by allowing you to specify inputs and outputs like hdfs:/my/big_file.txt.

If you plan to use Drake with HDFS, please see the wiki page on HDFS Compatibility.

Amazon S3 Compatibility

Thanks to Chris Howe, Drake now has basic compatibility with Amazon S3 by allowing you to specify inputs and outputs like s3://bucket/path/to/object.

If you plan to use Drake with S3, please see the wiki doc on S3 Compatibility.

Drake on the REPL

You can use Drake from your Clojure REPL, via drake.core/run-workflow. Please see the Drake on the REPL wiki page for more details.

Stuff outside this repo

Thanks to Lars Yencken, we now have Vim syntax support for Drake:

Also thanks to Lars Yencken, utilities for making life easier in Python with Drake workflows.

Courtesy of @daguar, an alternative approach to installing Drake on Mac OS X.

Original blog post announcing Drake's open source release

An epic knock-down-drag-out set of threads on Hacker News discussing the design merits of Drake

License

Source Copyright © 2012-2015 Factual, Inc.

Distributed under the Eclipse Public License, the same as Clojure uses. See the file COPYING.

More Repositories

1

skuld

Distributed task tracking system.
Clojure
300
star
2

geo

Clojure library for working with geohashes, polygons, and other world geometry
Clojure
294
star
3

riffle

write-once key/value storage engine
Clojure
136
star
4

s3-journal

stable, high-throughput journalling to S3
Clojure
100
star
5

clj-leveldb

Clojure bindings for LevelDB
Clojure
75
star
6

timely

Timely: A clojure dsl for cron and scheduling library
Clojure
35
star
7

open-dockerfiles

Factual's open source dockerfiles
Shell
28
star
8

parquet-rewriter

A library to mutate parquet files
Java
18
star
9

beercode-open

Open-source code backed by the Factual Beer Guarantee
Java
17
star
10

clj-helix

Clojure bindings for Apache Helix
Clojure
13
star
11

c4

Convenience features for handling record files the Clojure way
Clojure
9
star
12

eliza

Clojure
6
star
13

sosueme

A collection of Clojure functions for things we like to do.
Clojure
6
star
14

patchwork

Factual Dependency Management Tool
Python
5
star
15

docker-mariadb-10.0-galera

Shell
5
star
16

solr-mapreduce-indexer

Partial copy of solr/lucene contrib mapreduce indexer tool that works on Solr 6.x with some bug fixes and dependencies compiled in
Java
4
star
17

smaker

Smaker extends the standard Snakemake library by 1) supporting arbitrary snakefile aggregation/re-use through 2) middleware that parses generic wildcards in snakefiles.
Python
4
star
18

torpedo

Lets you torpedo complex functional expressions
Clojure
2
star
19

drake-interface

Defines Drake interfaces
Clojure
2
star
20

factual-android-sdk-demo

Java
2
star
21

engine-segment-integration-android

Factual Engine / Segment Analytics Android Integration
Java
1
star
22

docker-collins

Dockerfile
1
star
23

marathon-apps-exporter

Marathon Apps Exporter For Prometheus
HTML
1
star
24

kudos

Ruby
1
star
25

docker-osm2pgsql

Docker image for running osm2pgsql.
Shell
1
star
26

sdk-examples

Factual SDK Examples
Java
1
star
27

jackalope

Github integration service to support custom release planning processes
Clojure
1
star