• Stars
    star
    169
  • Rank 223,011 (Top 5 %)
  • Language
    Ruby
  • Created over 12 years ago
  • Updated over 9 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Seriously Fun guide to Big Data Analytics in Practice

Big Data for Chimps: A Seriously Fun guide to Terabyte-scale data processing

This is the work-in-progress version of the upcoming O'Reilly book, Big Data for Chimps: A Seriously Fun guide to Hadoop and Terabyte-scale data processing.

Our intent is to provide the best guide for exploratory data analytics using Hadoop -- for data science in practice. We use high-level languages (Pig and Ruby) that make Hadoop a tool, not a framework, allowing re-use and rapid development. We'll cover enough Hadoop internals to save you from diving into the source code, and enough tuning advice to let you know where to drill deep.

In all cases, the focus is on maximizing your time and creativity -- on helping you uncover what question to ask and the right way to ask it.

O'Reilly has courageouly agreed to release the book under an http://creativecommons.org/licenses/by-nc-sa/3.0/[CC-BY-NC-SA]. To buy a physical copy of the book, or a Kindle (.mobi) or iOS/Nook (.epub), visite the early release http://shop.oreilly.com[O'Reilly bookstore] (TODO: link to early release page). Buy it now, and you'll get frequently-updated access and the final version once available.

License

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Code is Apache licensed unless specifically labeled otherwise.

More Repositories

1

ironfan

Chef orchestration layer -- your system diagram come to life. Provision EC2, OpenStack or Vagrant without changes to cookbooks or configuration
Ruby
501
star
2

wukong

Ruby on Hadoop: Efficient, effective Hadoop streaming & bulk data processing. Write micro scripts for terabyte-scale data
Ruby
497
star
3

wonderdog

Bulk loading for elastic search
Java
186
star
4

configliere

Wise, discreet configuration for ruby scripts: integrate config files, environment variables and command line with no fuss
Ruby
122
star
5

ironfan-pantry

Battle-hardened Ironfan-ready big data chef cookbooks, laden with best practices and love from your friends at Infochimps
Python
99
star
6

ironfan-homebase

Skeleton homebase for Ironfan and Chef -- use this to hold your clusters, cookbooks and stacks
Ruby
34
star
7

data_science_fun_pack

Meta-repository of big data tools -- source and essential plugins for hadoop, pig, wukong, storm, kafka etc.
Java
29
star
8

gorillib

Gorillib: infochimps lightweight subset of ruby convenience methods
Ruby
17
star
9

ironfan-ci

Continuous Integration testing of ironfan clusters and chef cookbooks. Pass your system diagram into iron law,
Ruby
17
star
10

wukong-hadoop

Execute Wukong code within the Hadoop framework.
Ruby
13
star
11

ironfan-repoman

Rake tasks to syndicate out 50 cookbooks from ironfan-pantry into distinct isolated repos. Don't look in the trunk, repo man.
Ruby
13
star
12

chimpstation-homebase

it's like rocket fuel for cookbook development
Ruby
10
star
13

icss

Infochimps Stupid Schema library: an avro-compatible data description standard. ICSS completely describes a collection of data (and associated assets) in a way that is expressive, scalable and sufficient to drive remarkably complex downstream processes.
Ruby
9
star
14

vayacondios

Data goes in. The right thing happens.
Ruby
8
star
15

swineherd-fs

Filesystem Abstraction for S3, HDFS and normal filesystem
Ruby
5
star
16

senor_armando

Skeleton Goliath (http://goliath.io) App layout
Ruby
5
star
17

wukong-storm

Storm plugin for Wukong
Java
5
star
18

iron_cuke

Integration tests for the cloud, done right.
Ruby
4
star
19

wukong-load

Plugin that makes it easy to load, dump, and sync data between Wukong and various data stores
Ruby
4
star
20

community-pantry

Ironfan cookbooks not currently maintained by Infochimps
Ruby
3
star
21

wukong-deploy

Deploy pack framework for the Infochimps Platform
Ruby
2
star
22

dotfiles

dotfiles (.zshrc, .emacs.d, etc) for an infochimps standard workstation
Ruby
2
star
23

pigsy

UDFs and Loaders for Apache Pig -- geodata and more on hadoop
Java
2
star
24

infochimps-labs.github.com

infochimps open source reference pages
JavaScript
2
star
25

dark_siphon

A Goliath web app for duplicating production traffic to another system.
Ruby
2
star
26

bare-pantry

An empty Ironfan pantry
Ruby
1
star
27

goliath-chimp

Collection of Chimp-inspired Goliath/Rack utility classes
Ruby
1
star
28

wukong_for_r

How to use Wukong to run R scripts in Hadoop as well as locally
R
1
star
29

kibana-es-plugin

CSS
1
star