• Stars
    star
    158
  • Rank 235,757 (Top 5 %)
  • Language
    JavaScript
  • Created over 11 years ago
  • Updated over 10 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Chapter-wise code for Agile Data the O'Reilly book

Agile Data the Book

You can buy the book here. You can read the book on O'Reilly OFPS now. Work the chapter code examples as you go. Don't forget to initialize your python environment. Try linux (apt-get, yum) or OS X (brew, port) packages if any of the requirements don't install in your virtualenv.

Agile Data Code Examples

Setup your Python Virtual Environment

# From project root

# Setup python virtualenv
virtualenv -p `which python2.7` venv --distribute
source venv/bin/activate
pip install -r requirements.txt

Download your Gmail Inbox!

# From ch3

# Download your gmail inbox
cd gmail
./gmail.py -m automatic -u [email protected] -p 'my_password_' -s ./email.avro.schema -f '[Gmail]/All Mail' -o /tmp/test_mbox 2>&1 &

Chapter 2: Data

An example spreadsheet is available at ch02/Email Analysis.xlsb. Example Pig code is available at ch02/probability.pig.

Chapter 3: Agile Tools

Full tutorial in Chapter 3 README.

Highlight:

Download your Gmail Inbox!

# From ch3

# Download your gmail inbox
cd gmail
./gmail.py -m automatic -u [email protected] -p 'my_password_' -s ./email.avro.schema -f '[Gmail]/All Mail' -o /tmp/test_mbox 2>&1 &

Chapter 4: To the Cloud!

Chapter 4 tutorial

Chapter 7: Collecting and Displaying Atomic Records

Chapter 7 tutorial

Chapter 8: Creating Charts

Chapter 8 tutorial

Chapter 9: Building Interactive Reports

Chapter 9 tutorial

Chapter 10: Making Predictions

Chapter 10 tutorial

Chapter 11: Driving Actions

Chapter 11 tutorial

More Repositories

1

Agile_Data_Code_2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Jupyter Notebook
456
star
2

weakly_supervised_learning_code

The source code to the book Weakly Supervised Learning (O'Reilly, 2020) by Russell Jurney
Jupyter Notebook
37
star
3

Collecting-Data

This is a HOWTO for collecting data in Ruby and Python applications and sending it to S3 via Kafka.
Python
31
star
4

enron-avro

Code for creating and querying an Avro encoded repository of the UC Berkeley Enron email archive
19
star
5

github-explorer

Recommender system for Github projects using the github archive data
Python
17
star
6

enron-python-flask-cassandra-pig

Hortonworks demo of Enron emails with Pig, Cassandra, Python and Flask
Python
17
star
7

Cloud-Stenography

Main Repo
Java
15
star
8

enron-node-mongo

Building a simple Node application with Pig, MongoDB, Node.js and the Enron Emails
JavaScript
13
star
9

pig-to-json

A Pig to JSON UDF for Pig that converts tuples and bags to JSON strings
Java
13
star
10

enron-elasticsearch

Pig/ElasticSearch/Wonderdog example with the Enron Emails
12
star
11

coursera_machine_learning

Python examples of the homework examples for Andrew Ng's Stanford Machine Learning class on Coursera
Python
11
star
12

github_network

Experimentation with Github data as a network
Jupyter Notebook
8
star
13

amazon_open_source

Analyzing Amazon's Free and Open Source Software (FOSS) contributions
Jupyter Notebook
8
star
14

Booting-the-Analytics-Application

Data Syndrome HOWTO
Python
6
star
15

enron-hive

Working with the Enron emails in Pig and HIVE
4
star
16

enron-jruby-sinatra-hbase-pig

Hortonworks demo of Enron emails using Hadoop, Pig, HBase, JRuby, Sinatra
Ruby
4
star
17

enron-pig-tojson-redis-node

Enron Emails -> Pig ->ToJson -> RedisStorer -> Node.js
Python
3
star
18

druid-application-development

A Realtime Chart Web Application Development with Druid
Python
3
star
19

paas_blog

A series of blog posts exploring PaaS for automating data science tasks
Jupyter Notebook
3
star
20

timeseriesserde

A time series serde for HIVE
Java
2
star
21

deep_products

A book on building products using deep learning and natural language processing
Jupyter Notebook
2
star
22

enron-hcatalog

Using HCatalog with the Enron Avro dataset
2
star
23

hive_tweets

Process your tweets in Apache Hive
Python
2
star
24

property_graph_analytics

A forthcoming book on property graph analytics
2
star
25

nltk_exercises

Working through the nltk book
Python
2
star
26

Dattack

Ruby
2
star
27

commoncrawl-pig-arcfileloader-udf-storefunc

Pig ArcFileLoader examples for loading the Common Crawl internet data
2
star
28

libpostal-reborn

Code to go with my blog post, Libpostal, Reborn!
Jupyter Notebook
2
star
29

enron-pig-accumulo

Example of using Pig with Accumulo on the Berkely enron emails
1
star
30

baby_names

Project for US Baby Names example dashboard on Apache Superset
Jupyter Notebook
1
star
31

open_business_graph

Code relating to the Relato Business Graph on data.world
Groovy
1
star
32

deep_learning

Deep learning tools and utilities
1
star
33

superset_postgres_github

A project to wrangle github event data into Postgres for Superset to analyze
Jupyter Notebook
1
star
34

LinearAlgebra

A Processing project to visualize all of Linear Algebra! :)
Processing
1
star
35

druid-python-demo

Demonstration of druid, pyDruid, Flask and d3.js
JavaScript
1
star
36

addressbook_extensions

Titanium AddressBook extensions for iOS.
Python
1
star
37

quantum_ai_readme

README cataloging resources for learning about Quantum Computing applications in Artificial Intelligence
1
star
38

atlanta-directory-project

Processing Atlanta Directories from Emory University to understand the demographics of race and class in Atlanta in the Late 19th and early 20th centuries
1
star