Agile Data the Book
You can buy the book here. You can read the book on O'Reilly OFPS now. Work the chapter code examples as you go. Don't forget to initialize your python environment. Try linux (apt-get, yum) or OS X (brew, port) packages if any of the requirements don't install in your virtualenv.
Agile Data Code Examples
Setup your Python Virtual Environment
# From project root
# Setup python virtualenv
virtualenv -p `which python2.7` venv --distribute
source venv/bin/activate
pip install -r requirements.txt
Download your Gmail Inbox!
# From ch3
# Download your gmail inbox
cd gmail
./gmail.py -m automatic -u [email protected] -p 'my_password_' -s ./email.avro.schema -f '[Gmail]/All Mail' -o /tmp/test_mbox 2>&1 &
Chapter 2: Data
An example spreadsheet is available at ch02/Email Analysis.xlsb. Example Pig code is available at ch02/probability.pig.
Chapter 3: Agile Tools
Full tutorial in Chapter 3 README.
Highlight:
Download your Gmail Inbox!
# From ch3
# Download your gmail inbox
cd gmail
./gmail.py -m automatic -u [email protected] -p 'my_password_' -s ./email.avro.schema -f '[Gmail]/All Mail' -o /tmp/test_mbox 2>&1 &
Chapter 4: To the Cloud!