• This repository has been archived on 09/Nov/2019
  • Stars
    star
    135
  • Rank 269,297 (Top 6 %)
  • Language
    Jupyter Notebook
  • Created over 6 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Parallel Data Analysis with Dask

Materials for the Dask tutorial at PyCon 2018.

First Time Setup

If you don't have git installed, you can download a ZIP copy of the repository using the green button above. Note that the file will be called dask-tutorial-pycon-2018-master, instead of dask-tutorial-pycon-2018. Adjust the commands below accordingly.

Install Miniconda or ensure you have Python 3.6 installed on your system.

# Update conda
conda update conda

# Clone the repository. Or download the ZIP and add `-master` to the name.
git clone https://github.com/TomAugspurger/dask-tutorial-pycon-2018

# Enter the repository
cd dask-tutorial-pycon-2018

# Create the environment
conda env create

# Activate the environment
conda activate dask-pycon

# Download data
python prep_data.py

# Start jupyterlab
jupyter lab

If you aren't using conda

# Clone the repository. Or download the ZIP and add `-master` to the name.
git clone https://github.com/TomAugspurger/dask-tutorial-pycon-2018

# Enter the repository
cd dsak-tutorial-pycon-2018

# Create a virtualenv
python3 -m venv .env

# Activate the env
# See https://docs.python.org/3/library/venv.html#creating-virtual-environments
# For bash it's
source .env/bin/activate

# Install the dependencies
python -m pip install -r requirements.txt

# Download data
python prep_data.py

# Start jupyterlab
jupyter lab

Connect to the Cluster

We have a pangeo deployment running that'll provide everyone with their own cluster to try out Dask on some larger problems. You can log into the cluster by going to:

More Repositories

1

effective-pandas

Source code for my collection of articles on using pandas.
Jupyter Notebook
1,477
star
2

PyDataSeattle

For the pandas tutorial at PyData Seattle: https://www.youtube.com/watch?v=otCriSKVV_8
Jupyter Notebook
116
star
3

pandas-head-to-tail

Jupyter Notebook
88
star
4

pydata-nyc-ph2t

PyData NYC 2017: Pandas Head to Tail
Jupyter Notebook
57
star
5

pydata-chi-h2t

Materials for the pandas tutorial at PyData Chicago 2016
Jupyter Notebook
54
star
6

skmca

A scikit-learn compatible implementation of MCA
Python
31
star
7

mtg

Materials for my talk at PyData Chicago 2016
Jupyter Notebook
20
star
8

DSADD

A python package for defensive data analysis.
Python
17
star
9

postlearn

Common post-estimation tasks for scikit-learn
Python
17
star
10

rasterpandas

Jupyter Notebook
16
star
11

TomAugspurger.github.io

Source code for my site
CSS
16
star
12

Greene_Econometrics

Python
15
star
13

dask-tutorial-odsc-2018

Jupyter Notebook
15
star
14

esip-summer-2021-geospatial-ml

Jupyter Notebook
12
star
15

science-thursday

Jupyter Notebook
10
star
16

pc-cng-outreach-2022

Jupyter Notebook
10
star
17

noaa-nwm

Jupyter Notebook
10
star
18

acon-2020-pandas

What's new in pandas (AnacondaCON 2020)
Jupyter Notebook
9
star
19

scalable-geospatial-with-dask

Jupyter Notebook
9
star
20

pycps

Download CPS files
Python
9
star
21

pandas-best-practices

Jupyter Notebook
8
star
22

sktransformers

Python
7
star
23

planetary-computer-deep-dives

Jupyter Notebook
7
star
24

scalable-ml

Jupyter Notebook
6
star
25

scalable-ml-fec

Jupyter Notebook
6
star
26

xcog

Jupyter Notebook
5
star
27

dask-demo

Jupyter Notebook
5
star
28

StatLearning

R
5
star
29

pydata-nyc-2019-scalable-ml

Makefile
5
star
30

dota

Python
4
star
31

landcover-demo-gtc-2021

Jupyter Notebook
4
star
32

noaa-edmw-2022

Jupyter Notebook
4
star
33

PracticalPandas

3
star
34

jupyterhub-mlflow-auth

Tornado proxy for adding JupyterHub auth to MLFlow
Python
3
star
35

cogrib

Python
3
star
36

cmip6

2
star
37

geoint-2021

Jupyter Notebook
2
star
38

pandas-tutorial-dask

Jupyter Notebook
2
star
39

dask-prophet

Jupyter Notebook
2
star
40

stac-dask-discussion

Jupyter Notebook
2
star
41

dnwr-zlb

Source code for my second-year paper
C
2
star
42

scalable-sustainability-pydata-global

Jupyter Notebook
2
star
43

prefect-demo

Python
2
star
44

pandorable-pandas

Jupyter Notebook
2
star
45

ai4e-summit-2022

Jupyter Notebook
2
star
46

pc-data-api

Jupyter Notebook
2
star
47

google-open-buildings-example

Python
2
star
48

msgspec-stac

Python
2
star
49

pc-ams

Jupyter Notebook
2
star
50

ia-rug

Jupyter Notebook
1
star
51

titiler-binder

1
star
52

deltares

Python
1
star
53

kaggle-allstate

1
star
54

titanic

Python
1
star
55

circleci-test

Python
1
star
56

gbif

Shell
1
star
57

daymet-recipe

Python
1
star
58

pc-clivar-reanalysis

Jupyter Notebook
1
star
59

planetarycomputer-examples

1
star
60

pandas-manylinux

pandas manylinux
Shell
1
star
61

pandas-actions

Python
1
star
62

allisons-blog

Allison's Blog
JavaScript
1
star
63

pc-streamlit-example

Streamlit on the Planetary Computer
Python
1
star
64

idp-results

Parse results from https://results.thecaucuses.org/
Jupyter Notebook
1
star
65

planet-nicfi

Python
1
star
66

pc-binder-r

Dockerfile
1
star
67

jupyterhub-opencensus-monitoring

Python
1
star
68

example-pipeline

Python
1
star
69

pangeo-foss4g-2021

Makefile
1
star
70

pc-binder-python

Dockerfile
1
star
71

pc-binder-gpu-pytorch

Dockerfile
1
star