• Stars
    star
    107
  • Rank 323,587 (Top 7 %)
  • Language
    Python
  • Created over 7 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Parallelized GeoPandas with Dask

Dask Geopandas

Parallel GeoPandas with Dask

Status

UPDATE: current efforts are concentrated in a new repo at https://github.com/jsignell/dask-geopandas

This project is not in a functional state and should not be relied upon. No guarantee of support is provided.

This was was originally implemented to demonstrate speedups from parallelism alongside an experimental Cythonized branch of GeoPandas. That cythonized branch has since evolved to the point where the code here no longer works with the latest version.

If you really want to get this to work then you should checkout the geopandas-cython branch of geopandas at about 2017-09-21 and build from source (this may not be fun). But really the solution is probably to wait until everything settles. There is no known timeline for this.

If you would like to see this project in a more stable state then you might consider pitching in with developer time or with financial support from you or your company.

Example

Given a GeoPandas dataframe

import geopandas as gpd
df = gpd.read_file('...')

We can repartition it into a Dask-GeoPandas dataframe either naively by rows. This does not provide a spatial partitioning and so won't gain the efficiencies of spatial reasoning, but will still provide basic multi-core parallelism.

import dask_geopandas as dg
ddf = dg.from_pandas(df, npartitions=4)

We can also repartition by a set of known regions. This suffers an upfront cost of a spatial join, but enables spatial-aware computations in the future to be faster.

regions = gpd.read_file('boundaries.shp')
ddf = dg.repartition(df, regions)

Additionally, if you have a distributed dask.dataframe you can pass columns of x-y points to the set_geometry method. Currently this only supports point data.

import dask.dataframe as dd
import dask_geopandas as dg

df = dd.read_csv('...')

df = df.set_geometry(df[['latitude', 'longitude']])

More Repositories

1

multipledispatch

Multiple dispatch
Python
805
star
2

unification

Python
71
star
3

multipolyfit

A multivariate polynomial regression function in python
Python
55
star
4

pydata-toolz

Tutorial for Funcitonal Python tutorial at PyData-NYC 2013
48
star
5

dasklearn

Dask powered gridsearch and pipeline a la scikit-learn
Python
42
star
6

ShallowWater

Simple Python implementation of Shallow Water Equations
Python
42
star
7

pymarkdown

Evaluate code in markdown
HTML
42
star
8

heft

A static scheduling heuristic
Python
41
star
9

arxiv-matplotlib

Jupyter Notebook
37
star
10

pydata-nyc-2018-tutorial

Jupyter Notebook
37
star
11

fakestockdata

Generate fake stock data for testing or teaching
Python
31
star
12

dask-workshop

Jupyter Notebook
28
star
13

matrix-algebra

An algebra for Matrix Expressions written in Maude
23
star
14

slides

A template for slides using markdown
Python
21
star
15

dask-spark

Dask and Spark interactions
Python
21
star
16

chest

Simple spill-to-disk dictionary
Python
18
star
17

dask-tutorial

Jupyter Notebook
17
star
18

tompkins

A static DAG scheduling algorithm for heterogeneous systems using Mixed Integer Linear Programming . Implementation of "Optimization Techniques for Task Allocation and Scheduling in Distributed Multi-Agent Operations."
Python
16
star
19

blaze-tutorial

12
star
20

blog

Matthew Rocklin's technical blog
HTML
11
star
21

dask-marathon

Deploy Dask on Marathon
Python
10
star
22

itertoolz

More tools for iterators
Python
9
star
23

tutorials

A collection of executable tutorials from SciPy and PyData conferences
Python
8
star
24

dask-gpu-benchmarks

Jupyter Notebook
7
star
25

classtoolz

A collection of mixin classes
Python
7
star
26

cv

Curriculum Vita / Resume
TeX
7
star
27

mrocklin.github.io

Professional webpage
HTML
6
star
28

functoolz

More tools for functions
Python
6
star
29

ape

Research project on scheduling array primitives
Python
6
star
30

blaze-scipy-2014

Slides for scipy 2014 conference
6
star
31

computations

Python
5
star
32

intrograph

Organize computations through introspection of function argument names
Python
5
star
33

ctypes-example

A very simple example calling C code from within Python
Python
5
star
34

dask-mesos

Python
4
star
35

symbolic-array

Symbolic Numpy Arrays
Python
4
star
36

cmsc15200

A course webpage for CMSC15200 Summer 2012
C
3
star
37

shmarray

Python
3
star
38

dist

Python
3
star
39

thesis

TeX
3
star
40

cise-sympy-stats

Scripts to generate figures and equations in CiSE article "Symbolic Statistics with SymPy"
Python
3
star
41

skimage-dask-blog

Blogpost describing scikit-image and dask.array interaction
2
star
42

dask-demo

Live Demonstrations of Using Dask on Kubernetes
Jupyter Notebook
2
star
43

immigrants-are-awesome

Static site with information about American immigrants
HTML
2
star
44

seg-2019

Notebooks for a talk at SEG
Jupyter Notebook
2
star
45

dask-crossval

Python
2
star
46

unify

Unification algorithm for list-based trees
Python
2
star
47

termpy

Python
2
star
48

gha

GitHub Analysis
Python
2
star
49

dask-webinar-2018-11

Jupyter Notebook
2
star
50

opcounts

FLOPs for sequential BLAS and LAPACK libraries
Python
2
star
51

pangeo-binder-test

Jupyter Notebook
1
star
52

tprofile

Python
1
star
53

conda-forge-dependencies

Jupyter Notebook
1
star
54

cmsc10200

Class Teaching Materials for CMSC10200
Python
1
star
55

demo-binder

Jupyter Notebook
1
star
56

hpfrisbee

A website for a community Ultimate Frisbee game.
1
star
57

alettertofaye

A letter about basic computer security
1
star
58

test-project

1
star
59

nyc-taxi

Fooling around with NYC Taxi data
Jupyter Notebook
1
star
60

raw-host

HTML
1
star
61

zdict

Useful Mutable Mappings
1
star
62

dask-webpage

temporary hosting while screwing around with a basic webpage
HTML
1
star
63

mailFeed

Python code to access incoming mail from a gmail account
Python
1
star
64

helm-charts

1
star
65

py32test

A demonstration using Cython to backport a Python 3 codebase
Python
1
star
66

cva-example

Python
1
star