Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

Perl

C#

Scala

Groovy

C

Shell

Assembly

Zig

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

F#

Elm

Java

Racket

C#

C

Erlang

TypeScript

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇮🇸 Iceland

🇳🇫 Norfolk Island

🇵🇹 Portugal

🇲🇺 Mauritius

🇪🇸 Spain

🇷🇴 Romania

🇵🇳 Pitcairn Islands

🇮🇩 Indonesia

All Countries Compare Countries

mrocklin/dask-geopandas

Stars
107
Rank 323,587 (Top 7 %)
Language
Python
Created over 7 years ago
Updated 10 months ago

mrocklin/dask-geopandas

mrocklin

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Parallelized GeoPandas with Dask

Dask Geopandas

Parallel GeoPandas with Dask

Status

UPDATE: current efforts are concentrated in a new repo at https://github.com/jsignell/dask-geopandas

This project is not in a functional state and should not be relied upon. No guarantee of support is provided.

This was was originally implemented to demonstrate speedups from parallelism alongside an experimental Cythonized branch of GeoPandas. That cythonized branch has since evolved to the point where the code here no longer works with the latest version.

If you really want to get this to work then you should checkout the geopandas-cython branch of geopandas at about 2017-09-21 and build from source (this may not be fun). But really the solution is probably to wait until everything settles. There is no known timeline for this.

If you would like to see this project in a more stable state then you might consider pitching in with developer time or with financial support from you or your company.

Example

Given a GeoPandas dataframe

import geopandas as gpd
df = gpd.read_file('...')

We can repartition it into a Dask-GeoPandas dataframe either naively by rows. This does not provide a spatial partitioning and so won't gain the efficiencies of spatial reasoning, but will still provide basic multi-core parallelism.

import dask_geopandas as dg
ddf = dg.from_pandas(df, npartitions=4)

We can also repartition by a set of known regions. This suffers an upfront cost of a spatial join, but enables spatial-aware computations in the future to be faster.

regions = gpd.read_file('boundaries.shp')
ddf = dg.repartition(df, regions)

Additionally, if you have a distributed dask.dataframe you can pass columns of x-y points to the set_geometry method. Currently this only supports point data.

import dask.dataframe as dd
import dask_geopandas as dg

df = dd.read_csv('...')

df = df.set_geometry(df[['latitude', 'longitude']])

multipledispatch

Multiple dispatch

unification

multipolyfit

A multivariate polynomial regression function in python

pydata-toolz

Tutorial for Funcitonal Python tutorial at PyData-NYC 2013

dasklearn

Dask powered gridsearch and pipeline a la scikit-learn

ShallowWater

Simple Python implementation of Shallow Water Equations

pymarkdown

Evaluate code in markdown

heft

A static scheduling heuristic

arxiv-matplotlib

Jupyter Notebook

pydata-nyc-2018-tutorial

Jupyter Notebook

fakestockdata

Generate fake stock data for testing or teaching

dask-workshop

Jupyter Notebook

matrix-algebra

An algebra for Matrix Expressions written in Maude

slides

A template for slides using markdown

dask-spark

Dask and Spark interactions

chest

Simple spill-to-disk dictionary

dask-tutorial

Jupyter Notebook

tompkins

A static DAG scheduling algorithm for heterogeneous systems using Mixed Integer Linear Programming . Implementation of "Optimization Techniques for Task Allocation and Scheduling in Distributed Multi-Agent Operations."

blaze-tutorial

blog

Matthew Rocklin's technical blog

dask-marathon

Deploy Dask on Marathon

itertoolz

More tools for iterators

tutorials

A collection of executable tutorials from SciPy and PyData conferences

dask-gpu-benchmarks

Jupyter Notebook

classtoolz

A collection of mixin classes

cv

Curriculum Vita / Resume

mrocklin.github.io

Professional webpage

functoolz

More tools for functions

ape

Research project on scheduling array primitives

blaze-scipy-2014

Slides for scipy 2014 conference

computations

intrograph

Organize computations through introspection of function argument names

ctypes-example

A very simple example calling C code from within Python

dask-mesos

symbolic-array

Symbolic Numpy Arrays

cmsc15200

A course webpage for CMSC15200 Summer 2012

shmarray

dist

thesis

cise-sympy-stats

Scripts to generate figures and equations in CiSE article "Symbolic Statistics with SymPy"

skimage-dask-blog

Blogpost describing scikit-image and dask.array interaction

dask-demo

Live Demonstrations of Using Dask on Kubernetes

Jupyter Notebook

immigrants-are-awesome

Static site with information about American immigrants

seg-2019

Notebooks for a talk at SEG

Jupyter Notebook

dask-crossval

unify

Unification algorithm for list-based trees

termpy

gha

GitHub Analysis

dask-webinar-2018-11

Jupyter Notebook

opcounts

FLOPs for sequential BLAS and LAPACK libraries

pangeo-binder-test

Jupyter Notebook

tprofile

conda-forge-dependencies

Jupyter Notebook

cmsc10200

Class Teaching Materials for CMSC10200

demo-binder

Jupyter Notebook

hpfrisbee

A website for a community Ultimate Frisbee game.

alettertofaye

A letter about basic computer security

test-project

nyc-taxi

Fooling around with NYC Taxi data

Jupyter Notebook

raw-host

zdict

Useful Mutable Mappings

dask-webpage

temporary hosting while screwing around with a basic webpage

mailFeed

Python code to access incoming mail from a gmail account

helm-charts

py32test

A demonstration using Cython to backport a Python 3 codebase

cva-example