• Stars
    star
    105
  • Rank 328,196 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created over 10 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Cython implementation of jenks breaks

Fast python library for jenks natural breaks

The history and intent of the Jenks natural breaks algorithm is well covered by

There are even a few python implementations:

However, it has been noted that the python implementations are tediously slow. There are two obvious reasons for this...

  1. All the data is stored in python lists rather than optimized data structures. The fact that the variables are named "matrices" is perhaps some sort of practical joke given how bad lists are for matrix/array like data structures. Numpy arrays are your friend and I can't imagine doing any numeric computation in python without them.

  2. There is a lot of looping. Like exponential-time looping. The algorithm makes this somewhat inevitable. Python sucks at iterating over simple math, exploding the runtime very quickly. Cython, through it's variable typing, allows us to write the algorithm in python-like syntax, compile it to a shared library that can be imported as a python module and run at near-C speeds.

So I set forth to make good use of my son's afternoon nap time and port the existing python implementation (which is, in turn based on the wonderfully documented javascript implementation) to cython with numpy arrays.

Here's the benchmark against the jenks2.py implementation:

In [1]: from jenks2 import jenks

In [2]: %timeit jenks(data, 5)
1 loops, best of 3: 8.16 s per loop

In [3]: from jenks import jenks

In [4]: %timeit jenks(data, 5)
10 loops, best of 3: 69.2 ms per loop

Yep that's 118X faster for just a little bit of static typing and using arrays instead of lists. It even makes the logic easier to read (for those of us who work with matrices/arrays often).

The only cost is that you need Cython and a C compiler to get it working.

sudo apt-get install build-essential cython python-numpy
pip install -e "git+https://github.com/perrygeo/jenks.git#egg=jenks"

And then test it

In [1]: import json

In [2]: from jenks import jenks

In [3]: # data should be 1-dimensional array, python list or iterable 

In [4]: data = json.load(open('test.json')) 

In [5]: print jenks(data, 5)
[0.002810962, 2.0935481, 4.2054954, 6.1781483, 8.0917587, 9.997983]

More Repositories

1

simanneal

Python module for Simulated Annealing optimization
Python
628
star
2

python-rasterstats

Summary statistics of geospatial raster datasets based on vector geometries.
Python
522
star
3

pyimpute

Spatial classification and regression using Scikit-learn and Rasterio
Python
120
star
4

python-mbtiles

Python tools for working with mbtiles databases
Python
107
star
5

leaflet-simple-csv

Put points on a map. CSV-driven, clustered, mobile-ready, filterable.
JavaScript
102
star
6

docker-gdal-base

A base docker image for geospatial applications
Dockerfile
58
star
7

geojson-precision

Adjust precision of GeoJSON coordinates
Python
56
star
8

pairing

Encode pairs of integers as single integer values using the Cantor pairing algorithm
Python
38
star
9

pytsp

Python interface to external TSP solvers
Python
31
star
10

bbox-cheatsheet

Reference for comparing software implementations of geospatial bounding boxes
25
star
11

gdal_utils

Random GDAL and OGR scripts to do useful stuff
Python
24
star
12

lambda-rasterio

Building Rasterio apps on AWS Lambda
Python
23
star
13

optimal_tour

Find the shortest tour visiting all GeoJSON points using concorde and mapbox APIs
Python
21
star
14

pi_sensor_realtime

Raspberry Pi, analog sensors, websockets and streaming real time plots
HTML
18
star
15

mower

mower - For controlling GRASS GIS with Python
Python
17
star
16

websocket-geojson-leaflet

Use WebSockets to stream GeoJSON features to a Leaflet map.
JavaScript
16
star
17

spatial-search-showdown

JavaScript
16
star
18

docker-postgres

PostgreSQL and PostGIS, dockerized
Shell
15
star
19

krige

Kriging for Geospatial Interpolation
Rust
10
star
20

smos

Tools for working with Soil Moisture and Ocean Salinity (SMOS) satellite data
Python
8
star
21

vagrant-webmaps

Deploy the ultimate web mapping server with a single command.
HTML
7
star
22

graph-kickr

Visualize Wahoo Kickr workout data
Python
7
star
23

raspberry_pi

Setting up a headless Raspberry Pi with automated code deployments
Python
6
star
24

batch-copy

Tokio actor to batch binary copies into PostgreSQL
Rust
3
star
25

ncvrt

Use VRTs to deal with some quirks of NetCDF and GDAL interaction
Python
3
star
26

projection-finder

Find EPSG Coordinate Reference Systems that match your bounds and criteria
Python
3
star
27

climate_explorer

CSS
3
star
28

pgconman

Manage PostgreSQL connection environment variables
Python
3
star
29

daylight

Visualize sunrise and sunset times
Clojure
3
star
30

climatedata

local point summaries and visualizations of global climate models
Python
3
star
31

notebooks

Personal dev logs as Jupyter notebooks
Jupyter Notebook
2
star
32

geojson-to-gljs

Generate Mapbox GL JS maps from GeoJSON features at the command line
Python
2
star
33

ctr-mtb

Colorado Trail Race Map, MTB
HTML
2
star
34

ghtix

Tools for working with github issue tracker
Python
2
star
35

ergplayer

Little GUI app to "play" .erg files as you ride.
Python
2
star
36

example-mapserver-rs

A proof-of-concept HTTP server and bindings for UMN Mapserver, implemented in Rust
Rust
2
star
37

rio-combine

Find unique combinations of values for two rasters/arrays
Python
2
star
38

fio-stats

Summary statistics for GeoJSON feature properties
Python
2
star
39

csv2sqlite

Does what it says; converts csvs to sqlite tables
Python
2
star
40

geodesicxy

Extract distances and properties over GeoJSON points
Python
1
star
41

archive

old projects for purely historical purposes
JavaScript
1
star
42

mapbox-directions-ui

A MapboxGLJS and Elm interface to mapbox geocoding, directions and trip optimization APIs
Elm
1
star
43

pylas

Automatically exported from code.google.com/p/pylas
Python
1
star
44

rusty-python

Demo: add a little Rust to your Python projects.
Python
1
star
45

wikipedia-geo

Extract and filter geographic data from wikipedia
Python
1
star
46

openpayments

Geography of Health Care Industry Payments, http://perrygeo.github.io/openpayments
JavaScript
1
star
47

geofu

Geofu
Python
1
star
48

iterpipe

Processing pipelines for Python iterables
Python
1
star
49

dockermon

CLI to simplify local monitoring of a docker container's resource usage
Rust
1
star