• Stars
    star
    120
  • Rank 294,220 (Top 6 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created almost 11 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Spatial classification and regression using Scikit-learn and Rasterio

travis

Python module for geospatial prediction using scikit-learn and rasterio

pyimpute provides high-level python functions for bridging the gap between spatial data formats and machine learning software to facilitate supervised classification and regression on geospatial data. This allows you to create landscape-scale predictions based on sparse observations.

The observations, known as the training data, consists of:

  • response variables: what we are trying to predict
  • explanatory variables: variables which explain the spatial patterns of responses

The target data consists of explanatory variables represented by raster datasets. There are no response variables available for the target data; the goal is to predict a raster surface of responses. The responses can either be discrete (classification) or continuous (regression).

example

Pyimpute Functions

  • load_training_vector: Load training data where responses are vector data (explanatory variables are always raster)
  • load_training_raster: Load training data where responses are raster data
  • stratified_sample_raster: Random sampling of raster cells based on discrete classes
  • evaluate_clf: Performs cross-validation and prints metrics to help tune your scikit-learn classifiers.
  • load_targets: Loads target raster data into data structures required by scikit-learn
  • impute: takes target data and your scikit-learn classifier and makes predictions, outputing GeoTiffs

These functions don't really provide any ground-breaking new functionality, they merely saves lots of tedious data wrangling that would otherwise bog your analysis down in low-level details. In other words, pyimpute provides a high-level python workflow for spatial prediction, making it easier to:

  • explore new variables more easily
  • frequently update predictions with new information (e.g. new Landsat imagery as it becomes available)
  • bring the technique to other disciplines and geographies

Basic example

Here's what a pyimpute workflow might look like. In this example, we have two explanatory variables as rasters (temperature and precipitation) and a geojson with point observations of habitat suitability for a plant species. Our goal is to predict habitat suitability across the entire region based only on the explanatory variables.

from pyimpute import load_training_vector, load_targets, impute, evaluate_clf
from sklearn.ensemble import RandomForestClassifier

Load some training data

explanatory_rasters = ['temperature.tif', 'precipitation.tif']
response_data = 'point_observations.geojson'

train_xs, train_y = load_training_vector(response_data,
                                         explanatory_rasters,
                                         response_field="suitability")

Train a scikit-learn classifier

clf = RandomForestClassifier(n_estimators=10, n_jobs=1)
clf.fit(train_xs, train_y)

Evalute the classifier using several validation metrics, manually inspecting the output

evaluate_clf(clf, train_xs, train_y)

Load target raster data

target_xs, raster_info = load_targets(explanatory_rasters)

Make predictions, outputing geotiffs

impute(target_xs, clf, raster_info, outdir='/tmp',
        linechunk=400, class_prob=True, certainty=True)

assert os.path.exists("/tmp/responses.tif")
assert os.path.exists("/tmp/certainty.tif")
assert os.path.exists("/tmp/probability_0.tif")
assert os.path.exists("/tmp/probability_1.tif")

Installation

Assuming you have libgdal and the scipy system dependencies installed, you can install with pip

pip install pyimpute

Alternatively, install from the source code

git clone https://github.com/perrygeo/pyimpute.git
cd pyimpute
pip install -e .

See the .travis.yml file for a working example on Ubuntu systems.

Other resources

For an overview, watch my presentation at FOSS4G 2014: Spatial-Temporal Prediction of Climate Change Impacts using pyimpute, scikit-learn and GDAL — Matthew Perry

Also, check out the examples and the wiki

More Repositories

1

simanneal

Python module for Simulated Annealing optimization
Python
628
star
2

python-rasterstats

Summary statistics of geospatial raster datasets based on vector geometries.
Python
522
star
3

python-mbtiles

Python tools for working with mbtiles databases
Python
107
star
4

jenks

Cython implementation of jenks breaks
Python
105
star
5

leaflet-simple-csv

Put points on a map. CSV-driven, clustered, mobile-ready, filterable.
JavaScript
102
star
6

docker-gdal-base

A base docker image for geospatial applications
Dockerfile
58
star
7

geojson-precision

Adjust precision of GeoJSON coordinates
Python
56
star
8

pairing

Encode pairs of integers as single integer values using the Cantor pairing algorithm
Python
38
star
9

pytsp

Python interface to external TSP solvers
Python
31
star
10

bbox-cheatsheet

Reference for comparing software implementations of geospatial bounding boxes
25
star
11

gdal_utils

Random GDAL and OGR scripts to do useful stuff
Python
24
star
12

lambda-rasterio

Building Rasterio apps on AWS Lambda
Python
23
star
13

optimal_tour

Find the shortest tour visiting all GeoJSON points using concorde and mapbox APIs
Python
21
star
14

pi_sensor_realtime

Raspberry Pi, analog sensors, websockets and streaming real time plots
HTML
18
star
15

mower

mower - For controlling GRASS GIS with Python
Python
17
star
16

websocket-geojson-leaflet

Use WebSockets to stream GeoJSON features to a Leaflet map.
JavaScript
16
star
17

spatial-search-showdown

JavaScript
16
star
18

docker-postgres

PostgreSQL and PostGIS, dockerized
Shell
15
star
19

krige

Kriging for Geospatial Interpolation
Rust
10
star
20

smos

Tools for working with Soil Moisture and Ocean Salinity (SMOS) satellite data
Python
8
star
21

vagrant-webmaps

Deploy the ultimate web mapping server with a single command.
HTML
7
star
22

graph-kickr

Visualize Wahoo Kickr workout data
Python
7
star
23

raspberry_pi

Setting up a headless Raspberry Pi with automated code deployments
Python
6
star
24

batch-copy

Tokio actor to batch binary copies into PostgreSQL
Rust
3
star
25

ncvrt

Use VRTs to deal with some quirks of NetCDF and GDAL interaction
Python
3
star
26

projection-finder

Find EPSG Coordinate Reference Systems that match your bounds and criteria
Python
3
star
27

climate_explorer

CSS
3
star
28

pgconman

Manage PostgreSQL connection environment variables
Python
3
star
29

daylight

Visualize sunrise and sunset times
Clojure
3
star
30

climatedata

local point summaries and visualizations of global climate models
Python
3
star
31

notebooks

Personal dev logs as Jupyter notebooks
Jupyter Notebook
2
star
32

geojson-to-gljs

Generate Mapbox GL JS maps from GeoJSON features at the command line
Python
2
star
33

ctr-mtb

Colorado Trail Race Map, MTB
HTML
2
star
34

ghtix

Tools for working with github issue tracker
Python
2
star
35

ergplayer

Little GUI app to "play" .erg files as you ride.
Python
2
star
36

example-mapserver-rs

A proof-of-concept HTTP server and bindings for UMN Mapserver, implemented in Rust
Rust
2
star
37

rio-combine

Find unique combinations of values for two rasters/arrays
Python
2
star
38

fio-stats

Summary statistics for GeoJSON feature properties
Python
2
star
39

csv2sqlite

Does what it says; converts csvs to sqlite tables
Python
2
star
40

geodesicxy

Extract distances and properties over GeoJSON points
Python
1
star
41

archive

old projects for purely historical purposes
JavaScript
1
star
42

mapbox-directions-ui

A MapboxGLJS and Elm interface to mapbox geocoding, directions and trip optimization APIs
Elm
1
star
43

pylas

Automatically exported from code.google.com/p/pylas
Python
1
star
44

rusty-python

Demo: add a little Rust to your Python projects.
Python
1
star
45

wikipedia-geo

Extract and filter geographic data from wikipedia
Python
1
star
46

openpayments

Geography of Health Care Industry Payments, http://perrygeo.github.io/openpayments
JavaScript
1
star
47

geofu

Geofu
Python
1
star
48

iterpipe

Processing pipelines for Python iterables
Python
1
star
49

dockermon

CLI to simplify local monitoring of a docker container's resource usage
Rust
1
star