• Stars
    star
    169
  • Rank 219,960 (Top 5 %)
  • Language
    JavaScript
  • License
    ISC License
  • Created about 8 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[DEPRECATED] Data pipeline for machine learning with OpenStreetMap

skynet-data

A pipeline to simplify building a set of training data for aerial-imagery- and OpenStreetMap- based machine learning. The idea is to use OSM QA Tiles to generate "ground truth" images where each color represents some category derived from OSM features. Being map tiles, it's then pretty easy to match these up with the desired input imagery.

This repository is no longer under active development. We recommend using Label Maker to prepare data instead. That repo contains utility scripts which can be used to replicate the workflow needed to prepare data for skynet-train.

Quick Start

Pre-built docker image

The easiest way to use this is via the developmentseed/skynet-data docker image:

First, create a docker.env file with the contents including your MapboxAccessToken:

MapboxAccessToken=YOUR_TOKEN

Then run:

docker run -v /path/to/output/dir:/workdir/data --env-file docker.env developmentseed/skynet-data download-osm-tiles

docker run -v /path/to/output/dir:/workdir/data --env-file docker.env developmentseed/skynet-data

The first line downloads the OSM QA tiles to /path/to/output/dir/osm/planet.mbtiles. If you've already got that file on your machine, you can skip this.

The second builds a training set using the default options (Roads features from OSM QA tiles, images from Mapbox Satellite). To change the data sources, training set size and other options, add the relevant environment variables to the docker.env file , one per line.

Local docker image

You can also create the docker images yourself using docker-compose. Similarly to the quick-start above, make sure your docker.env file has your MapboxAccessToken and any other environment variables you want to set. Then run:

docker-compose build

to build your local docker image, and

docker-compose run data download-osm-tiles
docker-compose run data 

to download the OSM QA tiles, and run the data collection as specified in docker.env. By default the collected data will be saved into the data directory, but you can overide it by using -v /path/to/output/dir:/workdir/data after docker-compose run data similar to the pre-built instructions above.

Variables

The make commands below work off the following variables (with defaults as listed):

# location of image files
IMAGE_TILES ?= "tilejson+https://a.tiles.mapbox.com/v4/mapbox.satellite.json?access_token=$(MapboxAccessToken)"
# which osm-qa tiles extract to download; e.g. united_states_of_america
QA_TILES=planet
# location of data tiles to use for rendering labels; defaults to osm-qa tiles extract specified by QA_TILES
DATA_TILES ?= mbtiles://./data/osm/$(QA_TILES).mbtiles
# filter to this bbox
BBOX ?= '-180,-85,180,85'
# number of images (tiles) to sample
TRAIN_SIZE=1000
# define label classes output
CLASSES=classes/roads-buildings.json
# Filter out tiles whose ratio of labeled to unlabeled pixels is less than or
# equal to the given ratio.  Useful for excluding images that are all background, for example.
LABEL_RATIO ?= 0
# set this to a zoom higher than the data tiles' max zoom to get overzoomed label images
ZOOM_LEVEL ?= 17

You can override any of these parameters in your docker.env and make a full training set using the instructions above.

Details

Install

  • Install NodeJS v4.6.2
  • Install tippecanoe
  • Install GNU Parallel
  • Install shuf
  • Clone this repo and run npm install. (Note that this includes a node-mapnik install, which sometimes has trouble building in bleeding-edge versions of node.)

Sample available tiles

make data/sample.txt

This just does a simple random sample of the available tiles in the given mbtiles set, using tippecanoe-enumerate. For more intelligent filtering, consider using tippecanoe-decode to examine (geojson) contents of each tile.

Labels

Build label images: make data/labels/color or make data/labels/grayscale. Uses the CLASSES json file to set up the rendering of OSM data to images that represent per-pixel category labels. See classes/water-roads-buildings.json for an example. Rendering is with mapnik; see the docs for more on filter syntax.

Images

Download aerial images from a tiled source: make data/images

Heads up: the default, Mapbox Satellite, will need you to set the MapboxAccessToken variable, and will cost you map views!

Preview

Preview the generated data by opening up preview.html?accessToken=<mapbox access token>&prefix=/path/to/data in a local web server.

More Repositories

1

titiler

Build your own Raster dynamic map tile services
Python
707
star
2

landsat-util

A utility to search, download and process Landsat 8 satellite imagery
Python
688
star
3

jekyll-hook

No Longer Maintained | A server that listens for GitHub webhook posts and renders a Jekyll site
JavaScript
508
star
4

bones

A client/server web application framework built on Express and Backbone
JavaScript
483
star
5

label-maker

Data Preparation for Satellite Machine Learning
Python
445
star
6

lonboard

A Python library for fast, interactive geospatial vector data visualization in Jupyter.
Python
414
star
7

geolambda

Create and deploy Geospatial AWS Lambda functions
Dockerfile
285
star
8

geojson-pydantic

Pydantic data models for the GeoJSON spec
Python
209
star
9

skynet-train

Training and test the SegNet neural network on satellite imagery
JavaScript
203
star
10

eoAPI

[Active Development] Earth Observation API (Metadata, Raster and Vector services)
Shell
184
star
11

timvt

PostGIS based Vector Tile server.
PLpgSQL
164
star
12

rio-viz

Visualize Cloud Optimized GeoTIFF in browser
HTML
150
star
13

osm-seed

A collection of Dockerfiles to run a containerized version of OpenStreetMap
Shell
140
star
14

tipg

Simple and Fast Geospatial OGC Features and Tiles API for PostGIS.
PLpgSQL
129
star
15

jekyll-ga

A plugin for loading Google Analytics data into Jekyll
Ruby
128
star
16

fastai-serving

A Docker image for serving fast.ai models, mimicking the API of Tensorflow Serving
Python
119
star
17

dirty-reprojectors

Make quick and dirty projections to use in your web maps instead of Web Mercator
JavaScript
115
star
18

segment-anything-services

Running segment-anything image embedding, prompting, and mask generation as torchserve services
Jupyter Notebook
85
star
19

cogeo-mosaic

Create and use COG mosaic based on mosaicJSON
Python
84
star
20

morecantile

Construct and use OGC TileMatrixSets (TMS)
Python
80
star
21

rubik

DEPRECATED. Please see new home on d.o
PHP
79
star
22

observe

Cross-platform, offline, field mapping tool for OpenStreetMap
JavaScript
66
star
23

landsat-api

[DEPRECATED] An API for Landsat Metadata using Elastic Search
JavaScript
64
star
24

openlayers_themes

63
star
25

openlayers_plus

Additional tools and behaviors for OpenLayers.
JavaScript
62
star
26

rio-stac

Create STAC item from raster datasets
Python
61
star
27

mosaicjson-spec

JSON format for describing SpatioTemporal Cloud Optimized Geotiff mosaic.
61
star
28

vt-geojson

Extract GeoJSON from Mapbox vector tiles.
JavaScript
60
star
29

tao

DEPRECATED. Please see new home on d.o
PHP
56
star
30

slingshotSMS

A tiny RESTful modem server
Tcl
55
star
31

pearl-backend

PEARL (Planetary Computer Land Cover Mapping) Platform API and Infrastructure
JavaScript
55
star
32

labs-gpt-stac

Experimental: connect ChatGPT to a STAC API backend
HTML
51
star
33

macrocosm

Macrocosm is a partial port of the Open Street Map Rails API in NodeJS
JavaScript
49
star
34

collecticons-lib

Collecticons svg icons
JavaScript
48
star
35

FeatureServer

A simple feature server for Drupal.
PHP
46
star
36

tifeatures

Simple and Fast Geospatial Feature Server for PostGIS.
PLpgSQL
46
star
37

spacenet-data

Scripts for setting up the SpaceNet dataset for training a SegNet model
Shell
45
star
38

gdal2mb

a version of gdal2tiles with MapBox support
Python
41
star
39

ml-hv-grid-pub

Code for high-voltage grid mapping project with the World Bank; early 2018
Python
41
star
40

geodex

Find all geospatial tile indices contained in an arbitrary boundary at an arbitrary zoom
Python
36
star
41

mapbox

THIS REPOSITORY HAS MOVED. https://github.com/mapbox/osm-bright/
36
star
42

geokit

Python
35
star
43

atrium_features

Core atrium features.
PHP
34
star
44

chip-n-scale-queue-arranger

Chip 'n scale: Queue Arranger helps you run machine learning models over satellite imagery at scale
Python
33
star
45

caffe-fcn

Run Long and Shelhamer's FCN image segmentation network using Caffe
Jupyter Notebook
33
star
46

planet-stream

Stream the planet!
JavaScript
31
star
47

sat-api-pg

A Postgres backed STAC API.
PLpgSQL
30
star
48

ml-enabler

ML Enabler - machine learning interaction tools in the browser
JavaScript
29
star
49

singular

Dead simple Drupal theme with quick rebranding in mind
PHP
28
star
50

titiler-digitaltwin

A demo titiler for Sentinel 2 Digital Twin dataset
HTML
27
star
51

tilebench

Inspect Rasterio/GDAL HEAD/GET/LIST Requests
Python
27
star
52

scoreboard

Encouraging OpenStreetMap mappers with badges, graphs & stats! βœ¨πŸ•Ή
JavaScript
27
star
53

thatchertiler

ThatcherTiler: expect some features to be dropped.
Python
26
star
54

vt-grid

Build up a pyramid of mapbox vector tiles by aggregating quantitative data into grids at lower zooms.
JavaScript
26
star
55

twitter-server

A simple node.js program that pulls data from Twitter's 1.1 API and pushes the response to a cloud service.
JavaScript
26
star
56

tensorflow-eo-training

Deep learning with TensorFlow and earth observation data.
Jupyter Notebook
25
star
57

AtriumDesign

Design Components for Open Atrium
25
star
58

mapbox-gl-layers

Layer toggle control for Mapbox GL JS
JavaScript
24
star
59

remote-workstation

A Dockerised work environment hosted on AWS Fargate which can be SSH'd into β˜οΈπŸŒŽπŸ“¦
Python
24
star
60

sentinel-util

A CLI for downloading, processing, and making a mosaic from Sentinel-1, -2 and -3 data
Python
24
star
61

osm-teams

Teams for OpenStreetMap!
JavaScript
23
star
62

aiopmtiles

[WIP] PMTiles Async reader in python
Python
23
star
63

mrworldwide

Query OSM planet stats with AWS Athena
JavaScript
22
star
64

sat-ml-training

Jupyter Notebook
22
star
65

tf-lambda-proxy-apigw

Terraform Module for API Gateway Proxy + AWS Lambda
HCL
22
star
66

stac-react

React components and hooks for building STAC-API front-ends
TypeScript
21
star
67

ginkgo

Default theme for Open Atrium.
PHP
21
star
68

titiler-xarray

[DEMO] TiTiler extension for xarray
Python
21
star
69

python-seed

Project template for Python projects
Python
21
star
70

gl-director

GL Director, an interface to easily generate terrain flyovers for your app using Mapbox GL JS
JavaScript
21
star
71

supercluster-rs

Rust implementation of Supercluster for fast hierarchical point clustering
Rust
21
star
72

JosmMagicWand

JOSM Plugin allows select areas to label using a range of colors,
Java
20
star
73

project-seed

A basic starting point for projects.
JavaScript
20
star
74

titiler-image

TiTiler extension to work with non-geo images
Python
18
star
75

pearl-frontend

PEARL (Planetary Computer Land Cover Mapping) Frontend
JavaScript
18
star
76

satTS

ML pipeline to classify crop types with multi-spectral and multi-temporal EO data
Jupyter Notebook
18
star
77

bones-auth

Base authentication model, view and middleware
JavaScript
18
star
78

pydantic-ssm-settings

Replace Pydantic's builtin Secret Support with a configuration provider that loads parameters from AWS Systems Manager Parameter Store.
Python
18
star
79

openlayers_slim

A configured and compressed version of OpenLayers, suitable for most common applications.
JavaScript
17
star
80

litecal

A lite clean calendar UI.
PHP
17
star
81

seed

DS templates for everything
JavaScript
17
star
82

sentinel-2-cog

Convert Sentinel-2 JPEG 2000 to COG with AWS Lambda
Python
17
star
83

kenya-bank

Kenya Educational Mapping Site
JavaScript
17
star
84

co2ordinate

Find the most efficient place to gather GHG-wise
TypeScript
16
star
85

skynet-scrub

GUI for editing machine learning outputs
JavaScript
16
star
86

geofield

Defines a six column field for storing lat/lon data and bounding box data.
JavaScript
15
star
87

couch-sqlite

JavaScript
15
star
88

pointcloud-to-cog

Convert Point cloud data to Cloud Optimized GeoTIFF using AWS Lambda
Python
15
star
89

hapi-paginate

A simple pagination for hapijs responses
JavaScript
14
star
90

connectivity-atlas

An interactive map of global infrastructure.
HTML
14
star
91

fastapi_authorization_gateway

Python
14
star
92

jake

Minimal, lightweight theme for Managing News.
PHP
13
star
93

skynet-scrub-server

Backing store for developmentseed/skynet-scrub
JavaScript
13
star
94

titiler-mvt

Create MVT dynamically from COG
Python
13
star
95

simpleshare

Simple sharing of URLs using various methods. Supports Twitter, Facebook and email.
PHP
13
star
96

mapbox-gl-topojson

JavaScript
13
star
97

jekyll-filename

Ruby
13
star
98

our-forests-tomorrow

Visualizing European forests future. Based on the EU-Trees4F study by Mauri et al.
TypeScript
13
star
99

pearl-ml-pipeline

Python
12
star
100

worldpop

Get the population of regions that you draw!
JavaScript
12
star