• Stars
    star
    169
  • Rank 224,453 (Top 5 %)
  • Language
    JavaScript
  • License
    ISC License
  • Created over 8 years ago
  • Updated about 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[DEPRECATED] Data pipeline for machine learning with OpenStreetMap

skynet-data

A pipeline to simplify building a set of training data for aerial-imagery- and OpenStreetMap- based machine learning. The idea is to use OSM QA Tiles to generate "ground truth" images where each color represents some category derived from OSM features. Being map tiles, it's then pretty easy to match these up with the desired input imagery.

This repository is no longer under active development. We recommend using Label Maker to prepare data instead. That repo contains utility scripts which can be used to replicate the workflow needed to prepare data for skynet-train.

Quick Start

Pre-built docker image

The easiest way to use this is via the developmentseed/skynet-data docker image:

First, create a docker.env file with the contents including your MapboxAccessToken:

MapboxAccessToken=YOUR_TOKEN

Then run:

docker run -v /path/to/output/dir:/workdir/data --env-file docker.env developmentseed/skynet-data download-osm-tiles

docker run -v /path/to/output/dir:/workdir/data --env-file docker.env developmentseed/skynet-data

The first line downloads the OSM QA tiles to /path/to/output/dir/osm/planet.mbtiles. If you've already got that file on your machine, you can skip this.

The second builds a training set using the default options (Roads features from OSM QA tiles, images from Mapbox Satellite). To change the data sources, training set size and other options, add the relevant environment variables to the docker.env file , one per line.

Local docker image

You can also create the docker images yourself using docker-compose. Similarly to the quick-start above, make sure your docker.env file has your MapboxAccessToken and any other environment variables you want to set. Then run:

docker-compose build

to build your local docker image, and

docker-compose run data download-osm-tiles
docker-compose run data 

to download the OSM QA tiles, and run the data collection as specified in docker.env. By default the collected data will be saved into the data directory, but you can overide it by using -v /path/to/output/dir:/workdir/data after docker-compose run data similar to the pre-built instructions above.

Variables

The make commands below work off the following variables (with defaults as listed):

# location of image files
IMAGE_TILES ?= "tilejson+https://a.tiles.mapbox.com/v4/mapbox.satellite.json?access_token=$(MapboxAccessToken)"
# which osm-qa tiles extract to download; e.g. united_states_of_america
QA_TILES=planet
# location of data tiles to use for rendering labels; defaults to osm-qa tiles extract specified by QA_TILES
DATA_TILES ?= mbtiles://./data/osm/$(QA_TILES).mbtiles
# filter to this bbox
BBOX ?= '-180,-85,180,85'
# number of images (tiles) to sample
TRAIN_SIZE=1000
# define label classes output
CLASSES=classes/roads-buildings.json
# Filter out tiles whose ratio of labeled to unlabeled pixels is less than or
# equal to the given ratio.  Useful for excluding images that are all background, for example.
LABEL_RATIO ?= 0
# set this to a zoom higher than the data tiles' max zoom to get overzoomed label images
ZOOM_LEVEL ?= 17

You can override any of these parameters in your docker.env and make a full training set using the instructions above.

Details

Install

  • Install NodeJS v4.6.2
  • Install tippecanoe
  • Install GNU Parallel
  • Install shuf
  • Clone this repo and run npm install. (Note that this includes a node-mapnik install, which sometimes has trouble building in bleeding-edge versions of node.)

Sample available tiles

make data/sample.txt

This just does a simple random sample of the available tiles in the given mbtiles set, using tippecanoe-enumerate. For more intelligent filtering, consider using tippecanoe-decode to examine (geojson) contents of each tile.

Labels

Build label images: make data/labels/color or make data/labels/grayscale. Uses the CLASSES json file to set up the rendering of OSM data to images that represent per-pixel category labels. See classes/water-roads-buildings.json for an example. Rendering is with mapnik; see the docs for more on filter syntax.

Images

Download aerial images from a tiled source: make data/images

Heads up: the default, Mapbox Satellite, will need you to set the MapboxAccessToken variable, and will cost you map views!

Preview

Preview the generated data by opening up preview.html?accessToken=<mapbox access token>&prefix=/path/to/data in a local web server.

More Repositories

1

titiler

Build your own Raster dynamic map tile services
Python
746
star
2

landsat-util

A utility to search, download and process Landsat 8 satellite imagery
Python
688
star
3

lonboard

A Python library for fast, interactive geospatial vector data visualization in Jupyter.
Python
628
star
4

jekyll-hook

No Longer Maintained | A server that listens for GitHub webhook posts and renders a Jekyll site
JavaScript
508
star
5

bones

A client/server web application framework built on Express and Backbone
JavaScript
482
star
6

label-maker

Data Preparation for Satellite Machine Learning
Python
445
star
7

geolambda

Create and deploy Geospatial AWS Lambda functions
Dockerfile
285
star
8

geojson-pydantic

Pydantic data models for the GeoJSON spec
Python
220
star
9

skynet-train

Training and test the SegNet neural network on satellite imagery
JavaScript
203
star
10

eoAPI

[Active Development] Earth Observation API (Metadata, Raster and Vector services)
Shell
196
star
11

timvt

PostGIS based Vector Tile server.
PLpgSQL
164
star
12

rio-viz

Visualize Cloud Optimized GeoTIFF in browser
HTML
152
star
13

tipg

Simple and Fast Geospatial OGC Features and Tiles API for PostGIS.
PLpgSQL
152
star
14

osm-seed

A collection of Dockerfiles to run a containerized version of OpenStreetMap
Shell
149
star
15

jekyll-ga

A plugin for loading Google Analytics data into Jekyll
Ruby
128
star
16

fastai-serving

A Docker image for serving fast.ai models, mimicking the API of Tensorflow Serving
Python
119
star
17

dirty-reprojectors

Make quick and dirty projections to use in your web maps instead of Web Mercator
JavaScript
115
star
18

obstore

Simple, fast integration with Amazon S3, Google Cloud Storage, Azure Storage, and S3-compliant APIs like Cloudflare R2
Rust
106
star
19

segment-anything-services

Running segment-anything image embedding, prompting, and mask generation as torchserve services
Jupyter Notebook
93
star
20

cogeo-mosaic

Create and use COG mosaic based on mosaicJSON
Python
84
star
21

morecantile

Construct and use OGC TileMatrixSets (TMS)
Python
80
star
22

rubik

DEPRECATED. Please see new home on d.o
PHP
79
star
23

observe

Cross-platform, offline, field mapping tool for OpenStreetMap
JavaScript
66
star
24

landsat-api

[DEPRECATED] An API for Landsat Metadata using Elastic Search
JavaScript
64
star
25

openlayers_themes

63
star
26

openlayers_plus

Additional tools and behaviors for OpenLayers.
JavaScript
62
star
27

rio-stac

Create STAC item from raster datasets
Python
61
star
28

mosaicjson-spec

JSON format for describing SpatioTemporal Cloud Optimized Geotiff mosaic.
61
star
29

vt-geojson

Extract GeoJSON from Mapbox vector tiles.
JavaScript
60
star
30

tao

DEPRECATED. Please see new home on d.o
PHP
56
star
31

slingshotSMS

A tiny RESTful modem server
Tcl
55
star
32

pearl-backend

PEARL (Planetary Computer Land Cover Mapping) Platform API and Infrastructure
JavaScript
55
star
33

labs-gpt-stac

Experimental: connect ChatGPT to a STAC API backend
HTML
52
star
34

macrocosm

Macrocosm is a partial port of the Open Street Map Rails API in NodeJS
JavaScript
49
star
35

collecticons-lib

Collecticons svg icons
JavaScript
48
star
36

tifeatures

Simple and Fast Geospatial Feature Server for PostGIS.
PLpgSQL
48
star
37

FeatureServer

A simple feature server for Drupal.
PHP
46
star
38

spacenet-data

Scripts for setting up the SpaceNet dataset for training a SegNet model
Shell
45
star
39

gdal2mb

a version of gdal2tiles with MapBox support
Python
41
star
40

ml-hv-grid-pub

Code for high-voltage grid mapping project with the World Bank; early 2018
Python
41
star
41

geodex

Find all geospatial tile indices contained in an arbitrary boundary at an arbitrary zoom
Python
36
star
42

mapbox

THIS REPOSITORY HAS MOVED. https://github.com/mapbox/osm-bright/
36
star
43

geokit

Python
35
star
44

atrium_features

Core atrium features.
PHP
34
star
45

chip-n-scale-queue-arranger

Chip 'n scale: Queue Arranger helps you run machine learning models over satellite imagery at scale
Python
33
star
46

caffe-fcn

Run Long and Shelhamer's FCN image segmentation network using Caffe
Jupyter Notebook
33
star
47

planet-stream

Stream the planet!
JavaScript
31
star
48

sat-api-pg

A Postgres backed STAC API.
PLpgSQL
30
star
49

ml-enabler

ML Enabler - machine learning interaction tools in the browser
JavaScript
29
star
50

titiler-xarray

[DEMO] TiTiler extension for xarray
Python
28
star
51

singular

Dead simple Drupal theme with quick rebranding in mind
PHP
28
star
52

aiopmtiles

[WIP] PMTiles Async reader in python
Python
28
star
53

titiler-digitaltwin

A demo titiler for Sentinel 2 Digital Twin dataset
HTML
27
star
54

scoreboard

Encouraging OpenStreetMap mappers with badges, graphs & stats! โœจ๐Ÿ•น
JavaScript
27
star
55

tilebench

Inspect Rasterio/GDAL HEAD/GET/LIST Requests
Python
27
star
56

vt-grid

Build up a pyramid of mapbox vector tiles by aggregating quantitative data into grids at lower zooms.
JavaScript
26
star
57

thatchertiler

ThatcherTiler: expect some features to be dropped.
Python
26
star
58

twitter-server

A simple node.js program that pulls data from Twitter's 1.1 API and pushes the response to a cloud service.
JavaScript
26
star
59

stac-react

React components and hooks for building STAC-API front-ends
TypeScript
25
star
60

tensorflow-eo-training

Deep learning with TensorFlow and earth observation data.
Jupyter Notebook
25
star
61

AtriumDesign

Design Components for Open Atrium
25
star
62

mapbox-gl-layers

Layer toggle control for Mapbox GL JS
JavaScript
24
star
63

remote-workstation

A Dockerised work environment hosted on AWS Fargate which can be SSH'd into โ˜๏ธ๐ŸŒŽ๐Ÿ“ฆ
Python
24
star
64

sentinel-util

A CLI for downloading, processing, and making a mosaic from Sentinel-1, -2 and -3 data
Python
24
star
65

osm-teams

Teams for OpenStreetMap!
JavaScript
23
star
66

mrworldwide

Query OSM planet stats with AWS Athena
JavaScript
22
star
67

sat-ml-training

Jupyter Notebook
22
star
68

tf-lambda-proxy-apigw

Terraform Module for API Gateway Proxy + AWS Lambda
HCL
22
star
69

supercluster-rs

Rust implementation of Supercluster for fast hierarchical point clustering
Rust
22
star
70

JosmMagicWand

JOSM Plugin allows select areas to label using a range of colors,
Java
21
star
71

ginkgo

Default theme for Open Atrium.
PHP
21
star
72

python-seed

Project template for Python projects
Python
21
star
73

gl-director

GL Director, an interface to easily generate terrain flyovers for your app using Mapbox GL JS
JavaScript
21
star
74

our-forests-tomorrow

Visualizing European forests future. Based on the EU-Trees4F study by Mauri et al.
TypeScript
21
star
75

project-seed

A basic starting point for projects.
JavaScript
20
star
76

satTS

ML pipeline to classify crop types with multi-spectral and multi-temporal EO data
Jupyter Notebook
19
star
77

titiler-image

TiTiler extension to work with non-geo images
Python
18
star
78

pearl-frontend

PEARL (Planetary Computer Land Cover Mapping) Frontend
JavaScript
18
star
79

bones-auth

Base authentication model, view and middleware
JavaScript
18
star
80

pydantic-ssm-settings

Replace Pydantic's builtin Secret Support with a configuration provider that loads parameters from AWS Systems Manager Parameter Store.
Python
18
star
81

openlayers_slim

A configured and compressed version of OpenLayers, suitable for most common applications.
JavaScript
17
star
82

litecal

A lite clean calendar UI.
PHP
17
star
83

seed

DS templates for everything
JavaScript
17
star
84

sentinel-2-cog

Convert Sentinel-2 JPEG 2000 to COG with AWS Lambda
Python
17
star
85

kenya-bank

Kenya Educational Mapping Site
JavaScript
17
star
86

co2ordinate

Find the most efficient place to gather GHG-wise
TypeScript
16
star
87

skynet-scrub

GUI for editing machine learning outputs
JavaScript
16
star
88

fastapi-authorization-gateway

Python
16
star
89

aiocogeo-rs

Async GeoTIFF reader for Rust
Rust
15
star
90

warp-resample-profiling

Guidance and profiling results for warp resampling in Python
Dockerfile
15
star
91

geofield

Defines a six column field for storing lat/lon data and bounding box data.
JavaScript
15
star
92

couch-sqlite

JavaScript
15
star
93

pointcloud-to-cog

Convert Point cloud data to Cloud Optimized GeoTIFF using AWS Lambda
Python
15
star
94

stac-admin

UI to update collection and item meta data in STAC catalogs
TypeScript
15
star
95

hapi-paginate

A simple pagination for hapijs responses
JavaScript
14
star
96

connectivity-atlas

An interactive map of global infrastructure.
HTML
14
star
97

jake

Minimal, lightweight theme for Managing News.
PHP
13
star
98

skynet-scrub-server

Backing store for developmentseed/skynet-scrub
JavaScript
13
star
99

titiler-mvt

Create MVT dynamically from COG
Python
13
star
100

simpleshare

Simple sharing of URLs using various methods. Supports Twitter, Facebook and email.
PHP
13
star