• Stars
    star
    794
  • Rank 56,957 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 3 years ago
  • Updated 27 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Specification for storing geospatial vector data (point, line, polygon) in Parquet

GeoParquet

About

This repository defines a specification for how to store geospatial vector data (point, lines, polygons) in Apache Parquet, a popular columnar storage format for tabular data - see this vendor explanation for more on what that means. Our goal is to standardize how geospatial data is represented in Parquet to further geospatial interoperability among tools using Parquet today, and hopefully help push forward what's possible with 'cloud-native geospatial' workflows. There are now more than 10 different tools and libraries in 6 different languages that support GeoParquet, you can learn more at geoparquet.org.

Note: This specification is currently in 1.0 'release candidate' status, which means the community is proposing the current version to be 1.0.0, and if no blocking negative feedback is made until end of August 2023 then it will become 1.0.0. This means breaking changes are still possible, but quite unlikely - see the versioning section below for more info.

Early contributors include developers from GeoPandas, GeoTrellis, OpenLayers, Vis.gl, Voltron Data, Microsoft, Carto, Azavea, Planet & Unfolded. Anyone is welcome to join the project, by building implementations, trying it out, giving feedback through issues and contributing to the spec via pull requests. Initial work started in the geo-arrow-spec GeoPandas repository, and that will continue on Arrow work in a compatible way, with this specification focused solely on Parquet. We are in the process of becoming an OGC official Standards Working Group and are on the path to be a full OGC standard.

Goals

There are a few core goals driving the initial development.

  • Establish a great geospatial format for workflows that excel with columnar data - Most data science and 'business intelligence' workflows have been moving towards columnar data, but current geospatial formats can not be as efficiently loaded as other data. So we aim to bring geospatial data best practices to one of the most popular formats, and hopefully establish a good pattern for how to do so.
  • Introduce columnar data formats to the geospatial world - And most of the geospatial world is not yet benefitting from all the breakthroughs in data analysis in the broader IT world, so we are excited to enable interesting geospatial analysis with a wider range of tools.
  • Enable interoperability among cloud data warehouses - BigQuery, Snowflake, Redshift and others all support spatial operations but importing and exporting data with existing formats can be problematic. All support and often recommend Parquet, so defining a solid GeoParquet can help enable interoperability.
  • Persist geospatial data from Apache Arrow - GeoParquet is developed in parallel with a GeoArrow spec, to enable cross-language in-memory analytics of geospatial information with Arrow. Parquet is already well-supported by Arrow as the key on disk persistance format.

And our broader goal is to innovate with 'cloud-native vector' providing a stable base to try out new ideas for cloud-native & streaming workflows.

Features

A quick overview of what GeoParquet supports (or at least plans to support).

  • Multiple spatial reference systems - Many tools will use GeoParquet for high-performance analysis, so it's important to be able to use data in its native projection. But we do provide a clear default recommendation to better enable interoperability, giving a clear target for implementations that don't want to worry about projections.
  • Multiple geometry columns - There is a default geometry column, but additional geometry columns can be included.
  • Great compression / small files - Parquet is designed to compress very well, so data benefits by taking up less disk space & being more efficient over the network.
  • Work with both planar and spherical coordinates - Most cloud data warehouses support spherical coordinates, and so GeoParquet aims to help persist those and be clear about what is supported.
  • Great at read-heavy analytic workflows - Columnar formats enable cheap reading of a subset of columns, and Parquet in particular enables efficient filtering of chunks based on column statistics, so the format will perform well in a variety of modern analytic workflows.
  • Support for data partitioning - Parquet has a nice ability to partition data into different files for efficiency, and we aim to enable geospatial partitions.

It should be noted what GeoParquet is less good for. The biggest one is that it is not a good choice for write-heavy interactions. A row-based format will work much better if it is backing a system that is constantly updating the data and adding new data.

Roadmap

Our aim is to get to a 1.0.0 final by the end of August 2023. The goal of 1.0.0 is to establish a baseline of interoperability for geospatial information in Parquet. For 1.0.0 the only geometry encoding option is Well Known Binary, but we made it an option to allow other encodings. The main goal of 1.1.0 will be to incorporate a more columnar-oriented geometry format, which is currently being worked on as part of the GeoArrow spec. Once that gets finalized we will add the option to GeoParquet. In general 1.1.0 will further explore spatial optimization, spatial indices and spatial partitioning to improve GeoParquet's performance.

Versioning

After we reach version 1.0 we will follow SemVer, so at that point any breaking change will require the spec to go to 2.0.0. Currently implementors should expect breaking changes, though at some point, hopefully relatively soon (0.4?), we will declare that we don't think there will be any more potential breaking changes. Though the full commitment to that won't be made until 1.0.0.

Current Implementations & Examples

Examples of GeoParquet files following the current spec can be found in the examples/ folder. For information on all the tools and libraries implementing GeoParquet, as well as sample data, see the implementations section of the website.

More Repositories

1

ogcapi-features

An open standard for querying geospatial information on the web.
CSS
268
star
2

geopackage

An asciidoc version of the GeoPackage specification for easier collaboration
CSS
261
star
3

sensorthings

The official web site of the OGC SensorThings API standard specification.
132
star
4

geoapi

GeoAPI provides a set of interfaces in programming languages (currently Java and Python) for geospatial applications. The GeoAPI interfaces closely follow OGC specifications, adaptated to match the expectations of programmers.
Java
105
star
5

ogc-geosparql

Public Repository for the OGC GeoSPARQL Standards Working Group
HTML
77
star
6

CityGML-3.0CM

CityGML 3.0 Conceptional Model
TypeScript
74
star
7

ogcapi-environmental-data-retrieval

A Web API that provides a family of lightweight interfaces for accessing Environmental Data resources.
HTML
59
star
8

OGC-Technology-Trends

52
star
9

ogcapi-processes

CSS
46
star
10

ogcapi-common

OGC API - Common provides those elements shared by most or all of the OGC API standards to ensure consistency across the family.
CSS
45
star
11

teamengine

TEAM Engine (Test, Evaluation, And Measurement Engine) is an engine for testing web services and other resources written in JAVA. It executes test scripts written in Compliance Test Language (CTL), TestNG and other languages. It is lightweight and easy to run as a command line or to setup as a service. It can be used to test any type of service or encoding. It is also the official tool used by the Open Geospatial Consortium (OGC) for compliance testing.
Java
43
star
12

ogcapi-maps

OGC API - Map draft specification
CSS
42
star
13

GeoPose

OGC GeoPose development.
JavaScript
41
star
14

ogcapi-records

An open standard for the discovery of geospatial resources on the Web.
CSS
40
star
15

CityGML-3.0Encodings

Encodings for the CityGML 3.0 Conceptual Model
HTML
36
star
16

ogcapi-tiles

OGC API - Tiles draft specification
CSS
36
star
17

developer-events

HTML
30
star
18

ogc-feat-geo-json

CSS
29
star
19

OGC-Web-API-Guidelines

HTML
28
star
20

ogcapi-coverages

OGC API - Coverages draft specification
CSS
22
star
21

poi

OGC Points of Interest Encoding Specification
HTML
20
star
22

ogcapi-discrete-global-grid-systems

https://ogcapi.ogc.org/dggs
20
star
23

ideas

Public repository for Innovation Program Ideas
20
star
24

Geotech

HTML
19
star
25

e-learning

Source repository for OGC e-Learning http://opengeospatial.github.io/e-learning
HTML
18
star
26

mf-json

HTML
17
star
27

TrainingDML-AI_SWG

HTML
16
star
28

geotiff

HTML
16
star
29

ets-ogcapi-features10

Public Repository for the OGC API - Features Compliance Test Suite
Java
14
star
30

SELFIE

Second Environmental Linked Feature Interoperability Experiment
CSS
13
star
31

weather-on-the-web

Home for the development of Weather on the Web Standards
13
star
32

ogc_school

Public repository for OGC training material.
Python
12
star
33

CloudOptimizedGeoTIFF

CSS
11
star
34

templates

OGC document templates in AsciiDoc
Makefile
11
star
35

styles-and-symbology

OGC Styles & Symbology Standards
ANTLR
11
star
36

CRS-Gridded-Geodetic-data-eXchange-Format

Gridded Geodetic data eXchange Format
Python
11
star
37

CityGML3.0-GML-Encoding

CityGML 3.0 GML Encoding
HTML
11
star
38

GEOE3

Python
10
star
39

om-swg

HTML
10
star
40

ogcapi-styles

A Web API that enables map servers, clients as well as visual style editors, to manage and fetch styles.
CSS
10
star
41

cite

Repository to help manage general issues and ideas to improve the OGC CITE infrastructure.
Shell
10
star
42

boreholeie

Repository to support the work done in the 2018-2019 Borehole Interoperability Experiment
9
star
43

ontology-crs

9
star
44

joint-ogc-osgeo-asf-sprint-2021

9
star
45

ogcapi-routes

public repo for OGC API - Routes Standards Working Group
CSS
9
star
46

SoilDataIE

OGC Soil Data Interoperability Experiment
HTML
9
star
47

2D-Tile-Matrix-Set

OGC 2D Tile Matrix Set & TileSet Metadata standard
CSS
8
star
48

GeoSciML

Public repo for GeoSciML. Joint development working on RDF encoding for two specs (GWML and GeoSciML)
HTML
8
star
49

ets-gpkg12

GeoPackage 1.2 Executable Test Suite
Java
8
star
50

HY_Features

HTML
8
star
51

climate-resilience-dwg

HTML
8
star
52

wfs3hackathon

Repository for information about and results from the WFS 3.0 Hackathon, 6/7 March 2018
8
star
53

owscontext

OWS Context SWG version of the JSON encoding work
XSLT
8
star
54

omsf-profile

Observations & Measurements - GML Simple Features Profile
7
star
55

CoverageJSON

Public repo for CoverageJSON project
HTML
7
star
56

WaterQualityIE

HTML
7
star
57

ogcapi-movingfeatures

OGC API - Moving Features
HTML
7
star
58

ets-wfs20

Executable Test Suite for WFS 2.0
Java
7
star
59

ogcapi-connected-systems

Public Repository for the Connected Systems SWG
HTML
7
star
60

teamengine-docker

Dockerfile
7
star
61

ets-sta10

Repository for the Executable Test Suite for OGC Sensor Things API
Java
6
star
62

NamingAuthority

Primary repo for the OGC Naming Authority
HTML
6
star
63

ets-gpkg10

GeoPackage 1.0 Conformance Test Suite
Java
6
star
64

CRS-Deformation-Models

CRS Domain Working Group Deformation Models project
HTML
6
star
65

geosemantics-dwg

HTML
6
star
66

wkt

A standalone reference describing the Well-known Text Representation of Geometry. (Work In Progress)
6
star
67

muddi

Rich Text Format
6
star
68

GeoPoseSandbox

JavaScript
6
star
69

ets-indoorgml10

Executable Test Suite for Indoor GML 1.0
Game Maker Language
5
star
70

Agriculture-DWG

HTML
5
star
71

developer-website

https://opengeospatial.github.io/developer-website
Pug
5
star
72

CitSciIE

Citizen Science Interoperability Experiment
HTML
5
star
73

geopackage-tiled-gridded-coverage

A GeoPackage extension for tiled, gridded coverage data
CSS
5
star
74

PubSub-Whitepaper

HTML
5
star
75

3DPS

5
star
76

OGC-OS-Sprint-04-2020

OGC API - Tiles Sprint (April 23rd & 24th, 2020)
5
star
77

ELFIE

Environmental Linked Features IE
JavaScript
5
star
78

GeoPoseGuides

HTML
5
star
79

EDR-API-Sprint

Planning, work and final report of a virtual Hackathon/Sprint to progress EDR-API
CSS
4
star
80

ISG-Sprint-Year-2

HTML
4
star
81

netcdf-ld

Encoding standard to enable RDF graphs to be encoded in and interpreted from netCDF files
CSS
4
star
82

ogc-i3s-community-standard

Public Repo for the development of future version of the OGC i3s community standard
HTML
4
star
83

movingfeatures

public repo for Moving Features
HTML
4
star
84

ogcapi-3d-geovolumes

4
star
85

ogcapi-code-sprint-2021-07

HTML
4
star
86

ogcapi-geodatacubes

4
star
87

LANDRS

LANDRS: Linked-data API for Networked DRoneS
Vue
4
star
88

ets-ogcapi-processes10

Java
3
star
89

ets-citygml20

Java
3
star
90

geopackage-related-tables

A proposed GeoPackage extension for related tables
CSS
3
star
91

ogcapi-code-sprint-2021-10

October 2021 OGC API Virtual Code Sprint
3
star
92

ets-19139

ets-19139
XSLT
3
star
93

bblocks

HTML
3
star
94

ogcapi-sosa

HTML
3
star
95

mf-swg

Public repository of the OGC Moving Feature Standard Working Group (Moving Features SWG).
3
star
96

GeoDCAT-SWG

Ruby
3
star
97

OGC-ISG-Sprint-Sep-2020

CSS
3
star
98

pubsub

Makefile
3
star
99

GeoPoseWeb

3
star
100

T17-API-D168

OGC Testbed 17 "Attracting Developers: Lowering the entry hurdle for OGC Web API experiments" D168 Data Backend and Deployment - updates ongoing linked to TrainingDML-AI standard implementation
Python
3
star