• Stars
    star
    124
  • Rank 288,207 (Top 6 %)
  • Language
    C
  • License
    Other
  • Created almost 3 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Extension types for geospatial data for use with 'Arrow'

geoarrow

R-CMD-check Codecov test coverage

The goal of geoarrow is to leverage the features of the arrow package and larger Apache Arrow ecosystem for geospatial data. The geoarrow package provides an R implementation of the GeoParquet file format of and the draft geoarrow data specification, defining extension array types for vector geospatial data.

Installation

You can install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("paleolimbot/geoarrow")

Read and Write GeoParquet

Parquet is a compact binary file format that enables fast reading and efficient compression, and its geospatial extension ‘GeoParquet’ lets us use it to encode geospatial data. You can write geospatial data (e.g., sf objects) to Parquet using write_geoparquet() and read them using read_geoparquet().

library(geoarrow)

nc <- sf::read_sf(system.file("shape/nc.shp", package = "sf"))
write_geoparquet(nc, "nc.parquet")
read_geoparquet_sf("nc.parquet")
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS:  NAD27
#> # A tibble: 100 × 15
#>     AREA PERIMETER CNTY_ CNTY_ID NAME  FIPS  FIPSNO CRESS_ID BIR74 SID74 NWBIR74
#>    <dbl>     <dbl> <dbl>   <dbl> <chr> <chr>  <dbl>    <int> <dbl> <dbl>   <dbl>
#>  1 0.114      1.44  1825    1825 Ashe  37009  37009        5  1091     1      10
#>  2 0.061      1.23  1827    1827 Alle… 37005  37005        3   487     0      10
#>  3 0.143      1.63  1828    1828 Surry 37171  37171       86  3188     5     208
#>  4 0.07       2.97  1831    1831 Curr… 37053  37053       27   508     1     123
#>  5 0.153      2.21  1832    1832 Nort… 37131  37131       66  1421     9    1066
#>  6 0.097      1.67  1833    1833 Hert… 37091  37091       46  1452     7     954
#>  7 0.062      1.55  1834    1834 Camd… 37029  37029       15   286     0     115
#>  8 0.091      1.28  1835    1835 Gates 37073  37073       37   420     0     254
#>  9 0.118      1.42  1836    1836 Warr… 37185  37185       93   968     4     748
#> 10 0.124      1.43  1837    1837 Stok… 37169  37169       85  1612     1     160
#> # … with 90 more rows, and 4 more variables: BIR79 <dbl>, SID79 <dbl>,
#> #   NWBIR79 <dbl>, geometry <MULTIPOLYGON [°]>

You can also use arrow::open_dataset() and geoarrow_collect_sf() to use the full power of the Arrow compute engine on datasets of one or more files:

library(arrow)
library(dplyr)

(query <- open_dataset("nc.parquet") %>%
  filter(grepl("^A", NAME)) %>%
  select(NAME, geometry) )
#> FileSystemDataset (query)
#> NAME: string
#> geometry: wkb GEOGCS["NAD27",DATUM["North...
#> 
#> * Filter: if_else(is_null(match_substring_regex(NAME, {pattern="^A", ignore_case=false}), {nan_is_null=true}), false, match_substring_regex(NAME, {pattern="^A", ignore_case=false}))
#> See $.data for the source Arrow object

query %>%
  geoarrow_collect_sf()
#> Simple feature collection with 6 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -82.07776 ymin: 34.80792 xmax: -79.23799 ymax: 36.58965
#> Geodetic CRS:  NAD27
#> # A tibble: 6 × 2
#>   NAME                                                                  geometry
#>   <chr>                                                       <MULTIPOLYGON [°]>
#> 1 Ashe      (((-81.47276 36.23436, -81.54084 36.27251, -81.56198 36.27359, -81.…
#> 2 Alleghany (((-81.23989 36.36536, -81.24069 36.37942, -81.26284 36.40504, -81.…
#> 3 Avery     (((-81.94135 35.95498, -81.9614 35.93922, -81.94495 35.91861, -81.9…
#> 4 Alamance  (((-79.24619 35.86815, -79.23799 35.83725, -79.54099 35.83699, -79.…
#> 5 Alexander (((-81.10889 35.7719, -81.12728 35.78897, -81.1414 35.82332, -81.32…
#> 6 Anson     (((-79.91995 34.80792, -80.32528 34.81476, -80.27512 35.19311, -80.…

More Repositories

1

ggspatial

Enhancing spatial visualization in ggplot2
R
353
star
2

rbbt

R Interface to the Better BiBTex Zotero Connector
R
134
star
3

geos

Open Source Geometry Engine ('GEOS') R API
R
60
star
4

wk

Lightweight Well-Known Geometry Parsing
R
43
star
5

exifr

Read EXIF data in R using ExifTool
R
33
star
6

tidypaleo

Tidy tools for paleoenvironmental archives
R
32
star
7

narrow

An R interface to the 'Apache Arrow' C API
C
30
star
8

s2geography

Simple features (ish) for s2geometry
C++
27
star
9

geovctrs

Common Classes and Data Structures for Geometry Vectors
R
25
star
10

mudata2

Interchange Tools for Multi-Parameter Spatiotemporal Data
R
24
star
11

rosm

Plot Open Street Map and Other Tiles in R
R
24
star
12

prettymapr

Scale Bar, North Arrow, and Pretty Margins in R
R
22
star
13

CanadaWeather

Canadian Weather from Environment Canada for Android
Java
20
star
14

tidyphreeqc

Tidy geochemical modeling using PHREEQC
R
20
star
15

libgeos

Open Source Geometry Engine (GEOS) C API
C++
18
star
16

rclimateca

An R Package to fetch climate data from Environment Canada
R
16
star
17

chemr

data structures for chemistry in R
R
16
star
18

libproj

Generic Coordinate Transformation Library ('PROJ') C API
C
16
star
19

crs2crs

Generic Coordinate System Transforms
R
14
star
20

shp

Read Shapefiles
C
11
star
21

geoarrow-data

R
11
star
22

ggr6

An Implementation of the Grammar of Graphics in R6
R
11
star
23

s2plot

Plot spatial objects on a sphere
R
10
star
24

JupyterQt

Python
10
star
25

grd

Raster and Grid Geometry
R
9
star
26

gpkg

Proof of Concept 'GeoPackage' to Arrow Converter
C
9
star
27

ggdebug

Debug functions and ggproto methods in ggplot2
R
8
star
28

tf

'TensorFlow' C API Wrapper
R
8
star
29

rcanvec

Access and plot CanVec and CanVec+ data for rapid basemap creation in Canada
R
8
star
30

geoarrow-cpp-old

C++
8
star
31

xrftools

XRF tools for R
HTML
8
star
32

pb210

Lead-210 dating utilities
R
7
star
33

pkd

Compact Integer and Float Vectors
R
7
star
34

carbon14

Tidy Radiocarbon Dating Tools
R
7
star
35

rstudioconf2020

Source + slides for rstudio::conf(2020) presentation
R
7
star
36

rtree

Experimental 'RTree' Spatial Indexing
C++
7
star
37

r4transform2021

7
star
38

metal

See If We Can Use Apple Metal In R
C++
6
star
39

datafusion

Experimental R Bindings to Datafusion
Rust
6
star
40

dflite

Lightweight DataFrame for data science.
Python
6
star
41

geocrs

Create and Validate Coordinate Reference System Definitions
R
6
star
42

physical-geology

A bookdown version of "Physical Geology" by Karla Panchuk and Steven Earle
HTML
6
star
43

grib

Read Gridded Binary ('GRIB') Files
C
5
star
44

wkutils

Utilities for Well-Known Geometry Vectors
C++
5
star
45

minimal-thesis-bookdown

TeX
5
star
46

demoadbcplyr

Demonstrate 'dbplyr' through 'ADBC' via FlightSQL
R
5
star
47

rlibpal

Label placement using libpal
C++
5
star
48

rfc86

Tests GDAL RFC 86 Columnar API
R
5
star
49

easyphreeqc

A slightly easier R interface to phreeqc modeling
R
4
star
50

radbc

Experiments with ADBC
C
4
star
51

nanoarrow

Helpers for Arrow C Data & Arrow C Stream interfaces
C
4
star
52

ggoce

Plot 'oce' objects using 'ggplot2'
R
4
star
53

electionca

Canadian Elections Data
R
4
star
54

tidystats

Tidy data port of the stats package
R
4
star
55

geoproj

Low-level Access to the PROJ Library
C++
4
star
56

ggpy

python ggplot without matplotlib
Python
3
star
57

ggverbs

Verbifying noun functions in the ggplot2 package
R
3
star
58

r2d3globe

Intractive Globes for 'Rmarkdown' and 'shiny' using 'D3'
HTML
3
star
59

shinyex_enfr

R
3
star
60

r4ags

Workshop materials for the R+ggplot2 for geoscience course
3
star
61

rparquet

(Experiment) Read and Write 'Parquet' Files
Rust
3
star
62

rproj

Generic Coordinate Transformation Library ('PROJ') R API
R
3
star
63

sbe

Process and Convert Files Created by 'Seabird' Products
R
2
star
64

2021-04-30_dfo-git

2
star
65

hydatr

An R interface to the HYDAT database
R
2
star
66

JavaUnits

Flexible java unit library
Java
2
star
67

anrpackageusingc

Demonstrates Calling 'C' Code
R
2
star
68

r4paleolim

R for Paleolimnology
TeX
2
star
69

minigpkg

Proof-of-concept minimal GeoPackage IO
C
1
star
70

oceandf

Read 'ODF' (Ocean Data File) Files
R
1
star
71

pyosmroute

Python
1
star
72

plotwp

Wordpress Plugin to attach data to posts
PHP
1
star
73

2021-10-27_dfo-gh-actions

1
star
74

qosm

OpenStreetMaps tiles for QGIS
Python
1
star
75

ggstereo

Stereonets in R
R
1
star
76

dfoxaringan

Provides a DFO 'rmarkdown' template for 'xaringan' presentations
CSS
1
star
77

arce00

Read E00 Files
C
1
star
78

docker-images

Dockerfile
1
star
79

landsatutils

R package with convenience functions for Landsat workflows
R
1
star
80

testpackageusingarrowcpp

Experimenting With Various Ways to Do Stuff With 'Arrow'
R
1
star
81

swmm

Cross-platform access to the USEPA Stormwater Management Model (SWMM) in R
C
1
star
82

edwards97

Implementation of a Langmuir Semi-Empirical Coagulation Model
R
1
star
83

cwrs_poster_template

CSS
1
star
84

bsrto

Access Data from the Barrow Strait Real Time Observatory
R
1
star
85

2023-11-21_arrow-over-http-scratchpad

Python
1
star
86

nmea

Parse 'NMEA' Sentences
R
1
star