• Stars
    star
    399
  • Rank 104,267 (Top 3 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created almost 11 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Geohash utitlies in java

geo


codecov
Maven Central

Java utility methods for geohashing.

Features

  • simple api
  • encodes geohashes from latitude, longitude to arbitrary length (GeoHash.encodeHash)
  • decodes latitude, longitude from geohashes (GeoHash.decodeHash)
  • finds adjacent hash in any direction (GeoHash.adjacentHash), works on borders including the poles too
  • finds all 8 adjacent hashes to a hash (GeoHash.neighbours)
  • calculates hash length to enclose a bounding box (GeoHash.hashLengthToCoverBoundingBox)
  • calculates geohashes of given length to cover a bounding box. Returns coverage ratio as well (GeoHash.coverBoundingBox)
  • calculates height and width of geohashes in degrees (GeoHash.heightDegrees and GeoHash.widthDegrees)
  • encodes and decodes long values from geohashes (Base32.encodeBase32 and Base32.decodeBase32)
  • good performance (~3 million GeoHash.encodeHash calls per second on an I7, single thread)
  • no mutable types exposed by api
  • threadsafe
  • 100% unit test coverage (for what that's worth of course!)
  • Apache 2.0 licence
  • Published to Maven Central

Status: production, available on Maven Central

Maven site reports are here including javadoc.

Getting started

Add this to your pom:

<dependency>
    <groupId>com.github.davidmoten</groupId>
    <artifactId>geo</artifactId>
    <version>VERSION_HERE</version>
</dependency>

Bounding box searches using geohashing

What is the problem?

Databases of events at specific times occurring at specific places on the earth's surface are likely to be queried in terms of ranges of time and position. One such query is a bounding box query involving a time range and position constraint defined by a bounding lat-long box.

The challenge is to make your database run these queries quickly.

Some databases may either not support or suffer significant performance degradation when large datasets are queried with inequality conditions on more than one variable.

For example, a search for all ship reports within a time range and within a bounding box could be achieved with a range condition on time combined with a range condition on latitude combined with a range condition on longitude ( combined with = logical AND). This type of query can perform badly on many database types, SQL and NoSQL. On Google App Engine Datastore for instance only one variable with inequality conditions is allowed per query. This is a sensible step to take to meet scalability guarantees.

What is a solution?

The bounding box query with a time range can be rewritten using geohashes so that only one variable is subject to a range condition: time. The method is:

  • store geohashes of all lengths (depends on the indexing strategies available, a single full length hash may be enough) in indexed fields against each lat long position in the database. Note that storing hashes as a single long integer value may be advantageous (see Base32.decodeBase32 to convert a hash to a long).
  • calculate a set of geohashes that wholly covers the bounding box
  • perform the query using the time range and equality against the geohashes. For example:
(startTime <= t < finishTime) and (hash3='drt' or hash3='dr2')
  • filter the results of the query to include only those results within the bounding box

The last step is necessary because the set of geohashes contains the bounding box but may be larger than it.

What hash length to use?

So how long should the hashes be that we try to cover the bounding box with? This will depend on your aims which might be one or more of minimizing: cpu, url fetch time, financial cost, total data transferred from datastore, database load, 2nd tier load, or a heap of other possible metrics.

Calling GeoHash.coverBoundingBox with just the bounding points and no additional parameters will return hashes of a length such that the number of hashes is as many as possible but less than or equal to GeoHash.DEFAULT_MAX_HASHES (12).

You can explicitly control maxHashes by calling GeoHash.coverBoundingBoxMaxHashes.

As a quick example, for a bounding box proportioned more a less like a screen with Schenectady NY and Hartford CT in USA at the corners:

Here are the hash counts for different hash lengths:

m is the size in square degrees of the total hashed area and a is the area of the bounding box.

length  numHashes m/a    
1           1     1694   
2           1       53     
3           4        6.6    
4          30        1.6    
5         667        1.08   
6       20227        1.02   

Only testing against your database and your preferrably real life data will determine what the optimal maxHashes value is. In the benchmarks section below a test with H2 database found that optimal query time was when maxHashes is about 700. I doubt that this would be the case for many other databases.

A rigorous exploration of this topic would be fun to do or see. Let me know if you've done it or have a link and I'll update this page!

Hash height and width formulas

This is the relationship between a hash of length n and its height and width in degrees:

First define this function:

    parity(n) = 0 if n is even otherwise 1

Then

    width = 180 / 2(5n+parity(n)-2)/2 degrees

    height = 180 / 2(5n-parity(n))/2 degrees

The height and width in kilometres will be dependent on what part of the earth the hash is on and can be calculated using Position.getDistanceToKm. For example at (lat,lon):

double distancePerDegreeWidth =
     new Position(lat,lon).getDistanceToKm(new Position(lat, lon+1));

Benchmarks

Inserted 10,000,000 records into an embedded H2 filesystem database which uses B-tree indexes. The records were geographically randomly distributed across a region then a bounding box of 1/50th the area of the region was chosen. Query performed as follows (time is the time to run the query and iterate the results):

hashLength numHashes  found   from  time(s) 
2          2          200K    10m   56.0    
3          6          200k    1.2m  10.5
4          49         200k    303k   4.5
5          1128       200k    217K   3.6
none       none       200k    200k  31.1 (multiple range query)

I was pleasantly surprised that H2 allowed me to put over 1000 conditions in the where clause. I tried with the next higher hash length as well with over 22,000 hashes but H2 understandably threw a StackOverFlowError.

To run the benchmark:

mvn clean test -Dn=10000000

Running with n=1,000,000 is much quicker to run and yields the same primary result:

multiple range query is 10X slower than geohash lookup if the hash length is chosen judiciously

Links

More Repositories

1

rtree

Immutable in-memory R-tree and R*-tree implementations in Java with reactive api
Java
1,038
star
2

rxjava-jdbc

Efficient execution and functional composition of database calls using jdbc and RxJava Observables
Java
804
star
3

rxjava2-jdbc

RxJava2 integration with JDBC including Non-blocking Connection Pools
Java
386
star
4

rxjava-extras

Utilities for use with rxjava
Java
269
star
5

rxjava2-extras

Utilities for use with RxJava 2
Java
167
star
6

xsd-forms

Generates web forms from xml schema documents (xsd)
HTML
134
star
7

state-machine

Finite state machine class generator for java, exports graphml, supports immutability!
Java
124
star
8

hilbert-curve

Java utilities for transforming distance along N-dimensional Hilbert Curve to a point and back. Also supports range splitting queries on the Hilbert Curve.
Java
93
star
9

rxjava-file

RxJava observables for files including NIO events
Java
83
star
10

big-sorter

Java library that sorts very large files of records by splitting into smaller sorted files and merging
Java
74
star
11

rtree2

Immutable in-memory R-Tree and R*-Tree for Java with Iterable API
Java
71
star
12

openapi-to-plantuml

Converts OpenAPI 3.0 definitions to Plant UML text for visualisation of your API.
Java
56
star
13

jenkins-ec2-https

How to setup Jenkins CI on EC2 with https access
Shell
53
star
14

flatbuffers

Maven artifacts containing compiled flatbuffers binaries and flatbuffers-java runtime library
Java
53
star
15

predict4java

java library for satellite position prediction
Java
44
star
16

rtree-multi

Java library implementing immutable R-tree and R*-tree for n dimensions
Java
43
star
17

sparse-hilbert-index

Java library to create and search random access files (including in S3) using the space-filling hilbert index (sparse)
Java
40
star
18

java-builder-pattern-tricks

Tricks to use with the java builder pattern
40
star
19

cake-pattern

Examples of cake pattern in scala for injecting singleton and non-singleton dependencies
Scala
34
star
20

bplustree

B+-tree in java that stores to disk using memory mapped files, supports range queries and duplicate keys
Java
33
star
21

rtree-3d

3D R-Tree in java
Java
32
star
22

jax-maven-plugin

maven plugin support for xjc, wsimport, wsgen, schemagen for Java 8,9,10,11+
Java
31
star
23

odata-client

Java client generator for a service described by OData CSDL 4.0 metadata. Includes Microsoft Graph clients (v1.0 and Beta), Graph Explorer client, Analytics for DevOps, Dynamics CRM clients
Java
28
star
24

websockets-log-tail

Follow a stream (like a log file) from a server in the browser.
Java
28
star
25

grumpy

OGC WMS server allowing custom rendered layers in java
Java
28
star
26

word-wrap

Java library for word wrapping text including streaming and custom stringWidth
Java
27
star
27

aws-maven-plugin

Deploys resources to AWS using maven
Java
27
star
28

rxjava-slf4j

Logging utilities for use with RxJava
Java
25
star
29

aws-lightweight-client-java

A lightweight java client for the AWS API. Signs requests with AWS Version 4 and offers helpful builders.
Java
25
star
30

rxjava2-http

Transmit RxJava2 Flowable over http with non-blocking backpressure
Java
18
star
31

xuml-tools

Executable UML tools (xml schema, java model compiler, java + javascript model viewer) based on miUML metamodels
Java
16
star
32

reels

Actor framework for Java, non-blocking, performant
Java
16
star
33

audio-recognition

Matches audio to small vocabulary using fast fourier transforms
Java
15
star
34

rxjava2-aws

RxJava 2 utilities for use with AWS especially SQS, S3
Java
13
star
35

rxjava2-file

Java
13
star
36

kool

j.u.s.Stream alternative (synchronous only), reusable, faster, more operators, easier to use.
Java
13
star
37

guava-mini

Optional, Preconditions, Objects, Lists, Sets classes taken from guava
Java
10
star
38

jns

3D Navier-stokes solver for incompressible fluids using java 8 for regions including obstacles and surface
Java
10
star
39

io-extras

IO java utilities, OutputStream as InputStream, BoundedBufferedReader
Java
9
star
40

ppk

Concise Public Private Key (PKCS) encryption utilities in java
Java
9
star
41

rxjava-web-server

playing around with using Observables in a simple web server
Java
9
star
42

functional-jpa

Functional style java helpers for jpa and guava
Java
9
star
43

rxjava-aws

RxJava 1.x utilities for AWS (SQS, S3, ...)
Java
9
star
44

xjc-maven-plugin

Supports Java 8,9,10,11+, generates code from DTD or XSD
Java
8
star
45

bigsort

Uses RxJava to sort an arbitrarily large stream by serializing to temporary files and merging
Java
7
star
46

space-invaders-opengl

Runs the space invaders LWJGL demo as a main or an applet
Java
7
star
47

rxjava3-jdbc

Java
7
star
48

davidmoten.github.io

apidocs and other documentation
HTML
6
star
49

openapi-to-plantuml-aws-api

HTML
6
star
50

big-sorter-example

Demo maven project with big-sorter dependency and sample csv sort
Java
5
star
51

viem

Volatile Identifier Entity Matching (VIEM) algorithm and java library
Java
5
star
52

logan

Java webapp for time series analysis of log files
JavaScript
5
star
53

java-script-template

Template for a bash script that compiles and runs java commands
Shell
5
star
54

aws-helper

Type-safety additions for Java AWS Lambda in API Gateway context
Java
5
star
55

tile-joiner

Renders map service tiles to a BufferedImage in java and thence to a PNG for instance
Java
5
star
56

entity-tracking-in-memory

Matches timestamped geospatial position reports to entities in an in-memory dataset and maintains identifier uniqueness
Java
5
star
57

openapi-codegen

Java code generator from OpenAPI definition file
Java
5
star
58

one-time-link

Java webapp for creating one-time read links to encrypted information stored on the server file system
Java
4
star
59

http-test-server

Java
4
star
60

maven-s3-repo

Read from an S3-backed maven repository using standard http wagon authentication and serverless architecture
Java
4
star
61

plantuml-maven-plugin

Maven plugin for generating diagram images from PlantUML files
Java
4
star
62

decrypt-maven-plugin

Decrypts server passwords read from .m2/settings.xml
Java
4
star
63

java-data-structures

Practice implementations of some common data structures and algorithms in java
Java
4
star
64

low-mem

How to create low memory usage classes in java
Java
4
star
65

microsoft-dynamics-finance-client

Java client for Microsoft Dynamcis Finance and Operations API
Java
4
star
66

sonatype-parent

Parent pom.xml to ease deployment to Maven Central
4
star
67

java-builder2

Generate complex builder code using java code
Java
3
star
68

java-builder

Generate java builder pattern from a list of variable declarations
Java
3
star
69

rxjava3-pool

Java
3
star
70

api-gateway-java-lambda-cf-example

Example of integration of api gateway and java lambda using cloud-formation
Java
3
star
71

embedded-queue

Java
3
star
72

rxjava2-json

RxJava2 utitilies for consuming streaming json
Java
3
star
73

more-executors

More performant Java Executors
Java
3
star
74

timesheet

timesheet web application
Java
3
star
75

gedcom

Scala library to parse GEDCOM files (common genealogy format)
Scala
3
star
76

rxjava-extras-java-8

Utilities for use with RxJava 1.x and Java 8
Java
2
star
77

rxjava-parallel

implements a ParallelObservable as an experiment
Java
2
star
78

junit-extras

Utilities for use with junit
Java
2
star
79

rxjava-marble-template

Inkscape svg template mimicking rxjava style marble diagrams
2
star
80

beanstalk-template

Beanstalk java servlet application that supports client certificate authentication (load-balanced)
Java
2
star
81

git-properties-maven-plugin

Maven plugin to write a git.properties file to an output directory and to set maven properties for use in pom.xml
Java
2
star
82

jetty-demo

A demonstration webapp that can be started using mvn jetty:run
JavaScript
2
star
83

jks-util

Utilities for manipulating JKS files
Java
2
star
84

mp4-splicer

Java based tool for chopping and concatenating h264 video in mp4 containers
Java
2
star
85

c-vs-java

Performance comparison on 2D array of Java and C
Java
2
star
86

entity-tracking

Geopositional entity tracking using geohashing for queries
Java
1
star
87

android-scala-sample

Android app using scala built with maven
Scala
1
star
88

log-analysis

superseded by logan
JavaScript
1
star
89

log-metrics

Detects changes to log files and parses logs to extract and publish metrics
Java
1
star
90

as-none-before

ASN.1 java compiler
GAP
1
star
91

pulley

Fiddling around with reactive pull in Java
Java
1
star
92

ets

Entity Tracking System
1
star
93

school-class-partitions

Algorithm discussion on splitting a group into classes while optimizing friend preferences, gender split, and exclusions
1
star
94

state-diagram-viewer

playing with GraphStream library for graph visualisation particularly a UML State Diagram
1
star
95

practice

miscellaneous algorithm practice
Java
1
star
96

authentication-util

Hmac and other authentication utilities including AWS basic authentication using s3
Java
1
star
97

mandelbrot

Generates a movie of a mandelbrot set zoom-in
Java
1
star
98

github-stars

Service deployed to AWS API Gateway and Lambda using CF to cache github star counts
Java
1
star
99

aws-sdk-java

The official AWS SDK for Java.
Java
1
star
100

mapdb

MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.
Java
1
star