• Stars
    star
    303
  • Rank 137,655 (Top 3 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created over 7 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

stand-alone coarse geocoder

A modular, open-source search engine for our world.

Pelias is a geocoder powered completely by open data, available freely to everyone.

Local Installation · Cloud Webservice · Documentation · Community Chat

What is Pelias?
Pelias is a search engine for places worldwide, powered by open data. It turns addresses and place names into geographic coordinates, and turns geographic coordinates into places and addresses. With Pelias, you’re able to turn your users’ place searches into actionable geodata and transform your geodata into real places.

We think open data, open source, and open strategy win over proprietary solutions at any part of the stack and we want to ensure the services we offer are in line with that vision. We believe that an open geocoder improves over the long-term only if the community can incorporate truly representative local knowledge.

Pelias coarse geocoder

This repository provides all the code & geographic data you'll need to run your own coarse geocoder.

Read our An (almost) one line coarse geocoder with Docker blog post for a quick start guide and check out our demo.

This service is intended to be run as part of the Pelias Gecoder but can just as easily be run independently as it has no external dependencies.

Natural language parser for geographic text

The engine takes unstructured input text, such as 'Neutral Bay North Sydney New South Wales' and attempts to deduce the geographic area the user is referring to.

Human beings (familiar with Australian geography) are able to quickly scan the text and establish that there 3 distinct token groups: 'Neutral Bay', 'North Sydney' & 'New South Wales'.

The engine uses a similar technique to our brains, scanning across the text, cycling through a dictionary of learned terms and then trying to establish logical token groups.

Once token groups have been established, a reductive algorithm is used to ensure that the token groups are logical in a geographic context. We don't want to return New York City for a term such as 'nyc france', so we need to only return things called 'nyc' inside places called 'france'.

The engine starts from the rightmost group, and works to the left, ensuring token groups represent geographic entities contained within those which came before. This process is repeated until it either runs out of groups, or would return 0 results.

The best estimation is then returned, either as a set of integers representing the ids of those regions, or as a JSON structure which also contains additional information such as population counts etc.

The data is sourced from the whosonfirst project, this project also includes different language translations of place names.

Placeholder supports searching on and retrieving tokens in different languages and also offers support for synonyms and abbreviations.

The engine includes a rudimentary language detection algorithm which attempts to detect right-to-left languages and languages which write their addresses in major-to-minor format. It will then reverse the tokens to re-order them in to minor-to-major ordering.


Requirements

Placeholder requires Node.js and SQLite

See Pelias software requirements for required and recommended versions.

Install

$ git clone [email protected]:pelias/placeholder.git && cd placeholder
$ npm install

Download the required database files

Data hosting is provided by Geocode Earth. Other Pelias related downloads are available at https://geocode.earth/data.

$ mkdir data
$ curl -s https://data.geocode.earth/placeholder/store.sqlite3.gz | gunzip > data/store.sqlite3;

Confirm the build was successful

$ npm test
$ npm run cli -- san fran

> [email protected] cli
> node cmd/cli.js "san" "fran"

san fran

took: 3ms
 - 85922583	locality 	San Francisco

Run server

$ PORT=6100 npm start;

Configuration via Environment Variables

The service supports additional environment variables that affect its operation:

Environment Variable Default Description
HOST undefined The network address that the placeholder service will bind to. Defaults to whatever the current Node.js default is, which is currently to listen on 0.0.0.0 (all interfaces). See the Node.js Net documentation for more information.
PORT 3000 The TCP port that the placeholder service will use for incoming network connections
PLACEHOLDER_DATA ../data/ Path to the directory where the placeholder service will find the store.sqlite3 database file.

Open browser

the server should now be running and you should be able to access the http API:

http://localhost:6100/

try the following paths:

/demo
/parser/search?text=london
/parser/findbyid?ids=101748479
/parser/query?text=london
/parser/tokenize?text=sydney new south wales

Changing languages

the /parser/search endpoint accepts a ?lang=xxx property which can be used to vary the language of data returned.

for example, the following urls will return strings in Japanese / Russian where available:

/parser/search?text=germany&lang=jpn
/parser/search?text=germany&lang=rus

documents returned by /parser/search contain a boolean property named languageDefaulted which indicates if the service was able to find a translation in the language you request (false) or whether it returned the default language (true).

The /parser/findbyid endpoint also accepts a ?lang=xxx property which will return the selected lang if the translation exists and all translations otherwise.

for example, the following url will return strings in French / Korean where available:

/parser/findbyid?ids=85633147,102191581,85862899&lang=fra
/parser/findbyid?ids=85633147,102191581,85862899&lang=kor

the demo is also able to serve responses in different languages by providing the language code in the URL anchor:

/demo#jpn
/demo#chi
/demo#eng
/demo#fra
... etc.

Filtering by placetype

the /parser/search endpoint accepts a ?placetype=xxx parameter which can be used to control the placetype of records which are returned.

the API does not provide any performance benefits, it is simply a convenience API to filter by a whitelist.

you may specify multiple placetypes using a comma to separate them, such as ?placetype=xxx,yyy, these are matched as OR conditions. eg: (xxx OR yyy)

for example:

the query search?text=luxemburg will return results for the country, region, locality etc.

you can use the placetype filter to control which records are returned:

# all matching results
search?text=luxemburg

# only return matching country records
search?text=luxemburg&placetype=country

# return matching country or region records
search?text=luxemburg&placetype=country,region

Live mode (BETA)

the /parser/search endpoint accepts a ?mode=live parameter pair which can be used to enable an autocomplete-style API.

in this mode the final token of each input text is considered as 'incomplete', meaning that the user has potentially only typed part of a token.

this mode is currently in BETA, the interface and behaviour may change over time.

Configuring the rtree threshold

the default matching strategy uses the lineage table to ensure that token pairs represent a valid child->parent relationship. this ensures that queries like 'London France' do not match, because there is no entry in the lineage table linking those two places together.

in some cases it's preferable to fall back to a matching strategy which considers geographically nearby places with a matching name, even if that relationship does not explicitly exist in the lineage table.

for example, 'Basel France' will return 'Basel Switzerland'. this is useful for handling user input errors and errors and omissions from the lineage table.

in the example above, 'Basel France' only matches because the bounding box of 'Basel' overlaps the bounding box of 'France' and no other valid entry for 'Basel France' exists.

the definition of what is 'nearby' is configurable, the bbox for the minor term (left token) is expanded by a threshold (the threshold is added or subtracted to each of the bbox vertices).

by default the threshold is set as 0.2 (degrees), any float value between 0 and 1 may be specified via the enviornment variable RTREE_THRESHOLD.

a setting of less than 0 will disable the rtree functionality completely. disabling the rtree will result in nearby queries such as 'Basel France' returning 'France' instead of 'Basel Switzerland'.


Run the interactive shell

$ npm run repl

> [email protected] repl
> node cmd/repl.js

placeholder >

try the following commands:

placeholder > london on
 - 101735809	locality 	London

placeholder > search london on
 - 101735809	locality 	London

placeholder > tokenize sydney new south wales
 [ [ 'sydney', 'new south wales' ] ]

placeholder > token kelburn
 [ 1729339019 ]

placeholder > id 1729339019
 { name: 'Kelburn',
   placetype: 'neighbourhood',
   lineage:
    { continent_id: 102191583,
      country_id: 85633345,
      county_id: 102079339,
      locality_id: 101915529,
      neighbourhood_id: 1729339019,
      region_id: 85687233 },
   names: { eng: [ 'Kelburn' ] } }

Configuration for pelias API

While Placeholder can be used as a stand-alone application or included with other geographic software / search engines, it is designed for the Pelias geocoder.

To connect Placeholder service to the Pelias API, configure the pelias config file with the port that placeholder is running on.


Tests

run the test suite

$ npm test

Run the functional cases

there are more exhaustive test cases included in test/cases/.

to run all the test cases:

$ npm run funcs

Generate a ~500,000 line test file

this command requires the data/wof.extract file mentioned below in the 'building the database' section.

$ npm run gentests

once complete you can find the generated test cases in test/cases/generated.txt.


Docker

Build the service image

$ docker-compose build

Run the service in the background

$ docker-compose up -d

Building the database

Prerequisites

  • jq 1.5+ must be installed
    • on ubuntu: sudo apt-get install jq
    • on mac: brew install jq
  • Who's on First data download

Steps

the database is created from geographic data sourced from the whosonfirst project.

the whosonfirst project is distributed as geojson files, so in order to speed up development we first extract the relevant data in to a file: data/wof.extract.

the following command will iterate over all the geojson files under the WOF_DIR path, extracting the relevant properties in to the file data/wof.extract.

this process can take 30-60 minutes to run and consumes ~350MB of disk space, you will only need to run this command once, or when your local whosonfirst-data files are updated.

$ WOF_DIR=/data/whosonfirst-data/data npm run extract

now you can rebuild the data/store.json file with the following command:

this should take 2-3 minutes to run:

$ npm run build

Using the Docker image

Rebuild the image

you can rebuild the image on any system with the following command:

$ docker build -t pelias/placeholder .

Download pre-built image

Up to date Docker images are built and automatically pushed to Docker Hub from our continuous integration pipeline

You can pull the latest stable image with

$ docker pull pelias/placeholder

Download custom image tags

We publish each commit and the latest of each branch to separate tags

A list of all available tags to download can be found at https://hub.docker.com/r/pelias/placeholder/tags/

More Repositories

1

pelias

Pelias is a modular open-source geocoder using Elasticsearch.
Twig
3,052
star
2

docker

Run the Pelias geocoder in docker containers, including example projects.
Shell
287
star
3

documentation

All things documentation for Pelias
217
star
4

api

HTTP API for Pelias Geocoder
JavaScript
209
star
5

leaflet-plugin

Add Pelias geocoding to your Leaflet map.
JavaScript
189
star
6

pbf2json

An OpenStreetMap pbf parser which exports json, allows you to cherry-pick tags and handles denormalizing ways and relations. Available as a standalone binary and comes with a convenient npm wrapper.
Go
129
star
7

openstreetmap

Import pipeline for OSM in to Pelias
JavaScript
107
star
8

polygon-lookup

Fast point-in-polygon intersection for large numbers of polygons.
JavaScript
71
star
9

interpolation

global street address interpolation service (beta)
JavaScript
54
star
10

openaddresses

Pelias import pipeline for OpenAddresses.
JavaScript
46
star
11

geonames

Import pipeline for geonames in to Pelias
JavaScript
43
star
12

parser

natural language classification engine for geocoding
JavaScript
41
star
13

schema

elasticsearch schema files and tooling
JavaScript
38
star
14

libpostal-service

Dockerfile for libpostal-service based on the Who's on First implementation
Dockerfile
33
star
15

spatial

ALPHA: geographic data service backed by spatialite
JavaScript
29
star
16

whosonfirst

Importer for Who's on First gazetteer
JavaScript
26
star
17

pelias-android-sdk

Android sdk for pelias
Java
20
star
18

csv-importer

Import arbitrary data in CSV format to Pelias
JavaScript
17
star
19

polylines

Pelias import pipeline for polyline (road network) data.
JavaScript
17
star
20

pip-service

Pelias point-in-polygon-service
JavaScript
15
star
21

query

geospatial queries used by the pelias api
JavaScript
12
star
22

terraform-elasticsearch

Terraform scripts for running an Elasticsearch cluster
HCL
10
star
23

pelias-ios-sdk

Interact with Mapzen's search & geocoding service
Swift
9
star
24

wof-admin-lookup

Who's on First Admin Lookup for the Pelias Geocoder
JavaScript
9
star
25

config

Configuration file for Pelias
JavaScript
8
star
26

dashboard

Pelias dashboard built with the Dashing framework
JavaScript
7
star
27

scripts-batch-search

JavaScript
6
star
28

model

Pelias data models
JavaScript
6
star
29

transit

Load transit landmarks into the Pelias geocoder
JavaScript
6
star
30

acceptance-tests

Pelias API acceptance tests
4
star
31

presentation

Pelias related talks and presentations.
JavaScript
4
star
32

postal-cities

Scripts to generate mappings of postal codes to 'last line' postal localities (postal cities)
JavaScript
4
star
33

labels

Pelias Label generation
JavaScript
4
star
34

fuzzy-tester

A fuzzy testing library for geocoding
JavaScript
4
star
35

microservice-wrapper

JavaScript
4
star
36

docker-baseimage

Pelias Docker Baseimage
Dockerfile
3
star
37

wof

WhosOnFirst tools
JavaScript
3
star
38

dbclient

Database client for Pelias import pipelines
JavaScript
3
star
39

design

Branding & graphic design guidelines and assets
2
star
40

sorting

JavaScript
2
star
41

loadtest

Scripts for loadtesting pelias
JavaScript
2
star
42

woflint

WhosOnFirst document/collection linter
JavaScript
1
star
43

mars-importer

Importer for Martian data
JavaScript
1
star
44

docker-valhalla-baseimage

Pelias Docker Baseimage with Valhalla additionally installed
Shell
1
star
45

blacklist-stream

Pelias document blacklist stream
JavaScript
1
star
46

analysis

text analysis libraries (work in progress)
JavaScript
1
star
47

ci-tools

Tools for manging CI builds used in other repositories
Shell
1
star