• Stars
    star
    107
  • Rank 323,587 (Top 7 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created over 10 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Import pipeline for OSM in to Pelias

A modular, open-source search engine for our world.

Pelias is a geocoder powered completely by open data, available freely to everyone.

Local Installation · Cloud Webservice · Documentation · Community Chat

What is Pelias?
Pelias is a search engine for places worldwide, powered by open data. It turns addresses and place names into geographic coordinates, and turns geographic coordinates into places and addresses. With Pelias, you’re able to turn your users’ place searches into actionable geodata and transform your geodata into real places.

We think open data, open source, and open strategy win over proprietary solutions at any part of the stack and we want to ensure the services we offer are in line with that vision. We believe that an open geocoder improves over the long-term only if the community can incorporate truly representative local knowledge.

Pelias OpenStreetMap importer

Overview

The OpenStreetMap importer handles importing data from OpenStreetMap into Elasticsearch for use by Pelias.

It includes logic for filtering to select only data relevant for geocoding, transforming it to match the Pelias data model, and augmenting the data as required.

Prerequisites

See Pelias software requirements

Clone and Install dependencies

For instructions on setting up Pelias as a whole, see our getting started guide. Further instructions here pertain to the OSM importer only

$ git clone https://github.com/pelias/openstreetmap.git && cd openstreetmap;
$ npm install

Download data

The importer will accept any valid pbf extract you have, such as a full planet file (50GB+) from planet.openstreetmap.org or download.geofabrik.de You can use the included download script to obtain the desired pbf files as follows. In the configuration file you can specify which files are to be downloaded. They will all be downloaded to the imports.openstreetmap.datapath directory.

If no download sources are specified in the configuration, the entire planet file will be downloaded. Keep in mind this file is quite large.

$ PELIAS_CONFIG=<path-to-config> npm run download

Configuration

In order to tell the importer the location of your downloads, temp space and environmental settings you will first need to create a ~/pelias.json file.

See the config documentation for details on the structure of this file. Your relevant config info for the openstreetmap module might look something like this:

{
  "imports": {
    "openstreetmap": {
      "download": [{
        "sourceURL": "https://s3.amazonaws.com/metro-extracts.nextzen.org/portland_oregon.osm.pbf"
      }],
      "datapath": "/mnt/pelias/openstreetmap",
      "leveldbpath": "/tmp",
      "import": [{
        "filename": "portland_oregon.osm.pbf"
      }]
    }
  }
}

Configuration Settings

imports.openstreetmap.datapath

This is the directory where the OSM importer will look for files to import. If configured it will also download files to this location.

imports.openstreetmap.download[0].sourceURL

A URL to download when the download script (in ./bin/download) is run. Will be downloaded to the datapath directory.

imports.openstreetmap.import[0].filename

The OSM importer will look for a file with a name matching this value in the configured datapath directory when importing data.

If downloading from a remote URL, the filename must match the value in sourceURL.

imports.openstreetmap.leveldbpath

This is the directory where temporary files will be stored in order to denormalize OSM ways and relations. In the case of a planet import it is best to have at least 100GB free.

Defaults to tmp.

imports.openstreetmap.importVenues

By default, the OSM importer imports both venue records and addresses. If set to false, only address records will be imported.

imports.openstreetmap.removeDisusedVenues

If set to boolean true, venues with popularity below 0 (as determined by OSM tags) will be discarded. In practice, this affects records with tags such as disused, amenity:disused and abandoned

By default, or if set to any other value besides true, these records will be imported.

Administrative Hierarchy Lookup

OSM records often do not contain information about which city, state (or other region like province), or country that they belong to. Pelias has the ability to compute these values from Who's on First data. For more info on how admin lookup works, see the documentation for pelias/wof-admin-lookup. By default, adminLookup is enabled. To disable, set imports.adminLookup.enabled to false in Pelias config.

Note: Admin lookup requires loading around 5GB of data into memory.

Running an import

This will start the import process. It may take a few minutes to load administrative data and begin processing the OSM PBF file, then you should see regular progress updates in the terminal.

$ npm start

How long does it take?

If all goes well, you should see between 6000-7000 records imported per second on a modern machine. A full planet install will import about 80 million records, whereas most city extracts will import at most a few thousand.

These counts are of records containing valid location names to search on, data which is not directly searchable by the end user, such as fire hydrants, lamp posts, etc is not imported.

If you are looking to run a planet-wide cluster like the one we provide for geocode.earth please see our documentation on full planet builds.

Issues

If you have any issues getting set up or the documentation is missing something, please open an issue here: https://github.com/pelias/openstreetmap/issues

Contributing

Please fork and pull request against upstream master on a feature branch.

Pretty please; provide unit tests and script fixtures in the test directory.

Code Linting

A .jshintrc file is provided which contains a linting config, usually your text editor will understand this config and give you inline hints on code style and readability.

These settings are strictly enforced when you do a git commit, you can execute git commit at any time to run the linter against your code.

Running Unit Tests

$ npm test

Running End-to-End Tests

These tests run the entire pipeline against a small PBF extract to assert that the individual units work as expected when wired together.

$ npm run end-to-end

Code Coverage

$ npm run coverage

Continuous Integration

CI tests every change against all supported Node.js versions.

More Repositories

1

pelias

Pelias is a modular open-source geocoder using Elasticsearch.
Twig
3,052
star
2

placeholder

stand-alone coarse geocoder
JavaScript
303
star
3

docker

Run the Pelias geocoder in docker containers, including example projects.
Shell
287
star
4

documentation

All things documentation for Pelias
217
star
5

api

HTTP API for Pelias Geocoder
JavaScript
209
star
6

leaflet-plugin

Add Pelias geocoding to your Leaflet map.
JavaScript
189
star
7

pbf2json

An OpenStreetMap pbf parser which exports json, allows you to cherry-pick tags and handles denormalizing ways and relations. Available as a standalone binary and comes with a convenient npm wrapper.
Go
129
star
8

polygon-lookup

Fast point-in-polygon intersection for large numbers of polygons.
JavaScript
71
star
9

interpolation

global street address interpolation service (beta)
JavaScript
54
star
10

openaddresses

Pelias import pipeline for OpenAddresses.
JavaScript
46
star
11

geonames

Import pipeline for geonames in to Pelias
JavaScript
43
star
12

parser

natural language classification engine for geocoding
JavaScript
41
star
13

schema

elasticsearch schema files and tooling
JavaScript
38
star
14

libpostal-service

Dockerfile for libpostal-service based on the Who's on First implementation
Dockerfile
33
star
15

spatial

ALPHA: geographic data service backed by spatialite
JavaScript
29
star
16

whosonfirst

Importer for Who's on First gazetteer
JavaScript
26
star
17

pelias-android-sdk

Android sdk for pelias
Java
20
star
18

csv-importer

Import arbitrary data in CSV format to Pelias
JavaScript
17
star
19

polylines

Pelias import pipeline for polyline (road network) data.
JavaScript
17
star
20

pip-service

Pelias point-in-polygon-service
JavaScript
15
star
21

query

geospatial queries used by the pelias api
JavaScript
12
star
22

terraform-elasticsearch

Terraform scripts for running an Elasticsearch cluster
HCL
10
star
23

pelias-ios-sdk

Interact with Mapzen's search & geocoding service
Swift
9
star
24

wof-admin-lookup

Who's on First Admin Lookup for the Pelias Geocoder
JavaScript
9
star
25

config

Configuration file for Pelias
JavaScript
8
star
26

dashboard

Pelias dashboard built with the Dashing framework
JavaScript
7
star
27

scripts-batch-search

JavaScript
6
star
28

model

Pelias data models
JavaScript
6
star
29

transit

Load transit landmarks into the Pelias geocoder
JavaScript
6
star
30

acceptance-tests

Pelias API acceptance tests
4
star
31

presentation

Pelias related talks and presentations.
JavaScript
4
star
32

postal-cities

Scripts to generate mappings of postal codes to 'last line' postal localities (postal cities)
JavaScript
4
star
33

labels

Pelias Label generation
JavaScript
4
star
34

fuzzy-tester

A fuzzy testing library for geocoding
JavaScript
4
star
35

microservice-wrapper

JavaScript
4
star
36

docker-baseimage

Pelias Docker Baseimage
Dockerfile
3
star
37

wof

WhosOnFirst tools
JavaScript
3
star
38

dbclient

Database client for Pelias import pipelines
JavaScript
3
star
39

design

Branding & graphic design guidelines and assets
2
star
40

sorting

JavaScript
2
star
41

loadtest

Scripts for loadtesting pelias
JavaScript
2
star
42

woflint

WhosOnFirst document/collection linter
JavaScript
1
star
43

mars-importer

Importer for Martian data
JavaScript
1
star
44

docker-valhalla-baseimage

Pelias Docker Baseimage with Valhalla additionally installed
Shell
1
star
45

blacklist-stream

Pelias document blacklist stream
JavaScript
1
star
46

analysis

text analysis libraries (work in progress)
JavaScript
1
star
47

ci-tools

Tools for manging CI builds used in other repositories
Shell
1
star