• Stars
    star
    433
  • Rank 100,464 (Top 2 %)
  • Language
    Scala
  • License
    Other
  • Created almost 13 years ago
  • Updated almost 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

MOVED - The project is still under development but this page is deprecated.

NOTICE - This project has moved.

It is now part of Foursquare's open source monorepo Fsq.io and all future work will be published there.

The project very much lives on but this Github repo is deprecated.

Twofishes

A coarse, splitting geocoder and reverse geocoder in scala -- Prebuilt indexes and binaries available at twofishes.net. Discussion at google groups.

What is a Geocoder?

A geocoder is a piece of software that translates from strings to coordinates. "New York, NY" to "40.74, -74.0". This is an implementation of a coarse (city level, meaning it can't understand street addresses) geocoder that also supports splitting (breaking off the non-geocoded part in the final response).

Overview

This geocoder was designed around the geonames data, which is relatively small, and easy to parse in a short amount of time in a single thread without much post-processing. Geonames is a collection of simple text files that represent political features across the world. The geonames data has the nice property that all the features are listed with stable identifiers for their parents, the bigger political features that contain them (rego park -> queens county -> new york state -> united states). In one pass, we can build a database where each entry is a feature with a list of names for indexing, names for display, and a list of parents.

The Data

Geonames is great, but not perfect. Southeast Asia doesn't have the most comprehensive coverage. Geonames doesn't have bounding boxes, so we add some of those from http://code.flickr.com/blog/2011/01/08/flickr-shapefiles-public-dataset-2-0/ where possible.

Geonames is licensed under CC-BY http://www.geonames.org/. They take a pretty liberal interpretation of this and just ask for about page attribution if you make use of the data. Flickr shapefiles are public domain

Reverse Geocoding and Polygons

To enable reverse geocoding in twofishes, you need to add polygon data to the inputs. geonames does not distribute polygons, nor does the twofishes distribution contain shapefiles. Shapefiles must be in epsg:4326 projection. The following script will write a copy of your shapefile with an extra property that is the geonameid of the matching feature.

I will add automated scripts for this soon, but for now, if you have shapefiles that map to existing geonames features that you want to put into twofishes

examples:

US place (locality) data -- ftp://ftp2.census.gov/geo/tiger/TIGER2010/PLACE/2010/ ~/shputils/shape-gn-matchr.py --shp_name_keys=NAME10 tl_2010_35_place10.shp gn-tl_2010_35_place10.shp

US county data -- ftp://ftp2.census.gov/geo/tiger/TIGER2010/COUNTY/2010/ ../shputils/shape-gn-matchr.py --dbname=gis --shp_name_keys=NAME10 --allowed_gn_classes='' --allowed_gn_codes=ADM2 --fallback_allowed_gn_classes='' --fallback_allowed_gn_codes='' tl_2010_us_county10.shp gn-us-adm2.shp

MX locality data -- http://blog.diegovalle.net/2013/02/download-shapefiles-of-mexico.html ogr2ogr -t_srs EPSG:4326 mx-4326.shp MUNICIPIOS.shp ./shputils/shape-gn-matchr.py --dbname=gis --shp_name_keys=NOM_MUN mx-4326.shp gn-mx-localities.shp

Requirements

  • Java (jre and jdk)
  • Mongo
  • curl
  • unzip

First time setup

  • git clone https://github.com/foursquare/twofishes.git
  • cd twofishes
  • If you want to download country: ./download-country.sh [ISO 3166 country code] (For example US, GB, etc)
  • If you want to download world: ./download-world.sh

Data import

  • mongod --dbpath /local/directory/for/output/
  • If you want to import countries: ./parse.py -c US /output/dir (Note that you can specify list of countries separating them by comma: US,GB,RU)
  • If you want to import world: ./parse.py -w /output/dir

Serving

  • ./serve.py -p 8080 /output/dir – Where /output/dir will contain a subdirectory whose name will be the date of the most recent build, for example 2013-02-25-01-08-23.803740. You need to point to this subdirectory or to a folder called latest which is created during the build process (in the twofishes directory) and is a symlink to the most recent dated subdirectory.
  • server should be responding to finagle-thrift on the port specified (8080 by default), and responding to http requests at the next port up: http://localhost:8081/?query=rego+park+ny http://localhost:8081/static/geocoder.html#rego+park
  • use the --host flag to specify a bind address (defaults to 0.0.0.0)
  • to enable hotfixes and allow refreshing, use the --hotfix_basepath and --enable_private_endpoints params as detailed under Hotfixes below

NOTE: mongod is not required for serving, only index building.

A better option is to run "./sbt server/assembly" and then use the resulting server/target/server-assembly-VERSION.jar. Serve that with java -jar JARFILE --hfile_basepath /directory

Hotfixes

Hotfixes are expressed as fine-grained edits on top of features in the index. Features can be quickly added, removed or modified on a live server without requiring a full index rebuild and redeploy. Most fields on a GeocodeServingFeature and fields on its nested structs can be edited via a GeocodeServingFeatureEdit object.

To enable hotfix support, the server can be pointed to a hotfix directory at startup via the --hotfix_basepath param. Any .json files found in this directory will be deserialized from JSON to Thrift.

There is only basic tooling to build these JSON hotfix files at present. In JsonHotfixFileBuilder.scala, use GeocodeServingFeatureEdit.newBuilder to build up individual hotfixes in code. Then run build-hotfix-file.py specifying an output file. I will provide a better way shortly.

The server can reload hotfixes on-demand via the /refreshStore endpoint. There is no authentication on this endpoint (or any other private endpoints), so it is disabled by default. Use the --enable_private_endpoints param to enable at your own risk, only if your servers are not publicly accessible. When enabled, calling this endpoint on an individual server will cause it to re-scan the hotfix_basepath directory. Use the helper script refresh-store.py.

Troubleshooting

If you see a java OutOfMemory error at start, you may need to up your # of mapped files

on linux: sysctl -w vm.max_map_count = 131072

Talking to the Server

Technical Details

I use mongo to save state during the index building phase (so that, for instance, we can parse the alternateNames file, which adds name+lang pairs to features defined in a separate file, or adding the flickr bounding boxes). A final pass goes over the database, dereferences ids and outputs some hadoop mapfiles and hfiles. These two hfiles are all that is required for serving the data.

If we were doing heavier processing on the incoming data, a mapreduce that spits out hfiles might make more sense.

When we parse a query, we do a rough recursive descent parse, starting from the left. If being used to split geographic queries like "pizza new york" we expect the "what" to be on the left. All of the features found in a parse must be parents of the smallest

The geocoder currently may return multiple valid parses, however, it only returns the longest possible parses. For "Springfield, US" we will return multiple features that match that query (there are dozens of springfields in the US). It will not return a parse of "Springfield" near "US" with only US geocoded if it can find a longer parse, but it will return multiple valid interpretations of the longest parse.

Performance

Twofishes can handle 100s of queries a second at < 5ms/query on average.

Point reverse geocoding is absurdly performant -- 1000s of queries a second at < 1ms/query.

Future

I'd like to integrate more data from OSM and possibly an entire build solely from OSM. I'd also like to get supplemental data from the Foursquare database where possible. If I was feeling more US-centric, I'd parse the TIGER-line data for US polygons, but I'm expecting those to mostly be in OSM.

Also US-centric are zillow neighborhood polygons, also CC-by-SA. I might add an "attribution" field to the response for certain datasources. I'm not looking forward to writing a conflater with precedence for overlapping features from different data sets.

Contributors

Many thanks for assistance:

  • Jorge Ortiz

Unrelated

These are the two fishes I grilled the night I started coding the original python implementation https://twitter.com/#!/whizziwig/statuses/154431957630066688

More Repositories

1

rogue

MOVED - The project is still under development but this page is deprecated.
Scala
489
star
2

FSNetworking

foursquare iOS networking library
Objective-C
384
star
3

fsqio

A monorepo that holds all of Foursquare's opensource projects
Scala
254
star
4

quattroshapes

Makefile
231
star
5

FSQCollectionViewAlignedLayout

FSQCollectionViewAlignedLayout is a generic collection view layout designed to be very flexible and configurable. It's goal is to save its users from having to write their own custom layout classes every time UICollectionViewFlowLayout is not appropriate for their view.
Objective-C
176
star
6

fongo

faked out in-memory mongo for java
Java
150
star
7

foursquare-android-oauth

Foursquare native authentication makes it easier for your app's users to connect with Foursquare. Unlike web-based OAuth, native authentication re-uses the Foursquare app's user credentials, saving users the hassle of re-logging in to Foursquare within your app.
Java
134
star
8

foursquare-palmpre

A webOS app (Mojo Framework)
JavaScript
105
star
9

foursquare-ios-oauth

Foursquare native authentication makes it easier for your app's users to connect with Foursquare. Unlike web-based OAuth, native authentication re-uses the Foursquare app's user credentials, saving users the hassle of re-logging in to Foursquare within your app.
Objective-C
104
star
10

slashem

A rogue-like DSL for querying SOLR
Scala
103
star
11

FSQLocationBroker

A centralized location manager for your app.
Objective-C
94
star
12

oozie-web

A more pretty, more usable web dashboard for Apache Oozie, written in Scala.
JavaScript
74
star
13

foursquare-fhttp

MOVED - The project is still under development but this page is deprecated.
Scala
44
star
14

FSQCellManifest

A UITableView and UICollectionView delegate and datasource that provides a simpler unified interface for describing your sections and cells.
Objective-C
43
star
15

quiver

An HFile-backed Key-Value Server
Go
42
star
16

hackathon

foursquare hackathonsβ„’
40
star
17

spindle

MOVED - The project is still under development but this page is deprecated.
Scala
39
star
18

mongo-hdfs-export

Scala
31
star
19

foursquare-app-framework

Framework for building Connected Apps
Python
31
star
20

react-foursquare

Foursquare Library for React
JavaScript
25
star
21

es-scorer-plugin

Plugin to do our scoring in ES
Scala
24
star
22

sites-to-markdown

convert google sites html to markdown
Java
23
star
23

FSQRoutes

URL routing framework for iOS
Objective-C
21
star
24

fsq-studio-sdk-examples

Foursquare Studio is a platform to visualize, unify, enrich, and analyze spatial data on a planetary scale.
Jupyter Notebook
19
star
25

qgis-plugin

Foursquare Studio plugin for QGIS
Python
19
star
26

datasource-plugin-clouderamanager

Cloudera Manager datasource for Grafana 3.x
JavaScript
19
star
27

twitter-util-async

scala-async support for twitter util library
Scala
15
star
28

Place-API-Postman-Collection

Postman collection that contains almost all the sample Foursquare Places API calls.
14
star
29

foursquair

An Adobe AIR desktop client for foursquare
ActionScript
14
star
30

placepicker-sdk-sample

An SDK to help developers add a place picker to their app and also quickly access the Foursquare place that their user is at.
Java
11
star
31

wait

wait gem: executes a block until there's a result
Ruby
10
star
32

hoursparser.js

dumb but useful hours extractor from free-text entry
JavaScript
9
star
33

h3-presto

Presto bindings for H3, a hierarchical hexagonal geospatial indexing system
Java
8
star
34

gitshed

git versioning of large binary files outside the repo.
Python
8
star
35

shapefile-geo

Java
8
star
36

pilgrim-sdk-react-native

React native wrapper for the Pilgrim SDK
Java
7
star
37

fsgo

Reusable libraries for building Go services
Go
7
star
38

FSQMessageForwarder

An Obj-C message forwarder class, for when you don't have access to the source of the sending object.
Objective-C
6
star
39

merchant-app

JavaScript
5
star
40

source_code_analysis

Utilities to analyze, lint and rewrite source code in various languages.
Python
5
star
41

gohfile

5
star
42

exceptionator

MOVED - The project is still under development but this page is deprecated.
JavaScript
5
star
43

foursquare-places

framework agnostic wrapper for foursquare's APIs
JavaScript
5
star
44

android-map-utils

A collection of 3rd party map utility classes
4
star
45

cc-shapefiles

Scala
3
star
46

pilgrim-unity-package

Unity package which enables easy integration with Pilgrim SDK
Objective-C
3
star
47

foursquareapi-csharp

C#
3
star
48

foursquare.github.io

Foursquare open source portal
HTML
2
star
49

movementsdk-ios-spm

Movement SDK for iOS - Swift Package Manager
Swift
2
star
50

foursquare-places-api-samples

Developer Examples for using Foursquare products
HTML
2
star
51

simple-macros

MOVED - The project is still under development but this page is deprecated.
Scala
2
star
52

FSQComponents

Objective-C
2
star
53

RNPilgrimSample

Pilgrim sample app using React Native
Java
2
star
54

json-traverser

Scala
1
star
55

hackmidwest

This repo contains everything developers need to get started at Hack Midwest!
1
star
56

MovementSdk-CocoaPods-Beta

Private CocoaPods Spec repo for the Movement SDK
Ruby
1
star
57

finagle-dual

Support thrift and HTTP on same port with Finagle
Scala
1
star
58

movement-sdk-react-native

React native wrapper for the Movement SDK
Objective-C
1
star
59

pilgrimsdk-adobe-extension

The pilgrim adobe extension
Kotlin
1
star
60

public-model-resources

Jupyter Notebook
1
star
61

mobbing-interview-python

Used by the Security & Quality Team for interviews
1
star
62

RNMovementSample

Movement SDK sample app using React Native
Java
1
star
63

Pilgrim-CocoaPods-Beta

Public cocoapods spec repo for Pilgrim SDK beta builds
Ruby
1
star
64

pilgrim-ios-spm

Pilgrim SDK for iOS - Swift Package Manager
Swift
1
star
65

alertmon

Foursquare's homegrown production alerting platform
Python
1
star
66

commons-old

Temporary duplicate of foursquare/commons (a fork of twitter/commons) while we restructure things.
Java
1
star