• Stars
    star
    195
  • Rank 199,374 (Top 4 %)
  • Language
    Rust
  • Created over 5 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Analyzing the Green Line Extension with OpenStreetMap

Analyzing the Green Line Extension with OpenStreetMap

This presentation chronicles my attempts to use Rust to understand how the Green Line Extension will affect commutes in Somerville.

Transit times after

This map shows relative transit times to downtown.

About the GLX

The Green Line Extension will add seven new stations to the north end of the MBTA green line. It will cost about $3 billion and is slated for completion about two years from now.

Planned Green Line stations

Map Courtesy Wikipedia user Pi.1415926535, CC-BY-SA 3.0

I want to look at how much the Green Line will change transit times from Somerville to downtown Boston.

πŸƒ Yak Shaving πŸƒ

I'd like to analyze transit times... so I clearly need to write an OpenStreetMap file ingester and a GPU-based map renderer!?

  • Ingester
  • Renderer
  • Glue

OpenStreetMap

OpenStreetMap is like Wikipedia for geographical data. Besides literally streets, it contains over 5 billion buildings, parks, train tracks, trees, benches, and much more. This data set powers many maps that you see around the Internet.

Getting the data

Geofabrik GmbH provides downloads of various regions, down to U.S. states. I chose to download the Massachusetts data in .pbf format, The data is is encoded with Google's protocol buffers, a binary encoding format.

OSM technical

I make use of two OSM concepts:

  • A node is basically a point in space with a lat/long
  • A way is a list of nodes, e.g. road, building, really anything that's not a point

Interpreting the data

Nodes

Data:

  • ID
  • latitude + longitude
  • Key-value tags, e.g. amenity: vending_machine, vending: public_transport_tickets

Ways

Data:

  • ID
  • List of nodes
  • Key-value tags, e.g. highway: tertiary, name: Pearl Street
  • Attribution

OSM PBF format

This is a protocol buffer format for efficiently storing OSM data.

It uses a few neat tricks:

  • Splitting each file into blocks to allow parallel processing
  • Storing data in columnar formats
  • Referring to nodes and strings by ID (string interning)
  • Delta-encoding sequences of ints (saves space)

Splitting into blocks

The file is organized in triplets:

- #1 block header size in bytes
- #1 block header
- #1 block
- #2 block header size in bytes
- #2 block header
- #2 block
...
message BlobHeader {
  required string type = 1;     <- There are multiple types of blocks
  ...
  required int32 datasize = 3;  <- The size of the upcoming block
}

Columnar data in protobuf

A new trick for me! Each array is the same length:

message DenseNodes {
   repeated sint64 id = 1 [packed = true];

   optional DenseInfo denseinfo = 5;

   repeated sint64 lat = 8 [packed = true];
   repeated sint64 lon = 9 [packed = true];

   // Alternates key, value, key, value, ...
   repeated int32 keys_vals = 10 [packed = true]; 
}

Since proto3, scalar numeric fields are packed by default.

Delta-encoding IDs

Original data: 100, 101, 103, 104

Delta-encoded: 100, 1, 2, 1

Why is this good? With protobuf varints, the space required to encode an integer is proportional to the log of its size. Since OSM contains over 5 billion nodes, we're probably encoding most ints in 1 byte instead of 5 bytes, saving 80% space!

Ingesting the data

I made my own ingester for OSM PBF data! Not sure if that was a good idea.

Challenges:

  • The OSM PBF format is a little chaotic, having evolved over time
  • Keeping up with the performance tricks
  • I needed many layers of mapping, filtering and flat mapping

Yeah, it probably wasn't a good idea. Although I had trouble using the osmpbf-reader crate, I'd like to try the osmpbf crate.

Optimizing ingestion

Because the file is split into blocks, I was able to decode each block individually, resulting in a huge speedup.

The actual analysis

My model of a typical commute is:

  1. Walk to the station
  2. Wait for the train
  3. Train goes downtown

In this model, each resident chooses the station that will result in the lowest total transit time.

Possible extensions:

  • Other destinations. For example, Harvard, MIT, and Kendall are huge employment centers.
  • Consider other modes of transport
  • Analyze a larger geographic region

Walking Times

"As the crow flies" with a constant penalty to account for the fact that people don't fly.

I expect this to introduce some error, especially near difficult-to-pass objects like I-93 and McGrath Highway.

Possible extension: pathfinding with actual sidewalk data

Transit time from station to downtown

I have defined "downtown" as, "Park Street or Downtown Crossing, whichever is closer."

Existing lines: use schedules, test calibration by hand

GLX: measure distance between stations, comparing to existing D branch of the Green Line. I'm assuming that the trains will move at the same average speed, 19 MPH, because the density of stations is not too far off (thanks JBR!).

After doing this, I found a copy of the 2009 Draft Environmental Impact Report which thankfully confirmed the sanity of these estimates.

Estimated Transit Times

Measured in minutes to Lechmere.

  • Lechmere: 0
  • Union: 3 (DEIR: 4.5)
  • E. Somerville: 3 (no DEIR)
  • Gilman: 5 (DEIR: 5)
  • Magoun: 6 (DEIR: 7)
  • Ball: 9 (DEIR: 8.5)
  • College Ave. 10 (DEIR: 10.25)

Wait times

I also estimated the typical wait time between trains at each station.

On the low end, Lechmere will have an expected wait only of 2 minutes since it's on both Green Line branches. On the high end, the orange line is more like 9 minutes.

Possible extension: incorporate subway on-time performance

Finding the best station for a given location

Currently: simply try each station for the given location and use the best one. This isn't always the closest station: for example, if you're halfway between Porter and Magoun, traveling through Porter will be a bit faster.

Possible extension: use heuristics to consider fewer stations

Rendering a map

Goals: produce useful and informative maps of the Somerville area.

Two tricky problems here:

  1. What do I want to draw?
  2. How do I draw it?

What do I want to draw?

The map should:

  1. show the new, old, and/or difference in travel time, in addition to perhaps tracks, stations and other physical features relevant to transit
  2. orient me
  3. look good

Drawing with Rust

Rust's 2D graphics landscape is evolving rapidly!

A Guide to Rust Graphics Libraries, As of May 2019 (author of ggez, so this is from the perspective of games)

What graphics API? wgpu-rs because I am a library hypebeast.

wgpu-rs

Back ends: Vulkan, Metal, and DirectX

The successor to the deprecated gfx crate

Strongly inspired by Vulkan, but with a compatibility layer to work with Metal and DirectX. Higher-level than gfx-hal

Selecting colors for graphics

Selecting a good color palette is important for clarity and accessibility.

I'm using the palette crate for color space math.

Starting with CIE L*a*b* color space, designed to be perceptually uniform.

To specify colors by hand, using CIE L*C*hΒ°, a cylindrical version of that space.

Use color palettes friendly to color blind people, i.e. don't contrast red vs. green.

I was highly inspired by Color Schemes from Programming Design Systems, a free digital book by Rune Madsen.

Misc. Graphics

I used lyon to tesselate my graphics (i.e. to turn everything into triangles)

I wrote simple GLSL shaders and compiled them to Vulkan with shaderc.

Results

Transit times before

Colors:

  • Orange: 5–23 minutes
  • Red: 12–30 minutes
  • Pink: 20-38 minutes
  • Blue: 28-45 minutes

Let's use an uncertainty of 5 minutes: think about headway, train delays, and modeling inaccuracies

There is an underserved corridor in central and northern parts of Somerville.

Transit times after

Now that corridor is better-served. πŸ‘

Transit times delta

How much better-served? In this map, the data ranges from 0 to 20 minutes. The highest deltas can be seen in Union Square, Gilman Square, and southeast of Magoun Square.

Best stations

This map shows the best MBTA line for each location. This shows that a large chunk of Somerville will be "on the green line."

End

Recap

I used OpenStreetMap and wgpu-rs to analyze MBTA's upcoming Green Line Extension with Rust.

Data Pipeline

  • 1a: Deserialize OSM data with protobuf and rayon
  • 1b: Parse station data with csv
  • 2a: Translate from lat/long with geo
  • 3a: Generate shapes
  • 3b: Generate colors with palette
  • 4: Tesselate shapes with lyon
  • 5: Render map with wgpu

Acknowledgements

  • Wikipedia, as usual
  • Google Maps is a nice way to double-check results
  • OpenStreetMap

Notes

TODO before presentation

  • Be able to render all of Somerville
  • "Complete" presentation
  • Add a legend or numerical labels

TODO after release

  • Add title and legend
  • Draw background, not houses
  • Draw borders on un-drawn area
  • Use correct depth... relations?
  • Fix horizontal-vertical squashing
  • Render in whole screen

Map/design inspiration

More Repositories

1

global-data-in-rust

This guide explains how you can use "global data" in Rust
Rust
198
star
2

contiguous-data-in-rust

An opinionated guide that tries to help you choose the best way to store contiguous data in Rust
Rust
118
star
3

hmmm

Hidden Markov Models in Rust
Rust
66
star
4

rustdoc-katex-demo

This crate demonstrates an approach to including KaTeX in Rust docs
Rust
27
star
5

bitcoin-script-explorer

A tool to help programmers learn Bitcoin Script.
CSS
27
star
6

bfgs

A pure Rust implementation of the BFGS optimization algorithm
Rust
18
star
7

ndarray-csv

Easily read homogeneous CSV data into a 2D ndarray
Rust
17
star
8

future-by-example

Examples of common patterns using Future
Rust
15
star
9

bitcoin-nanopayment

A probabilistic Bitcoin nanopayments library in Node using the Bitcoin-Qt RPC API
JavaScript
14
star
10

exandria

A decentralized file sharing system that includes search
JavaScript
6
star
11

burnie

An SPV Bitcoin burn verification library built on webcoin
JavaScript
6
star
12

iced-plot

Rust
5
star
13

spin_on

Rust
5
star
14

tokio-fmt-encoder

A Tokio Encoder for Debug and Display
Rust
4
star
15

ssaas

Screen Saver as a Service
3
star
16

exploring-computation-graphs-in-rust

Rust
3
star
17

protozor

A P2P system for replicating streaming signed append-only logs
JavaScript
2
star
18

frunk-column

A prototype of columnar data storage using frunk HLists
Rust
2
star
19

burn-stream

A world-writable append-only log using burned bitcoins
JavaScript
2
star
20

swarmhub

swarmhub makes a webrtc-swarm act like a signalhub.
JavaScript
2
star
21

decentralized-cat-names

A decentralized set of cat names
JavaScript
2
star
22

pystan-cache

A PyStan wrapper that caches compiled models
Python
2
star
23

tokio-stdin

Read from stdin as a Tokio stream
Rust
1
star
24

fake-words

Python
1
star
25

rss_catalog

A collection of RSS feeds of articles organized in a machine-readable way.
Python
1
star
26

burn-name

Simple decentralized username registration by burning bitcoins
JavaScript
1
star
27

compass

Rust
1
star
28

webcoin-checkpoint-maker

Make checkpoints for your webcoin projects
JavaScript
1
star
29

error-test-sinks

Sink implementations for testing that simulate different types of errors
Rust
1
star
30

baby-bayes

Rust
1
star
31

wasm-math-experiment

Rust
1
star
32

cache-live-stream

Cache append-only live streams using levelup
JavaScript
1
star
33

burn-name-writer

Register usernames with BurnName
JavaScript
1
star
34

logistic-regression-in-rust

A toy implementation of logistic regression in Rust
Rust
1
star
35

tokio-stdout

Write to stdout as a Tokio sink
Rust
1
star