Synopsis
This repository contains code and jupyter notebooks with machine learning algorithms for working with GPS trajectories. It will be used during the Machine Learning hackathon of IotTechDay2017.
The dataset used is the popular GeoLife GPS Trajectories
We have already processed this dataset, so that each trajectory (which only contains lat, long, timestamp) is enriched with velocity, acceleration and modality information.
This processed data can be downloaded from google drive (size 3.7 GB). It is also available in zipped format (size 0.9 GB)
For the classification and clustering part, only the metadata files are necessary. These contain aggregated data per trajectory (such as average velocity, average acceleration etc). These metadata files are much smaller in size and can be downloaded from google drive (1.5 MB zipped) and dropbox (3.6 MB unzipped)
Main Contributors:
Tasks
-
- How can we load GPS trajectories in a proper way so that it will be easier to work with in the future.
-
- Supervised Machine Learning: Build a classifier which can automatically detect the transportation mode of the trajectories (walking, bicycle, car etc).
-
- Unsupervised Learning; Clustering of the GPS trajectories by using auto-encoders and recurrent neural networks.
-
- GeoSpatial analysis of the GPS trajectories; Analysis and visualization of the taken routes (does the popularity of a route affect the traffic? What are the points of interest e.g., restaurants, stores, hotels, etc. )
Notebooks
We have provided some notebooks, which should give you a flying start, but feel free to do everything your own way.
Possible relevant datasets
-
Bus routes and stops of Beijing -> contains 1.543 bus routes and 42.161 stops
-
Beijing check-in records from Sina -> contains 868 m check-ins for all 143,576 venues.