• Stars
    star
    114
  • Rank 308,031 (Top 7 %)
  • Language
    JavaScript
  • Created over 11 years ago
  • Updated almost 11 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Statistical models and webapp for predicting when bikeshare stations will be empty or full.

Predictive bikeshare rebalancing

Statistical models and app for predicting when bikeshare stations will be empty or full in Washington DC and someday Chicago. The app is live at bikeshare.dssg.io.

This project is a part of the 2013 Data Science for Social Good fellowship, in partnership with Divvy and the Chicago Department of Transportation.

For a quick and gentle overview of the project, check out our blog post.

The problem: bikeshare rebalancing

The City of Chicago just launched Divvy, a new bike sharing system designed to connect people to transit, and to make short one-way trips across town easy. Bike sharing is citywide bike rental - you can take a bike out at a station on one street corner and drop it off at another.

Bike sharing systems share a central flaw: because of commuting patterns, bikes tend to pile up downtown in morning and on the outskirts in the afternoon. This imbalance can make using bikeshare difficult, because people canโ€™t take out bikes from empty stations, or finish their rides at full stations.

To prevent this problem, bikeshare operators drive trucks around to reallocate bikes from full stations to empty ones. In bikeshare parlance, this is called rebalancing.

Right now, they do this by looking at the current number of bikes at each station - not how many will be there in an hour or two.

Weโ€™re working with the City of Chicagoโ€™s Department of Transportation to make bikeshare rebalancing more proactive: by analyzing weather and bikeshare station trends, we can predict how many bikes are likely to be at each Divvy station in the future.

However, since there's not much bike sharing data for Chicago yet, we're first developing predictive models for Capital Bikeshare, Washington DC's bike sharing system.

Read more about bikeshare rebalancing on our blog

The solution: Poisson regression

To predict the number of bikes at bike share stations in DC, we're using Poisson regression, a statistical technique useful for modeling counts.

Specifically, we take the current time of day, day of week, month, and weather as inputs into our model, and try to predict the number of bike arrivals and departures we expect to see at a given bike share station over the next 60 minutes. We subtract departures from arrivals to find the net change in bikes over the hour, and add this change to the current number of bikes to get our predicted bikes at the station in 60 minutes.

We do this for every station in DC's bikeshare system, and display the resulting predictions in a human-friendly web app.

Read more about our statistical model in the wiki

The data: real-time bikeshare station availability and weather

Alta bikeshare - the company that runs the bikeshare systems in Boston, Washington DC, New York, Chicago, and others - publishes real-time bike availability data for these cities through an API.

Every minute or two, the API reports the number of bikes and docks available at each bikeshare station in the city's system:

{
	"id":17,
	"stationName": "Wood St & Division St",
	"location": "1802 W. Divison St",
	"availableBikes": 6,
	"availableDocks": 9,
	"totalDocks": 15,
	"latitude": 41.90332,
	"longitude": -87.67273,		
	"statusValue": "In Service",
}

We're using historical bike availability data for DC - courtesy of urban researcher Oliver O'Brien - and historical weather data from Forecast.io to fit our Poisson model.

To make predictions, we get real-time bike availability and weather data from Alta's DC API and Procure.io, and plug these inputs into our model.

Read more about how we're getting data in the wiki

Project layout

There are three components to the project:

A database storing historical bikeshare and weather data

Thanks to Oliver O'Brien, we've got historical data on the number of bikes and docks available at every bikeshare station in DC and Boston since their systems launched. We're storing this data in postgreSQL database, and updating it constantly by hitting Atla's real-time bikeshare APIs. Read our wiki for more detail on these data sources.

Scripts to build the database, load historical data into it, and add real-time data to it are in the data and scrapers folders. The database updates every minute using a cron job that you need schedule on your own machine.

A model that uses this data to predict future number of bikes

The Poisson model lives in model. There's also a binomial logistic model we implemented in there. Exploratory data analysis that informed the model choice lives in analysis.

There are scripts in model/possion that crunch the historical data in the database to estimate the model's parameters, and others that use the model to predict by consuming these parameters, fetching real-time model inputs from the database, and spitting out predictions. We also have model validation scripts that measure our model's predictive accuracy.

A simple webapp that displays the model's predictions

The webapp is currently live at bikeshare.dssg.io.

The app, which uses flask and bootstrap, lives in web. We use MapBox.js for mapping. Simply run python app.py to deploy the application on localhost.

To install either needed python dependencies, clone the project and run pip install -r requirements.txt

Installation

To get the project running locally, first to clone the repo:

git clone https://github.com/dssg/bikeshare
cd bikeshare/

Database Configuration

You will need a working PostgreSQL 9.x series install. Once you have that, run data/create_db.sql to create all the appropriate tables.

Scraper Configuration

We use several scrapers to populate the data in the database. Inside scrapers there is more detailed install instructions and example crontabs. You will need a forecast.io API key. Historical data will be made avalible shortly.

Webapp Installation

To the run the flash web app, you'll need to create a new python virtual environment, install needed python modules using pip, and run the flask server:

cd bikeshare/web
virtualenv ./
. bin/activate
pip install -r requirements.text
python web/app.py

To deploy the webapp in a production environment, use Gunicorn & nginx web servers.

Team

Team

Contributing to the project

To get involved, please check the issue tracker.

To get in touch, email the team at [email protected].

License

Copyright (C) 2013 Data Science for Social Good Fellowship at the University of Chicago

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

More Repositories

1

hitchhikers-guide

The Hitchhiker's Guide to Data Science for Social Good
Jupyter Notebook
1,000
star
2

aequitas

Bias Auditing & Fair ML Toolkit
Python
683
star
3

triage

General Purpose Risk Modeling and Prediction Toolkit for Policy and Social Good Problems
Jupyter Notebook
186
star
4

mlforpublicpolicylab

Repo for ML for Public Policy Lab course at CMU
Jupyter Notebook
104
star
5

MLforPublicPolicy

Class resources for CAPP 30254 (Machine Learning for Public Policy)
Jupyter Notebook
104
star
6

data-science-101

Methods, tools, tips, and tricks for anyone interested in getting started doing data science for the social good.
Jupyter Notebook
95
star
7

energywise

An energy analytics tool to make commercial building more energy efficient
Python
77
star
8

fairness_tutorial

Hands-on tutorial on ML Fairness
Jupyter Notebook
69
star
9

student-early-warning

Using machine learning to predict high school dropouts
R
67
star
10

MLinPractice

Repository for ML in Practice Course at CMU (10-718)
Jupyter Notebook
56
star
11

tweedr

A machine learning API to analyze tweets during disasters.
JavaScript
54
star
12

police-eis

DSaPP police early intervention system: using machine learning to predict adverse incidents
Python
50
star
13

wikienergy

Git repo for Wiki Energy project
Jupyter Notebook
46
star
14

411-on-311

Exploratory analysis and predictive models of how Chicago's neighborhoods interact with the City's 311 service requests.
Python
44
star
15

pgdedupe

A simple command line interface to the datamade/dedupe library.
Jupyter Notebook
42
star
16

policy_diffusion

Tracing policy ideas from think tanks and lobbyists through state legislative bills
Python
42
star
17

givinggraph

An API tool to help understand the relationships between non-profits, for-profits, and the causes they support.
Python
28
star
18

ushine-learning

An API that uses machine learning to help the Ushahidi nonprofit do smarter crisis crowdsourcing.
Python
24
star
19

repo-scraper

Search for potential passwords/data leaks in a folder or git repo
Python
23
star
20

cta-sim

Big data simulation of Chicago's public transportation to improve transit planning and reduce bus crowding
CSS
23
star
21

census-communities-usa

Mapping and analyzing local business data from the Census Bureau.
Python
21
star
22

usal_echo_public

Automate process for view classification of the Apical 4 chamber, Apical 2 chamber and Parasternal long axis. Segmentation of the Apical 4 chamber and Apical 2 chamber. Calculate measurements of the Ejection Fraction of the heart to classify it as normal, abnoral or grayzone.
Python
21
star
23

syracuse_public

Python
21
star
24

jakarta_smart_city_traffic_safety_public

Identifying traffic-safety issues in CCTV footage
Jupyter Notebook
20
star
25

data-challenges

A repository of real-world data challenges faced by organizations used for project-based learning
19
star
26

match.edu

Predictive models to identify high-achieving high school students who are likely to undermatch - attend 2-year rather than 4-year colleges, or not go to college at all.
R
18
star
27

dsapp-reading-group

Proceedings of the Center for Data Scientists arguing (about) Public Papers
17
star
28

cta-otp

OpenTripPlanner tool and transit mobility maps for Chicago
Java
17
star
29

diogenes

Searching for an honest classifier
Python
17
star
30

ohio

Python I/O extras
Python
17
star
31

dirtyduck

A Guided Tour of Triage
CSS
16
star
32

eights

Data Science template with focus on prewritten workflows
Python
14
star
33

Random_Forest_Imputer

Automatic missing value imputation using random forests
Python
14
star
34

growth-curves

Statistical models of children's growth curves that predict which kids are at risk of obesity.
Python
14
star
35

argcmdr

Thin argparse wrapper for quick, clear and easy declaration of hierarchical console command interfaces
Python
13
star
36

learning

What fellows are learning about data problems and tools
Python
13
star
37

data-portal-treemap

Chicago Data Portal (data.cityofchicago.org) tree map
Python
13
star
38

DSaPP_RA_Project

This repository includes an exercise for aspiring DSaPP volunteers and research assistants to complete
12
star
39

dssg-training-workshop-2015

Main site for DSSG Training 2015
HTML
11
star
40

acs2pgsql

Download American Community Survey data and put it into a Postgres database
Shell
11
star
41

air_pollution_estimation

Jupyter Notebook
10
star
42

UPSG

A set of tools and conventions to help data scientists share code
Python
10
star
43

streetlights-crime

Statistical models to find whether Chicago street light outages are associated with increased crime
R
10
star
44

predicting_student_enrollment_public

Statistical models and analysis of student enrollment in Chicago Public Schools
R
9
star
45

dssg-manual

This repository contains the Eric & Wendy Schmidt Data Science for Social Good Fellowship Manual
9
star
46

cincinnati

DSaPP project with the City of Cincinnati. Building upon the DSSG15 project
Python
8
star
47

memphis-public

Public repository for the DSSG Memphis project
R
8
star
48

peeps-chili

Ethics with a side of chili. And peeps.
Jupyter Notebook
8
star
49

land-bank

Analytics tool to help the Cook County Land Bank acquire vacant and abandoned properties strategically.
JavaScript
8
star
50

johnson-county-ddj-public

Python
7
star
51

matching-tool

Integrating HMIS and criminal-justice data
Python
7
star
52

dssg2017-text_analysis

Text Analysis Tutorial for DSSG 2017 Conference
Jupyter Notebook
7
star
53

randomize_your_data

Randomize the order of each column to help check for leakage
Jupyter Notebook
7
star
54

timechop

generate time splits for temporal validation
Python
5
star
55

weather2pgsql

Download NOAA weather for a user-specified US state
Shell
5
star
56

tyra

Prediction model evaluation
JavaScript
4
star
57

barefoot-winnie-public

Recommending responses to law related queries - Built during DSSG in collaboration with Barefoot Law
Python
4
star
58

hylas

Webapp for visualizing ML'd data
JavaScript
4
star
59

innovation-ecosystems

Understanding city innovation hotspots using the Census CitySDK
CSS
4
star
60

healthleads-public

The public repo for the 2014 DSSG Health Leads project
Python
4
star
61

project_template

A template for a sample DSSG project.
Shell
4
star
62

solveforgood-wri

Jupyter Notebook
4
star
63

babies-public

This is the publicly available version of the babies repo, containing code used during our project with the Illinois Department of Human Services to predict and reduce adverse births in Illinois.
Python
4
star
64

stupid-csv-tricks

Code for doing slightly atypical things with CSVs
Python
4
star
65

catwalk

Training, testing, and evaluating machine learning classifier models
Python
3
star
66

lorax

Speaks for the trees by providing individual feature importances from random forests.
Python
3
star
67

hiv-retention-public

Jupyter Notebook
3
star
68

homelessness-public

Python
3
star
69

marketplace

Code for the Solve for Good platform run by the DSSG Foundation
Python
3
star
70

rws_accident_prediction_public

rws_accident_prediction
Jupyter Notebook
3
star
71

cincinnati2015-public

Predicting blight in Cincinnati
Python
3
star
72

mexico-public

Public facing Mexico repository
R
3
star
73

audition

Choosing the best classifier models
Python
3
star
74

dssg-public-hmda

R
3
star
75

install-cli

Bash library for guided installation & bootstrapping
Shell
3
star
76

sklearn_tutorial

Short tutorial on some pipeline issues
Jupyter Notebook
3
star
77

machine_learning_legislation

Automatically identify earmarks in congressional spending bills
OpenEdge ABL
3
star
78

acdhs_housing_public

Python
3
star
79

nfp

Impact evaluation of the Nurse-Family Partnership nonprofit
R
3
star
80

MS2Postgres

A tool to move data from SQL Server to PostgreSQL in an environment with limited harddrive space.
Python
3
star
81

EDF

Analysis of energy efficiency loan data for the Environmental Defense Fund.
3
star
82

check-for-secrets

Discovering Secrets analysts Possibly Pushed
Shell
2
star
83

el-salvador-mined-public

Reducing Early School Dropout Rates in El Salvador
Jupyter Notebook
2
star
84

after-hours

R
2
star
85

education-highschool-public

DSSG 2015 project focused on using data science methods to help partner public school districts improve their respective high school graduation rates and outcomes.
HTML
2
star
86

passenv

Shell command like env to run a program in an environment modified by values read from standard input
Python
2
star
87

chile-dt-public

Improving Workplace Safety through Proactive Inspections in Chile
R
2
star
88

cincinnati_ems_public

Jupyter Notebook
2
star
89

tuscany-tourism-public

Data-Driven Planning for Sustainable Tourism in Tuscany
HTML
2
star
90

architect

Plan, design and build train and test matrices
Python
2
star
91

panopticon

The command center at the DSSG office. http://en.wikipedia.org/wiki/Panopticon
Ruby
2
star
92

pakistan_ihhn_public

Public Repository for the DSSG 2022 Pakistan IHHN Project
Python
2
star
93

data-sci-fellows

The sexy landing page that will make everyone want to apply for the fellowship.
JavaScript
2
star
94

donors-choose

Jupyter Notebook
2
star
95

obscuritext

Transform text to be unreadable but still somewhat useful
Jupyter Notebook
2
star
96

dojo_mh_public

Public Repository for DSSG 2022 Douglas and Johnson Counties, KS Mental Health Project
Jupyter Notebook
2
star
97

baltimore_roofs_public

Public Repository for DSSG 2022 Baltimore Roofs Project
Python
2
star
98

sanergy-public

Python
2
star
99

signalled-timeout

Timeout library for generic interruption of main thread by an exception after a configurable duration.
Python
2
star
100

appy-reviews

A "smart" Web application for reviewing DSSG program application submissions
Python
2
star