• Stars
    star
    409
  • Rank 105,709 (Top 3 %)
  • Language
    HTML
  • License
    Other
  • Created about 10 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This repository contains the code to generate predictions of critical violations at food establishments in Chicago. It also contains the results of an evaluation of the effectiveness of those predictions.

Food Inspections Evaluation

This is our model for predicting which food establishments are at most risk for the types of violations most likely to spread food-borne illness. Chicago Department of Public Health staff use these predictions to prioritize inspections. During a two month pilot period, we found that that using these predictions meant that inspectors found critical violations much faster.

You can help improve the health of our city by improving this model. This repository contains a training and test set, along with the data used in the current model.

Feel free to clone, fork, send pull requests and to file bugs. Please note that we will need you to agree to our Contributor License Agreement (CLA) in order to be able to use any pull requests.

Original Analysis and Reports

In an effort to reduce the public’s exposure to foodborne illness the City of Chicago partnered with Allstate’s Quantitative Research & Analytics department to develop a predictive model to help prioritize the city's food inspection staff. This Github project is a complete working evaluation of the model including the data that was used in the model, the code that was used to produce the statistical results, the evaluation of the validity of the results, and documentation of our methodology.

The model evaluation calculates individualized risk scores for more than ten thousand Chicagoland food establishments using publically available data, most of which is updated nightly on Chicago’s data portal. The sole exception is information about the inspectors.

The evaluation compares two months of Chicago’s Department of Public Health inspections to an alternative data driven approach based on the model. The two month evaluation period is a completely out of sample evaluation based on a model created using test and training data sets from prior time periods.

The reports may be reproduced compiling the knitr documents present in ./REPORTS.

REQUIREMENTS

All of the code in this project uses the open source statistical application, R. We advise that you use R version >= 3.1 for best results.

Ubuntu users may need to install libssl-dev, libcurl4-gnutls-dev, and libxml2-dev. This can be accomplished by typing the following command at the command line: sudo apt-get install libssl-dev libcurl4-gnutls-dev libxml2-dev

The code makes extensive usage of the data.table package. If you are not familiar with the package, you might want to consult the data.table [FAQ available on CRAN] (http://cran.r-project.org/web/packages/data.table/vignettes/datatable-faq.pdf).

FILE LAYOUT

The following directory structure is used:

DIRECTORY DESCRIPTION
. Project files such as README and LICENSE
./CODE/ Sequential scripts used to develop model
./CODE/functions/ General function definitions, which could be used in any script
./DATA/ Data files created by scripts in ./CODE/, or static
./REPORTS/ Reports and other output are located in

We have included all of the steps used to develop the model, evaluate the results, and document the results in the above directory structure.

The scripts located in the ./CODE/ folder are organized sequentially, meaning that the numeric prefix indicates the order in which the script was / should be run in order to reproduce our results.

Although we include all the necessary steps to download and transform the data used in the model, we also have stored a snapshot of the data in the repository. So, to run the model as it stands, it is only necessary to download the repository, install the dependencies, and step through the code in CODE/30_glmnet_model.R. If you do not already have them, the dependencies can be installed using the startup script CODE/00_Startup.R.

DATA

Data used to develop the model is stored in the ./DATA directory. Chicago’s Open Data Portal. The following datasets were used in the building the analysis-ready dataset.

Business Licenses
Food Inspections 
Crime
Garbage Cart Complaints
Sanitation Complaints
Weather
Sanitarian Information

The data sources are joined to create a tabular dataset that paints a statistical picture of a ‘business license’- The primary modelling unit / unit of observation in this project.

The data sources are joined (in SQLesque manner) on appropriate composite keys. These keys include Inspection ID, Business License, and Geography expressed as a Latitude / Longitude combination among others.

Acknowledgements

This research was conducted by the City of Chicago with support from the Civic Consulting Alliance, and Allstate Insurance. The City would especially like to thank Stephen Collins, Gavin Smart, Ben Albright, and David Crippin for their efforts in developing the predictive model. We also appreciate the help of Kelsey Burr, Christian Hines, and Kiran Pookote in coordinating this research project. We owe a special thanks to our volunteers from Allstate who put in a tremendous effort to develop the predictive model and allowing their team to volunteer for projects to change their city. This project was partially funded by an award from the Bloomberg Philanthropies' Mayors Challenge.

More Repositories

1

opengrid

A user-friendly, map-based tool to combine and explore real-time or historical data.
JavaScript
247
star
2

RSocrata

Provides easier interaction with Socrata open data portals http://dev.socrata.com. Users can provide a 'Socrata' data set resource URL, or a 'Socrata' Open Data API (SoDA) web query, or a 'Socrata' "human-friendly" URL, returns an R data frame. Converts dates to 'POSIX' format. Manages throttling by 'Socrata'.
R
236
star
3

osd-bike-routes

Open source release of bike routes in Chicago.
141
star
4

osd-street-center-line

Open source release of street center lines in Chicago.
R
108
star
5

open-data-etl-utility-kit

Use Pentaho's open source data integration tool (Kettle) to create Extract-Transform-Load (ETL) processes to update a Socrata open data portal. Documentation is available at http://open-data-etl-utility-kit.readthedocs.io/en/stable
Shell
95
star
6

metalicious

An open source data dictionary which can be deployed to track the metadata of one or more databases.
PHP
65
star
7

osd-building-footprints

Open source release of building footprints in Chicago.
R
62
star
8

clear-water

Forecasting elevated levels of E. coli at Chicago beaches to provide proper warning to beach-goers.
R
55
star
9

windy

Ruby
43
star
10

climatechangeisreal

Republish the climate change websites removed by the EPA onto your own site. Amplify the dangers of climate change.
HTML
35
star
11

osd-bike-racks

Open source release of city bike rack locations throughout Chicago.
R
32
star
12

osd-pedway-routes

Open source release of pedway routes in Chicago.
R
26
star
13

dev.cityofchicago.org

Developer resources provided by the City of Chicago and sister agencies and get technical updates from the developer blog.
JavaScript
23
star
14

west-nile-virus-predictions

Algorithm to predict repeated positive results for West Nile Virus for mosquitoes captured in traps across Chicago.
R
14
star
15

vision-zero-dashboard

Vision Zero Dashboard
R
14
star
16

open-data-annual-report-2013

Source code driving the online version of City of Chicago's Open Data Annual report.
CSS
11
star
17

design-cds-bootstrap

The Chicago Design System as a Bootstrap 4 theme
JavaScript
8
star
18

food-deserts

7
star
19

clear-water-app

Generate water quality predictions for Chicago beaches, and upload to https://data.cityofchicago.org/Parks-Recreation/Beach-E-coli-Predictions/xvsz-3xcj
R
7
star
20

smart-data-platform

The Smart Data Platform is a platform which helps automate predictive analytics for use within cities and display it without requiring a background in data science or statistics.
7
star
21

design-system-site

This site is an example site for Chicago Digital Web Sites. It uses the latest public version of the Chicago Design System theme on our CDN.
HTML
6
star
22

journal-chicago

An "RMarkdown HTML theme" for a bootstrapped, modern, clean interface intended for technical writings. This design is a modification of the Bootswatch Journal theme.
CSS
6
star
23

buildings

A project to create a simple interface for visualizing building data.
R
5
star
24

open311-api-docs

API documentation for the City of Chicago Open311 system.
HTML
5
star
25

Census2020-redistricting

HTML
4
star
26

design-projects

A repository listing new & current design projects.
4
star
27

opengrid-svc-template

Serivce layer that permits interaction between OpenGrid user interface and JSON data sources, just as Mongo.
Java
3
star
28

chicago-opengrid-io

JavaScript
2
star
29

tweetstreamwrapper

Wrapper for python tweetstream library; handles connection failures, makes it easier to put tweets in mongo / redis / etc. Definitely under construction.
Python
2
star
30

Google-Translate-API-Demo

Using the v2 Google Translate API to translate a web page in the browser.
HTML
2
star
31

census2020_ward_rpt

Census 2020 Report
HTML
2
star
32

tnp-reporting-manual

Chicago data reporting rules for Transportation Network Providers
SCSS
2
star
33

digital.chicago.gov

People, process, and product news about digital service delivery to Chicagoans.
HTML
2
star
34

resilient.chicago.gov

The Resilient Chicago: A Plan for Inclusive Growth and a Connected City microsite.
CSS
2
star
35

predicting-e-coli-concentrations

This repository is part of the working draft for an upcoming an academic paper describing the methods and results of the City of Chicago Clear Water project.
TeX
2
star
36

SGIM_Results

The Sustainable Green Infrastructure Monitoring (SGIM) project uses sensors, provided by Opti, that measures water diverted into the ground. This repo contains the ETL code which updates Chicago's open data portal. It can be reused to connect Opti sensors to Socrata open data portals. Depends on the "open-data-etl-utility-kit" repo.
Shell
2
star
37

cds-reference

A reference site to test the design-cds-bootstrap theme.
HTML
2
star
38

contributor-license-agreement

Standard text used by the City of Chicago for a contributor license agreement (CLA).
1
star
39

design-alpha-alpha.chicago.gov

A markup experiment in converting cityofchicago.org content to CDS design styles.
HTML
1
star
40

windy-python

1
star
41

design.chicago.gov

The Chicago Design System is a guide for producing delightful technology products and services for the residents of the City of Chicago.
HTML
1
star
42

lead-safe-api-docs

HTML
1
star
43

windy-node

1
star
44

risk-based-model-to-predict-food-inspection-critical-violations

TeX
1
star
45

census2020

Placeholder website for the 2020 census
HTML
1
star
46

patternlab

Pattern Lab 2 instance for Chicago Design System (CDS)
JavaScript
1
star