Data Science for Social Good (@dssg)

Top repositories

1

hitchhikers-guide

The Hitchhiker's Guide to Data Science for Social Good
Jupyter Notebook
1,000
star
2

aequitas

Bias Auditing & Fair ML Toolkit
Python
683
star
3

triage

General Purpose Risk Modeling and Prediction Toolkit for Policy and Social Good Problems
Jupyter Notebook
186
star
4

bikeshare

Statistical models and webapp for predicting when bikeshare stations will be empty or full.
JavaScript
114
star
5

mlforpublicpolicylab

Repo for ML for Public Policy Lab course at CMU
Jupyter Notebook
104
star
6

MLforPublicPolicy

Class resources for CAPP 30254 (Machine Learning for Public Policy)
Jupyter Notebook
104
star
7

data-science-101

Methods, tools, tips, and tricks for anyone interested in getting started doing data science for the social good.
Jupyter Notebook
95
star
8

energywise

An energy analytics tool to make commercial building more energy efficient
Python
77
star
9

fairness_tutorial

Hands-on tutorial on ML Fairness
Jupyter Notebook
69
star
10

student-early-warning

Using machine learning to predict high school dropouts
R
67
star
11

MLinPractice

Repository for ML in Practice Course at CMU (10-718)
Jupyter Notebook
56
star
12

tweedr

A machine learning API to analyze tweets during disasters.
JavaScript
54
star
13

police-eis

DSaPP police early intervention system: using machine learning to predict adverse incidents
Python
50
star
14

wikienergy

Git repo for Wiki Energy project
Jupyter Notebook
46
star
15

411-on-311

Exploratory analysis and predictive models of how Chicago's neighborhoods interact with the City's 311 service requests.
Python
44
star
16

pgdedupe

A simple command line interface to the datamade/dedupe library.
Jupyter Notebook
42
star
17

policy_diffusion

Tracing policy ideas from think tanks and lobbyists through state legislative bills
Python
42
star
18

givinggraph

An API tool to help understand the relationships between non-profits, for-profits, and the causes they support.
Python
28
star
19

ushine-learning

An API that uses machine learning to help the Ushahidi nonprofit do smarter crisis crowdsourcing.
Python
24
star
20

repo-scraper

Search for potential passwords/data leaks in a folder or git repo
Python
23
star
21

cta-sim

Big data simulation of Chicago's public transportation to improve transit planning and reduce bus crowding
CSS
23
star
22

census-communities-usa

Mapping and analyzing local business data from the Census Bureau.
Python
21
star
23

usal_echo_public

Automate process for view classification of the Apical 4 chamber, Apical 2 chamber and Parasternal long axis. Segmentation of the Apical 4 chamber and Apical 2 chamber. Calculate measurements of the Ejection Fraction of the heart to classify it as normal, abnoral or grayzone.
Python
21
star
24

syracuse_public

Python
21
star
25

jakarta_smart_city_traffic_safety_public

Identifying traffic-safety issues in CCTV footage
Jupyter Notebook
20
star
26

data-challenges

A repository of real-world data challenges faced by organizations used for project-based learning
19
star
27

match.edu

Predictive models to identify high-achieving high school students who are likely to undermatch - attend 2-year rather than 4-year colleges, or not go to college at all.
R
18
star
28

dsapp-reading-group

Proceedings of the Center for Data Scientists arguing (about) Public Papers
17
star
29

cta-otp

OpenTripPlanner tool and transit mobility maps for Chicago
Java
17
star
30

diogenes

Searching for an honest classifier
Python
17
star
31

ohio

Python I/O extras
Python
17
star
32

dirtyduck

A Guided Tour of Triage
CSS
16
star
33

eights

Data Science template with focus on prewritten workflows
Python
14
star
34

Random_Forest_Imputer

Automatic missing value imputation using random forests
Python
14
star
35

growth-curves

Statistical models of children's growth curves that predict which kids are at risk of obesity.
Python
14
star
36

argcmdr

Thin argparse wrapper for quick, clear and easy declaration of hierarchical console command interfaces
Python
13
star
37

learning

What fellows are learning about data problems and tools
Python
13
star
38

data-portal-treemap

Chicago Data Portal (data.cityofchicago.org) tree map
Python
13
star
39

DSaPP_RA_Project

This repository includes an exercise for aspiring DSaPP volunteers and research assistants to complete
12
star
40

dssg-training-workshop-2015

Main site for DSSG Training 2015
HTML
11
star
41

acs2pgsql

Download American Community Survey data and put it into a Postgres database
Shell
11
star
42

air_pollution_estimation

Jupyter Notebook
10
star
43

UPSG

A set of tools and conventions to help data scientists share code
Python
10
star
44

streetlights-crime

Statistical models to find whether Chicago street light outages are associated with increased crime
R
10
star
45

predicting_student_enrollment_public

Statistical models and analysis of student enrollment in Chicago Public Schools
R
9
star
46

dssg-manual

This repository contains the Eric & Wendy Schmidt Data Science for Social Good Fellowship Manual
9
star
47

cincinnati

DSaPP project with the City of Cincinnati. Building upon the DSSG15 project
Python
8
star
48

memphis-public

Public repository for the DSSG Memphis project
R
8
star
49

peeps-chili

Ethics with a side of chili. And peeps.
Jupyter Notebook
8
star
50

land-bank

Analytics tool to help the Cook County Land Bank acquire vacant and abandoned properties strategically.
JavaScript
8
star
51

johnson-county-ddj-public

Python
7
star
52

matching-tool

Integrating HMIS and criminal-justice data
Python
7
star
53

dssg2017-text_analysis

Text Analysis Tutorial for DSSG 2017 Conference
Jupyter Notebook
7
star
54

randomize_your_data

Randomize the order of each column to help check for leakage
Jupyter Notebook
7
star
55

timechop

generate time splits for temporal validation
Python
5
star
56

weather2pgsql

Download NOAA weather for a user-specified US state
Shell
5
star
57

tyra

Prediction model evaluation
JavaScript
4
star
58

barefoot-winnie-public

Recommending responses to law related queries - Built during DSSG in collaboration with Barefoot Law
Python
4
star
59

hylas

Webapp for visualizing ML'd data
JavaScript
4
star
60

innovation-ecosystems

Understanding city innovation hotspots using the Census CitySDK
CSS
4
star
61

healthleads-public

The public repo for the 2014 DSSG Health Leads project
Python
4
star
62

project_template

A template for a sample DSSG project.
Shell
4
star
63

solveforgood-wri

Jupyter Notebook
4
star
64

babies-public

This is the publicly available version of the babies repo, containing code used during our project with the Illinois Department of Human Services to predict and reduce adverse births in Illinois.
Python
4
star
65

stupid-csv-tricks

Code for doing slightly atypical things with CSVs
Python
4
star
66

catwalk

Training, testing, and evaluating machine learning classifier models
Python
3
star
67

lorax

Speaks for the trees by providing individual feature importances from random forests.
Python
3
star
68

hiv-retention-public

Jupyter Notebook
3
star
69

homelessness-public

Python
3
star
70

marketplace

Code for the Solve for Good platform run by the DSSG Foundation
Python
3
star
71

rws_accident_prediction_public

rws_accident_prediction
Jupyter Notebook
3
star
72

cincinnati2015-public

Predicting blight in Cincinnati
Python
3
star
73

mexico-public

Public facing Mexico repository
R
3
star
74

audition

Choosing the best classifier models
Python
3
star
75

dssg-public-hmda

R
3
star
76

install-cli

Bash library for guided installation & bootstrapping
Shell
3
star
77

sklearn_tutorial

Short tutorial on some pipeline issues
Jupyter Notebook
3
star
78

machine_learning_legislation

Automatically identify earmarks in congressional spending bills
OpenEdge ABL
3
star
79

acdhs_housing_public

Python
3
star
80

nfp

Impact evaluation of the Nurse-Family Partnership nonprofit
R
3
star
81

MS2Postgres

A tool to move data from SQL Server to PostgreSQL in an environment with limited harddrive space.
Python
3
star
82

EDF

Analysis of energy efficiency loan data for the Environmental Defense Fund.
3
star
83

check-for-secrets

Discovering Secrets analysts Possibly Pushed
Shell
2
star
84

el-salvador-mined-public

Reducing Early School Dropout Rates in El Salvador
Jupyter Notebook
2
star
85

after-hours

R
2
star
86

education-highschool-public

DSSG 2015 project focused on using data science methods to help partner public school districts improve their respective high school graduation rates and outcomes.
HTML
2
star
87

passenv

Shell command like env to run a program in an environment modified by values read from standard input
Python
2
star
88

chile-dt-public

Improving Workplace Safety through Proactive Inspections in Chile
R
2
star
89

cincinnati_ems_public

Jupyter Notebook
2
star
90

tuscany-tourism-public

Data-Driven Planning for Sustainable Tourism in Tuscany
HTML
2
star
91

architect

Plan, design and build train and test matrices
Python
2
star
92

panopticon

The command center at the DSSG office. http://en.wikipedia.org/wiki/Panopticon
Ruby
2
star
93

pakistan_ihhn_public

Public Repository for the DSSG 2022 Pakistan IHHN Project
Python
2
star
94

data-sci-fellows

The sexy landing page that will make everyone want to apply for the fellowship.
JavaScript
2
star
95

donors-choose

Jupyter Notebook
2
star
96

obscuritext

Transform text to be unreadable but still somewhat useful
Jupyter Notebook
2
star
97

dojo_mh_public

Public Repository for DSSG 2022 Douglas and Johnson Counties, KS Mental Health Project
Jupyter Notebook
2
star
98

baltimore_roofs_public

Public Repository for DSSG 2022 Baltimore Roofs Project
Python
2
star
99

sanergy-public

Python
2
star
100

signalled-timeout

Timeout library for generic interruption of main thread by an exception after a configurable duration.
Python
2
star