• Stars
    star
    148
  • Rank 249,983 (Top 5 %)
  • Language
    Jupyter Notebook
  • Created over 4 years ago
  • Updated almost 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

covid-mobility

Code to generate results in "Mobility network models of COVID-19 explain inequities and inform reopening" (2020) by Serina Y Chang*, Emma Pierson*, Pang Wei Koh*, Jaline Gerardin, Beth Redbird, David Grusky, and Jure Leskovec.

Regenerating results

  1. Setting up virtualenv. Our code is run in a conda environment, with all analysis performed on a Linux Ubuntu system. You can set up this environment by running conda env create --prefix YOUR_PATH_HERE --file safegraph_env_v3.yml. Once you have set up the environment, activate it prior to running any code by running source YOUR_PATH_HERE/bin/activate.

  2. Downloading datasets. We specify paths to datasets in covid_constants_and_util.py. Note that specific filenames are still referenced in other files, following their naming constructs in the downloaded data (e.g., os.path.join(PATH_TO_SDM_V1, dt.strftime('%Y/%m/%d/%Y-%m-%d-social-distancing.csv.gz')) in helper_methods_for_aggregate_data_analysis.py). You may need to modify these paths and/or filenames if your file structure or naming constructs are different.

    • Our estimated hourly mobility networks (IPFP output) are available through the SafeGraph COVID-19 Data Consortium. The raw data that we used to construct these networks are also available; as described in the Methods section of our paper, we use v1 of the Weekly Patterns Data from March 1 - May 2 2020; Monthly Patterns data from January 2019 - February 2020; and Social Distancing Metrics from March 1 2020 - May 2, 2020.

    • We use case and death count data from The New York Times, available here. While the The New York Times updates the data regularly, results in our paper are generated using case and death counts through May 9, 2020.

    • Census data comes from the American Community Survey. Census block group shapefiles, with linked data from the 5-year 2013-2017 ACS, are available here. (We note that, as described in the Methods, we use the 1-year 2018 estimates for the current population of each census block group.) The mapping from counties to MSAs is available here.

    • We use Google mobility data as a validation of SafeGraph data quality, available here.

  3. Data processing. We process SafeGraph Patterns data and combine it with Census data using process_safegraph_data.ipynb. This notebook takes a while to run, and the files are large; we suggest running it in a screen session or similar on a cluster using a command like jupyter nbconvert --execute --ExecutePreprocessor.timeout=-1 --to notebook process_safegraph_data.ipynb. As a general note, our code is currently designed to run on our servers (eg, it makes reference to our specific filepaths). We wanted to provide a copy of the codebase as quickly as possible to reviewers, for maximum transparency, but will further clean up the code in coming weeks so that it is easier for others to use.

  4. Running models.

    • Models are run using model_experiments.py. The experimental pipeline has several steps which must be run in a particular order. Running all the models described in the paper is computationally expensive. Specifically, most experiments in the paper were performed using a server with 288 threads and 12 TB RAM; saving the models required several terabytes of disk space. We highlight steps that are particularly computationally expensive.
    • a. Generate the hourly visit matrices by running IPFP. Run python model_experiments.py run_many_models_in_parallel just_save_ipf_output. This will start one job for each MSA which generates the hourly visit matrices through the iterative proportional fitting procedure (IPFP).
    • b. Determine plausible ranges for model parameters over which to conduct grid search.. Run python model_experiments.py run_many_models_in_parallel calibrate_r0. This will start several hundred jobs.
    • c. Conduct grid search to find models which best fit case counts.. Run python model_experiments.py run_many_models_in_parallel normal_grid_search. This is a computationally expensive step which will fit thousands of models; even starting all the models may take several hours.
    • The remaining experiments rely on having grid search completed, since they use the best-fit model parameters. However, once grid search is performed, they can be run in any order. Be sure to change the variable min_timestring_to_load_best_fit_models_from_grid_search so that it is equivalent to the timestring for the first grid search experiment. All experiments can be run using the same call signature as above: python model_experiments.py run_many_models_in_parallel EXPERIMENT_NAME. The specific experiments are:
      • test_interventions: This tests the effects of reopening each POI subcategory. This is computationally expensive because it runs one model for each category, MSA, and best-fit model parameter setting; in total, this is several thousand models.
      • test_retrospective_counterfactuals: This simulates the impacts of various counterfactuals of past mobility reduction on infection outcomes. This is moderately expensive computationally (several hundred jobs), because it runs one model for each counterfactual setting, MSA, and best-fit model parameter setting.
      • test_max_capacity_clipping: This tests the effects of partial reopening by ''clipping'' each POI's visits to a fraction of its maximum capacity (or occupancy). This will start around 1000 jobs, running one model for each level of clipping, MSA, and best-fit model parameter setting.
      • test_uniform_proportion_of_full_reopening: This tests the effects of partial reopening by uniformly reducing visits to each POI from their activity levels in early March. This will also start around 1000 jobs, running one model for each level of reopening, MSA, and best-fit model parameter setting.
      • rerun_best_models_and_save_cases_per_poi: This reruns the best-fit models for each MSA and saves the expected number of infectons that occurred at each POI on each day. We do not save infections per POI by default, because this takes up too much space and slows down the simulation process. This is the least computationally expensive of the experiments, just running each best-fit model parameter setting once.
  5. Analyzing models and generating results for paper. Once models have been run, figures and results in the paper can be reproduced by running make_figures.ipynb and supplementary_analyses.ipynb. See below for details.

Files

covid_constants_and_util.py: Constants and general utility methods.

disease_model.py: Implements the disease model on the mobility network.

helper_methods_for_aggregate_data_analysis.py: Various helper methods used in data processing and throughout the analysis.

make_figures.ipynb: Once the models have been run, reproduces the main figures (Figures 1-3) and all of the Extended Data and SI figures and tables that are directly related to the main figures (e.g., results for all metro areas, in the case that the main figure only highlights one metro area).

make_network_map.ipynb: Constructs the POI-CBG spatial maps in Figure 1a.

model_experiments.py: Runs models for the experiments described in the paper.

process_safegraph_data.ipynb: Processes the raw SafeGraph data.

safegraph_env_v3.yml: Used to set up the conda environment.

supplementary_analyses.ipynb: Once the models have been run, reproduces the remaining Extended Data and SI figures and tables, including sensitivity analyses and checks for parameter identifiability.

test_google_correlation.ipynb: Tests the correlation between Google and SafeGraph mobility data.

More Repositories

1

snap

Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library.
C++
2,167
star
2

ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
Python
1,906
star
3

GraphGym

Platform for designing and evaluating Graph Neural Networks (GNN)
Python
1,669
star
4

pretrain-gnns

Strategies for Pre-training Graph Neural Networks
Python
955
star
5

deepsnap

Python library assists deep learning on graphs
Python
546
star
6

GraphRNN

Python
408
star
7

med-flamingo

Python
375
star
8

neural-subgraph-learning-GNN

Jupyter Notebook
327
star
9

stark

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases (NeurIPS D&B 2024)
Python
297
star
10

snap-python

SNAP Python code, SWIG related files
C++
294
star
11

cs224w-notes

CS224W Course Notes
CSS
292
star
12

KGReasoning

Multi-Hop Logical Reasoning in Knowledge Graphs
Python
274
star
13

GreaseLM

[ICLR 2022 spotlight]GreaseLM: Graph REASoning Enhanced Language Models for Question Answering
Python
229
star
14

MLAgentBench

Python
224
star
15

relbench

RelBench: Relational Deep Learning Benchmark
Python
193
star
16

GEARS

GEARS is a geometric deep learning model that predicts outcomes of novel multi-gene perturbations
Python
189
star
17

distance-encoding

Distance Encoding for GNN Design
Jupyter Notebook
181
star
18

graphwave

Jupyter Notebook
169
star
19

UCE

UCE is a zero-shot foundation model for single-cell gene expression data
Python
158
star
20

roland

Jupyter Notebook
125
star
21

GIB

Graph Information Bottleneck (GIB) for learning minimal sufficient structural and feature information using GNNs
Jupyter Notebook
123
star
22

mars

Discovering novel cell types across heterogenous single-cell experiments
Jupyter Notebook
119
star
23

comet

[ICLR 2021] Concept Learners for Few-Shot Learning
Python
111
star
24

SATURN

Jupyter Notebook
103
star
25

orca

[ICLR 2022] Open-World Semi-Supervised Learning
Python
85
star
26

prodigy

Python
75
star
27

CAW

Python
72
star
28

snapvx

Python
65
star
29

conformalized-gnn

Uncertainty Quantification over Graph with Conformalized Graph Neural Networks (NeurIPS 2023)
Python
64
star
30

multiscale-interactome

Python
62
star
31

plato

Python
61
star
32

miner-data

Python
60
star
33

stellar

Jupyter Notebook
58
star
34

mambo

Jupyter Notebook
37
star
35

lamp

[ICLR23] First deep learning-based surrogate model that jointly learns the evolution model and optimizes computational cost via remeshing
Python
36
star
36

crust

[NeurIPS 2020] Coresets for Robust Training of Neural Networks against Noisy Labels
Python
33
star
37

bc-emb

Python
32
star
38

csr

Python
30
star
39

zeroc

ZeroC is a neuro-symbolic method that trained with elementary visual concepts and relations, can zero-shot recognize and acquire more complex, hierarchical concepts, even across domains
Jupyter Notebook
28
star
40

masa

Motif-Aware State Assignment in Noisy Time Series Data
Python
24
star
41

le_pde

LE-PDE accelerates PDEs' forward simulation and inverse optimization via latent global evolution, achieving significant speedup with SOTA accuracy
Jupyter Notebook
21
star
42

ConE

Python
20
star
43

BioDiscoveryAgent

BioDiscoveryAgent is an LLM-based AI agent for closed-loop design of genetic perturbation experiments
Python
19
star
44

F-FADE

Python
17
star
45

MetroMaps

MetroMaps Release
Python
16
star
46

MAG

Programs for Microsoft Academic Graph
Python
16
star
47

snap-dev

SNAP repository for Ringo
C++
14
star
48

exposure-segregation

Python
13
star
49

ringo

Next generation graph processing platform
Python
12
star
50

planet

PlaNet: Predicting population response to drugs via clinical knowledge graph
Python
12
star
51

covid-mobility-tool

Jupyter Notebook
10
star
52

llm-social-network

Jupyter Notebook
10
star
53

reddit-processing

preprocessing of Reddit data
Python
7
star
54

ViRel

ViRel: Unsupervised Visual Relations Discovery with Graph-level Analogy
Python
7
star
55

news-search

search Internet news archive
Java
7
star
56

snap-python-64

C++
6
star
57

snap-dev-64

64-bit SNAP (in development, not intended for general use)
C++
6
star
58

snapworld

Python
6
star
59

lego

5
star
60

yperf

Simple performance monitor for Linux
Python
4
star
61

pebble-fit

become less sedentary with pebble
C
4
star
62

dec2vec

Python
3
star
63

caml

Python
3
star
64

SnapTimeTF

Python
2
star
65

covid-spillovers

Jupyter Notebook
2
star
66

curis-2012

Summer 2012 Curis Project
JavaScript
2
star
67

snaptime

Python
2
star
68

GNN-reading-group

1
star
69

supply-chains

Jupyter Notebook
1
star
70

relbench-user-study

Python
1
star
71

AutoTransfer

Python
1
star
72

hash

C++
1
star