• Stars
    star
    261
  • Rank 155,741 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created about 8 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Counterfactual Regression

cfrnet

Counterfactual regression (CFR) by learning balanced representations, as developed by Johansson, Shalit & Sontag (2016) and Shalit, Johansson & Sontag (2016). cfrnet is implemented in Python using TensorFlow 0.12.0-rc1 and NumPy 1.11.3. The code has not been tested with TensorFlow 1.0.

Code

The core components of cfrnet, i.e. the TensorFlow graph, is contained in cfr/cfr_net.py. The training is performed by cfr_net_train.py. The file cfr_param_search.py takes a configuration file as input and allows the user to randomly sample from the supplied parameters (given that there are multiple values given in a list. See configs/example_ihdp.txt for an example.

A typical experiment uses cfr_param_search.py and evaluate.py as sub-routines. cfr_param_search is best used to randomly search the parameter space. In the output directory, it creates a log of which configurations have been used so far, so that the same experiment is not repeated. evaluate.py goes through the predictions produced by the model and evaluates the error.

cfr_param_search

The script cfr_param_search.py runs a random hyperparameter search given a configuration file.

Usage:

python cfr_param_search.py <config_file> <num_runs>

The config_file argument should contain the path to a text file where each line is a key-value pair for a CFR parameter.

The num_run argument should contain an integer to indicate how many parameter settings should be sampled. If all possible configurations should be used, this can be set arbitrarily high as the script will terminate when all have been used. If the number of possible settings is vast, a smaller value for num_runs may be appropriate.

Example:

python evaluate.py configs/example_ihdp.txt 10

Example configuration file (from configs/example_ihdp.txt):

p_alpha=[0, 1e-1]
p_lambda=[1e-3]
n_in=[2]
n_out=[2]
dropout_in=1.0
...

Note that some of the lines have square brackets to indicate lists. If a parameter list contains more than a single element, cfr_param_search will sample uniformly from these values. In this way, random parameter search can be performed.

evaluate

The script evaluate.py performs an evaluation of a trained model based on the predictions made for the training and test sets.

Usage:

python evaluate.py <config_file> [overwrite] [filters]

The parameter config_file should be the same as the one used in cfr_param_search. (Note: evaluate only uses the settings for dataform, data_test, datadir and outdir, the rest can be changed without affecting the evaluation.)

If the overwrite parameter is set to "1", the script re-computes all error estimates. If it is set to "0" it re-uses stored values, but re-prints and re-plots all results.

The argument filters accepts a string in the form of a python dict containing values of the parameters the used wishes to filter. This produces plots and text summaries only of results corresponding to configuration that matches the filter.

Example:

python evaluate.py configs/example_ihdp.txt 0 "{p_alpha: 0}"

Examples

A simple experiment example is contained in example_ihdp.sh. This file runs the model on (a subset of) the IHDP data with parameters supplied by configs/example_ihdp.txt. The data for this example can be downloaded from http://www.fredjo.com/files/ihdp_npci_1-100.train.npz (train) and http://www.fredjo.com/files/ihdp_npci_1-100.test.npz (test). For the full data (of 1000 replications) used in the ICML 2017 paper, please visit https://www.fredjo.com/.

FAQ

  • Q: What are the hyperparameters used on IHDP in the ICML 2017 paper? A: The parameters were those given in example_ihdp.txt but with p_alpha = 0.3
  • Q: I don't get the same IHDP results as in the paper when I try to replicate with the IHDP example from Github. A: The ICML 2017 results were computed over the full set of 1000 replications. The Github IHDP example uses only 100 examples as it is meant to serve as a quick demo. Please find the 1000 replications at https://www.fredjo.com/.

References

Uri Shalit, Fredrik D. Johansson & David Sontag. Estimating individual treatment effect: generalization bounds and algorithms, 34th International Conference on Machine Learning (ICML), August 2017.

Fredrik D. Johansson, Uri Shalit & David Sontag. Learning Representations for Counterfactual Inference. 33rd International Conference on Machine Learning (ICML), June 2016.

More Repositories

1

structuredinference

Structured Inference Networks for Nonlinear State Space Models
Jupyter Notebook
255
star
2

embeddings

Code for AMIA CRI 2016 paper "Learning Low-Dimensional Representations of Medical Concepts"
Python
233
star
3

TabLLM

Python
162
star
4

dmm

Deep Markov Models
Jupyter Notebook
127
star
5

deepDiagnosis

A torch package for learning diagnosis models from temporal patient data.
Lua
110
star
6

HealthKnowledgeGraph

Health knowledge graph for 157 diseases and 491 symptoms, learned from >270,000 patients' data
89
star
7

co-llm

Co-LLM: Learning to Decode Collaboratively with Multiple Language Models
Python
87
star
8

omop-learn

Python package for machine learning for healthcare using a OMOP common data model
Python
86
star
9

prancer

Platform enabling Rapid Annotation for Clinical Entity Recognition
JavaScript
48
star
10

gumbel-max-scm

Code for "Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models" (ICML 2019)
Python
39
star
11

ML-tools

Miscellaneous tools for clinical ML
Python
30
star
12

human_ai_deferral

Human-AI Deferral Evaluation Benchmark (Learning to Defer) AISTATS23
Python
18
star
13

anchorExplorer

Python
17
star
14

trajectory-inspection

Code for "Trajectory Inspection: A Method for Iterative Clinician-Driven Design of Reinforcement Learning Studies"
Jupyter Notebook
16
star
15

cotrain-prompting

Code for co-training large language models (e.g. T0) with smaller ones (e.g. BERT) to boost few-shot performance
Python
15
star
16

ContextualAutocomplete_MLHC2020

Code for Contextual Autocomplete paper published in MLHC2020
Jupyter Notebook
13
star
17

teaching-to-understand-ai

Code and webpages for our study on teaching humans to defer to an AI
Jupyter Notebook
11
star
18

dgm

Deep Generative Model (Torch)
Lua
11
star
19

realhumaneval

Jupyter Notebook
11
star
20

learn-to-defer

Code for "Consistent Estimators for Learning to Defer to an Expert" (ICML 2020)
Jupyter Notebook
11
star
21

sc-foundation-eval

Code for evaluating single cell foundation models scBERT and scGPT
Jupyter Notebook
10
star
22

SparsityBoost

http://cs.nyu.edu/~dsontag/papers/BrennerSontag_uai13.pdf
Python
10
star
23

proxy-anchor-regression

Code for ICML 2021 paper "Regularizing towards Causal Invariance: Linear Models with Proxies" (ICML 2021)
Jupyter Notebook
10
star
24

onboarding_human_ai

Onboarding Humans to work with AI: Algorithms to find regions and describe them in natural language that show how humans should collaborate with AI (NeurIPS23)
Jupyter Notebook
10
star
25

vae_ssl

Scalable semi-supervised learning with deep variational autoencoders
Jupyter Notebook
9
star
26

amr-uti-stm

Code for "A decision algorithm to promote outpatient antimicrobial stewardship for uncomplicated urinary tract infection"
Python
8
star
27

dgc_predict

Applies and evaluates a variety of methods to complete a partially-observed data tensor, e.g. comprising gene expression profiles corresponding to various drugs, applied in various cellular contexts.
R
8
star
28

mimic-language-model

A conditional language model for MIMIC-III.
Python
8
star
29

ml_mmrf

Machine Learning with data from the Multiple Myeloma Research Foundation
Jupyter Notebook
7
star
30

overparam

Python
6
star
31

ckd_progression

Python
6
star
32

parametric-robustness-evaluation

Code for paper "Evaluating Robustness to Dataset Shift via Parametric Robustness Sets"
Python
5
star
33

active_learn_to_defer

Code for Sample Efficient Learning of Predictors that Complement Humans (ICML 2022)
Python
5
star
34

surprising-sepsis

Python
4
star
35

large-scale-temporal-shift-study

Code for Large-Scale Study of Temporal Shift in Health Insurance Claims. Christina X Ji, Ahmed M Alaa, David Sontag. CHIL, 2023. https://arxiv.org/abs/2305.05087
Python
4
star
36

amr-uti-kdd

Treatment Policy Learning in Multiobjective Settings with Fully Observed Outcomes (KDD 2020)
Python
4
star
37

theanomodels

A lightweight wrapper around theano for rapid-prototyping of models
Python
3
star
38

clinical-anchors

Python
3
star
39

finding-decision-heterogeneity-regions

Code for "Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance" at NeurIPS 2021
Jupyter Notebook
3
star
40

fully-observed-policy-learning

Code for "Treatment Policy Learning in Multiobjective Settings with Fully Observed Outcomes" (KDD 2020)
Jupyter Notebook
3
star
41

mimic_annotations

2
star
42

fw-inference

Barrier Frank-Wolfe for Marginal Inference
C++
2
star
43

oncology_rationale_extraction

Functionality from "Automated NLP extraction of clinical rationale for treatment discontinuation in breast cancer"
Python
2
star
44

overlap-code

Code for "Characterization of Overlap in Observational Studies" (AISTATS 2020)
Python
2
star
45

omop-variation

Tools to identify and evaluate heterogeneity in decision-making processes.
Python
2
star
46

clinicalml-scBERT-NMI

analysis code to reproduce results in NMI submission
Jupyter Notebook
1
star
47

rct-obs-extrapolation

Code for paper, "Falsification before Extrapolation in Causal Effect Estimation"
Jupyter Notebook
1
star