• Stars
    star
    110
  • Rank 316,770 (Top 7 %)
  • Language
    Lua
  • License
    MIT License
  • Created over 8 years ago
  • Updated about 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A torch package for learning diagnosis models from temporal patient data.

deepDiagnosis

A torch package for learning diagnosis models from temporal patient data.

For more details please check:

  1. http://arxiv.org/abs/1608.00647

Narges Razavian, Jake Marcus, David Sontag,"Multi-task Prediction of Disease Onsets from Longitudinal Lab Tests", Machine Learning and Healthcare, 2016

  1. http://arxiv.org/abs/1511.07938

Narges Razavian, David Sontag, "Temporal Convolutional Neural Networks for Diagnosis from Lab Tests", ICLR 2016 Workshop track.

#Installation:

The package has the following dependencies:

Python: Numpy, CPickle

LUA: Torch, cunn, nn, cutorch, gnuplot, optim, and rnn

#Usage:

Run the following in order. Creating datasets can be done in parallel over train/test/valid tasks. Up to you.

There are sample input files (./sample_python_data) that you can use to test the package first.

1) python create_torch_tensors.py --x  sample_python_data/xtrain.pkl --y sample_python_data/ytrain.pkl --task 'train' --outdir ./sampledata/

2) python create_torch_tensors.py --x sample_python_data/xtest.pkl --y sample_python_data/ytest.pkl --task 'test' --outdir ./sampledata/

3) python create_torch_tensors.py --x sample_python_data/xvalid.pkl --y sample_python_data/yvalid.pkl --task 'valid' --outdir ./sampledata/


4) th create_batches.lua --task=train --input_dir=./sampledata --batch_output_dir=./sampleBatchDir 

5) th create_batches.lua --task=valid --input_dir=./sampledata --batch_output_dir=./sampleBatchDir 

6) th create_batches.lua --task=scoretrain --input_dir=./sampledata --batch_output_dir=./sampleBatchDir 

7) th create_batches.lua --task=test --input_dir=./sampledata --batch_output_dir=./sampleBatchDir


8) th train_and_validate.lua --task=train --input_batch_dir=./sampleBatchDir --save_models_dir=./sample_models/

Once the model is trained, run the following to get final evaluations on test set: (change the "lstm2016_05_29_10_11_01" into the model directory that you have created in step 8. Training directories have timestamp.)

9) th train_and_validate.lua --task=test --validation_dir=./sample_models/lstm2016_05_29_10_11_01/

Read the following for details on how to define your cohort and task.

#Input: Input should be one of the two formats described below:

Here is an Imaginary input and output for a single person in 2 input setting.

Read below for the details:

Format 1) Python nympy arrays (also support cPickle) of size

xtrain, xvalid, xtest: |labs| x |people| x |cohort time| for creating the input batches

ytrain, yvalid, ytest: |diseases| x |people| x |cohort time| for creating the output batches and inclusion/exclusion for each batch member

Format 2) Python numpy arrays (also support cPickle) of size

xtrain, xvalid, xtest: |Labs| x |people| x |cohort time| for the output

ytrain, yvalid, ytest: |diseases| x |people| for the output, where we do not have a concept of time.

(Note that in format 2 you can also provide exclusion-per-disease for input. If you need that version, let me know and I'll update that part immediately.)

Format 3) advanced shelve databases, for our internal use.

Please refer to https://github.com/clinicalml/ckd_progression for details.

#Prediction Models:

Currently the following models are supported. The details of the architectures are included in the citation paper below.

  1. Logistic Regression (--model=max_logit)

  1. Feedforward network (--model=mlp)

  1. Temporal Convolutional neural network over a backward window (--model=convnet)

  1. Convolutional neural network over input and time dimension (--model=convnet_mix)

  1. Multi-resolution temporal convolutional neural network (--model=multiresconvnet)

  1. LSTM network over the backward window (--model=lstmlast) (note: a version --model=lstmall is also available but we found training with lstmlast gives better results)

  1. Ensemble of multiple models (to be added soon)

#Synthetic Input for testing the package

You can use the following to create synthetic numpy arrays to test the package;

python create_synthetic_data.py --outdir ./sample_python_data --N 6000  --D 15 --T 48 --O 20

This code will create 3 datasets (train, test, valid) in the ./sample_python_data directory, with dimensions of: 5 x 2000 x 48 for each input x (xtrain, xtest, xvalid) and 20 x 2000 x 48 for each outcome set y. This synthetic data corresponds to input type 1 above. Follow steps 1-9 in the (Run) section above to test with this data, and feel free to test with other synthetic datasets.

#Citation: @article{razavian2016temporal, title={Multi-task Prediction of Disease Onsets from Longitudinal Lab Tests}, author={Razavian, Narges and Marcus,Jake and Sontag, David}, journal={1st Conference on Machine Learning and Health Care (MLHC)}, year={2016} }

@article{razavian2015temporal,
  title={Temporal Convolutional Neural Networks for Diagnosis from Lab Tests},
  author={Razavian, Narges and Sontag, David},
  journal={arXiv preprint arXiv:1511.07938},
  year={2015}
}

#Bug reports, questions, and Contact:

For any questions please email: narges razavian [[email protected] or https://github.com/narges-rzv/]

More Repositories

1

cfrnet

Counterfactual Regression
Python
261
star
2

structuredinference

Structured Inference Networks for Nonlinear State Space Models
Jupyter Notebook
255
star
3

embeddings

Code for AMIA CRI 2016 paper "Learning Low-Dimensional Representations of Medical Concepts"
Python
233
star
4

TabLLM

Python
162
star
5

dmm

Deep Markov Models
Jupyter Notebook
127
star
6

HealthKnowledgeGraph

Health knowledge graph for 157 diseases and 491 symptoms, learned from >270,000 patients' data
96
star
7

co-llm

Co-LLM: Learning to Decode Collaboratively with Multiple Language Models
Python
87
star
8

omop-learn

Python package for machine learning for healthcare using a OMOP common data model
Python
86
star
9

prancer

Platform enabling Rapid Annotation for Clinical Entity Recognition
JavaScript
48
star
10

gumbel-max-scm

Code for "Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models" (ICML 2019)
Python
39
star
11

ML-tools

Miscellaneous tools for clinical ML
Python
30
star
12

human_ai_deferral

Human-AI Deferral Evaluation Benchmark (Learning to Defer) AISTATS23
Python
18
star
13

anchorExplorer

Python
17
star
14

trajectory-inspection

Code for "Trajectory Inspection: A Method for Iterative Clinician-Driven Design of Reinforcement Learning Studies"
Jupyter Notebook
16
star
15

cotrain-prompting

Code for co-training large language models (e.g. T0) with smaller ones (e.g. BERT) to boost few-shot performance
Python
15
star
16

ContextualAutocomplete_MLHC2020

Code for Contextual Autocomplete paper published in MLHC2020
Jupyter Notebook
13
star
17

realhumaneval

Jupyter Notebook
12
star
18

teaching-to-understand-ai

Code and webpages for our study on teaching humans to defer to an AI
Jupyter Notebook
11
star
19

dgm

Deep Generative Model (Torch)
Lua
11
star
20

learn-to-defer

Code for "Consistent Estimators for Learning to Defer to an Expert" (ICML 2020)
Jupyter Notebook
11
star
21

sc-foundation-eval

Code for evaluating single cell foundation models scBERT and scGPT
Jupyter Notebook
10
star
22

SparsityBoost

http://cs.nyu.edu/~dsontag/papers/BrennerSontag_uai13.pdf
Python
10
star
23

proxy-anchor-regression

Code for ICML 2021 paper "Regularizing towards Causal Invariance: Linear Models with Proxies" (ICML 2021)
Jupyter Notebook
10
star
24

onboarding_human_ai

Onboarding Humans to work with AI: Algorithms to find regions and describe them in natural language that show how humans should collaborate with AI (NeurIPS23)
Jupyter Notebook
10
star
25

vae_ssl

Scalable semi-supervised learning with deep variational autoencoders
Jupyter Notebook
9
star
26

amr-uti-stm

Code for "A decision algorithm to promote outpatient antimicrobial stewardship for uncomplicated urinary tract infection"
Python
8
star
27

dgc_predict

Applies and evaluates a variety of methods to complete a partially-observed data tensor, e.g. comprising gene expression profiles corresponding to various drugs, applied in various cellular contexts.
R
8
star
28

mimic-language-model

A conditional language model for MIMIC-III.
Python
8
star
29

ml_mmrf

Machine Learning with data from the Multiple Myeloma Research Foundation
Jupyter Notebook
7
star
30

overparam

Python
6
star
31

ckd_progression

Python
6
star
32

parametric-robustness-evaluation

Code for paper "Evaluating Robustness to Dataset Shift via Parametric Robustness Sets"
Python
5
star
33

active_learn_to_defer

Code for Sample Efficient Learning of Predictors that Complement Humans (ICML 2022)
Python
5
star
34

surprising-sepsis

Python
4
star
35

large-scale-temporal-shift-study

Code for Large-Scale Study of Temporal Shift in Health Insurance Claims. Christina X Ji, Ahmed M Alaa, David Sontag. CHIL, 2023. https://arxiv.org/abs/2305.05087
Python
4
star
36

amr-uti-kdd

Treatment Policy Learning in Multiobjective Settings with Fully Observed Outcomes (KDD 2020)
Python
4
star
37

theanomodels

A lightweight wrapper around theano for rapid-prototyping of models
Python
3
star
38

clinical-anchors

Python
3
star
39

finding-decision-heterogeneity-regions

Code for "Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance" at NeurIPS 2021
Jupyter Notebook
3
star
40

fully-observed-policy-learning

Code for "Treatment Policy Learning in Multiobjective Settings with Fully Observed Outcomes" (KDD 2020)
Jupyter Notebook
3
star
41

mimic_annotations

2
star
42

fw-inference

Barrier Frank-Wolfe for Marginal Inference
C++
2
star
43

oncology_rationale_extraction

Functionality from "Automated NLP extraction of clinical rationale for treatment discontinuation in breast cancer"
Python
2
star
44

overlap-code

Code for "Characterization of Overlap in Observational Studies" (AISTATS 2020)
Python
2
star
45

omop-variation

Tools to identify and evaluate heterogeneity in decision-making processes.
Python
2
star
46

clinicalml-scBERT-NMI

analysis code to reproduce results in NMI submission
Jupyter Notebook
1
star
47

rct-obs-extrapolation

Code for paper, "Falsification before Extrapolation in Causal Effect Estimation"
Jupyter Notebook
1
star