• Stars
    star
    233
  • Rank 172,230 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 8 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for AMIA CRI 2016 paper "Learning Low-Dimensional Representations of Medical Concepts"

embeddings

This repository contains code accompanying publication of the paper:

Y. Choi, Y. Chiu, D. Sontag. Learning Low-Dimensional Representations of Medical Concepts. To appear in Proceedings of the AMIA Summit on Clinical Research Informatics (CRI), 2016.

In the base directory there are three files containing the two best 300-dimensional embeddings learned in the paper, and the embeddings used in the previous work which we compared to:

  • claims_codes_hs_300.txt.gz: Embeddings of ICD-9 diagnosis and procedure codes, NDC medication codes, and LOINC laboratory codes, derived from a large claims dataset from 2005 to 2013 for roughly 4 million people.
  • stanford_cuis_svd_300.txt.gz: Embeddings of UMLS concept unique identifiers (CUIs), derived from 20 million clinical notes spanning 19 years of data from Stanford Hospital and Clinics, using a data set released in a paper by Finlayson, LePendu & Shah.
  • DeVine_etal_200.txt.gz: Embeddings of UMLS CUIs learned by De Vine et al. CIKM '14, derived from 348,566 medical journal abstracts (courtesy of the authors).

In the eval directory there are three files of interest:

  • eval/Embedding_Evaluation.ipynb, an iPython notebook which reproduces the main results of the paper. If you come up with your own embeddings, you can use this benchmark to quantitatively compare them to our embeddings.
  • eval/visualize_claims_embeddings.py a Python program you can run which will allow you to look at nearest neighbors for the claims_codes_hs_300.txt embeddings (after decompressing the file using gunzip).
  • eval/visualize_stanford_embeddings.py, same as above but for the stanford_cuis_svd_300.txt embeddings.

Note that you may need to decompress, using gunzip, files in the eval directory prior to being able to run some of the programs. Additionally, to run the iPython notebook, you need to place the file MRCONSO.RRF from the UMLS Metathesaurus into the eval directory (we do not distribute this).

More Repositories

1

cfrnet

Counterfactual Regression
Python
261
star
2

structuredinference

Structured Inference Networks for Nonlinear State Space Models
Jupyter Notebook
255
star
3

TabLLM

Python
162
star
4

dmm

Deep Markov Models
Jupyter Notebook
127
star
5

deepDiagnosis

A torch package for learning diagnosis models from temporal patient data.
Lua
110
star
6

HealthKnowledgeGraph

Health knowledge graph for 157 diseases and 491 symptoms, learned from >270,000 patients' data
96
star
7

co-llm

Co-LLM: Learning to Decode Collaboratively with Multiple Language Models
Python
87
star
8

omop-learn

Python package for machine learning for healthcare using a OMOP common data model
Python
86
star
9

prancer

Platform enabling Rapid Annotation for Clinical Entity Recognition
JavaScript
48
star
10

gumbel-max-scm

Code for "Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models" (ICML 2019)
Python
39
star
11

ML-tools

Miscellaneous tools for clinical ML
Python
30
star
12

human_ai_deferral

Human-AI Deferral Evaluation Benchmark (Learning to Defer) AISTATS23
Python
18
star
13

anchorExplorer

Python
17
star
14

trajectory-inspection

Code for "Trajectory Inspection: A Method for Iterative Clinician-Driven Design of Reinforcement Learning Studies"
Jupyter Notebook
16
star
15

cotrain-prompting

Code for co-training large language models (e.g. T0) with smaller ones (e.g. BERT) to boost few-shot performance
Python
15
star
16

ContextualAutocomplete_MLHC2020

Code for Contextual Autocomplete paper published in MLHC2020
Jupyter Notebook
13
star
17

realhumaneval

Jupyter Notebook
12
star
18

teaching-to-understand-ai

Code and webpages for our study on teaching humans to defer to an AI
Jupyter Notebook
11
star
19

dgm

Deep Generative Model (Torch)
Lua
11
star
20

learn-to-defer

Code for "Consistent Estimators for Learning to Defer to an Expert" (ICML 2020)
Jupyter Notebook
11
star
21

sc-foundation-eval

Code for evaluating single cell foundation models scBERT and scGPT
Jupyter Notebook
10
star
22

SparsityBoost

http://cs.nyu.edu/~dsontag/papers/BrennerSontag_uai13.pdf
Python
10
star
23

proxy-anchor-regression

Code for ICML 2021 paper "Regularizing towards Causal Invariance: Linear Models with Proxies" (ICML 2021)
Jupyter Notebook
10
star
24

onboarding_human_ai

Onboarding Humans to work with AI: Algorithms to find regions and describe them in natural language that show how humans should collaborate with AI (NeurIPS23)
Jupyter Notebook
10
star
25

vae_ssl

Scalable semi-supervised learning with deep variational autoencoders
Jupyter Notebook
9
star
26

amr-uti-stm

Code for "A decision algorithm to promote outpatient antimicrobial stewardship for uncomplicated urinary tract infection"
Python
8
star
27

dgc_predict

Applies and evaluates a variety of methods to complete a partially-observed data tensor, e.g. comprising gene expression profiles corresponding to various drugs, applied in various cellular contexts.
R
8
star
28

mimic-language-model

A conditional language model for MIMIC-III.
Python
8
star
29

ml_mmrf

Machine Learning with data from the Multiple Myeloma Research Foundation
Jupyter Notebook
7
star
30

overparam

Python
6
star
31

ckd_progression

Python
6
star
32

parametric-robustness-evaluation

Code for paper "Evaluating Robustness to Dataset Shift via Parametric Robustness Sets"
Python
5
star
33

active_learn_to_defer

Code for Sample Efficient Learning of Predictors that Complement Humans (ICML 2022)
Python
5
star
34

surprising-sepsis

Python
4
star
35

large-scale-temporal-shift-study

Code for Large-Scale Study of Temporal Shift in Health Insurance Claims. Christina X Ji, Ahmed M Alaa, David Sontag. CHIL, 2023. https://arxiv.org/abs/2305.05087
Python
4
star
36

amr-uti-kdd

Treatment Policy Learning in Multiobjective Settings with Fully Observed Outcomes (KDD 2020)
Python
4
star
37

theanomodels

A lightweight wrapper around theano for rapid-prototyping of models
Python
3
star
38

clinical-anchors

Python
3
star
39

finding-decision-heterogeneity-regions

Code for "Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance" at NeurIPS 2021
Jupyter Notebook
3
star
40

fully-observed-policy-learning

Code for "Treatment Policy Learning in Multiobjective Settings with Fully Observed Outcomes" (KDD 2020)
Jupyter Notebook
3
star
41

mimic_annotations

2
star
42

fw-inference

Barrier Frank-Wolfe for Marginal Inference
C++
2
star
43

oncology_rationale_extraction

Functionality from "Automated NLP extraction of clinical rationale for treatment discontinuation in breast cancer"
Python
2
star
44

overlap-code

Code for "Characterization of Overlap in Observational Studies" (AISTATS 2020)
Python
2
star
45

omop-variation

Tools to identify and evaluate heterogeneity in decision-making processes.
Python
2
star
46

clinicalml-scBERT-NMI

analysis code to reproduce results in NMI submission
Jupyter Notebook
1
star
47

rct-obs-extrapolation

Code for paper, "Falsification before Extrapolation in Causal Effect Estimation"
Jupyter Notebook
1
star