• Stars
    star
    149
  • Rank 248,619 (Top 5 %)
  • Language
    Jupyter Notebook
  • License
    GNU General Publi...
  • Created over 3 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021

Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images

Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images, ICCV 2021. [HTML]
Richard J Chen, Ming Y Lu, Wei-Hung Weng, Tiffany Y Chen, Drew FK Williamson, Trevor Manz, Maha Shady, Faisal Mahmood
@inproceedings{chen2021multimodal,
  title={Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images},
  author={Chen, Richard J and Lu, Ming Y and Weng, Wei-Hung and Chen, Tiffany Y and Williamson, Drew FK and Manz, Trevor and Shady, Maha and Mahmood, Faisal},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={4015--4025},
  year={2021}
}

Summary: We develop a method for performing early fusion between histology and genomics via: 1) formulating both WSIs and genomic inputs as embedding-like structures, 2) using co-attention mechanism that learns pairwise interactions between instance-level histology patches and genomic embeddings. In addition, we make connections between MIL and Set Transformers, and adapt Transformer Attention to WSIs for learning long-range dependencies for survival outcome prediction.

Updates:

  • 11/12/2021: Several users have raised concerns about the low c-Index for GBMLGG in SNN (Genomic Only). In using the gene families from MSigDB as gene signatures, IDH1 mutation was not included (key biomarker in distinguishing GBM and LGG).
  • 06/18/2021: Updated data preprocessing section for reproducibility.
  • 06/17/2021: Uploaded predicted risk scores on the validation folds for each models, and the evaluation script to compute the c-Index and Integrated AUC (I-AUC) validation metrics, found using the following Jupyter Notebook. Model checkpoints for MCAT are uploaded in the results directory.
  • 06/17/2021: Uploaded notebook detailing the MCAT network architecture, with sample input in the following following Jupyter Notebook, in which we print the shape of the tensors at each stage of MCAT.

Installation Guide for Linux (using anaconda)

Pre-requisites:

  • Linux (Tested on Ubuntu 18.04)
  • NVIDIA GPU (Tested on Nvidia GeForce RTX 2080 Ti x 16) with CUDA 11.0 and cuDNN 7.5
  • Python (3.7.7), h5py (2.10.0), matplotlib (3.1.1), numpy (1.18.1), opencv-python (4.1.1), openslide-python (1.1.1), openslide (3.4.1), pandas (1.1.3), pillow (7.0.0), PyTorch (1.6.0), scikit-learn (0.22.1), scipy (1.4.1), tensorflow (1.13.1), tensorboardx (1.9), torchvision (0.7.0), captum (0.2.0), shap (0.35.0)

Downloading TCGA Data

To download diagnostic WSIs (formatted as .svs files), molecular feature data and other clinical metadata, please refer to the NIH Genomic Data Commons Data Portal and the cBioPortal. WSIs for each cancer type can be downloaded using the GDC Data Transfer Tool.

Processing Whole Slide Images

To process WSIs, first, the tissue regions in each biopsy slide are segmented using Otsu's Segmentation on a downsampled WSI using OpenSlide. The 256 x 256 patches without spatial overlapping are extracted from the segmented tissue regions at the desired magnification. Consequently, a pretrained truncated ResNet50 is used to encode raw image patches into 1024-dim feature vectors, which we then save as .pt files for each WSI. The extracted features then serve as input (in a .pt file) to the network. The following folder structure is assumed for the extracted features vectors:

DATA_ROOT_DIR/
    └──TCGA_BLCA/
        ├── slide_1.pt
        ├── slide_2.pt
        └── ...
    └──TCGA_BRCA/
        ├── slide_1.pt
        ├── slide_2.pt
        └── ...
    └──TCGA_GBMLGG/
        ├── slide_1.pt
        ├── slide_2.pt
        └── ...
    └──TCGA_LUAD/
        ├── slide_1.ptd
        ├── slide_2.pt
        └── ...
    └──TCGA_UCEC/
        ├── slide_1.pt
        ├── slide_2.pt
        └── ...
    ...

DATA_ROOT_DIR is the base directory of all datasets / cancer type(e.g. the directory to your SSD). Within DATA_ROOT_DIR, each folder contains a list of .pt files for that dataset / cancer type.

Molecular Features and Genomic Signatures

Processed molecular profile features containing mutation status, copy number variation, and RNA-Seq abundance can be downloaded from the cBioPortal, which we include as CSV files in the following directory. For ordering gene features into gene embeddings, we used the following categorization of gene families (categorized via common features such as homology or biochemical activity) from MSigDB. Gene sets for homeodomain proteins and translocated cancer genes were not used due to overlap with transcription factors and oncogenes respectively. The curation of "genomic signatures" can be modified to curate genomic embedding that reflect unique biological functions.

Training-Validation Splits

For evaluating the algorithm's performance, we randomly partitioned each dataset using 5-fold cross-validation. Splits for each cancer type are found in the splits/5foldcv folder, which each contain splits_{k}.csv for k = 1 to 5. In each splits_{k}.csv, the first column corresponds to the TCGA Case IDs used for training, and the second column corresponds to the TCGA Case IDs used for validation. Alternatively, one could define their own splits, however, the files would need to be defined in this format. The dataset loader for using these train-val splits are defined in the get_split_from_df function in the Generic_WSI_Survival_Dataset class (inherited from the PyTorch Dataset class).

Running Experiments

To run experiments using the SNN, AMIL, and MMF networks defined in this repository, experiments can be run using the following generic command-line:

CUDA_VISIBLE_DEVICES=<DEVICE ID> python main.py --which_splits <SPLIT FOLDER PATH> --split_dir <SPLITS FOR CANCER TYPE> --mode <WHICH MODALITY> --model_type <WHICH MODEL>

Commands for all experiments / models can be found in the Commands.md file.

Issues

  • Please open new threads or report issues directly (for urgent blockers) to [email protected].
  • Immediate response to minor issues may not be available.

License & Usage

If you find our work useful in your research, please consider citing our paper at:

@inproceedings{chen2021multimodal,
  title={Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images},
  author={Chen, Richard J and Lu, Ming Y and Weng, Wei-Hung and Chen, Tiffany Y and Williamson, Drew FK and Manz, Trevor and Shady, Maha and Mahmood, Faisal},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={4015--4025},
  year={2021}
}

© Mahmood Lab - This code is made available under the GPLv3 License and is available for non-commercial academic purposes.

More Repositories

1

CLAM

Data-efficient and weakly supervised computational pathology on whole slide images - Nature Biomedical Engineering
Python
1,039
star
2

HIPT

Hierarchical Image Pyramid Transformer - CVPR 2022 (Oral)
Jupyter Notebook
497
star
3

PathomicFusion

Fusing Histology and Genomics via Deep Learning - IEEE TMI
Jupyter Notebook
269
star
4

UNI

Towards a general-purpose foundation model for computational pathology - Nature Medicine
Jupyter Notebook
251
star
5

CONCH

A vision-language foundation model for computational pathology - Nature Medicine
Python
225
star
6

NucleiSegmentation

cGAN-based Multi Organ Nuclei Segmentation
Python
183
star
7

PORPOISE

Pan-Cancer Integrative Histology-Genomic Analysis via Multimodal Deep Learning - Cancer Cell
Jupyter Notebook
179
star
8

TOAD

AI-based pathology predicts origins for cancers of unknown primary - Nature
Python
170
star
9

HEST

HEST: Bringing Spatial Transcriptomics and Histopathology together
Python
123
star
10

Patch-GCN

Context-Aware Survival Prediction using Patch-based Graph Convolutional Networks - MICCAI 2021
Python
114
star
11

SurvPath

Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
Python
96
star
12

SISH

Fast and scalable search of whole-slide images via self-supervised deep learning - Nature Biomedical Engineering
Python
95
star
13

MI-Zero

Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images - CVPR 2023
Python
87
star
14

TriPath

Analysis of 3D pathology samples using weakly supervised AI - Cell
Python
80
star
15

TANGLE

Transcriptomics-guided Slide Representation Learning in Computational Pathology - CVPR 2024
Python
70
star
16

PANTHER

Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology - CVPR 2024
Jupyter Notebook
57
star
17

MAPS

Machine learning for Analysis of Proteomics in Spatial biology - Nature Communications
Jupyter Notebook
46
star
18

HistoFL

Federated Learning for Computational Pathology - Medical Image Analysis
Python
44
star
19

CRANE

Deep learning enabled assessment of cardiac allograft rejection from endomyocardial biopsies- Nature Medicine
Python
28
star
20

MMP

Multimodal prototyping for cancer survival prediction - ICML 2024
Jupyter Notebook
27
star
21

MADELEINE

MADELEINE: multi-stain slide representation learning (ECCV'24)
Python
25
star
22

varpool

Variance pooling to incorporate ITH in CPath models - MICCAI 2022
Python
22
star
23

multimodal-cancer-origin-prediction

Deep learning-based multimodal integration of histology and genomics to improves cancer origin prediction
Python
15
star
24

CPATH_demographics

Demographic bias in misdiagnosis by computational pathology models - Nature Medicine
Python
12
star
25

MANTA

Multimodal AI for Renal Allograft Biopsy Assessment
Python
4
star
26

hest-website

Jekyll website for HEST-1k
CSS
2
star
27

EmbeddedAI-Scope

A 3D Printed Embedded AI-based Microscope for Pathology Diagnosis
Jupyter Notebook
1
star
28

MAXIM

MArker imputation model for multiple\underlineX IMages
Jupyter Notebook
1
star