• Stars
    star
    269
  • Rank 152,662 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    GNU General Publi...
  • Created almost 5 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fusing Histology and Genomics via Deep Learning - IEEE TMI

Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Diagnosis and Prognosis

Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis, IEEE Transactions on Medical Imaging, 2020. [HTML] [arXiv] [Talk]
Richard J Chen, Ming Y Lu, Jingwen Wang, Drew FK Williamson, Scott J Rodig, Neal I Lindeman, Faisal Mahmood
@article{chen2020pathomic,
  title={Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis},
  author={Chen, Richard J and Lu, Ming Y and Wang, Jingwen and Williamson, Drew FK and Rodig, Scott J and Lindeman, Neal I and Mahmood, Faisal},
  journal={IEEE Transactions on Medical Imaging},
  year={2020},
  publisher={IEEE}
}

Summary: We propose a simple and scalable method for integrating histology images and -omic data using attention gating and tensor fusion. Histopathology images can be processed using CNNs or GCNs for parameter efficiency or a combination of the the two. The setup is adaptable for integrating multiple -omic modalities with histopathology and can be used for improved diagnostic, prognostic and therapeutic response determinations.

Community / Follow-Up Work :)

GitHub Repositories / Projects ★ ★ ★ ★ ★

Updates

  • 05/26/2021: Updated Google Drive with all models and processed data for TCGA-GBMLGG and TCGA-KIRC. found using the following link. The data made available for TCGA-GBMLGG are the same ROIs used by Mobadersany et al.

Setup

Prerequisites

  • Linux (Tested on Ubuntu 18.04)
  • NVIDIA GPU (Tested on Nvidia GeForce RTX 2080 Tis on local workstations, and Nvidia V100s using Google Cloud)
  • CUDA + cuDNN (Tested on CUDA 10.1 and cuDNN 7.5. CPU mode and CUDA without CuDNN may work with minimal modification, but untested.)
  • torch>=1.1.0
  • torch_geometric=1.3.0

Code Base Structure

The code base structure is explained below:

  • train_cv.py: Cross-validation script for training unimodal and multimodal networks. This script will save evaluation metrics and predictions on the train + test split for each epoch on every split in checkpoints.
  • test_cv.py: Script for testing unimodal and unimodal networks on only the test split.
  • train_test.py: Contains the definitions for "train" and "test".
  • networks.py: Contains PyTorch model definitions for all unimodal and multimodal network.
  • fusion.py: Contains PyTorch model definitions for fusion.
  • data_loaders.py: Contains the PyTorch DatasetLoader definition for loading multimodal data.
  • options.py: Contains all the options for the argparser.
  • make_splits.py: Script for generating a pickle file that saves + aligns the path for multimodal data for cross-validation.
  • run_cox_baselines.py: Script for running Cox baselines.
  • utils.py: Contains definitions for collating, survival loss functions, data preprocessing, evaluation, figure plotting, etc...

The directory structure for your multimodal dataset should look similar to the following:

./
├── data
      └── PROJECT
            ├── INPUT A (e.g. Image)
                ├── image_001.png
                ├── image_002.png
                ├── ...
            ├── INPUT B (e.g. Graph)
                ├── image_001.pkl
                ├── image_002.pkl
                ├── ...
            └── INPUT C (e.g. Genomic)
                └── genomic_data.csv
└── checkpoints
        └── PROJECT
            ├── TASK X (e.g. Survival Analysis)
                ├── path
                    ├── ...
                ├── ...
            └── TASK Y (e.g. Grade Classification)
                ├── path
                    ├── ...
                ├── ...

Depending on which modalities you are interested in combining, you must: (1) write your own function for aligning multimodal data in make_splits.py, (2) create your DatasetLoader in data_loaders.py, (3) modify the options.py for your data and task. Models will be saved to the checkpoints directory, with each model for each task saved in its own directory. At the moment, the only supervised learning tasks implemented are survival outcome prediction and grade classification.

Training and Evaluation

Here are example commands for training unimodal + multimodal networks.

Survival Model for Input A

Example shown below for training a survival model for mode A and saving the model checkpoints + predictions at the end of each split. In this example, we would create a folder called "CNN_A" in "./checkpoints/example/" for all the models in cross-validation. It assumes that "A" is defined as a mode in dataset_loaders.py for handling modality-specific data-preprocessing steps (random crop + flip + jittering for images), and that there is a network defined for input A in networks.py. "surv" is already defined as a task for training networks for survival analysis in options.py, networks.py, train_test.py, train_cv.py.

python train_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task surv --mode A --model_name CNN_A --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0

To obtain test predictions on only the test splits in your cross-validation, you can replace "train_cv" with "test_cv".

python test_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task surv --mode input_A --model input_A_CNN --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0

Grade Classification Model for Input A + B

Example shown below for training a grade classification model for fusing modes A and B. Similar to the previous example, we would create a folder called "Fusion_AB" in "./checkpoints/example/" for all the models in cross-validation. It assumes that "AB" is defined as a mode in dataset_loaders.py for handling multiple inputs A and B at the same time. "grad" is already defined as a task for training networks for grade classification in options.py, networks.py, train_test.py, train_cv.py.

python train_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task grad --mode AB --model_name Fusion_AB --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0

Reproducibility

To reporduce the results in our paper and for exact data preprocessing, implementation, and experimental details please follow the instructions here: ./data/TCGA_GBMLGG/. Processed data and trained models can be downloaded here.

Issues

  • Please open new threads or report issues directly (for urgent blockers) to [email protected].
  • Immediate response to minor issues may not be available.

Licenses, Usages, and Acknowledgements

  • This project is licensed under the GNU GPLv3 License - see the LICENSE.md file for details. A provisional patent on this work has been filed by the Brigham and Women's Hospital.
  • This code is inspired by SALMON and SCNN. Code base structure was inspired by pytorch-CycleGAN-and-pix2pix.
  • Subsidized computing resources for this project were provided by Nvidia and Google Cloud.
  • If you find our work useful in your research, please consider citing our paper at:
@article{chen2020pathomic,
  title={Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis},
  author={Chen, Richard J and Lu, Ming Y and Wang, Jingwen and Williamson, Drew FK and Rodig, Scott J and Lindeman, Neal I and Mahmood, Faisal},
  journal={IEEE Transactions on Medical Imaging},
  year={2020},
  publisher={IEEE}
}

© Mahmood Lab - This code is made available under the GPLv3 License and is available for non-commercial academic purposes.

More Repositories

1

CLAM

Data-efficient and weakly supervised computational pathology on whole slide images - Nature Biomedical Engineering
Python
1,039
star
2

HIPT

Hierarchical Image Pyramid Transformer - CVPR 2022 (Oral)
Jupyter Notebook
497
star
3

UNI

Towards a general-purpose foundation model for computational pathology - Nature Medicine
Jupyter Notebook
251
star
4

CONCH

A vision-language foundation model for computational pathology - Nature Medicine
Python
225
star
5

NucleiSegmentation

cGAN-based Multi Organ Nuclei Segmentation
Python
183
star
6

PORPOISE

Pan-Cancer Integrative Histology-Genomic Analysis via Multimodal Deep Learning - Cancer Cell
Jupyter Notebook
179
star
7

TOAD

AI-based pathology predicts origins for cancers of unknown primary - Nature
Python
170
star
8

MCAT

Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021
Jupyter Notebook
149
star
9

HEST

HEST: Bringing Spatial Transcriptomics and Histopathology together
Python
123
star
10

Patch-GCN

Context-Aware Survival Prediction using Patch-based Graph Convolutional Networks - MICCAI 2021
Python
114
star
11

SurvPath

Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction - CVPR 2024
Python
96
star
12

SISH

Fast and scalable search of whole-slide images via self-supervised deep learning - Nature Biomedical Engineering
Python
95
star
13

MI-Zero

Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images - CVPR 2023
Python
87
star
14

TriPath

Analysis of 3D pathology samples using weakly supervised AI - Cell
Python
80
star
15

TANGLE

Transcriptomics-guided Slide Representation Learning in Computational Pathology - CVPR 2024
Python
70
star
16

PANTHER

Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology - CVPR 2024
Jupyter Notebook
57
star
17

MAPS

Machine learning for Analysis of Proteomics in Spatial biology - Nature Communications
Jupyter Notebook
46
star
18

HistoFL

Federated Learning for Computational Pathology - Medical Image Analysis
Python
44
star
19

CRANE

Deep learning enabled assessment of cardiac allograft rejection from endomyocardial biopsies- Nature Medicine
Python
28
star
20

MMP

Multimodal prototyping for cancer survival prediction - ICML 2024
Jupyter Notebook
27
star
21

MADELEINE

MADELEINE: multi-stain slide representation learning (ECCV'24)
Python
25
star
22

varpool

Variance pooling to incorporate ITH in CPath models - MICCAI 2022
Python
22
star
23

multimodal-cancer-origin-prediction

Deep learning-based multimodal integration of histology and genomics to improves cancer origin prediction
Python
15
star
24

CPATH_demographics

Demographic bias in misdiagnosis by computational pathology models - Nature Medicine
Python
12
star
25

MANTA

Multimodal AI for Renal Allograft Biopsy Assessment
Python
4
star
26

hest-website

Jekyll website for HEST-1k
CSS
2
star
27

EmbeddedAI-Scope

A 3D Printed Embedded AI-based Microscope for Pathology Diagnosis
Jupyter Notebook
1
star
28

MAXIM

MArker imputation model for multiple\underlineX IMages
Jupyter Notebook
1
star