• Stars
    star
    125
  • Rank 286,335 (Top 6 %)
  • Language
    Jupyter Notebook
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

KD

This repo covers the implementation of the following paper:

"Data Tells the Truth: A Knowledge Distillation Method for Genomic Survival Analysis by Handling Censoring"

Project home page: https://datatellstruth.github.io/.

Requeirements

Python 3.7.11

Pytorch 1.13.1

numpy 1.21.2

scikit-learn 1.0

tqdm 4.62.4

python setup.py install

Data preparation

All data used in our work are publicly available. You can download the TCGA data from :https://www.synapse.org/#!Synapse:syn4976369.2. Here, we provide processed files in data/raw_data. Data files required for running can be obtained through processing. For instance, you can process data/raw_data/Oncogene_5.txt to obtain the {cancer_type}_gene5.csv. We provide samples in data/sample_data. The folder stucture as follow:

```
data
  |β€”β€”β€”β€”gene
  |       β””β€”β€”β€”β€”β€”β€”ACC.csv
  |       β””β€”β€”β€”β€”β€”β€”ACC_Clean.csv
  |       β””β€”β€”β€”β€”β€”β€”PAAD.csv
  |       β””β€”β€”β€”β€”β€”β€”PAAD_Clean.csv
  |       β””β€”β€”β€”β€”β€”β€”...
  |β€”β€”β€”β€”location
  |       β””β€”β€”β€”β€”β€”β€”ACC_gene5.csv
  |       β””β€”β€”β€”β€”β€”β€”HNSC_gene5.csv
  |       β””β€”β€”β€”β€”β€”β€”PAAD_gene5.csv
  |       β””β€”β€”β€”β€”β€”β€”LUAD_gene5.csv
  |       β””β€”β€”β€”β€”β€”β€”...
  ```

Usage

Path to the data used for running should save in args.csv_folder as csv files, you can use the other data formats to match the dataset in your machine or re-write the Cancer and Cancer_fold class in the dataset.py.

We have encapsulated the methods into a Python library, which includes two trainers: KD_Trainer and KDKA_Trainer. Additionally, there is an evaluation function provided for assessment. Below, we provide an example demonstrating how to use these methods:

from KD.experiment import KD_Trainer, KDKA_Trainer
# For more details about parameters, please refer to the github repository.

#train = KDKA_Trainer(cancer_type='ACC', folder='data/cancer_gene/', resume='', csv_folder='data/', round=25, batch_size=10, output_folder='pth', lr=0.1, dim=5796, seed=42)
train = KD_Trainer(cancer_type='ACC', folder='data/cancer_gene/', resume='', csv_folder='data/', round=25, batch_size=10, output_folder='pth', lr=0.1, dim=5796, seed=42)

We have also provided a runnable notebook file for ease of learning and utilization.