GEARS: Predicting transcriptional outcomes of novel multi-gene perturbations
This repository hosts the official implementation of GEARS, a method that can predict transcriptional response to both single and multi-gene perturbations using single-cell RNA-sequencing data from perturbational screens.
Installation
Install PyG, and then do pip install cell-gears
.
[New] Updates in v0.1.1
- Fixed training breakpoint bug from v0.1.0
- Preprocessed dataloader now available for Replogle 2022 RPE1 and K562 essential datasets
- Added custom split, fixed no-test split
Core API Interface
Using the API, you can (1) reproduce the results in our paper and (2) train GEARS on your perturbation dataset using a few lines of code.
from gears import PertData, GEARS
# get data
pert_data = PertData('./data')
# load dataset in paper: norman, adamson, dixit.
pert_data.load(data_name = 'norman')
# specify data split
pert_data.prepare_split(split = 'simulation', seed = 1)
# get dataloader with batch size
pert_data.get_dataloader(batch_size = 32, test_batch_size = 128)
# set up and train a model
gears_model = GEARS(pert_data, device = 'cuda:8')
gears_model.model_initialize(hidden_size = 64)
gears_model.train(epochs = 20)
# save/load model
gears_model.save_model('gears')
gears_model.load_pretrained('gears')
# predict
gears_model.predict([['CBL', 'CNN1'], ['FEV']])
gears_model.GI_predict(['CBL', 'CNN1'], GI_genes_file=None)
To use your own dataset, create a scanpy adata object with a gene_name
column in adata.var
, and two columns condition
, cell_type
in adata.obs
. Then run:
pert_data.new_data_process(dataset_name = 'XXX', adata = adata)
# to load the processed data
pert_data.load(data_path = './data/XXX')
Demos
Name | Description |
---|---|
Dataset Tutorial | Tutorial on how to use the dataset loader and read customized data |
Model Tutorial | Tutorial on how to train GEARS |
Plot top 20 DE genes | Tutorial on how to plot the top 20 DE genes |
Uncertainty | Tutorial on how to train an uncertainty-aware GEARS model |
Colab
Name | Description |
---|---|
Using Trained Model | Use a model trained on Norman et al. 2019 to make predictions (Needs Colab Pro) |
Cite Us
@article{roohani2023predicting,
title={Predicting transcriptional outcomes of novel multigene perturbations with gears},
author={Roohani, Yusuf and Huang, Kexin and Leskovec, Jure},
journal={Nature Biotechnology},
year={2023},
publisher={Nature Publishing Group US New York}
}
Paper: Link
Code for reproducing figures: Link