sc-pert - Machine learning for perturbational single-cell omics
This repository provides a community-maintained summary of models and datasets. It was initially curated for (Cell Systems, 2021).
External annotations
There are various resources for evaluation of single cell perturbation models. We discuss five tasks in the publication which can be supported by the following publicly available annotations:
- GDSC provides a collection of cell viability measurements for many compounds and cell lines. We provide a code snippet to conveniently load GDSC-provided z-score compound response rankings per cell line.
- Additional viability data can be obtained from DepMap's PRISM dataset.
- Therapeutics Data Commons provides access to a number of compound databases as part of their cheminformatics tasks. (In the same vein, OpenProblems provides a framework for tasks in single-cell which can also support perturbation modeling tasks in a more long term format than was previously seen in the DREAM challenges.)
- PubChem contains a comprehensive record of compounds ranging from experimental entities to non-proprietary small molecules. It is queryable via PubChemPy.
- DrugBank provides annotations for a relatively small number of small molecules in a standardized format.
Current modeling approaches
We maintain a list of perturbation-related tools at scrna-tools. Please consider further updating and tagging tools there.
For the basis of the table in the article, see this spreadsheet of a subset of perturbation models which includes more details.
Datasets
Below, we curated a table of perturbation datasets based on Svensson et al. (2020).
We also offer some datasets in a curated .h5ad
format via the download links in the table below. raw h5ad
denotes a version of the dataset that has not been filtered, normalized, or standardized.
H5ads denoted as processed
have an accompanying processing notebook, and have been similarly preprocessed. These datasets have the following standardized fields in .obs
:
perturbation_name
-- Human-readable ompound names (International non-proprietary naming where possible) for small molecules and gene names for genetic perturbations.perturbation_type
--small molecule
orgenetic
perturbation_value
-- A continuous covariate quantity, such as the dosage concentration or the number of hours since treatment.perturbation_unit
-- Describesperturbation_value
, such as'ug'
or'hrs'
.
Shorthand | Title | .h5ad availability | Treatment | # perturbations | # cell types | # doses | # timepoints | Reported cells total | Organism | Tissue | Technique | Data location | Panel size | Measurement | Cell source | Disease | Contrasts | Developmental stage | Number of reported cell types or clusters | Cell clustering | Pseudotime | RNA Velocity | PCA | tSNE | H5AD location | Isolation | BC --> Cell ID OR BC --> Cluster ID | Number individuals |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Jaitin et al. Science | Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types | CRISPR | 8-22 | 1 | - | 1 | 4,468 | Mouse | Spleen | MARS-seq | GSE54006 | nan | RNA-seq | CD11c+ enriched splenocytes | nan | nan | nan | 9 | Yes | No | nan | No | No | nan | Sorting (FACS) | nan | nan | |
Dixit et al. Cell | Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens | [raw h5ad] [processed h5ad] [processing nb] | CRISPR | 10,24 | 1 | - | 1-2 | 200,000 | Human, Mouse | Culture | Perturb-seq | GSE90063 | nan | RNA-seq | BMDCs, K562 | nan | nan | nan | nan | nan | nan | nan | nan | No | nan | Nanodroplet dilution | nan | nan |
Datlinger et al. NMeth | Pooled CRISPR screening with single-cell transcriptome readout | CRISPR | 29 | 1 | - | 1 | 5,905 | Human, Mouse | Culture | CROP-seq | GSE92872 | nan | RNA-seq | HEK293T, 3T3, Jurkat | nan | nan | nan | nan | nan | nan | nan | nan | No | nan | nan | nan | nan | |
Hill et al. NMethods | On the design of CRISPR-based single-cell molecular screens | CRISPR | 32 | 1 | 2 | 1 | 5,879 | Human | Culture | CROP-seq | GSE108699 | nan | RNA-seq | MCF10a cells | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | https://github.com/shendurelab/single-cell-ko-screens#result-files | nan | |
Ursu et al. bioRxiv | Massively parallel phenotyping of variant impact in cancer with Perturb-seq reveals a shift in the spectrum of cell states induced by somatic mutations | CRISPR | 200 | 1 | - | 1 | 162,314 | Human | Lung | Perturb-seq | nan | nan | RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Jin et al. Science | In vivo Perturb-Seq reveals neuronal and glial abnormalities associated with autism risk genes | CRISPR | 35 | - | - | 1 | 46,770 | Mouse | Brain | Perturb-seq | nan | nan | RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Frangieh et al. NGenetics | Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion | [raw h5ad] [processed h5ad] [processing nb] | CRISPR | 248 | 1 | - | 1 | 218,331 | Human | Culture | Perturb-CITE-seq | SCP1064 | nan | RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
Papalexi et al. NGenetics | Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens | CRISPR | 111 (sgRNA) | 1 | 2 | - | 28,295 | Human | Culture | CITE-seq & ECCITE-seq | GSE153056 | nan | RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Datlinger et al. NMethods | Ultra-high-throughput single-cell RNA sequencing and perturbation screening with combinatorial fluidic indexing | CRISPR KO + antibody | 96 | 1 | 1 | 1 | nan | Human, Mouse | nan | scifi-RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Alda-Catalinas et al. CSystems | A Single-Cell Transcriptomics CRISPR-Activation Screen Identifies Epigenetic Regulators of the Zygotic Genome Activation Program | CRISPRa | 230 | 1 | - | - | 203,894 | Mouse | Culture | Chromium | nan | nan | RNA-seq | mESCs | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Norman et al. (2019) | nan | [raw h5ad] [processed h5ad] [curation nb] [processing nb] | CRISPRa | 287 | 1 | - | 1 | nan | nan | nan | Perturb-seq | nan | nan | RNA-seq | induction of gene pair targets+single gene controls in K562 cells after screening 112 genes (2x gRNA per) and their combinations | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
Adamson et al. Cell | A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response | CRISPRi | 9-93 (sgRNA) | 1 | - | 1 | 86,000 | Human | Culture | Perturb-seq | GSE90546 | nan | RNA-seq | K562 | nan | nan | nan | nan | nan | nan | nan | nan | Yes | nan | nan | nan | nan | |
Gasperini et al. Cell | A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens | CRISPRi | 1119, 5779 | 1 | - | 1 | 207,324 | Human | Culture | CROP-seq | GSE120861 | nan | RNA-seq | K562 Cells | nan | CRISPR Screen | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Jost et al. NBT | Titrating gene expression using libraries of systematically attenuated CRISPR guide RNAs | CRISPRi | 25 | 2 | - | 1 | 19,587 | Human | Culture | Perturb-seq | GSE132080 | nan | RNA-seq | K562 cells | nan | 25 gene screen | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Schraivogel et al. NMethods | Targeted Perturb-seq enables genome-scale genetic screens in single cells | [processing nb] | CRISPRi | 1778 (enhancers) | 1 | - | 1 | 231,667 | Human, Mouse | Bone marrow, Culture | TAP-seq | GSE135497 | 1,000 | RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | Yes | nan | nan | nan | nan |
Leng et al. bioRxiv | CRISPRi screens in human astrocytes elucidate regulators of distinct inflammatory reactive states | CRISPRi | 30 | 1 | 2 | - | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Replogle et al. (2020) | nan | genetic targets | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Replogle et al. (2021) | nan | genetic targets | >10000 | 2 | - | - | nan | nan | nan | Perturb-seq | nan | nan | RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Shin et al. SAdvances | Multiplexed single-cell RNA-seq via transient barcoding for simultaneous expression profiling of various drug perturbations | small molecules | 45 | 2 | 1 | 1 | 3,091 | Mouse, Human | Culture | Drop-seq | PRJNA493658 | nan | RNA-seq | HEK293T, NIIH3T3, A375, SW480, K562 | nan | 45 perturbations | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | |
Srivatsan et al. Science | Massively multiplex chemical transcriptomics at single-cell resolution | [raw h5ad] [curation nb] [curation nb] [processing nb] | small molecules | 188 | 3 | 4 | 2 | 650,000 | Human | Culture | sci-Plex | GSE139944 | nan | RNA-seq | Cancer cell lines A549, K562, and MCF7 | nan | 5,000 drug conditions | nan | 3 | Yes | Yes | No | Yes | No | nan | nan | nan | nan |
Zhao et al. bioRxiv | Deconvolution of Cell Type-Specific Drug Responses in Human Tumor Tissue with Single-Cell RNA-seq | small molecules | 2,6 | 6,1 | - | - | 48,404 | Human | Brain, Tumor | SCRB-seq (microwell) | GSE148842 | nan | RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 6 | |
McFarland et al. NCommunications | Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action | [curation nb] [processing nb] | small molecules | 1-13 | 24-99 | 1 | 1-5 | nan | Human | Culture | MIX-seq | nan | nan | RNA-seq | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
Chen et al. (2020) | nan | small molecules | 300 | 1 | 1 | 1 | nan | nan | nan | CyTOF | nan | nan | protein | breast cancer cells undergoing TGF-β-induced EMT | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |