• Stars
    star
    141
  • Rank 259,971 (Top 6 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created almost 4 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Models and datasets for perturbational single-cell omics

sc-pert - Machine learning for perturbational single-cell omics

This repository provides a community-maintained summary of models and datasets. It was initially curated for (Cell Systems, 2021).

External annotations

There are various resources for evaluation of single cell perturbation models. We discuss five tasks in the publication which can be supported by the following publicly available annotations:

  • GDSC provides a collection of cell viability measurements for many compounds and cell lines. We provide a code snippet to conveniently load GDSC-provided z-score compound response rankings per cell line.
  • Additional viability data can be obtained from DepMap's PRISM dataset.
  • Therapeutics Data Commons provides access to a number of compound databases as part of their cheminformatics tasks. (In the same vein, OpenProblems provides a framework for tasks in single-cell which can also support perturbation modeling tasks in a more long term format than was previously seen in the DREAM challenges.)
  • PubChem contains a comprehensive record of compounds ranging from experimental entities to non-proprietary small molecules. It is queryable via PubChemPy.
  • DrugBank provides annotations for a relatively small number of small molecules in a standardized format.

Current modeling approaches

We maintain a list of perturbation-related tools at scrna-tools. Please consider further updating and tagging tools there.

For the basis of the table in the article, see this spreadsheet of a subset of perturbation models which includes more details.

Datasets

Below, we curated a table of perturbation datasets based on Svensson et al. (2020).

We also offer some datasets in a curated .h5ad format via the download links in the table below. raw h5ad denotes a version of the dataset that has not been filtered, normalized, or standardized.

H5ads denoted as processed have an accompanying processing notebook, and have been similarly preprocessed. These datasets have the following standardized fields in .obs:

  • perturbation_name -- Human-readable ompound names (International non-proprietary naming where possible) for small molecules and gene names for genetic perturbations.
  • perturbation_type -- small molecule or genetic
  • perturbation_value -- A continuous covariate quantity, such as the dosage concentration or the number of hours since treatment.
  • perturbation_unit -- Describes perturbation_value, such as 'ug' or 'hrs'.
Shorthand Title                                                                       .h5ad availability Treatment # perturbations # cell types # doses # timepoints Reported cells total Organism Tissue Technique Data location Panel size Measurement Cell source Disease Contrasts Developmental stage Number of reported cell types or clusters Cell clustering Pseudotime RNA Velocity PCA tSNE H5AD location Isolation BC --> Cell ID OR BC --> Cluster ID Number individuals
Jaitin et al. Science Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types CRISPR 8-22 1 - 1 4,468 Mouse Spleen MARS-seq GSE54006 nan RNA-seq CD11c+ enriched splenocytes nan nan nan 9 Yes No nan No No nan Sorting (FACS) nan nan
Dixit et al. Cell Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens [raw h5ad] [processed h5ad] [processing nb] CRISPR 10,24 1 - 1-2 200,000 Human, Mouse Culture Perturb-seq GSE90063 nan RNA-seq BMDCs, K562 nan nan nan nan nan nan nan nan No nan Nanodroplet dilution nan nan
Datlinger et al. NMeth Pooled CRISPR screening with single-cell transcriptome readout CRISPR 29 1 - 1 5,905 Human, Mouse Culture CROP-seq GSE92872 nan RNA-seq HEK293T, 3T3, Jurkat nan nan nan nan nan nan nan nan No nan nan nan nan
Hill et al. NMethods On the design of CRISPR-based single-cell molecular screens CRISPR 32 1 2 1 5,879 Human Culture CROP-seq GSE108699 nan RNA-seq MCF10a cells nan nan nan nan nan nan nan nan nan nan nan https://github.com/shendurelab/single-cell-ko-screens#result-files nan
Ursu et al. bioRxiv Massively parallel phenotyping of variant impact in cancer with Perturb-seq reveals a shift in the spectrum of cell states induced by somatic mutations CRISPR 200 1 - 1 162,314 Human Lung Perturb-seq nan nan RNA-seq nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Jin et al. Science In vivo Perturb-Seq reveals neuronal and glial abnormalities associated with autism risk genes CRISPR 35 - - 1 46,770 Mouse Brain Perturb-seq nan nan RNA-seq nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Frangieh et al. NGenetics Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion [raw h5ad] [processed h5ad] [processing nb] CRISPR 248 1 - 1 218,331 Human Culture Perturb-CITE-seq SCP1064 nan RNA-seq nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Papalexi et al. NGenetics Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens CRISPR 111 (sgRNA) 1 2 - 28,295 Human Culture CITE-seq & ECCITE-seq GSE153056 nan RNA-seq nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Datlinger et al. NMethods Ultra-high-throughput single-cell RNA sequencing and perturbation screening with combinatorial fluidic indexing CRISPR KO + antibody 96 1 1 1 nan Human, Mouse nan scifi-RNA-seq nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Alda-Catalinas et al. CSystems A Single-Cell Transcriptomics CRISPR-Activation Screen Identifies Epigenetic Regulators of the Zygotic Genome Activation Program CRISPRa 230 1 - - 203,894 Mouse Culture Chromium nan nan RNA-seq mESCs nan nan nan nan nan nan nan nan nan nan nan nan nan
Norman et al. (2019) nan [raw h5ad] [processed h5ad] [curation nb] [processing nb] CRISPRa 287 1 - 1 nan nan nan Perturb-seq nan nan RNA-seq induction of gene pair targets+single gene controls in K562 cells after screening 112 genes (2x gRNA per) and their combinations nan nan nan nan nan nan nan nan nan nan nan nan nan
Adamson et al. Cell A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response CRISPRi 9-93 (sgRNA) 1 - 1 86,000 Human Culture Perturb-seq GSE90546 nan RNA-seq K562 nan nan nan nan nan nan nan nan Yes nan nan nan nan
Gasperini et al. Cell A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens CRISPRi 1119, 5779 1 - 1 207,324 Human Culture CROP-seq GSE120861 nan RNA-seq K562 Cells nan CRISPR Screen nan nan nan nan nan nan nan nan nan nan nan
Jost et al. NBT Titrating gene expression using libraries of systematically attenuated CRISPR guide RNAs CRISPRi 25 2 - 1 19,587 Human Culture Perturb-seq GSE132080 nan RNA-seq K562 cells nan 25 gene screen nan nan nan nan nan nan nan nan nan nan nan
Schraivogel et al. NMethods Targeted Perturb-seq enables genome-scale genetic screens in single cells [processing nb] CRISPRi 1778 (enhancers) 1 - 1 231,667 Human, Mouse Bone marrow, Culture TAP-seq GSE135497 1,000 RNA-seq nan nan nan nan nan nan nan nan nan Yes nan nan nan nan
Leng et al. bioRxiv CRISPRi screens in human astrocytes elucidate regulators of distinct inflammatory reactive states CRISPRi 30 1 2 - nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Replogle et al. (2020) nan genetic targets nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Replogle et al. (2021) nan genetic targets >10000 2 - - nan nan nan Perturb-seq nan nan RNA-seq nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Shin et al. SAdvances Multiplexed single-cell RNA-seq via transient barcoding for simultaneous expression profiling of various drug perturbations small molecules 45 2 1 1 3,091 Mouse, Human Culture Drop-seq PRJNA493658 nan RNA-seq HEK293T, NIIH3T3, A375, SW480, K562 nan 45 perturbations nan nan nan nan nan nan nan nan nan nan nan
Srivatsan et al. Science Massively multiplex chemical transcriptomics at single-cell resolution [raw h5ad] [curation nb] [curation nb] [processing nb] small molecules 188 3 4 2 650,000 Human Culture sci-Plex GSE139944 nan RNA-seq Cancer cell lines A549, K562, and MCF7 nan 5,000 drug conditions nan 3 Yes Yes No Yes No nan nan nan nan
Zhao et al. bioRxiv Deconvolution of Cell Type-Specific Drug Responses in Human Tumor Tissue with Single-Cell RNA-seq small molecules 2,6 6,1 - - 48,404 Human Brain, Tumor SCRB-seq (microwell) GSE148842 nan RNA-seq nan nan nan nan nan nan nan nan nan nan nan nan nan 6
McFarland et al. NCommunications Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action [curation nb] [processing nb] small molecules 1-13 24-99 1 1-5 nan Human Culture MIX-seq nan nan RNA-seq nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Chen et al. (2020) nan small molecules 300 1 1 1 nan nan nan CyTOF nan nan protein breast cancer cells undergoing TGF-β-induced EMT nan nan nan nan nan nan nan nan nan nan nan nan nan

More Repositories

1

single-cell-tutorial

Single cell current best practices tutorial case study for the paper:Luecken and Theis, "Current best practices in single-cell RNA-seq analysis: a tutorial"
Jupyter Notebook
1,284
star
2

single-cell-best-practices

https://www.sc-best-practices.org
Jupyter Notebook
746
star
3

cellrank

CellRank: dynamics from multi-view single-cell data
Python
342
star
4

scvelo

RNA Velocity generalized through dynamical modeling
Python
335
star
5

scarches

Reference mapping for single-cell genomics
Jupyter Notebook
333
star
6

scib

Benchmarking analysis of data integration tools
Python
298
star
7

scgen

Single cell perturbation prediction
Python
259
star
8

dca

Deep count autoencoder for denoising scRNA-seq data
Python
224
star
9

ehrapy

Electronic Health Record Analysis with Python.
Python
201
star
10

diffxpy

Differential expression analysis for single-cell RNA-seq data.
Python
192
star
11

paga

Mapping out the coarse-grained connectivity structures of complex manifolds.
Jupyter Notebook
159
star
12

kBET

An R package to test for batch effects in high-dimensional single-cell RNA sequencing data.
HTML
148
star
13

scCODA

A Bayesian model for compositional single-cell data analysis
Jupyter Notebook
145
star
14

sfaira

data and model repository for single-cell data
Python
134
star
15

anndata2ri

Convert between AnnData and SingleCellExperiment
Python
124
star
16

moscot

Multi-omic single-cell optimal transport tools
Python
112
star
17

ncem

Learning cell communication from spatial graphs of cells
Python
102
star
18

chemCPA

Code for "Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution", NeurIPS 2022.
Jupyter Notebook
96
star
19

zellkonverter

Conversion between scRNA-seq objects
R
88
star
20

cpa

The Compositional Perturbation Autoencoder (CPA) is a deep generative framework to learn effects of perturbations at the single-cell level. CPA performs OOD predictions of unseen combinations of drugs, learns interpretable embeddings, estimates dose-response curves, and provides uncertainty estimates.
Python
84
star
21

scib-pipeline

Snakemake pipeline that works with the scIB package to benchmark data integration methods.
Python
65
star
22

destiny

R package for single cell and other data analysis using diffusion maps
R
62
star
23

nicheformer

Repository for Nicheformer: a foundation model for single-cell and spatial omics
Jupyter Notebook
55
star
24

trVAE

Conditional out-of-distribution prediction
Python
54
star
25

scib-reproducibility

Additional code and analysis from the single-cell integration benchmarking project
Jupyter Notebook
53
star
26

AutoGeneS

Jupyter Notebook
50
star
27

spatial_scog_workshop_2022

Tutorials for the SCOG Virtual Workshop ‘Spatial transcriptomics data analysis in Python’ - May 23-24, 2022
Jupyter Notebook
49
star
28

pseudodynamics

Dynamic models for single-cell RNA-seq time series.
Jupyter Notebook
40
star
29

scTab

Jupyter Notebook
38
star
30

tcellmatch

Python
34
star
31

scArches-reproducibility

Reproducing result from the paper
Jupyter Notebook
33
star
32

graphcompass

GraphCompass: Graph Comparison Tools for Differential Analyses in Spatial Systems
Jupyter Notebook
30
star
33

deepflow

This code contains the neural network implementation from the nature communication manuscript NCOMMS-16-25447A.
Python
28
star
34

mubind

Learning motif contributions to cell transitions using sequence features and graphs.
Python
27
star
35

batchglm

Fit generalized linear models in python.
Python
27
star
36

graph_abstraction

Generate cellular maps of differentiation manifolds with complex topologies.
Jupyter Notebook
26
star
37

DeepRT

Jupyter Notebook
25
star
38

hadge

Comprehensive pipeline for donor demultiplexing in single cell
Nextflow
24
star
39

Covid_meta_analysis

Analysis notebooks for the Covid-19 meta analysis that accompanies the Nature Medicine publication "Single-cell meta-analysis of SARS-CoV-2 entry genes across tissues and demographics"
Jupyter Notebook
24
star
40

spapros

Python package for Probe set selection for targeted spatial transcriptomics.
Python
23
star
41

scvelo_notebooks

Jupyter Notebook
23
star
42

interactive_plotting

Jupyter Notebook
21
star
43

scgen-reproducibility

Jupyter Notebook
18
star
44

multimil

Multimodal weakly supervised learning to identify disease-specific changes in single-cell atlases
Python
18
star
45

geome

Python
16
star
46

campa

Conditional Autoencoders for Multiplexed Pixel Analysis
Jupyter Notebook
14
star
47

multicpa

Python
13
star
48

scPoli_reproduce

Reproducibility notebooks for scPoli
Jupyter Notebook
13
star
49

cellrank_reproducibility

CellRank's reproducibility repository.
Jupyter Notebook
13
star
50

scanpy-in-R

A guide to using the Python scRNA-seq analysis package Scanpy from R
HTML
12
star
51

scanpydoc

Collection of Sphinx extensions similar to (but more flexible than) numpydoc
Python
12
star
52

MetaMap

The code and analyses accompanying the manuscript “MetaMap: An atlas of metatranscriptomic reads in human disease-related RNA-seq data”.
HTML
12
star
53

DeepCollisionalCrossSection

Jupyter Notebook
11
star
54

scAnalysisTutorial

Jupyter Notebook
10
star
55

multigrate

Multigrate: multiomic data integration for single-cell genomics
Python
10
star
56

cross_system_integration

Jupyter Notebook
10
star
57

GWAS-scRNAseq-Integration

A Shiny tool to define the cell-type of action by integrating single cell expression data with GWAS
R
10
star
58

superexacttestpy

Python implementation of the SuperExactTest package
Jupyter Notebook
9
star
59

enrichment_analysis_celltype

Cell type enrichment analysis using gene signatures and cluster markers
R
9
star
60

ncem_tutorials

Jupyter Notebook
9
star
61

IMPA

Jupyter Notebook
9
star
62

diffxpy_tutorials

Tutorials for diffxpy.
Jupyter Notebook
9
star
63

moslin

Code, data and analysis for moslin.
Jupyter Notebook
9
star
64

trvaep

Jupyter Notebook
9
star
65

expiMap_reproducibility

Jupyter Notebook
9
star
66

ncem_benchmarks

Jupyter Notebook
8
star
67

greatpy

GREAT algorithm in Python
Jupyter Notebook
8
star
68

PathReg

Sparsity-enforcing regularizer
Jupyter Notebook
8
star
69

squidpy_reproducibility

Jupyter Notebook
8
star
70

sc-best-practices-ce

The best-practices workflow for single-cell RNA-seq analysis as determined by the community.
8
star
71

tissue_tensorflow

Python
8
star
72

2020_Mayr

This repo contains the analysis code describing the findings of Mayr_et_al
Jupyter Notebook
7
star
73

ehrapy-tutorials

Tutorials for ehrapy
Jupyter Notebook
7
star
74

cpa-reproducibility

Notebooks for CPA figures
Jupyter Notebook
7
star
75

scachepy

Caching extension for Scanpy
Jupyter Notebook
7
star
76

2019_Strunz

Reproducibility repo accompanying Strunz et al. "Alveolar regeneration through a Krt8+ transitional stem cell state that persists in human lung fibrosis". Nat Commun. 2020.
Jupyter Notebook
7
star
77

scCODA_reproducibility

Jupyter Notebook
7
star
78

2018_Angelidis

Reproducibility repo accompanying Angelidis et al. "An atlas of the aging lung mapped by single cell transcriptomics and deep tissue proteomics"
R
6
star
79

gastrulation_analysis

Jupyter Notebook
6
star
80

trVAE_reproducibility

Jupyter Notebook
6
star
81

cellrank_notebooks

Tutorials and examples for CellRank.
Jupyter Notebook
6
star
82

intercode

Jupyter Notebook
6
star
83

spapros-pipeline

Nextflow
6
star
84

jump-cpg0016-segmentation

Snakemake pipeline used to segment the cpg0016 dataset of the JUMP-Cell Painting Consortium
Jupyter Notebook
6
star
85

flowVI

flowVI: Flow Cytometry Variational Inference
5
star
86

sfaira_tutorials

Jupyter Notebook
5
star
87

theislab.github.io

theislab repository overview
JavaScript
5
star
88

scatac_poisson_reproducibility

Jupyter Notebook
5
star
89

disent

Out-of-distribution prediction with disentangled representations for single-cell RNA sequencing data
Jupyter Notebook
5
star
90

ehrapy-datasets

A collection of scripts to generate AnnData objects of EHR datasets for ehrapy
Jupyter Notebook
5
star
91

neural_organoid_atlas

Reproducibility repository for the Human Neural Organoid Atlas publication
Jupyter Notebook
5
star
92

moscot_notebooks

Analysis notebooks using the moscot package
Jupyter Notebook
5
star
93

scanpy-demo-czbiohub

single-cell scanpy teaching
HTML
5
star
94

kbranches

Finding branching events and tips in single cell differentiation trajectories
R
5
star
95

InterpretableAutoencoders

Jupyter Notebook
5
star
96

archmap

JavaScript
4
star
97

inVAE

Invariant Representation learning
Jupyter Notebook
4
star
98

cellrank_reproducibility_preprint

Code to reproduce results from the CellRank preprint
Jupyter Notebook
4
star
99

extended-single-cell-best-practices-container

Hosting the container for the extended single-cell best-practices book
Dockerfile
4
star
100

LODE

repository for all LODE projects
Jupyter Notebook
4
star