• Stars
    star
    103
  • Rank 333,046 (Top 7 %)
  • Language
    R
  • License
    MIT License
  • Created over 6 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Infer cell types in scRNA-seq data using bulk RNA-seq or gene sets

clustifyr

R-CMD-check-bioc Codecov test coverage platforms bioc #downloads

clustifyr classifies cells and clusters in single-cell RNA sequencing experiments using reference bulk RNA-seq data sets, sorted microarray expression data, single-cell gene signatures, or lists of marker genes.

Installation

Install the Bioconductor version with:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("clustifyr")

Install the development version with:

BiocManager::install("rnabioco/clustifyr")

Example usage

In this example we use the following built-in input data:

  • an expression matrix of single cell RNA-seq data (pbmc_matrix_small)
  • a metadata data.frame (pbmc_meta), with cluster information stored ("classified")
  • a vector of variable genes (pbmc_vargenes)
  • a matrix of mean normalized scRNA-seq UMI counts by cell type (cbmc_ref)

We then calculate correlation coefficients and plot them on a pre-calculated projection (stored in pbmc_meta).

library(clustifyr)

# calculate correlation
res <- clustify(
  input = pbmc_matrix_small,
  metadata = pbmc_meta$classified,
  ref_mat = cbmc_ref,
  query_genes = pbmc_vargenes
)

# print assignments
cor_to_call(res)
#> # A tibble: 9 ร— 3
#> # Groups:   cluster [9]
#>   cluster      type           r
#>   <chr>        <chr>      <dbl>
#> 1 B            B          0.909
#> 2 CD14+ Mono   CD14+ Mono 0.915
#> 3 FCGR3A+ Mono CD16+ Mono 0.929
#> 4 Memory CD4 T CD4 T      0.861
#> 5 Naive CD4 T  CD4 T      0.889
#> 6 DC           DC         0.849
#> 7 Platelet     Mk         0.732
#> 8 CD8 T        NK         0.826
#> 9 NK           NK         0.894

# plot assignments on a projection
plot_best_call(
  cor_mat = res,
  metadata = pbmc_meta,
  cluster_col = "classified"
)

clustify() can take a clustered SingleCellExperiment or seurat object (both v2 and v3) and assign identities.

# for SingleCellExperiment
clustify(
  input = sce_small,          # an SCE object
  ref_mat = cbmc_ref,         # matrix of RNA-seq expression data for each cell type
  cluster_col = "cell_type1", # name of column in meta.data containing cell clusters
  obj_out = TRUE              # output SCE object with cell type inserted as "type" column
) 
#> class: SingleCellExperiment 
#> dim: 200 200 
#> metadata(0):
#> assays(2): counts logcounts
#> rownames(200): SGIP1 AZIN2 ... TAF12 SNHG3
#> rowData names(10): feature_symbol is_feature_control ... total_counts
#>   log10_total_counts
#> colnames(200): AZ_A1 AZ_A10 ... HP1502401_E18 HP1502401_E19
#> colData names(35): cell_quality cell_type1 ... type r
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

library(Seurat)
# for Seurat3/4
clustify(
  input = s_small3,
  cluster_col = "RNA_snn_res.1",
  ref_mat = cbmc_ref,
  seurat_out = TRUE
)
#> An object of class Seurat 
#> 230 features across 80 samples within 1 assay 
#> Active assay: RNA (230 features, 20 variable features)
#>  2 dimensional reductions calculated: pca, tsne

# New output option, directly as a vector (in the order of the metadata), which can then be inserted into metadata dataframes and other workflows
clustify(
  input = s_small3,
  cluster_col = "RNA_snn_res.1",
  ref_mat = cbmc_ref,
  vec_out = TRUE
)
#>  [1] "Mk"         "Mk"         "Mk"         "Mk"         "Mk"        
#>  [6] "Mk"         "Mk"         "Mk"         "Mk"         "Mk"        
#> [11] "B"          "B"          "B"          "B"          "B"         
#> [16] "B"          "B"          "B"          "B"          "B"         
#> [21] "CD16+ Mono" "CD16+ Mono" "CD16+ Mono" "CD16+ Mono" "CD16+ Mono"
#> [26] "CD16+ Mono" "CD16+ Mono" "CD16+ Mono" "CD16+ Mono" "CD16+ Mono"
#> [31] "Mk"         "B"          "Mk"         "Mk"         "Mk"        
#> [36] "Mk"         "Mk"         "Mk"         "Mk"         "Mk"        
#> [41] "Mk"         "B"          "Mk"         "Mk"         "B"         
#> [46] "B"          "Mk"         "Mk"         "Mk"         "Mk"        
#> [51] "CD16+ Mono" "CD16+ Mono" "B"          "CD16+ Mono" "CD16+ Mono"
#> [56] "CD16+ Mono" "CD16+ Mono" "CD16+ Mono" "CD16+ Mono" "Mk"        
#> [61] "B"          "CD16+ Mono" "B"          "CD16+ Mono" "B"         
#> [66] "CD16+ Mono" "CD16+ Mono" "CD16+ Mono" "CD16+ Mono" "B"         
#> [71] "Mk"         "Mk"         "Mk"         "Mk"         "Mk"        
#> [76] "Mk"         "Mk"         "Mk"         "Mk"         "CD16+ Mono"

New reference matrix can be made directly from SingleCellExperiment and Seurat objects as well. Other scRNAseq experiment object types are supported as well.

# make reference from SingleCellExperiment objects
sce_ref <- object_ref(
  input = sce_small,               # SCE object
  cluster_col = "cell_type1"       # name of column in colData containing cell identities
)
#> The following clusters have less than 10 cells for this analysis: co-expression, ductal, endothelial, epsilon, MHC class II, PSC. Classification is likely inaccurate.

# make reference from seurat objects
s_ref <- seurat_ref(
  seurat_object = s_small3,
  cluster_col = "RNA_snn_res.1"
)

head(s_ref)
#>                 0        1        2
#> MS4A1    0.000000 1.126047 5.151065
#> CD79B    2.469341 2.920407 5.031316
#> CD79A    0.000000 2.535151 5.375681
#> HLA-DRA  3.640368 6.008446 7.055386
#> TCL1A    0.000000 1.495867 4.963367
#> HLA-DQB1 1.603068 3.836290 5.137422

clustify_lists() handles identity assignment of matrix or SingleCellExperiment and seurat objects based on marker gene lists.

clustify_lists(
  input = pbmc_matrix_small,
  metadata = pbmc_meta,
  cluster_col = "classified",
  marker = pbmc_markers,
  marker_inmatrix = FALSE
)
#>                      0        1        2         3         4        5        6
#> Naive CD4 T  1.5639055 20.19469 31.77095  8.664074 23.844992 19.06931 19.06931
#> Memory CD4 T 1.5639055 20.19469 31.77095 10.568007 23.844992 17.97875 19.06931
#> CD14+ Mono   0.9575077 14.70716 76.21353 17.899569 11.687739 49.86699 16.83210
#> B            0.6564777 12.70976 31.77095 26.422929 13.536295 20.19469 13.53630
#> CD8 T        1.0785353 17.97875 31.82210 12.584823 31.822099 22.71234 40.45383
#> FCGR3A+ Mono 0.6564777 13.63321 72.43684 17.899569  9.726346 56.48245 14.61025
#> NK           0.6564777 14.61025 31.82210  7.757206 31.822099 22.71234 45.05072
#> DC           0.6564777 15.80598 63.34978 19.069308 13.758144 40.56298 17.97875
#> Platelet     0.5428889 13.34769 59.94938 14.215244 15.158755 46.92861 19.49246
#>                      7          8
#> Naive CD4 T   6.165348  0.6055118
#> Memory CD4 T  6.165348  0.9575077
#> CD14+ Mono   25.181595  1.0785353
#> B            17.899569  0.1401901
#> CD8 T         7.882145  0.3309153
#> FCGR3A+ Mono 21.409177  0.3309153
#> NK            5.358651  0.3309153
#> DC           45.101877  0.1401901
#> Platelet     19.492465 59.9493793

clustify_lists(
  input = s_small3,
  marker = pbmc_markers,
  marker_inmatrix = FALSE,
  cluster_col = "RNA_snn_res.1",
  seurat_out = TRUE
)
#> An object of class Seurat 
#> 230 features across 80 samples within 1 assay 
#> Active assay: RNA (230 features, 20 variable features)
#>  2 dimensional reductions calculated: pca, tsne

Additional resources

  • Script for benchmarking, compatible with scRNAseq_Benchmark

  • Additional reference data (including tabula muris, immgen, etc) are available in a supplemental package clustifyrdatahub. Also see list for individual downloads.

  • See the FAQ for more details.

More Repositories

1

valr

Genome Interval Arithmetic in R
R
87
star
2

djvdj

An R package to analyze single-cell V(D)J data
R
23
star
3

ggtrace

ggplot2 geoms to outline groups of data points
R
15
star
4

practical-data-analysis

Short course using RStudio for biological data analysis
R
13
star
5

clustifyrdata

Reference data for cell-type assignment with clustifyr
R
13
star
6

nihexporter

An R data package for NIH EXPORTER data
R
12
star
7

scraps

Single-cell RNA Processing Software
Python
11
star
8

clustifyrdatahub

External single-cell mRNA sequencing data sets for use with clustifyr
R
8
star
9

cellar

scRNA-Seq analysis vignettes
HTML
8
star
10

someta

Inclusion of processed cell metadata improves single cell sequencing analysis reproducibility and accessibility
R
8
star
11

scbp

utility boilerplate single cell functions
R
7
star
12

raer

Characterize A-to-I RNA editing in bulk and single-cell RNA sequencing experiments
R
7
star
13

rnaedits

Analysis of A-to-I RNA editing during hibernation
Python
4
star
14

squirrelbox

Shiny app to browse hibernating ground squirrel brain tissue RNA-seq data, new genome assembly and transcriptome annotation.
R
4
star
15

scRNA-seq-Cell-Ref-Matrix

A reference matrix atlas of all single cell RNA-seq data on NCBI GEO for mice and humans
R
4
star
16

scrna-subsets

Recovery of transcriptome subsets from pooled single cell RNA sequencing libraries
C
3
star
17

scrapR

R package for importing and processing scraps reference and output
R
3
star
18

molb-7950

MOLB 7950: Informatics and Statistics for Molecular Biology
HTML
3
star
19

cellrangerAWS

Command line tool to run Cell Ranger using AWS
Shell
3
star
20

medulloblast

single cell RNA-seq survey of childhood medulloblastoma
R
2
star
21

rhumba

Snakemake pipelines for RNA sequencing data
C
2
star
22

rnastruct

Pipeline for investigating RNA structure-Seq data
Python
2
star
23

spatialshiny

Browser for Spatial Transcriptomics, built on R Shiny
R
2
star
24

tRNA004

Nanopore direct tRNA sequencing method refinement and RNA modification benchmarking
Python
2
star
25

deepsort-rs

sort demultiplexed reads based on deeplexicon outputs
Rust
1
star
26

fuzzyfastq-rs

Rust implementation of fuzzy matching of sequences in FASTQ files
Rust
1
star
27

gene_model_plot

Python package designed to visualize gene models from GTF files and variants from VCF files.
Jupyter Notebook
1
star
28

clustifyr-web-app

An R Shiny app to help with scRNA-seq benchmarking and analysis with clustifyr
R
1
star
29

rbi-website

Website for the RNA Bioscience Initiative at CU Anschutz
TeX
1
star
30

bmsc-7810-pbda

BMSC 7810: Practical biological data analysis with R/RStudio
HTML
1
star
31

scinter

overview of single cell visualization tools
1
star
32

visium-probe-design

Generate probes for the 10x Genomics fixed assay
Python
1
star
33

detectrms-rs

Fast detection of rna modifications sites
Rust
1
star