• Stars
    star
    183
  • Rank 208,992 (Top 5 %)
  • Language
    HTML
  • License
    BSD 3-Clause "New...
  • Created almost 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code and annotations for the Tabula Muris single-cell transcriptomic dataset.

tabula-muris

The Tabula muris data was generated by the Chan Zuckerberg Biohub. For a detailed description of the project please refer to our publication Transcriptomic characterization of 20 organs and tissues from mouse at single cell resolution creates a Tabula Muris. The Tabula muris project is a a compendium of single cell transcriptomic data from the mouse containing nearly 100,000 cells from 20 organs and tissues. The data allow for direct and controlled comparison of gene expression in cell types shared between tissues, such as immune cells from distinct anatomical locations. The resource also enables contrasting two distinct technical approaches:

  • microfluidic droplet-based 3'-end counting, which provides a survey of thousands of cells per organ at relatively low coverage.
  • FACS-based full length transcript analysis, which provides higher sensitivity and coverage.

This rich collection of annotated cells will be a useful resource for:

  • Defining gene expression in previously poorly-characterized cell populations.
  • Validating findings in future targeted single-cell studies.
  • Developing of methods for integrating datasets (eg between the FACS and droplet experiments), characterizing batch effects, and quantifying the variation of gene expression in many cell types between organs and animals.

Since late 2017, Tabula muris data have been made available to all users free of charge. AWS has made the data freely available on Amazon S3 so that anyone can download the resource to perform analysis and advance medical discovery without needing to worry about the cost of storing Tabula muris data or the time required to download it.

Learn more about how Tabula muris data is used in the project vignettes repo.

Installation - Python

To install the Python dependencies, create a tabula-muris-env environment by using the environment.yml file provided:

conda env create -f environment.yml

Activate the environment and install it to your Jupyter notebook with:

source activate tabula-muris-env
python -m ipykernel install --user --name tabula-muris-env --display-name "Python 3.6 (tabula-muris-env)"

Installation - R

Packages:

install.packages(c("here", "Seurat", "useful", "ontologyIndex", "tidyverse"))

Getting started

From "raw" gene-cell counts tables

If you want to start from the raw gene-cell counts tables, then first download the data from figshare. You can download manually from the links (FACS and Droplet) or run a script we've prepared:

bash 00_data_ingest/download_data.sh

This will download two zip files,droplet_raw_data.zip and facs_raw_data.zip and unzip them into the folder structure described below. Then you'll have two folders in 00_data_ingest (the location is important - everything here depends on the folder structure).

FACS

The FACS folder should look like this:

00_facs_raw_data
β”œβ”€β”€ FACS
β”‚Β Β  β”œβ”€β”€ Aorta-counts.csv
β”‚Β Β  β”œβ”€β”€ Bladder-counts.csv
β”‚Β Β  β”œβ”€β”€ Brain_Myeloid-counts.csv
β”‚Β Β  β”œβ”€β”€ Brain_Non-Myeloid-counts.csv
β”‚Β Β  β”œβ”€β”€ Diaphragm-counts.csv
β”‚Β Β  β”œβ”€β”€ Fat-counts.csv
β”‚Β Β  β”œβ”€β”€ Heart-counts.csv
β”‚Β Β  β”œβ”€β”€ Kidney-counts.csv
β”‚Β Β  β”œβ”€β”€ Large_Intestine-counts.csv
β”‚Β Β  β”œβ”€β”€ Limb_Muscle-counts.csv
β”‚Β Β  β”œβ”€β”€ Liver-counts.csv
β”‚Β Β  β”œβ”€β”€ Lung-counts.csv
β”‚Β Β  β”œβ”€β”€ Mammary_Gland-counts.csv
β”‚Β Β  β”œβ”€β”€ Marrow-counts.csv
β”‚Β Β  β”œβ”€β”€ Pancreas-counts.csv
β”‚Β Β  β”œβ”€β”€ Skin-counts.csv
β”‚Β Β  β”œβ”€β”€ Spleen-counts.csv
β”‚Β Β  β”œβ”€β”€ Thymus-counts.csv
β”‚Β Β  β”œβ”€β”€ Tongue-counts.csv
β”‚Β Β  └── Trachea-counts.csv
β”œβ”€β”€ FACS.zip
β”œβ”€β”€ annotations_FACS.csv
└── metadata_FACS.csv

Droplet

Now your droplet folders should look like this:

01_droplet_raw_data
β”œβ”€β”€ annotations_droplet.csv
β”œβ”€β”€ droplet
β”‚Β Β  β”œβ”€β”€ Bladder-10X_P4_3
β”‚Β Β  β”œβ”€β”€ Bladder-10X_P4_4
β”‚Β Β  β”œβ”€β”€ Bladder-10X_P7_7
β”‚Β Β  β”œβ”€β”€ Heart_and_Aorta-10X_P7_4
β”‚Β Β  β”œβ”€β”€ Kidney-10X_P4_5
β”‚Β Β  β”œβ”€β”€ Kidney-10X_P4_6
β”‚Β Β  β”œβ”€β”€ Kidney-10X_P7_5
β”‚Β Β  β”œβ”€β”€ Limb_Muscle-10X_P7_14
β”‚Β Β  β”œβ”€β”€ Limb_Muscle-10X_P7_15
β”‚Β Β  β”œβ”€β”€ Liver-10X_P4_2
β”‚Β Β  β”œβ”€β”€ Liver-10X_P7_0
β”‚Β Β  β”œβ”€β”€ Liver-10X_P7_1
β”‚Β Β  β”œβ”€β”€ Lung-10X_P7_8
β”‚Β Β  β”œβ”€β”€ Lung-10X_P7_9
β”‚Β Β  β”œβ”€β”€ Lung-10X_P8_12
β”‚Β Β  β”œβ”€β”€ Lung-10X_P8_13
β”‚Β Β  β”œβ”€β”€ Mammary_Gland-10X_P7_12
β”‚Β Β  β”œβ”€β”€ Mammary_Gland-10X_P7_13
β”‚Β Β  β”œβ”€β”€ Marrow-10X_P7_2
β”‚Β Β  β”œβ”€β”€ Marrow-10X_P7_3
β”‚Β Β  β”œβ”€β”€ Spleen-10X_P4_7
β”‚Β Β  β”œβ”€β”€ Spleen-10X_P7_6
β”‚Β Β  β”œβ”€β”€ Thymus-10X_P7_11
β”‚Β Β  β”œβ”€β”€ Tongue-10X_P4_0
β”‚Β Β  β”œβ”€β”€ Tongue-10X_P4_1
β”‚Β Β  β”œβ”€β”€ Tongue-10X_P7_10
β”‚Β Β  β”œβ”€β”€ Trachea-10X_P8_14
β”‚Β Β  └── Trachea-10X_P8_15
β”œβ”€β”€ droplet.zip
└── metadata_droplet.csv

All of the *-10X_* folders contain a barcodes.tsv, genes.tsv, and matrix.mtx file as output by cellranger from 10X genomics.

01_droplet_raw_data/droplet/Bladder-10X_P4_3
β”œβ”€β”€ barcodes.tsv
β”œβ”€β”€ genes.tsv
└── matrix.mtx

Folder Organization

  • FACS = SmartSeq2 on FACS-sorted plates
  • Microfluidic = 10x droplet-based unique molecular identifier (UMI)-barcoded transcripts and cells
tabula_muris/
    00_data_ingest/               # How the data was processed from gene-cell tables
        README.md
        download_robj.Rmd         # Download R objects for figures using this script
        02_tissue_analysis_rmd/                  # *Generate* R objects for figures yourself
            Aorta_facs.Rmd
            Brain-Non-microglia_facs.Rmd
            Brain-Microglia_facs.Rmd
            Bladder_facs.Rmd
            Bladder_droplet.Rmd
            Colon_facs.Rmd
            Heart_facs.Rmd
            Heart_droplet.Rmd
            ... more files ...
        03_tissue_annotation_csv/
            Aorta_facs_annotation.csv
            Brain-Non-microglia_facs_annotation.csv
            Brain-Microglia_facs_annotation.csv
            Bladder_facs_annotation.csv
            Bladder_droplet_annotation.csv
            Colon_facs_annotation.csv
            Heart_facs_annotation.csv
            Heart_droplet_annotation.csv
            ... more files ...
        04_tissue_robj_generated/
        10_tissue_robj_downloaded/
        11_global_robj/
        12_extract_number_of_genes_cells/
        13_ngenes_ncells_facs/
        14_ngenes_ncells_droplet/
        15_color_palette/
        16_genes_for_tissue_tsne/
        20_dissociation_genes/
        All_Droplet_Notebook.Rmd
        All_FACS_Notebook.Rmd
        Droplet_Notebook.Rmd
        FACS_Notebook.Rmd
        README.md
        cell_order_FACS.txt
        cell_order_droplets.txt
        download_data.sh
    01_figure1/                   # Overview + #cell barplots + #gene/#reads horizonplots
        README.md
        figure1{b-g}.ipynb
    02_figure2/                   # FACS TSNE plots + annotation barplots
        README.md
        figure2a.Rmd
        figure2b.Rmd
        figure2c.ipynb
    03_figure3/                   # All-cell clustering heatmap with dendrogram
        figure3.Rmd
    04_figure4/                   # Analysis of all T cells sorted by FACS
        figure4{a-d}.Rmd
    05_figure5/                   # Transcription factor expression analysis
        figure5.Rmd
    11_supplementary_figure1/     # Histograms of number of genes detected across tissues
    12_supplementary_figure2/     # FACS vs Microfluidics - # cells expressing a gene
    13_supplementary_figure3/     # FACS vs Microfluidics - # genes detected per cell
    14_supplementary_figure4/     # FACS vs Microfluidics - dynamic range
    15_supplementary_figure5/     # Microfluidics TSNE plots + annotation barplots
    16_supplementary_figure6/     # Analysis of dissociation-induced genes
    17_supplementary_figure7/     # Transcription factor enrichment in cell types

How to cite this dataset

If you find the Tabula muris data useful for your research please cite our publication

Contact

If you have questions about the data, you can create an Issue at the project repo on GitHub.

License

There are no restrictions on the use of data received from the Chan Zuckerberg Biohub, unless expressly identified prior to or at the time of receipt.

More Repositories

1

noise2self

A framework for blind denoising with self-supervision.
Jupyter Notebook
313
star
2

scell_lung_adenocarcinoma

This is the repository that contains the analysis of the lung adenocarcinoma single cell dataset
HTML
142
star
3

xicor

xi correlation method adapted for python
Jupyter Notebook
126
star
4

tabula-muris-senis

Tabula Muris Senis
Jupyter Notebook
93
star
5

learn-bioinformatics

List of resources for learning bioinformatics, from beginner to advanced
Jupyter Notebook
87
star
6

cerebra

A tool for fast and accurate summarizing of variant calling format (VCF) files
Python
60
star
7

tabula-sapiens

Single Cell Transcriptomics of 25 Human Organs to Create a Tabula Sapiens
Jupyter Notebook
38
star
8

MIDAS

Metagenomic Intra-Species Diversity Analysis (MIDAS)
Python
35
star
9

iohub

Pythonic and parallelizable I/O for N-dimensional imaging data with OME metadata
Python
27
star
10

sc2-illumina-pipeline

Bioinformatics pipeline for SARS-CoV-2 sequencing at CZ Biohub
Nextflow
25
star
11

tabula-muris-vignettes

Examples analyses using the single-cell RNA-seq data from mouse cell atlases
Makefile
22
star
12

orpheum

Orpheum (Previously called and published under sencha) is a Python package for directly translating RNA-seq reads into coding protein sequence.
Python
18
star
13

scRFE

Single-cell identity definition using random forest modelling and recursive feature elimination
Jupyter Notebook
12
star
14

AIRRscape

R Shiny tool to interactively visualize and analyze antibody repertoires
R
10
star
15

excellxgene

JavaScript
10
star
16

utilities

A collection of scripts for common data management and processing tasks
Nextflow
9
star
17

mspipelines

Python
8
star
18

california-mosquito-study

Source code and data supporting our eLife manuscript on single mosquito sequencing (https://elifesciences.org/articles/68353)
Jupyter Notebook
8
star
19

coPylot

coPylot - microscope control
Python
8
star
20

singlecell-dash

Dashboard for visualizing sequencing QC of plates from a single cell RNA-seq project
Python
7
star
21

shrimPy

shrimPy: Smart High-throughput Robust Imaging & Measurement in Python
Python
7
star
22

dashit

Automatically create CRISPR guides for DASH βœ‚οΈ
Python
6
star
23

flash

Python
6
star
24

crispycrunch

Web app for CRISPR experiment setup and analysis
Python
6
star
25

Spid.jl

Julia
5
star
26

covid19-transcriptomics-pathogenesis-diagnostics-results

Jupyter Notebook
5
star
27

scrnaseq-for-the-99-percent

Code accompanying the paper titled, "Single-cell transcriptomics for the 99.9% of species without reference genomes" by Botvinnik et al, 2021
Jupyter Notebook
5
star
28

nf-ortholog

Compare orthologous genes across species using k-mer similarity
Nextflow
4
star
29

InSituToolkit

An installer for in situ transcriptomics image processing tools
Python
4
star
30

pomelo

A novel bioinformatic approach for identifying metabolic vulnerabilities of pathogens to Inform host-directed therapeutics
R
4
star
31

nf-predictorthologs

*de novo* orthologous gene predictions from bam + bed or fasta/fastq data
Nextflow
4
star
32

EpiGen-COVID19

Using both phylogenetic and time-series data from the COVID19 pandemic to estimate number of missing infections and predict forward epidemic trajectory.
R
4
star
33

opencell-portal-pub

Public read-only repo for the OpenCell database and web app
Jupyter Notebook
3
star
34

v-host-factor-db

An interactive database for virus-centric CRISPR screens
HTML
3
star
35

cellxgene-training

how to make the most out of cellxgene
3
star
36

napari-mosquito-bb-annotations

Napari plugin for bounding box annotations
Python
3
star
37

covidtracker_notes

notes on frameshifts and other genbank related notes
3
star
38

bcell_pipeline

Immcantation B-cell Repertoire Sequencing Pipeline adapted for Reflow
R
3
star
39

czb-ui

CZ Biohub's higher level components made with CZI's Science Design System Component Library.
TypeScript
3
star
40

CovidTissueAtlas

UCSF Covid Tissue Atlas
Jupyter Notebook
2
star
41

CRISPRflow

MAGeCK analysis automated by Nextflow
R
2
star
42

bucketbase

Jupyter Notebook
2
star
43

human_melanocytes

Single cell RNA-seq of human melanocytes
Jupyter Notebook
2
star
44

simscity

A library to simulate single-cell data
Python
2
star
45

molecular-cross-validation

Calibrate and compare methods for denoising single-cell RNA-seq data
Python
2
star
46

MS-AutoQC

Realtime quality control for mass spectrometry data acquisition
Python
2
star
47

instapipeline

Tools for crowd-sourced annotation of FISH images
Jupyter Notebook
2
star
48

interns-2021-dataviz

Repository for Introduction to Data Visualization in Python for Interns 2021
Jupyter Notebook
2
star
49

maca

Command line utilities for RNA-sequencing data
Python
2
star
50

nf-epigen

Jupyter Notebook
2
star
51

datahub

Monorepo for Datahub projects
Rust
2
star
52

celltype_annotation_tutorial

A repository for notebooks, figures, and utilities for cell-type annotation tutorial manuscript using CZ CELLxGENE ecosystem
Jupyter Notebook
2
star
53

ImageStabilizer

Stabilize x-y shifts in long time series videos. This is a wrapper around the ImageJ plugin "ImageStabilizer" that enables the jar to be run from command line with conventional flags
Java
2
star
54

zebrahub_analysis

Analysis of single-cell single-embryo RNA seq data from the zebrafish developmental atlas Zebrahub.
Jupyter Notebook
2
star
55

InfectedCellMicroscopyAnnotator

A Napari Plugin for annotating open-cell data
Python
1
star
56

epidemiology_flux_model

Jupyter Notebook
1
star
57

pairani

Average Nucleotide Identity Pairwise Distance Computation For Microbiome
Dockerfile
1
star
58

ncov-modeling-jc

HTML
1
star
59

hovernet_he

Python
1
star
60

realtime-covid19-tracking

realtime-tracking-covid19
1
star
61

covidhub-pub

Jupyter Notebook
1
star
62

pyseus

Pyseus: Perseus in Python
Python
1
star
63

2021-opencell-microscopy-automation

Public repo for OpenCell-related microscopy automation software
Jupyter Notebook
1
star
64

nf-simulaternaseq

Simulate RNA-seq reads
Nextflow
1
star
65

ds-infected-cell-summer

Jupyter Notebook
1
star
66

BioE-Bartender

Software for hardware systems built in response to emerging needs
C++
1
star
67

GenoPrimer

Automated primer design for genotyping CRISPR edited cells
Python
1
star
68

automated-protein-purifier

Contains Python application and czpurifier package for automated protein purification
Python
1
star
69

dotblotr

dotblotr: a microarray image processing package
Python
1
star
70

UVScope-control

UVScope control software
MATLAB
1
star
71

Label-Free-Malaria

Image processing software for the Label-free malaria imaging project.
MATLAB
1
star
72

2021-opencell-figures

Public read-only repo for code and data related to the 2021 OpenCell preprint
Jupyter Notebook
1
star
73

2023-facs-automation-pub

Python libraries for automating cell sorting with the Sony SH800S GUI.
Python
1
star
74

ULC-OD-Meter

Ultra Low Cost Optical Density Meter Repository
C++
1
star
75

napari-iohub

OME-Zarr viewer for napari with iohub as the I/O backend
Python
1
star
76

nf-unsplicedcds

This workflow will find unspliced coding sequences from bams.
Nextflow
1
star