• Stars
    star
    115
  • Rank 305,916 (Top 7 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Microscopy Image Cytometry Toolkit

Build Status Coverage Status

Cytokit

Cytokit is a collection of tools for quantifying and analyzing properties of individual cells in large fluorescent microscopy datasets with a focus on those generated from multiplexed staining protocols. This includes a GPU-accelerated image processing pipeline (via TensorFlow), CLI tools for batch processing of experimental replicates (often requiring conditional configuration, as things tend go wrong when capturing hundreds of thousands of microscope images over a period of hours or days), and visualization UIs (either Cytokit Explorer or CellProfiler Analyst).

Cytokit runs in a Python 3 environment but also comes (via Docker) with CellProfiler (Python 2) and Ilastik installations.

For more information, see: Cytokit: A single-cell analysis toolkit for high dimensional fluorescent microscopy imaging

Quick Start

Installing and configuring Cytokit currently involves little more than installing nvidia-docker and building or downloading the Cytokit container image, but this inherently limits support to Linux operating systems for GPU-acceleration. Additional limitations include:

  • There is currently no CPU-only docker image
  • Generating and running pipelines requires working knowledge of JupyterLab and a little tolerance for yaml/json files as well as command lines
  • Only tiff files are supported as a raw input image format
  • Deconvolution requires manual configuration of microscope attributes like filter wavelengths, immersion media, and numerical aperture (though support to infer much of this based on the imaging platform may be added in the future)
  • 3 dimensional images are supported but cell segmentation and related outputs are currently 2 dimensional
  • General system requirements include at least 24G RAM and 8G of GPU memory (per GPU)

Once nvidia-docker is installed, the container can be launched and used as follows:

nvidia-docker pull eczech/cytokit:latest

# Set LOCAL_IMAGE_DATA_DIR variable to a host directory for data sharing
# and persistent storage between container runs
export LOCAL_IMAGE_DATA_DIR=/tmp 

# Run the container with an attached volume to contain raw images and results  
nvidia-docker run --rm -ti -p 8888:8888 -p 8787:8787 -p 8050:8050 \
-v $LOCAL_IMAGE_DATA_DIR:/lab/data \
eczech/cytokit

This will launch JupyterLab on port 8888. After navigating to localhost:8888 and entering the access token printed on the command line following nvidia-docker run, you can then run an example notebook like cellular_marker_profiling_example, which can be found at /lab/repos/cytokit/python/notebooks/examples in the JupyterLab file navigator.

Using a Specific Release

To use a release-specific container, the instructions above can be modified as such where the below example shows how to launch the 0.1.1 container:

nvidia-docker pull eczech/cytokit:0.1.1
export LOCAL_IMAGE_DATA_DIR=/tmp   
nvidia-docker run --rm -ti -p 8888:8888 -p 8787:8787 -p 8050:8050 \
-v $LOCAL_IMAGE_DATA_DIR:/lab/data \
eczech/cytokit:0.1.1

Example

One of the goals of Cytokit is to make it as easy as possible to reproduce complicated workflows on big image datasets and to that end, the majority of the logic that drives how Cytokit functions is determined by json/yaml configurations.
Starting from template configurations like this sample Test Experiment and more realistically, this CODEX BALBc1 configuration, pipelines are meant to work as bash scripts executing small variants on these parameterizations for evaluation against one another. Here is a bash script demonstrating how this often works:

EXPERIMENT_DATA_DIR=/lab/data/201801-codex-lung

for REPLICATE in "201801-codex-lung-01" "201801-codex-lung-02"; do
    DATA_DIR=$EXPERIMENT_DATA_DIR/$REPLICATE
    
    # This command will generate 3 processing variants to run:
    # v01 - Cell object determined as fixed radius from nuclei
    # v02 - Cell object determined by membrane stain
    # v03 - 5x5 grid subset with deconvolution applied and before/after channels extracted
    cytokit config editor --base-config-path=template_config.yaml --output-dir=$DATA_DIR/output \
      set processor.cytometry.segmentation_params.nucleus_dilation 10 \
    save_variant v01/config reset \
      set processor.cytometry.membrane_channel_name CD45 \
    save_variant v02/config reset \
      set acquisition.region_height 5 \
      set acquisition.region_width 5 \
      set processor.args.run_deconvolution True \
      add operator '{extract: {name:deconvolution, channels:[raw_DAPI,proc_DAPI]}}' \
    save_variant v03/config exit 
    
    # Run everything for each variant of this experiment
    for VARIANT in v01 v02 v03; do
        OUTPUT_DIR=$DATA_DIR/output/$VARIANT
        CONFIG_DIR=$OUTPUT_DIR/config
        cytokit processor run_all --config-path=$CONFIG_DIR --data-dir=$OUTPUT_DIR --output-dir=$OUTPUT_DIR
        cytokit operator run_all  --config-path=$CONFIG_DIR --data-dir=$OUTPUT_DIR 
        cytokit analysis run_all  --config-path=$CONFIG_DIR --data-dir=$OUTPUT_DIR 
    done
done

The above, when executed, would produce several things:

  1. 5D tiles with processed image data (which can be reused without having to restart from raw data)
  2. 5D tile extracts corresponding to user-defined slices (e.g. raw vs processed DAPI images above) as well as montages of these tiles (e.g. stitchings of 16 2048x2048 images on 4x4 grid into single 8192x8192 images)
  3. CSV/FCS files with single-cell data
  4. Final yaml configuration files representing how each variant was defined

For example, an ad-hoc extraction like this (which could also be defined in the configuration files):

cytokit operator extract --name='primary_markers' --z='best' \
  --channels=['proc_dapi','proc_cd3','proc_cd4','proc_cd8','cyto_cell_boundary','cyto_nucleus_boundary']

Would produce 5D hyperstack images that could be loaded into ImageJ and blended together:

Human T Cells stained for DAPI (gray), CD3 (blue), CD4 (red), CD8 (green) and with nucleus outline (light green), cell outline (light red)

Cytokit Explorer UI

After processing an experiment, the Explorer UI application can be run within the same docker container for fast visualization of the relationship between spatial features of cells and fluorescent signal intensities:

High-Res Version

See the Cytokit Explorer docs for more details.

CellProfiler Analyst

In addition to Cytokit Explorer, exports can also be generated using CellProfiler (CP) directly. This makes it possible to ammend a configuration with a line like this to generate both CP spreadhseets and a SQLite DB compatible with CellProfiler Analyst (see pub/config/codex-spleen/experiment.yaml):

analysis:
  - cellprofiler_quantification: 
    - export_csv: true
    - export_db: true
    - export_db_objects_separately: true

These screenshots from CellProfiler Analyst 2.2.1 show a reconstruction of plots used in the CODEX publication based on data generated by dynamic construction and execution of a CP 3.1.8 pipeline (see pub/analysis/codex-spleen/pipeline_execution.sh):

CellProfiler Integration

CellProfiler is not easy to use programmatically as it is used here. There is no official Python API and direct access to the internals has to be informed largely based on tests and other source code, but for any interested power-users, here are some parts of this project that may be useful resources:

  • Installation: The Dockerfile shows how to bootstrap a minimal Python 2.7 environment compatible with CellProfiler 3.1.8
  • Configuration: The cpcli.py script demonstrates how to build a CP pipeline programmatically (in this case segmented objects are provided to the pipeline that only does quantification and export)
  • Analysis: When exported data from CP in a docker container, the paths in csv files or inserted into a database will all be relative to a container. One simple solution to this problem is to simply create a local /lab/data folder with copies of the information from the container that you would like to analyze.
    A little more information on this can be found at pub/analysis/codex-spleen/README.md.

Custom Segmentation

While the purpose of this pipeline is to perform image preprocessing and segmentation, the semantics of that segmentation often change. Depending on the experimental context, the provided cell nucleus segmentation may not be adequate and if a different segmentation methodology is required then any custom logic can be added to the pipeline as in the mc38-spheroid example. Specifically, a custom segmentation implementation is used here to identify spheroids rather than cells.

Messaging Caveats

Errors in processor logs that can safely be ignored:

  • tornado.iostream.StreamClosedError: Stream is closed: These often follow the completion of successful pipeline runs. These can hopefully be eliminated in the future with a dask upgrade but for now they can simply be ignored.

CODEX Backport

As a small piece of standalone functionality, instructions can be found here for how to run deconvolution on CODEX samples: Standalone Deconvolution Instructions

More Repositories

1

pileup.js

Interactive in-browser track viewer
JavaScript
276
star
2

grafana-spark-dashboards

Scripts for generating Grafana dashboards for monitoring Spark jobs
JavaScript
241
star
3

spree

Live-updating Spark UI built with Meteor
JavaScript
189
star
4

survivalstan

Library of Stan Models for Survival Analysis
Jupyter Notebook
123
star
5

ppx_deriving_cmdliner

Ppx_deriving plugin for generating command line interfaces from types (Cmdliner.Term.t)
OCaml
96
star
6

flowdec

TensorFlow Deconvolution for Microscopy Data
Jupyter Notebook
88
star
7

guacamole

Spark-based variant calling, with experimental support for multi-sample somatic calling (including RNA) and local assembly
Scala
83
star
8

ketrew

Keep Track of Experimental Workflows
OCaml
76
star
9

yarn-logs-helpers

Scripts for parsing / making sense of yarn logs
Shell
52
star
10

genspio

Generate Shell Phrases In OCaml
OCaml
48
star
11

dask-distributed-on-kubernetes

Deploy dask-distributed on google container engine using kubernetes
Jupyter Notebook
40
star
12

data-canvas

Improved event handling and testing for the HTML5 canvas
JavaScript
38
star
13

cycledash

Variant Caller Analysis Dashboard and Data Management System
Python
35
star
14

prohlatype

Probabilistic HLA typing
OCaml
35
star
15

kubeface

python parallel map on kubernetes
Python
34
star
16

epidisco

Personalized cancer epitope discovery and peptide vaccine prediction pipeline
OCaml
30
star
17

sosa

The Sane OCaml String API
OCaml
27
star
18

biokepi

Bioinformatics Ketrew Pipelines
OCaml
27
star
19

spark-tests

Utilities for writing tests that use Apache Spark.
Scala
24
star
20

t-cell-relation-extraction

Literature mining for T cell relations
Jupyter Notebook
23
star
21

multi-omic-urothelial-anti-pdl1

Contribution of systemic and somatic factors to clinical response and resistance in urothelial cancer: an exploratory multi-omic analysis
Jupyter Notebook
22
star
22

vcf.js

A VCF parser and variant record model in JavaScript.
JavaScript
22
star
23

magic-rdds

Miscellaneous functionality for manipulating Apache Spark RDDs.
Scala
22
star
24

cohorts

Utilities for analyzing mutations and neoepitopes in patient cohorts
Python
20
star
25

spark-bam

Load genomic BAM files using Apache Spark
Scala
20
star
26

pygdc

Python API for Genomic Data Commons
Python
18
star
27

concordance

Concordance between variant callers
JavaScript
17
star
28

shapeless-utils

type-classes for structural manipulation of algebraic data types
Scala
17
star
29

bai-indexer

Build an index for your BAM Index (BAI)
Python
17
star
30

spark-json-relay

SparkListener that converts SparkListenerEvents to JSON and forwards them to an external service via RPC.
Scala
17
star
31

coclobas

Configurable Cloudy Batch Scheduler
OCaml
16
star
32

spark-util

low-level helpers for Apache Spark libraries and tests
Scala
16
star
33

t-cell-guide

Human Primary T cells: A Practical Guide
Jupyter Notebook
15
star
34

awesome-clonality

A curated list of awesome clonality and tumor heterogeneity resources
15
star
35

sbt-parent

SBT plugins for publishing to Maven Central, shading and managing dependencies, reporting to Coveralls from TravisCI, and more
Scala
14
star
36

immuno

Use somatic mutations to choose a personalized cancer vaccine (tumor-specific immunogenic peptides)
Python
14
star
37

pageant

Parallel Genomic Analysis Toolkit
14
star
38

seltest

The simple, fast, visual testing framework for web applications.
Python
13
star
39

stanity

python convenience functions for working with Stan models (via pystan)
Python
13
star
40

slim

Node server that listens to Spark events, aggregates statistics, and writes them to Mongo
JavaScript
10
star
41

vcf-annotate-polyphen

A tool to annotate human VCF files with PolyPhen2 effect measures
Python
9
star
42

vaf-experiments

A step-by-step guide to estimate tumor clonality/purity from variant allele frequency data
Jupyter Notebook
8
star
43

style-guides

Guidelines of the Hammer Lab
8
star
44

math-utils

Math and statistics utilities
Scala
7
star
45

hlarp

Normalize HLA typing output.
OCaml
6
star
46

t-cell-data

TeX
6
star
47

iterators

Enrichment-methods for Scala collections (Iterators, Iterables, Arrays)
Scala
6
star
48

infino

Infino: a Bayesian hierarchical model improves estimates of immune infiltration into tumor microenvironment
Jupyter Notebook
6
star
49

kerseq

Helpers for sequence prediction with Keras
Python
5
star
50

secotrec

Setup Coclobas/Ketrew Clusters
OCaml
5
star
51

immune-infiltrate-explorations

Jupyter Notebook
5
star
52

suffix-arrays

Spark-based implementation of pDC3, a linear-time parallel suffix-array-construction algorithm
TypeScript
5
star
53

spark-genomics

Aggregation of various hammerlab-org genomic, spark, and scala libraries
Scala
5
star
54

wobidisco

Workflows Bioinformatics and Discoballs: The Biokepiverse
5
star
55

igv-httpfs

An adaptor which lets IGV talk to HDFS via HttpFS
Python
5
star
56

redaw

Reinvent the Dataset Wheel
OCaml
4
star
57

melanoma-reanalysis

Online Materials: Somatic Mutations, Neoepitope Homology and Inflammation in Melanomas Treated with CTLA-4 Blockade
4
star
58

idiogrammatik

An extensible, embeddable karyogram for the browser.
JavaScript
4
star
59

cli-utils

Helpers for creating command-line applications
Scala
3
star
60

topeology

Compare neoepitope sequences with epitopes from IEDB
Python
3
star
61

stratotemplate

DEPRECATED: we don't really maintain this any more, we use Coclobas:
OCaml
3
star
62

ngsdiagnostics

Diagnostic Scripts for an NGS Pipeline
Python
3
star
63

ogene

Type-safe scripts for genomic file wrangling
OCaml
3
star
64

bespoke.js

Parsers and fetchers for a cornucopia of bioinformatics formats
JavaScript
3
star
65

mhcflurry-icml-compbio-2016

Data and analysis notebooks for Predicting Peptide-MHC Binding Affinities With Imputed Training Data
Jupyter Notebook
3
star
66

rinfino

R client to run infino (http://github.com/hammerlab/infino)
R
2
star
67

coverage-depth

Generate genomic-coverage-depth histograms using Apache Spark
Scala
2
star
68

SmartCount

Repository for collaboration on Celldom computer vision solutions
Jupyter Notebook
2
star
69

igvxml

Create IGV session files from the command-line
OCaml
2
star
70

paper-aocs-chemo-neoantigens

Manuscript on chemotherapy-induced neoantigens in samples from the Australian Ovarian Cancer Study
Jupyter Notebook
2
star
71

bdgenomics-notebook

2
star
72

flusso

FCS (Flow Cytometry Standard) parser and utility
JavaScript
2
star
73

io-utils

Libraries for console/file I/O, processing/formatting sizes in bytes, etc.
Scala
2
star
74

variant-calling-benchmarks

Automated and curated variant calling benchmarks for Guacamole
Jupyter Notebook
2
star
75

stratocumulus

DEPRECATED: we don't really maintain this any more, we use Coclobas:
OCaml
2
star
76

tcga-blca

Example analysis using Cohorts & TCGA-BLCA data
Jupyter Notebook
2
star
77

cvutils

Computer vision utilities
Python
2
star
78

path-utils

Scala convenience-wrapper for java.nio.file.Path
Scala
2
star
79

discohorts

Generate Cohorts based on Epidisco and/or Biokepi results
Python
1
star
80

spear

WIP: SparkListener that maintains info about jobs, stages, tasks, executors, and RDDs in MongoDB.
Scala
1
star
81

pysigs

Mutational signature deconvolution onto known signatures
Python
1
star
82

celldom-analysis

Repository for Celldom experiment analysis and configuration
Jupyter Notebook
1
star
83

t-cell-electroporation

Code/Data repository for "Electroporation characteristics of human primary T cells"
Jupyter Notebook
1
star
84

genomic-loci

Utilities for representing genomic loci and reference-genomes
Scala
1
star
85

stancache

Filecache for stan models
Python
1
star
86

genomic-reads

Library for representing and working with genomic-sequencing reads.
Scala
1
star
87

avm

Arteriovenous malformations
Python
1
star
88

epidisco-web

Web interface to easily describe and submit epidisco jobs
JavaScript
1
star
89

nosoi

Exploration of evolutionary signatures within viral proteomes by making use of MHC binding predictions
Perl
1
star
90

string-utils

String/CSV utilities
Scala
1
star