• Stars
    star
    425
  • Rank 102,094 (Top 3 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 6 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.

pySCENIC

buildstatus pypipackage docstatus

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.

The pioneering work was done in R and results were published in Nature Methods [1]. A new and comprehensive description of this Python implementation of the SCENIC pipeline is available in Nature Protocols [4].

pySCENIC can be run on a single desktop machine but easily scales to multi-core clusters to analyze thousands of cells in no time. The latter is achieved via the dask framework for distributed computing [2].

Full documentation for pySCENIC is available on Read the Docs


pySCENIC is part of the SCENIC Suite of tools! See the main SCENIC website for additional information and a full list of tools available.


News and releases

0.12.1 | 2022-11-21

  • Add support for running arboreto_with_multiprocessing.py with spawn instead of fork as multiprocessing method.Pool
  • Use ravel instead of flatten to avoid unnecessary memory copy in aucell
  • Update Docker image file and add separated Docker file for pySCENIC with scanpy.

0.12.0 | 2022-08-16

  • Only databases in Feather v2 format are supported now (ctxcore >= 0.2), which allow uses recent versions of pyarrow (>=8.0.0) instead of very old ones (<0.17). Databases in the new format can be downloaded from https://resources.aertslab.org/cistarget/databases/ and end with *.genes_vs_motifs.rankings.feather or *.genes_vs_tracks.rankings.feather.
  • Support clustered motif databases.
  • Use custom multiprocessing instead of dask, by default.
  • Docker image uses python 3.10 and contains only needed pySCENIC dependencies for CLI usage.
  • Remove unneeded scripts and notebooks for unused/deprecated database formats.

0.11.2 | 2021-05-07

  • Split some core cisTarget functions out into a separate repository, ctxcore. This is now a required package for pySCENIC.

0.11.1 | 2021-02-11

  • Fix bug in motif url construction (#275)
  • Fix for export2loom with sparse dataframe (#278)
  • Fix sklearn t-SNE import (#285)
  • Updates to Docker image (expose port 8787 for Dask dashboard)

0.11.0 | 2021-02-10

Major features:

  • Updated arboreto release (GRN inference step) includes:
    • Support for sparse matrices (using the --sparse flag in pyscenic grn, or passing a sparse matrix to grnboost2/genie3).
    • Fixes to avoid dask metadata mismatch error
  • Updated cisTarget:
    • Fix for metadata mismatch in ctx prune2df step
    • Support for databases Apache Parquet format
    • Faster loading from feather databases
    • Bugfix: loading genes from a database (previously missing the last gene name in the database)
  • Support for Anndata input and output
  • Package updates:
    • Upgrade to newer pandas version
    • Upgrade to newer numba version
    • Upgrade to newer versions of dask, distributed
  • Input checks and more descriptive error messages.
    • Check that regulons loaded are not empty.
  • Bugfixes:
    • In the regulons output from the cisTarget step, the gene weights were incorrectly assigned to their respective target genes (PR #254).
    • Motif url construction fixed when running ctx without pruning
    • Compression of intermediate files in the CLI steps
    • Handle loom files with non-standard gene/cell attribute names
    • Reformat the genesig gmt input/output
    • Fix AUCell output to loom with non-standard loom attributes

0.10.4 | 2020-11-24

  • Included new CLI option to add correlation information to the GRN adjacencies file. This can be called with pyscenic add_cor.

See also the extended Release Notes.

Overview

The pipeline has three steps:

  1. First transcription factors (TFs) and their target genes, together defining a regulon, are derived using gene inference methods which solely rely on correlations between expression of genes across cells. The arboreto package is used for this step.
  2. These regulons are refined by pruning targets that do not have an enrichment for a corresponding motif of the TF effectively separating direct from indirect targets based on the presence of cis-regulatory footprints.
  3. Finally, the original cells are differentiated and clustered on the activity of these discovered regulons.

The most impactful speed improvement is introduced by the arboreto package in step 1. This package provides an alternative to GENIE3 [3] called GRNBoost2. This package can be controlled from within pySCENIC.

All the functionality of the original R implementation is available and in addition:

  1. You can leverage multi-core and multi-node clusters using dask and its distributed scheduler.
  2. We implemented a version of the recovery of input genes that takes into account weights associated with these genes.
  3. Regulons, i.e. the regulatory network that connects a TF with its target genes, with targets that are repressed are now also derived and used for cell enrichment analysis.

Additional resources

For more information, please visit LCB, the main SCENIC website, or SCENIC (R version). There is a tutorial to create new cisTarget databases. The CLI to pySCENIC has also been streamlined into a pipeline that can be run with a single command, using the Nextflow workflow manager. There are two Nextflow implementations available:

  • SCENICprotocol: A Nextflow DSL1 implementation of pySCENIC alongside a basic "best practices" expression analysis. Includes details on pySCENIC installation, usage, and downstream analysis, along with detailed tutorials.
  • VSNPipelines: A Nextflow DSL2 implementation of pySCENIC with a comprehensive and customizable pipeline for expression analysis. Includes additional pySCENIC features (multi-runs, integrated motif- and track-based regulon pruning, loom file generation).

Acknowledgments

We are grateful to all providers of TF-annotated position weight matrices, in particular Martha Bulyk (UNIPROBE), Wyeth Wasserman and Albin Sandelin (JASPAR), BioBase (TRANSFAC), Scot Wolfe and Michael Brodsky (FlyFactorSurvey) and Timothy Hughes (cisBP).

References

[1]Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat Meth 14, 1083–1086 (2017). doi:10.1038/nmeth.4463
[2]Rocklin, M. Dask: parallel computation with blocked algorithms and task scheduling. conference.scipy.org
[3]Huynh-Thu, V. A. et al. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5, (2010). doi:10.1371/journal.pone.0012776
[4]Van de Sande B., Flerin C., et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat Protoc. June 2020:1-30. doi:10.1038/s41596-020-0336-2

More Repositories

1

SCENIC

SCENIC is an R package to infer Gene Regulatory Networks and cell types from single-cell RNA-seq data.
HTML
406
star
2

scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Jupyter Notebook
179
star
3

SCENICprotocol

A scalable SCENIC workflow for single-cell gene regulatory network analysis
Python
139
star
4

cisTopic

cisTopic: Probabilistic modelling of cis-regulatory topics from single cell epigenomics data
HTML
135
star
5

AUCell

AUCell: score single cells with gene regulatory networks
R
116
star
6

SCope

Fast visualization tool for large-scale and high dimensional single-cell data
Python
69
star
7

pycisTopic

pycisTopic is a Python module to simultaneously identify cell states and cis-regulatory topics from single cell epigenomics data.
Jupyter Notebook
56
star
8

arboreto

A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.
Jupyter Notebook
50
star
9

create_cisTarget_databases

Create cisTarget databases
Python
39
star
10

SCopeLoomR

R package (compatible with SCope) to create generic .loom files and extend them with other data e.g.: SCENIC regulons, Seurat clusters and markers, ...
R
38
star
11

RcisTarget

RcisTarget: Transcription factor binding motif enrichment
HTML
31
star
12

GENIE3

GENIE3 (GEne Network Inference with Ensemble of trees) R-package
C
26
star
13

CREsted

Python
26
star
14

ScoMAP

ScoMAP is an R package to spatially integrate single-cell omics data into virtual cells and infer enhancer-to-gene relationships.
R
25
star
15

single_cell_toolkit

Tools for correcting single cell barcodes for various scATAC-seq techniques.
Shell
25
star
16

PUMATAC

Pipeline for Universal Mapping of ATAC-seq
Nextflow
21
star
17

nextcloud_share_url_downloader

Download files from and list content of NextCloud (password protected) share directly from the command line without needing a webbrowser.
Shell
21
star
18

popscle_helper_tools

Helper tools for popscle
Shell
19
star
19

GRNBoost

Scalable inference of gene regulatory networks using Apache Spark and XGBoost
Scala
17
star
20

install_aspera_connect

Install latest version of Aspera Connect and show example how to use it for downloading sequening data.
Shell
16
star
21

pycistarget

pycistarget is a python module to perform motif enrichment analysis in sets of regions with different tools and identify high confidence TF cistromes.
Python
13
star
22

scATAC-seq_benchmark

Jupyter Notebook
13
star
23

scenicplus_analyses

SCENIC+ analyses
Jupyter Notebook
7
star
24

singlecellRNA_melanoma_paper

R
7
star
25

LoomXpy

Python package (compatible with SCope) to create .loom files and extend them with other data e.g.: SCENIC regulons
Python
6
star
26

Nova-ST

A repository containing the analysis scripts for Nova-ST data
Jupyter Notebook
6
star
27

DeepMEL

Jupyter Notebook
6
star
28

scenic-nf

DEPRECATED | pySCENIC pipeline implemented in Nextflow using containers
Nextflow
6
star
29

DeepBrain

DeepBrain: a collection of vertebrate sequence-based enhancer models aimed at understanding brain cell type enhancer code across and within species
Jupyter Notebook
5
star
30

PUMATAC_tutorial

Jupyter Notebook
5
star
31

Bravo_et_al_EyeAntennalDisc

Code for reproducing the figures presented in the manuscript 'Identification of genomic enhancers through spatial integration of single-cell transcriptomics and epigenomics' (Bravo et al., 2020)
HTML
5
star
32

MendelCraft

The MendelCraft mod, introducing Mendelian genetics to the chickens of Minecraft!
Java
5
star
33

scatac_fragment_tools

Tools for working with scATAC-seq fragment files
Python
5
star
34

Bravo_et_al_Liver

Bravo_et_al_Liver
Jupyter Notebook
5
star
35

iterative_peak_filtering

Iterative peak filtering.
Shell
4
star
36

iRegulon

A regulon consists of a transcription factor (TF) and its direct transcriptional targets, which contain common TF binding sites in their cis-regulatory control elements. The iRegulon plugin allows you to identify regulons using motif and track discovery in an existing network or in a set of co-regulated genes.
Java
4
star
37

ctxcore

Core functions for pycisTarget and the SCENIC tool suite
Python
3
star
38

primescore

Calculation of regulatory impact score of a mutation.
Python
3
star
39

hydrop_data_analysis

Jupyter Notebook
3
star
40

scforest

scforest: a visual overview of single cell technology
TeX
3
star
41

webhdf5

HDF5 library in WASM
Python
2
star
42

ATAC-seq-analysis

Some scripts for ATAC-seq data analysis
Python
2
star
43

mucistarget

Predict cis-regulatory mutations in gene regulatory networks
Python
2
star
44

SpatialNF

Spatial transcriptomics NextFlow pipelines
Python
2
star
45

fly_brain

Decoding gene regulation in the fly brain
R
2
star
46

SCopeLoomPy

A Python notebook to create .loom files and extend them with other data e.g.: SCENIC regulons, Seurat clusters and markers, compatible with SCope
Python
2
star
47

AS_variant_pipeline

Allele specific variant pipeline
Jupyter Notebook
1
star
48

regulatory_regions_delineation

Create regulatory regions delineation
Python
1
star
49

DGRP2_dm3_to_dm6

Create BCF/VCF files with all DGRP2 mutations for Drosophila melanogaster (dm6) from dm3 VCF files.
1
star
50

Melanoma_MPRA_paper

Melanoma MPRA paper
Jupyter Notebook
1
star
51

EnhancerAI

Python
1
star