• Stars
    star
    159
  • Rank 234,449 (Top 5 %)
  • Language
    Rust
  • License
    BSD 3-Clause "New...
  • Created about 4 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

🐟 🔬🦀 alevin-fry is an efficient and flexible tool for processing single-cell sequencing data, currently focused on single-cell transcriptomics and feature barcoding.

logo

alevin-fry Rust Anaconda-Server Badge Anaconda-Server Badge GitHub tag (latest SemVer)

alevin-fry is a suite of tools for the rapid, accurate and memory-frugal processing single-cell and single-nucleus sequencing data. It consumes RAD files generated by salmon alevin, and performs common operations like generating permit lists, and estimating the number of distinct molecules from each gene within each cell. The focus in alevin-fry is on safety, accuracy and efficiency (in terms of both time and memory usage).

You can read the paper describing alevin fry, "Alevin-fry unlocks rapid, accurate, and memory-frugal quantification of single-cell RNA-seq data" here, and the pre-print on bioRxiv.

Getting started with alevin-fry and dedicated documentation

While this README contains some useful information to get started and some pointers, alevin-fry has it's own dedicated documentation site, hosted on ReadTheDocs.

More information

  • Quickstart guide using the simpleaf wrapper

  • Relationship to alevin: Alevin-fry has been designed as the successor to alevin. It subsumes the core features of alevin, while also providing important new capabilities and considerably improving the performance profile. We anticipate that new method development and feature additions will take place primarily within the alevin-fry codebase. Thus, we encourage users of alevin to migrate to alevin-fry when feasible. That being said, alevin is still actively-maintained and supported, so if you are using it and not ready to migrate you can continue to ask questions and post issues in the salmon repository.

FAQs

Are you curious about processing details like whether to use a sparse or dense index? Do you have a question that isn't necessarily a bug report or feature request, and that isn't readily answered by the documentation or tutorials? Then please feel free to ask over in the Q&A.

Sister repositories

  • The generation of the reduced alignment data (RAD) files processed by alevin-fry is done by either piscem or salmon. The latest version of both are available on GitHub and via bioconda.

  • The simpleaf repository contains a dedicated wrapper / workflow runner for processing data with alevin-fry that vastly simplifies both the creation of extended references and the subsequent quantification of samples. If you find that simpleaf is missing a feature that you'd like to have, please consider submitting a feature request in the simpleaf repository.

  • The pyroe repository provides tools to help easily construct an enhanced (spliced + intronic or spliced + unspliced) transcriptome from a reference genome and GTF file.

  • The fishpond package — maintained by @mikelove and his lab — contains the recommended relevant functions for reading alevin-fry output (particularly USA-mode output) into the R ecosystem, in the form of a singleCellExperiment object.

  • The alevinqc package — maintained by @csoneson — provides tool and functions for performing quality control and assessment downstream of alevin-fry.

Installing from bioconda

Alevin-fry is available for both x86 linux and OSX platforms using bioconda. On Apple silicon, you can either build (easily) from source (see below) or run alevin-fry under the rosetta 2 emulation layer.

With bioconda in the appropriate place in your channel list, you should simply be able to install via:

$ conda install -c bioconda alevin-fry

Installing from crates.io

Alevin-fry can also be installed from crates.io using cargo. This can be done with the following command:

$ cargo install alevin-fry

Building from source

If you want to use features or fixes that may only be available in the latest develop branch (or want to build for a different architecture), then you have to build from source. Luckily, cargo makes that easy; see below.

Alevin-fry is built and tested with the latest (major & minor) stable version of Rust. While it will likely compile fine with slightly older versions of Rust, this is not a guarantee and is not a support priority. Unlike with C++, Rust has a frequent and stable release cadence, is designed to be installed and updated from user space, and is easy to keep up to date with rustup. Thanks to cargo, building should be as easy as:

$ cargo build --release

subsequent commands below will assume that the executable is in your path. Temporarily, this can be done (in bash-like shells) using:

$ export PATH=`pwd`/target/release/:$PATH

Citing alevin-fry

If you use alevin-fry in your work, please cite:

He, D., Zakeri, M., Sarkar, H., Soneson, C., Srivastava, A., and Patro, R. Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data. Nat Methods 19, 316–322 (2022). https://doi.org/10.1038/s41592-022-01408-3

BibTeX:

@Article{He2022,
author={He, Dongze and Zakeri, Mohsen and Sarkar, Hirak and Soneson, Charlotte and Srivastava, Avi and Patro, Rob},
title={Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data},
journal={Nature Methods},
year={2022},
month={Mar},
day={01},
volume={19},
number={3},
pages={316-322},
issn={1548-7105},
doi={10.1038/s41592-022-01408-3},
url={https://doi.org/10.1038/s41592-022-01408-3}
}

More Repositories

1

salmon

🐟 🍣 🍱 Highly-accurate & wicked fast transcript-level quantification from RNA-seq reads using selective alignment
C++
765
star
2

pufferfish

An efficient index for the colored, compacted, de Bruijn graph
C
107
star
3

RapMap

Rapid sensitive and accurate read mapping via quasi-mapping
C++
89
star
4

cuttlefish

Building the compacted de Bruijn graph efficiently from references or reads.
C++
77
star
5

oarfish

long read RNA-seq quantification
Rust
66
star
6

terminus

Rust
57
star
7

kmers

A bit-packed k-mer representation (and relevant utilities) for rust
Rust
47
star
8

simpleaf

A rust framework to make using alevin-fry even simpler
Rust
44
star
9

wasabi

Prepare Sailfish and Salmon output for downstream analysis
R
43
star
10

SalmonTools

Useful tools for working with Salmon output
C++
36
star
11

RapClust

Accurate, Lightweight Clustering of de novo Transcriptomes using Fragment Equivalence Classes
Python
30
star
12

piscem

Rust wrapper for the next generation (still currently in C++)
Rust
20
star
13

shoal

Improved multi-sample transcript abundance estimates using adaptive priors
C++
20
star
14

grangers

Rust
16
star
15

pufferfish2

Rust
15
star
16

EDS

💡 💾 💽 A simple, intuitive and Efficient single cell binary Data Storage format
Rust
15
star
17

pyroe

Python
15
star
18

grouper

Python
15
star
19

piscem-infer

Rust
14
star
20

mazu

A Rust library for building modular, fast and compact indexes over genomic data
Rust
13
star
21

rainbowfish

A succinct colored dBG representation
C++
12
star
22

quark

semi-reference-based short read compression
C++
11
star
23

minnow

C++
10
star
24

usefulaf

Useful scripts and tools related to alevin-fry
Rust
9
star
25

measuresmatter

A treatise on quantification and differential expression from RNA-seq data
8
star
26

combine-lab.github.io

HTML
8
star
27

quantaf

Nextflow
8
star
28

seine-rs

A (rust)🦀 library and suite of tools for manipulating and processing the output of salmon, alevin, and alevin-fry 🐟
Rust
7
star
29

piscem-cpp

A small sparse and fast reference index based on SShash and Tiling encoding
C
6
star
30

sc-census

R
6
star
31

GRASS

Graph-Regularized Annotation via Semi-Supervised learning
Python
6
star
32

TreeTerminus

C
5
star
33

COMBINE-lab.github.io-OLD

Lab website
HTML
5
star
34

matryoshka

Methods for the automated discovery of hierarchically-structured chromatin domains
C++
5
star
35

forseti

Python
5
star
36

pcalib

A small "lightweight" implementation of PCA in C++ using the Eigen library
C++
5
star
37

perplexity

Rust
5
star
38

alevin-paper-pipeline

A simple pipeline using CGAT framework for benchmarking and analysis.
Python
5
star
39

seq_geom_xform

A crate to convert "complex" sequence library geometries to "simple" geometries
Rust
4
star
40

efgdl-spec

Specification for the extended fragment geometry description language
4
star
41

seqproc

Rust
4
star
42

seq_geom_parser

Testing out rust parsing of single-cell library geometry specifications
Rust
4
star
43

roe

R
3
star
44

roers

Rust
3
star
45

protocol-estuary

Jsonnet
3
star
46

xz

A GitHub clone of the XZ repo (http://git.tukaani.org/xz.git) — not from the original author
C
3
star
47

LabRules

A list of rules and hints (on various different subjects) for members of the COMBINE lab. Others may also find them useful.
3
star
48

txome-clustering

Meaningful and efficient clustering of de novo transcriptome assembly results (better name pending).
Python
3
star
49

cqf-rust

An attempt at rustification of the CQF (https://github.com/splatlab/cqf)
C
2
star
50

libgff

C++
2
star
51

splitp

split-seq preprocessing
Rust
2
star
52

quant-tx-diversity

Jupyter Notebook
2
star
53

alevin-fry-paper-scripts

Jupyter Notebook
2
star
54

scrna-ambiguity

Jupyter Notebook
2
star
55

alevin-tutorial

This is a support website for Alevin-tool (part of Salmon).
1
star
56

radicl-cpp

Internal library for basic reading and writing of .rad files in C++14
C++
1
star
57

FastaDigest

Get the "signature" of fasta files
Python
1
star
58

rsrs

Reference Signatures (rs) in Rust (rs)
Rust
1
star
59

libradicl

Rust
1
star
60

simplr

1
star
61

KallistoFormatDescription

Documenting the (binary) output format used by Kallisto
1
star
62

pufferfish_experiments

Jupyter Notebook
1
star
63

CSE549-txpCoverage

C++
1
star
64

QuantAnalysis

Jupyter Notebook
1
star
65

radtk

Various tools for working with RAD files
Rust
1
star
66

lr_quant_benchmarks

A replicable and modular benchmark for long-read RNA transcript quantification methods
Python
1
star