• Stars
    star
    765
  • Rank 59,372 (Top 2 %)
  • Language
    C++
  • License
    GNU General Publi...
  • Created over 9 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

🐟 🍣 🍱 Highly-accurate & wicked fast transcript-level quantification from RNA-seq reads using selective alignment

salmon logo

Documentation Status install with bioconda GitHub tag (latest SemVer)

Try out the new alevin-fry framework for single-cell analysis; tutorials can be found here!

Help guide the development of Salmon, take our survey

What is Salmon?

Salmon is a wicked-fast program to produce a highly-accurate, transcript-level quantification estimates from RNA-seq data. Salmon achieves its accuracy and speed via a number of different innovations, including the use of selective-alignment (accurate but fast-to-compute proxies for traditional read alignments), and massively-parallel stochastic collapsed variational inference. The result is a versatile tool that fits nicely into many different pipelines. For example, you can choose to make use of our selective-alignment algorithm by providing Salmon with raw sequencing reads, or, if it is more convenient, you can provide Salmon with regular alignments (e.g. an unsorted BAM file with alignments to the transcriptome produced with your favorite aligner), and it will use the same wicked-fast, state-of-the-art inference algorithm to estimate transcript-level abundances for your experiment.

Give salmon a try! You can find the latest binary releases here.

The current version number of the master branch of Salmon can be found here

Documentation

The documentation for Salmon is available on ReadTheDocs, check it out here.

Salmon is, and will continue to be, freely and actively supported on a best-effort basis. If you need industrial-grade technical support, please consider the options at oceangenomics.com/contact.

Decoy sequences in transcriptomes

tl;dr: fast is good but fast and accurate is better! Alignment and mapping methodology influence transcript abundance estimation, and accounting for the accounting for fragments of unexpected origin can improve transcript quantification. To this end, salmon provides the ability to index both the transcriptome as well as decoy seuqence that can be considered during mapping and quantification. The decoy sequence accounts for reads that might otherwise be (spuriously) attributed to some annotated transcript. This tutorial provides a step-by-step guide on how to efficiently index the reference transcriptome and genome to produce a decoy-aware index. Specifically, there are 3 possible ways in which the salmon index can be created:

  • cDNA-only index : salmon_index - https://combine-lab.github.io/salmon/getting_started/. This method will result in the smallest index and require the least resources to build, but will be the most prone to possible spurious alignments.

  • SA mashmap index: salmon_partial_sa_index - (regions of genome that have high sequence similarity to the transcriptome) - Details can be found in this README and using this script. While running mashmap can require considerable resources, the resulting decoy files are fairly small. This will result in an index bigger than the cDNA-only index, but still mucch smaller than the full genome index below. It will confer many, though not all, of the benefits of using the entire genome as a decoy sequence.

  • SAF genome index: salmon_sa_index - (the full genome is used as decoy) - The tutorial for creating such an index can be found here. This will result in the largest index, but likely does the best job in avoiding spurious alignments to annotated transcripts.

Facing problems with Indexing?, Check if anyone else already had this problem in the issues section or fill the index generation request form

NOTE:

If you are generating an index to be used for single-cell or single-nucleus quantification with alevin-fry, then we recommend you consider building a spliced+intron (splici) reference. This serves much of the purpose of a decoy-aware index when quantifying with alevin-fry, while also providing the capability to attribute splicing status to mapped fragments. More details about the splici reference and the Unspliced/Spliced/Ambiguous quantification mode it enables can be found here.

Chat live about Salmon

You can chat with the Salmon developers and other users via Gitter (Note: Gitter is much less frequently monitored than GitHub, so if you have an important problem or question, please consider opening an issue here on GitHub)!

Join the chat at https://gitter.im/COMBINE-lab/salmon

More Repositories

1

alevin-fry

🐟 πŸ”¬πŸ¦€ alevin-fry is an efficient and flexible tool for processing single-cell sequencing data, currently focused on single-cell transcriptomics and feature barcoding.
Rust
159
star
2

pufferfish

An efficient index for the colored, compacted, de Bruijn graph
C
107
star
3

RapMap

Rapid sensitive and accurate read mapping via quasi-mapping
C++
89
star
4

cuttlefish

Building the compacted de Bruijn graph efficiently from references or reads.
C++
79
star
5

oarfish

long read RNA-seq quantification
Rust
66
star
6

terminus

Rust
57
star
7

kmers

A bit-packed k-mer representation (and relevant utilities) for rust
Rust
47
star
8

simpleaf

A rust framework to make using alevin-fry even simpler
Rust
44
star
9

wasabi

Prepare Sailfish and Salmon output for downstream analysis
R
43
star
10

SalmonTools

Useful tools for working with Salmon output
C++
36
star
11

RapClust

Accurate, Lightweight Clustering of de novo Transcriptomes using Fragment Equivalence Classes
Python
30
star
12

piscem

Rust wrapper for the next generation (still currently in C++)
Rust
20
star
13

shoal

Improved multi-sample transcript abundance estimates using adaptive priors
C++
20
star
14

grangers

Rust
16
star
15

pufferfish2

Rust
15
star
16

EDS

πŸ’‘ πŸ’Ύ πŸ’½ A simple, intuitive and Efficient single cell binary Data Storage format
Rust
15
star
17

pyroe

Python
15
star
18

grouper

Python
15
star
19

piscem-infer

Rust
14
star
20

mazu

A Rust library for building modular, fast and compact indexes over genomic data
Rust
13
star
21

rainbowfish

A succinct colored dBG representation
C++
12
star
22

quark

semi-reference-based short read compression
C++
11
star
23

minnow

C++
10
star
24

usefulaf

Useful scripts and tools related to alevin-fry
Rust
9
star
25

measuresmatter

A treatise on quantification and differential expression from RNA-seq data
8
star
26

combine-lab.github.io

HTML
8
star
27

quantaf

Nextflow
8
star
28

seine-rs

A (rust)πŸ¦€ library and suite of tools for manipulating and processing the output of salmon, alevin, and alevin-fry 🐟
Rust
7
star
29

piscem-cpp

A small sparse and fast reference index based on SShash and Tiling encoding
C
6
star
30

sc-census

R
6
star
31

GRASS

Graph-Regularized Annotation via Semi-Supervised learning
Python
6
star
32

TreeTerminus

C
5
star
33

COMBINE-lab.github.io-OLD

Lab website
HTML
5
star
34

matryoshka

Methods for the automated discovery of hierarchically-structured chromatin domains
C++
5
star
35

forseti

Python
5
star
36

pcalib

A small "lightweight" implementation of PCA in C++ using the Eigen library
C++
5
star
37

perplexity

Rust
5
star
38

alevin-paper-pipeline

A simple pipeline using CGAT framework for benchmarking and analysis.
Python
5
star
39

seq_geom_xform

A crate to convert "complex" sequence library geometries to "simple" geometries
Rust
4
star
40

efgdl-spec

Specification for the extended fragment geometry description language
4
star
41

seqproc

Rust
4
star
42

seq_geom_parser

Testing out rust parsing of single-cell library geometry specifications
Rust
4
star
43

roe

R
3
star
44

roers

Rust
3
star
45

protocol-estuary

Jsonnet
3
star
46

xz

A GitHub clone of the XZ repo (http://git.tukaani.org/xz.git) β€” not from the original author
C
3
star
47

LabRules

A list of rules and hints (on various different subjects) for members of the COMBINE lab. Others may also find them useful.
3
star
48

txome-clustering

Meaningful and efficient clustering of de novo transcriptome assembly results (better name pending).
Python
3
star
49

cqf-rust

An attempt at rustification of the CQF (https://github.com/splatlab/cqf)
C
2
star
50

libgff

C++
2
star
51

splitp

split-seq preprocessing
Rust
2
star
52

quant-tx-diversity

Jupyter Notebook
2
star
53

alevin-fry-paper-scripts

Jupyter Notebook
2
star
54

scrna-ambiguity

Jupyter Notebook
2
star
55

alevin-tutorial

This is a support website for Alevin-tool (part of Salmon).
1
star
56

radicl-cpp

Internal library for basic reading and writing of .rad files in C++14
C++
1
star
57

FastaDigest

Get the "signature" of fasta files
Python
1
star
58

rsrs

Reference Signatures (rs) in Rust (rs)
Rust
1
star
59

libradicl

Rust
1
star
60

simplr

1
star
61

KallistoFormatDescription

Documenting the (binary) output format used by Kallisto
1
star
62

pufferfish_experiments

Jupyter Notebook
1
star
63

CSE549-txpCoverage

C++
1
star
64

QuantAnalysis

Jupyter Notebook
1
star
65

radtk

Various tools for working with RAD files
Rust
1
star
66

lr_quant_benchmarks

A replicable and modular benchmark for long-read RNA transcript quantification methods
Python
1
star