• Stars
    star
    169
  • Rank 224,453 (Top 5 %)
  • Language
    C++
  • License
    GNU General Publi...
  • Created over 10 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

SortMeRNA: next-generation sequence filtering and alignment tool

sortmerna

SortMeRNA is a local sequence alignment tool for filtering, mapping and clustering.

The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files. SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.

SortMeRNA is also available through QIIME v1.9.1 and the nf-core RNA-Seq pipeline v.3.9.

Table of Contents

Getting Started

SortMeRNA 4 is C++17 compliant, and mostly uses standard libraries. It uses CMake as the build system, and can be run/built on all major OS including Linux, Windows, and Mac.

Using Conda package

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Then, as per the Bioconda guidelines, add the following conda channels:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict


conda search sortmerna
  Loading channels: done
  # Name                       Version           Build  Channel
  sortmerna                        2.0               0  bioconda
  ...
  sortmerna                      4.3.4               0  bioconda
  ...
  sortmerna                      4.3.6               0  bioconda

# create a new environment and install SortMeRNA in it
conda create --name sortmerna_env
conda activate sortmerna_env
conda install sortmerna
which sortmerna
  /home/biocodz/miniconda3/envs/sortmerna_env/bin/sortmerna

# test the installation
sortmerna --version
  SortMeRNA version 4.3.6
  Build Date: Aug 16 2022
  sortmerna_build_git_sha:@db8c1983765f61986b46ee686734749eda235dcc@
  sortmerna_build_git_date:@2022/08/16 11:42:59@

# view help
sortmerna -h

Using GitHub release binaries on Linux

Visit Sortmerna GitHub Releases

Linux distribution is a Shell script with the embedded installation archive.

Issue the following bash commands:

pushd ~

# get the distro
wget https://github.com/biocore/sortmerna/releases/download/v4.3.6/sortmerna-4.3.6-Linux.sh

# view the installer usage
bash sortmerna-4.3.6-Linux.sh --help
    Options: [defaults in brackets after descriptions]
      --help            print this message
      --version         print cmake installer version
      --prefix=dir      directory in which to install
      --include-subdir  include the sortmerna-4.3.6-Linux subdirectory
      --exclude-subdir  exclude the sortmerna-4.3.6-Linux subdirectory
      --skip-license    accept license

# run the installer
bash sortmerna-4.3.6-Linux.sh --skip-license
  sortmerna Installer Version: 4.3.6, Copyright (c) Clarity Genomics
  This is a self-extracting archive.
  The archive will be extracted to: $HOME/sortmerna
  
  Using target directory: /home/biocodz/sortmerna
  Extracting, please wait...
  
  Unpacking finished successfully

# check the installed binaries
ls -lrt /home/biocodz/sortmerna/bin/
sortmerna

# set PATH
export PATH=$HOME/sortmerna/bin:$PATH

# test the installation
sortmerna --version
  SortMeRNA version 4.3.6
  Build Date: Jul 17 2021
  sortmerna_build_git_sha:@921fa40256760ea2d44c49b21eb326afda748d5e@
  sortmerna_build_git_date:@2022/08/16 10:59:31@

# view help
sortmerna -h

Running

  • The only required options are --ref and --reads
  • Options (any) can be specified usig a single dash e.g. -ref and -reads
  • Both plain fasta/fastq and archived fasta.gz/fastq.gz files are accepted
  • file extensions .fastq, .fastq.gz, .fq, .fq.gz, .fasta, ... are optional. The format and compression are automatically recognized
  • Relative paths are accepted

for example

# single reference and single reads file
sortmerna --ref REF_PATH --reads READS_PATH

# for multiple references use multiple '--ref'
sortmerna --ref REF_PATH_1 --ref REF_PATH_2 --ref REF_PATH_3 --reads READS_PATH

# for paired reads use '--reads' twice
sortmerna --ref REF_PATH_1 --ref REF_PATH_2 --ref REF_PATH_3 --reads READS_PATH_1 --reads READS_PATH_2

More examples can be found in test.jinja and run.py

Execution trace

Here is a sample execution trace.

IMPORTANT

  • Progressing execution trace showing the number of reads processed so far indicates a normally running program.
  • Non-progressing trace means a problem. Please, kill the process (no waiting for two days), and file an issue here
  • please, provide the execution trace when filing issues.

Sample execution statistics are provided to give an idea on what the execution time might be.

Building from sources

Build instructions

User Manual

See Sortmerna Read The Docs project.

In case you need PDF, any modern browser can print web pages to PDF.

Taxonomies

The folder data/rRNA_databases/silva_ids_acc_tax.tar.gz contains SILVA taxonomy strings (extracted from XML file generated by ARB) for each of the reference sequences in the representative databases. The format of the files is three tab-separated columns, the first being the reference sequence ID, the second being the accession number and the final column is the taxonomy.

Citation

If you use SortMeRNA, please cite: Kopylova E., NoΓ© L. and Touzet H., "SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data", Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.

Contributors

See AUTHORS for a list of contributors to this project.

Support

For questions and comments, feel free to file an issue, or start a discussion.

More Repositories

1

scikit-bio

scikit-bio is an open-source, BSD-licensed, Python package providing data structures, algorithms, and educational resources for bioinformatics.
Python
781
star
2

qiime

Official QIIME 1 software repository. QIIME 2 (https://qiime2.org) has succeeded QIIME 1 as of January 2018.
Python
285
star
3

emp

Code repository of the Earth Microbiome Project.
Jupyter Notebook
154
star
4

mmvec

Neural networks for microbe-metabolite interaction analysis
Python
117
star
5

American-Gut

American Gut open-access data and IPython notebooks
Jupyter Notebook
107
star
6

biom-format

The Biological Observation Matrix (BIOM) Format Project
Python
92
star
7

deblur

Deblur is a greedy deconvolution algorithm based on known read error profiles.
Python
91
star
8

tcga

Microbial analysis in TCGA data
Jupyter Notebook
88
star
9

gemelli

Gemelli is a tool box for running Robust Aitchison PCA (RPCA), Joint Robust Aitchison PCA (Joint-RPCA), TEMPoral TEnsor Decomposition (TEMPTED), and Compositional Tensor Factorization (CTF) on sparse compositional omics datasets.
Python
67
star
10

songbird

Vanilla regression methods for microbiome differential abundance analysis
Python
56
star
11

gneiss

compositional data analysis toolbox
Jupyter Notebook
55
star
12

emperor

Emperor a tool for the analysis and visualization of large microbial ecology datasets
JavaScript
52
star
13

empress

A fast and scalable phylogenetic tree viewer for microbiome data analysis
JavaScript
45
star
14

redbiom

Sample search by metadata and features
Python
44
star
15

unifrac

Python
37
star
16

scikit-bio-cookbook

Recipes for bioinformatics analyses with scikit-bio
Jupyter Notebook
36
star
17

DEICODE

Robust Aitchison PCA from sparse count data
JavaScript
33
star
18

q2-qemistree

Hierarchical orderings for mass spectrometry data. Canonically pronounced "chemis-tree".
Python
31
star
19

qurro

Visualize differentially ranked features (taxa, metabolites, ...) and their log-ratios across samples
JavaScript
31
star
20

calour

exploratory and interactive microbiome analyses based on heatmaps
Python
27
star
21

q2-greengenes2

A QIIME 2 plugin for interaction with the Greengenes2 database
Python
26
star
22

wol

Reference Phylogeny for Bacterial and Archaeal Genomes
Jupyter Notebook
24
star
23

BIRDMAn

Bayesian Inferential Regression for Differential Microbiome Analysis
Python
22
star
24

Platypus-Conquistador

Confirming specific taxonomic groups within your samples.
Python
19
star
25

micronota

annotation pipeline for microbial genomes and metagenomes
Python
18
star
26

tax2tree

Automated taxonomy decoration onto a tree
Python
14
star
27

evident

Python
14
star
28

qadabra

Snakemake workflow for comparison of differential abundance ranks
Python
13
star
29

oecophylla

shotgun pipeline
Python
11
star
30

horizomer

Workflow for detecting genome-wide horizontal gene transfers
Python
11
star
31

greengenes2

Processing support for Greengenes2
Python
11
star
32

pyqi

Tools for developing and testing command line interfaces in Python.
Python
9
star
33

burrito

Python framework for controlling command-line applications.
Python
8
star
34

pynast

Python Nearest Alignment Space Termination tool (PyNAST): Official repository for software and unit tests
Python
8
star
35

metagenomics_pooling_notebook

Jupyter notebooks to assist with sample processing
Python
8
star
36

my-microbes

A set of tools for delivering personal microbiome results to individuals participating in microbiome sequencing studies.
Python
7
star
37

zebra_filter

Filtering out false taxonomic hits from shotgun sequencing based on genome coverage
Python
7
star
38

burrito-fillings

Application controllers for command line bioinformatics applications
Python
7
star
39

Evident-initial-demo

Elucidating sampling effort for microbial analysis studies
JavaScript
7
star
40

mds-approximations

Multidimensional scaling algorithms for microbiology-ecology datasets.
Python
6
star
41

microsetta-private-api

A private microservice to support The Microsetta Initiative
Python
6
star
42

conda-recipes

conda recipes for bioinformatic tools like blast+, infernal, etc.
Python
6
star
43

american-gut-web

The website for the American Gut Project participant portal
Python
5
star
44

qiime-default-reference

Default reference data files for use with QIIME.
Python
4
star
45

scikit-bio-rfcs

Request For Comments (RFCs) for scikit-bio.
4
star
46

labadmin

Administration website for the Knight Lab
Python
4
star
47

q2-umap

Applying umap to microbiome data via QIIME2
Python
4
star
48

improved-octo-waddle

Balanced parentheses succinct data structure in Python
Jupyter Notebook
4
star
49

dsFDR

descrete False Discovery Rate method
Python
3
star
50

SitePainter

A tool for exploring biogeographical patterns
JavaScript
3
star
51

bayestime

Jupyter Notebook
3
star
52

genome-subsampler

Statistical and empirical subsampling of reference genomes
Jupyter Notebook
3
star
53

micov

Aggregate genome coverage
Python
3
star
54

cmi-workshops

2
star
55

taxster

taxster: assigning taxonomy to organisms you've never even heard of
Python
2
star
56

PipeClust

MPI-based sequence clusterer
C
2
star
57

microsetta-public-api

A public microservice to support The Microsetta Initiative
Python
2
star
58

LabControl

lab manager for plate maps and sequence flows
Python
2
star
59

american-gut-rest

RESTful interface into the American Gut data
Python
2
star
60

unifrac-binaries

C++
1
star
61

biocore.github.io

CSS
1
star
62

q2-ili

QIIME2 plugin for `ili
Python
1
star
63

q2-katharoseq

Python
1
star
64

microsetta-interface

The Microsetta participant facing user interface
Jinja
1
star
65

qiime-workshops

Materials for biocore organized workshops
Jupyter Notebook
1
star
66

microprot

structural annotation pipeline for microbial genomes and metagenomes
Python
1
star
67

mg-scripts

Knight Lab internal Metagenomic processing scripts for demultiplexing, QC and host removal
Python
1
star
68

sage-emperor

Emperor implementation in the SAGE2 framework
JavaScript
1
star
69

q2-mislabeled

A QIIME 2 plugin for assessing sample mislabeling and contamination
Python
1
star
70

q2-american-gut

A QIIME2 plugin for working with and processing American Gut data
Python
1
star
71

basespace-qiime

QIIME's BaseSpace App
HTML
1
star