• Stars
    star
    300
  • Rank 138,870 (Top 3 %)
  • Language
    Rust
  • License
    GNU General Publi...
  • Created almost 7 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Read coverage calculator for metagenomics

CoverM logo

CoverM

Anaconda-Server Badge

CoverM aims to be a configurable, easy to use and fast DNA read coverage and relative abundance calculator focused on metagenomics applications.

CoverM calculates coverage of genomes/MAGs coverm genome (help) or individual contigs coverm contig (help). Calculating coverage by read mapping, its input can either be BAM files sorted by reference, or raw reads and reference genomes in various formats.

Installation

Install through the bioconda package

CoverM and its dependencies can be installed through the bioconda conda channel. After initial setup of conda and the bioconda channel, it can be installed with

conda install coverm

Pre-compiled binary

Statically compiled CoverM binaries available on the releases page. This installation method requires non-Rust dependencies to be installed separately - see the dependencies section.

Compiling from source

CoverM can also be installed from source, using the cargo build system after installing Rust.

cargo install coverm

Development version

To run an unreleased version of CoverM, after installing Rust and any additional dependencies listed below:

git clone https://github.com/wwood/CoverM
cd CoverM
cargo run -- genome ...etc...

To run tests:

cargo build
cargo test

Dependencies

For the full suite of options, additional programs must also be installed, when installing from source or for development.

These can be installed using the conda YAML environment definition:

conda env create -n coverm -f coverm.yml

Or, these can be installed manually:

  • samtools v1.9
  • tee, which is installed by default on most Linux operating systems.
  • man, which is installed by default on most Linux operating systems.

and some mapping software:

For dereplication:

Shell completion

Completion scripts for various shells e.g. BASH can be generated. For example, to install the bash completion script system-wide (this requires root privileges):

coverm shell-completion --output-file coverm --shell bash
mv coverm /etc/bash_completion.d/

It can also be installed into a user's home directory (root privileges not required):

coverm shell-completion --shell bash --output-file /dev/stdout >>~/.bash_completion

In both cases, to take effect, the terminal will likely need to be restarted. To test, type coverm gen and it should complete after pressing the TAB key.

Usage

CoverM operates in several modes. Detailed usage information including examples is given at the links below, or alternatively by using the -h or --full-help flags for each mode:

  • genome - Calculate coverage of genomes
  • contig - Calculate coverage of contigs

There are several utility modes as well:

  • make - Generate BAM files through alignment
  • filter - Remove (or only keep) alignments with insufficient identity
  • cluster - Dereplicate and cluster genomes
  • shell-completion - Generate shell completion scripts

Calculation methods

The -m/--methods flag specifies the specific kind(s) of coverage that are to be calculated.

To illustrate, imagine a set of 3 pairs of reads, where only 1 aligns to a single reference contig of length 1000bp:

read1_forward    ========>
read1_reverse                                  <====+====
contig    ...-----------------------------------------------------....
                 |        |         |         |         |
position        200      210       220       230       240

The difference coverage measures would be:

Method Value Formula Explanation
mean 0.02235294 (10+9)/(1000-2*75) The two reads have 10 and 9 bases aligned exactly, averaged over 1000-2*75 bp (length of contig minus 75bp from each end).
relative_abundance 33.3% 0.02235294/0.02235294*(2/6) If the contig is considered a genome, then its mean coverage is 0.02235294. There is a total of 0.02235294 mean coverage across all genomes, and 2 out of 6 reads (1 out of 3 pairs) map. This coverage calculation is only available in 'genome' mode.
trimmed_mean 0 mean_coverage(mid-ranked-positions) After removing the 5% of bases with highest coverage and 5% of bases with lowest coverage, all remaining positions have coverage 0.
covered_fraction 0.02 (10+10)/1000 20 bases are covered by any read, out of 1000bp.
covered_bases 20 10+10 20 bases are covered.
variance 0.01961962 var({1;20},{0;980}) Variance is calculated as the sample variance.
length 1000 The contig's length is 1000bp.
count 2 2 reads are mapped.
reads_per_base 0.002 2/1000 2 reads are mapped over 1000bp.
metabat contigLen 1000, totalAvgDepth 0.02235294, bam depth 0.02235294, variance 0.01961962 Reproduction of the MetaBAT 'jgi_summarize_bam_contig_depths' tool output, producing identical output.
coverage_histogram 20 bases with coverage 1, 980 bases with coverage 0 The number of positions with each different coverage are tallied.
rpkm 1000000 2 * 10^9 / 1000 / 2 Calculation here assumes no other reads map to other contigs. See https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/ for an explanation of RPKM and TPM
tpm 1000000 rpkm/total_of_rpkm * 10^6 Calculation here assumes no other reads map to other contigs. See RPKM above.

Calculation of genome-wise coverage (genome mode) is similar to calculating contig-wise (contig mode) coverage, except that the unit of reporting is per-genome rather than per-contig. For calculation methods which exclude base positions based on their coverage, all positions from all contigs are considered together. For instance, if a 2000bp contig with all positions having 1X coverage is in a genome with 2,000,000bp contig with no reads mapped, then the trimmed_mean will be 0 as all positions in the 2000bp are in the top 5% of positions sorted by coverage.

License

CoverM is made available under GPL3+. See LICENSE.txt for details. Copyright Ben Woodcroft.

Developed by Ben Woodcroft at the Queensland University of Technology Centre for Microbiome Research.

More Repositories

1

kingfisher-download

Easier download/extract of FASTA/Q read data and metadata from the ENA, NCBI, AWS or GCP.
Python
244
star
2

singlem

Novelty-inclusive microbial community profiling of shotgun metagenomes
Python
72
star
3

galah

More scalable dereplication for metagenome assembled genomes
Rust
46
star
4

OrfM

simple and not slow ORF caller
C
16
star
5

goruby

Gene Ontology (GO) interface for Ruby
Ruby
14
star
6

rarff

(fork) A Ruby library for handling ARFF files, as popularized by the WEKA machine learning program
Ruby
14
star
7

bbbin

A collection of working and non-working bioinformatics scripts
Ruby
8
star
8

bacterial_dating_aerobic_predictor

Prediction of aerobicity in extant and ancient genomes
Jupyter Notebook
6
star
9

finishm

genome improvement and finishing without further sequencing effort
Ruby
5
star
10

bioruby-kmer_counter

A biogem for counting small kmers for fingerprinting nucleotide sequences
Ruby
5
star
11

bioruby-cigar

A parser for CIGAR format alignments
Ruby
5
star
12

bioruby-sra

Ruby interface to the NCBI Sequence Read Archive (SRA)
Ruby
5
star
13

scim

(Unofficial) Smart Common Input Method
4
star
14

smafa

Biological sequence aligner for pre-aligned sequences
Rust
4
star
15

excelsior

NO LONGER MAINTAINED. Ruby gem that uses C bindings to read CSV files superfast. I'm totally serial!
C
3
star
16

bioruby-signalp

A wrapper for the signal peptide prediction algorithm SignalP
Ruby
3
star
17

dirseq

Work out whether RNAseq reads in general agree with the direction of the gene predicted
Ruby
3
star
18

bioruby-ipcress

Parser for the ipcress in-silico PCR program
Ruby
3
star
19

hmmer-rs

Ergonomic Rust interface to HMMER
Rust
3
star
20

bioruby-tm_hmm

A bioruby plugin for interaction with the transmembrane predictor TMHMM
Ruby
2
star
21

bioruby-orthomcl

Ruby wrappings and useful methods for the OrthoMCL database of protein orthology
Ruby
2
star
22

ace2sam

(fork) Converts an ACE alignment to a SAM file
C
2
star
23

bioruby-gag

bio-gag is a biogem for detecting and correcting a particular type of error (gag errors) that occurs/occurred in a particular version of the IonTorrent sequencing kit.
Ruby
2
star
24

bioruby-hmmer3_report

Parser for hmmsearch and hmmscan in the HMMER 3 package.
Ruby
2
star
25

bird_tool_utils-rust

Utility functions for the bird metagenomic toolkit
Rust
2
star
26

bioruby-pileup_iterator

Iterate through a samtools pileup file
Ruby
2
star
27

eupathdb_ubiquity

EuPathDB Ubiquity scripts
2
star
28

bioruby-cnls_screenscraper

a bioruby plugin for interaction with the cNLS (classical Nuclear Localisation Signal) predictor
Ruby
2
star
29

bioruby-stockholm

Parser for stockholm format files
Ruby
2
star
30

exportpred

(fork) fixes compilation errors in this predictor of P. falciparum exported proteins
Shell
2
star
31

blast_link

A few scripts for turning a vanilla NCBI wwwblast installation into a link filled paradise.
Perl
2
star
32

reach

Extend the Ruby Array class for less loops and blocks
Ruby
2
star
33

wwood.github.com

Personal Pages
2
star
34

sandpiper

Website / continuous DB builds for SingleM
Jupyter Notebook
2
star
35

reubypathdb

Ruby classes for parsing EuPathDB database download files from PlasmoDB, ToxoDB, CryptoDB, TriTrypDB, PiroplasmaDB and FungiDB, etc.
Ruby
2
star
36

bird_tool_utils-python

Opinionated Python utilities used in the bird suite of bioinformatic tools, developed by the Woodcroft lab
Python
1
star
37

checkm-rs

Rust library for CheckM genome assessor
Rust
1
star
38

mfqe

FASTA/FASTQ extractor for multiple sets of read names
Rust
1
star
39

bioruby-cog_categories

API to Clusters of Orthologous Groups of proteins (COGs) functional categories
Ruby
1
star
40

ApiLocServer

The web interface behind the ApiLoc server
Ruby
1
star
41

singlem_host_or_ecological_predictor

Predict whether a metagenome is from a host-associated sample or not based on its SingleM profile
Python
1
star
42

essentiality

Finding patterns to predict the essentiality of genes
JavaScript
1
star
43

biouby-agp

A Ruby parser of AGP format assembly scaffolding files
Ruby
1
star
44

tree2tax

Automatic taxonomy through consistent application of tree-based thresholding
Python
1
star
45

ApiLoc

A curated database of protein sub-cellular localisation in apicomplexan parasites
Ruby
1
star
46

bioruby-hmmer_model

Parse PFAM HMM definition files
Ruby
1
star
47

bioruby-krona

Programmatic interface to krona visualisations
Ruby
1
star
48

bioruby-newbler_outputs

Parsers for outputs from the assembly program Newbler
Ruby
1
star
49

extern

Convenient python shell command running library
Python
1
star
50

bioruby-octopus

Running and parsing of the protein transmembrane domain predictor octopus
Ruby
1
star
51

bioruby-exportpred

Wrapper around the ExportPred algorithm for predicting P. falciparum exported proteins
Ruby
1
star
52

ben_bioinformatics

A collection of random bioinformatic rails plugin
Ruby
1
star
53

vietnamese_mnemosyne

A deck of memory flash cards from learning Vietnamese
1
star
54

bioruby-wolf_psort_wrapper

Enables the localisation predictor WoLF PSORT to be run locally
Ruby
1
star
55

bioruby-aliphatic_index

TODO: one-line summary of your gem
Ruby
1
star
56

bioruby-hydropathy

Hydropathy scale for BioRuby
Ruby
1
star
57

yargraph

Another Ruby graph (in the nodes and edges sense of the word) library
Ruby
1
star
58

SilkSlider

Predict silk-like proteins
Ruby
1
star
59

tree2tax2

Assign branch-length based taxonomy to trees free from the 7 levels
Python
1
star
60

bioruby-plasmoap

Ruby implementation of the PlasmoAP program to predict apicoplast transit peptides in Plasmodium falciparum
Ruby
1
star
61

prodigal-runner

Run prodigal on microbial genomes automatically choosing between translation tables 4 and 11.
Python
1
star
62

singlem-installation

Containerised testing of SingleM installation methods
Shell
1
star
63

array_pair

random useful methods for working with Ruby arrays, hashes and objects
Ruby
1
star
64

bioruby-sra_fastq_dumper

Programmatically use the fastq-dumper tool from the SRA toolkit
Ruby
1
star
65

amplicon_encyclopaedia

TODO: one-line summary of your gem
Ruby
1
star
66

bioruby-img_database

An activerecord-based offline database mirroring the Integrated Microbial Genomes (IMG) resource
Ruby
1
star
67

singlem-benchmarking

Jupyter Notebook
1
star
68

bioruby-emboss_six_frame_nucleotide_sequences

a method to get the nucleotide sequence of translations done by the EMBOSS bioinformatics package program transeq
Ruby
1
star