• Stars
    star
    127
  • Rank 282,790 (Top 6 %)
  • Language
    Rust
  • License
    MIT License
  • Created about 2 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A *fast* tool for BAM/CRAM quality evaluation, intended for long reads

CRAMINO

A tool for quick quality assessment of cram and bam files, intended for long read sequencing.

Installation

Preferably, for most users, download a ready-to-use binary for your system to add directory on your $PATH from the releases.
You may have to change the file permissions to execute it with chmod +x cramino

Alternatively, use conda to install
conda install -c bioconda cramino

Or for Rust developers, build cramino with cargo:
cargo install cramino

Usage

cramino [OPTIONS] <INPUT>

Arguments:
  [INPUT]  cram or bam file to check [default: -]

Options:
  -t, --threads <THREADS>            Number of parallel decompression threads to use [default: 4]
      --reference <REFERENCE>        reference for decompressing cram
  -m, --min-read-len <MIN_READ_LEN>  Minimal length of read to be considered [default: 0]
      --hist                         If histograms have to be generated
      --checksum                     If a checksum has to be calculated
      --arrow <ARROW>                Write data to an arrow format file
      --karyotype                    Provide normalized number of reads per chromosome
      --phased                       Calculate metrics for phased reads
      --spliced                      Provide metrics for spliced data
      --ubam                         Provide metrics for unaligned reads
  -h, --help                         Print help
  -V, --version                      Print version

Example output

File name       example.cram
Number of reads 14108020
% from total reads  83.45
Yield [Gb]      139.91
N50     17447
Median length   6743.00
Mean length     9917
Median identity 94.27
Mean identity   92.53
Path    alignment/example.cram
Creation time   09/09/2022 10:53:36

A 140Gbase bam file is processed in 12 minutes, using <1Gbyte of memory. Note that the identity score above is defined as the gap-compressed identity. The --ubam flag will provide metrics for all reads in the file, regardless of whether they are aligned or not. The % from total reads output field contains the percentage of reads used for this report, depending on the --min-read-len and --ubam settings. Without both of those, this indicates the % of reads that are mapped, primary or supplementary.

Optional output

  • a checksum to check if files were updated/changed or corrupted. (--checksum)
  • an arrow file for use within NanoPlot and NanoComp (--arrow <filename>)
  • calculating a normalised number of reads per chromosome, e.g. to determine the sex or aneuploidies (--karyotype)
  • information about the phase blocks. (--phased)
  • information about number of splice sites. (--spliced)
  • histograms of read lengths and read identities, as below. (--hist). With --phased, also a histogram of phase block lengths. Please let me know if the histograms look inappropriately scaled for your data.
# Histogram for read lengths:
     0-2000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  2000-4000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  4000-6000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  6000-8000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 8000-10000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
10000-12000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
12000-14000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
14000-16000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
16000-18000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
18000-20000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
20000-22000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
22000-24000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
24000-26000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
26000-28000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
28000-30000 ∎∎∎∎∎∎∎∎∎∎∎∎
30000-32000 ∎∎∎∎∎∎∎∎∎
32000-34000 ∎∎∎∎∎∎
34000-36000 ∎∎∎∎
36000-38000 ∎∎
38000-40000 ∎
40000-42000 ∎
42000-44000 ∎
44000-46000 
46000-48000 
48000-50000 
50000-52000 
52000-54000 
54000-56000 
56000-58000 
58000-60000 
     60000+ 


# Histogram for Phred-scaled accuracies:
  Q0-1 
  Q1-2 
  Q2-3 
  Q3-4 
  Q4-5 
  Q5-6 ∎∎∎
  Q6-7 ∎∎∎∎∎∎∎∎∎∎∎∎
  Q7-8 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  Q8-9 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 Q9-10 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q10-11 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q11-12 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q12-13 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q13-14 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q14-15 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q15-16 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q16-17 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q17-18 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q18-19 ∎∎∎∎
Q19-20 ∎
Q20-21 
Q21-22 
Q22-23 
Q23-24 
Q24-25 
Q25-26 
Q26-27 
Q27-28 
Q28-29 
Q29-30 
Q30-31 
Q31-32 
Q32-33 
Q33-34 
Q34-35 
Q35-36 
Q36-37 
Q37-38 
Q38-39 
Q39-40 
  Q40+ 

CITATION

If you use this tool, please consider citing our publication.

More Repositories

1

NanoPlot

Plotting scripts for long read sequencing data
Python
430
star
2

nanopack

An overview of all nanopack tools
Python
210
star
3

nanofilt

Filtering and trimming of long read sequencing data
Python
189
star
4

chopper

Rust
150
star
5

nanocomp

Comparison of multiple long read datasets
Python
103
star
6

nanostat

Create statistic summary of an Oxford Nanopore read dataset
Python
92
star
7

nanoQC

Quality control tools for nanopore sequencing data
Python
91
star
8

methplotlib

Plotting tools for nanopore methylation data
Python
90
star
9

nano-snakemake

A snakemake pipeline for SV analysis from nanopore genome sequencing
Python
51
star
10

nanolyse

Remove lambda phage reads from a fastq file
Python
28
star
11

surpyvor

A python wrapper around SURVIVOR
Python
19
star
12

kyber

Rust
17
star
13

DEA.R

Script to automate differential expression analysis using DESeq2, edgeR or limma-voom
R
17
star
14

phasius

Rust
13
star
15

nanoget

Functions to extract information from Oxford Nanopore sequencing data and alignments
Python
11
star
16

nanomath

A few simple math function for other Oxford Nanopore processing scripts
Python
9
star
17

PromisingPreprint

A python twitter bot tweeting about preprints reaching an interesting altmetric score
Python
8
star
18

STRdust

Tandem repeat genotyping from long reads
Rust
8
star
19

enrichr_cli

Python script to use enrichr from command line (http://amp.pharm.mssm.edu/Enrichr/)
Python
7
star
20

nanotest

Small test datasets for testing nanopack scripts and modules
Shell
5
star
21

make_arrow

A Rust tool to create an arrow file from a cram/bam file
Rust
4
star
22

pathSTR

Repository with code for the analysis of pathogenic STRs in the 1000G ONT resequencing data
Jupyter Notebook
4
star
23

read_length_SV_discovery

Jupyter Notebook
3
star
24

nanoplotter

Plotting functions of Oxford Nanopore sequencing data
Python
2
star
25

fast5purge

Purge a fast5 file from sensitive information
Python
2
star
26

tool-packaging

Some notes on how to make a pypi package
Python
1
star
27

GermlineCNVCaller

Testing the GATK4.beta.5 GermlineCNVCaller
Python
1
star
28

nanosplit

Splitting Oxford Nanopore data in a fail and pass dataset using a user defined quality cutoff
Python
1
star
29

determine-gender

Scripts to determine the gender of samples in exome and transcriptome sequencing
Python
1
star
30

combine_images

Bit of Python code to resize and combine images
Python
1
star