CRAMINO
A tool for quick quality assessment of cram and bam files, intended for long read sequencing.
Installation
Preferably, for most users, download a ready-to-use binary for your system to add directory on your $PATH from the releases.
You may have to change the file permissions to execute it with chmod +x cramino
Alternatively, use conda to install
conda install -c bioconda cramino
Or for Rust developers, build cramino with cargo:
cargo install cramino
Usage
cramino [OPTIONS] <INPUT>
Arguments:
[INPUT] cram or bam file to check [default: -]
Options:
-t, --threads <THREADS> Number of parallel decompression threads to use [default: 4]
--reference <REFERENCE> reference for decompressing cram
-m, --min-read-len <MIN_READ_LEN> Minimal length of read to be considered [default: 0]
--hist If histograms have to be generated
--checksum If a checksum has to be calculated
--arrow <ARROW> Write data to an arrow format file
--karyotype Provide normalized number of reads per chromosome
--phased Calculate metrics for phased reads
--spliced Provide metrics for spliced data
--ubam Provide metrics for unaligned reads
-h, --help Print help
-V, --version Print version
Example output
File name example.cram
Number of reads 14108020
% from total reads 83.45
Yield [Gb] 139.91
N50 17447
Median length 6743.00
Mean length 9917
Median identity 94.27
Mean identity 92.53
Path alignment/example.cram
Creation time 09/09/2022 10:53:36
A 140Gbase bam file is processed in 12 minutes, using <1Gbyte of memory. Note that the identity score above is defined as the gap-compressed identity. The --ubam
flag will provide metrics for all reads in the file, regardless of whether they are aligned or not.
The % from total reads
output field contains the percentage of reads used for this report, depending on the --min-read-len
and --ubam
settings. Without both of those, this indicates the % of reads that are mapped, primary or supplementary.
Optional output
- a checksum to check if files were updated/changed or corrupted. (
--checksum
) - an arrow file for use within NanoPlot and NanoComp (
--arrow <filename>
) - calculating a normalised number of reads per chromosome, e.g. to determine the sex or aneuploidies (
--karyotype
) - information about the phase blocks. (
--phased
) - information about number of splice sites. (
--spliced
) - histograms of read lengths and read identities, as below. (
--hist
). With--phased
, also a histogram of phase block lengths. Please let me know if the histograms look inappropriately scaled for your data.
# Histogram for read lengths:
0-2000 ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2000-4000 βββββββββββββββββββββββββββββββββ
4000-6000 ββββββββββββββββββββββββββββββββ
6000-8000 βββββββββββββββββββββββββββββββ
8000-10000 βββββββββββββββββββββββββββ
10000-12000 βββββββββββββββββββββββββββ
12000-14000 ββββββββββββββββββββββββββββββ
14000-16000 βββββββββββββββββββββββββββββββββββ
16000-18000 ββββββββββββββββββββββββββββββββββββββ
18000-20000 βββββββββββββββββββββββββββββββββββββ
20000-22000 βββββββββββββββββββββββββββββββββ
22000-24000 βββββββββββββββββββββββββββ
24000-26000 βββββββββββββββββββββ
26000-28000 ββββββββββββββββ
28000-30000 ββββββββββββ
30000-32000 βββββββββ
32000-34000 ββββββ
34000-36000 ββββ
36000-38000 ββ
38000-40000 β
40000-42000 β
42000-44000 β
44000-46000
46000-48000
48000-50000
50000-52000
52000-54000
54000-56000
56000-58000
58000-60000
60000+
# Histogram for Phred-scaled accuracies:
Q0-1
Q1-2
Q2-3
Q3-4
Q4-5
Q5-6 βββ
Q6-7 ββββββββββββ
Q7-8 ββββββββββββββββββββββββ
Q8-9 βββββββββββββββββββββββββββ
Q9-10 βββββββββββββββββββββββ
Q10-11 βββββββββββββββββββββββββββββ
Q11-12 ββββββββββββββββββββββββββββββββββββββββββ
Q12-13 βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Q13-14 ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Q14-15 βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Q15-16 βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Q16-17 βββββββββββββββββββββββββββββββββββββββββββ
Q17-18 ββββββββββββββββ
Q18-19 ββββ
Q19-20 β
Q20-21
Q21-22
Q22-23
Q23-24
Q24-25
Q25-26
Q26-27
Q27-28
Q28-29
Q29-30
Q30-31
Q31-32
Q32-33
Q33-34
Q34-35
Q35-36
Q36-37
Q37-38
Q38-39
Q39-40
Q40+
CITATION
If you use this tool, please consider citing our publication.