• Stars
    star
    189
  • Rank 204,649 (Top 5 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 7 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Filtering and trimming of long read sequencing data

Nanofilt

Filtering and trimming of long read sequencing data.

Please be aware that NanoFilt will no longer receive any updates, as (most of) its functionality is included in chopper (which should be lots faster, too).

Twitter URL conda badge

Filtering on quality and/or read length, and optional trimming after passing filters.
Reads from stdin, writes to stdout. Optionally reads directly from an uncompressed file specified on the command line.

Intended to be used:

  • directly after fastq extraction
  • prior to mapping
  • in a stream between extraction and mapping

See also my post about NanoFilt on my blog Gigabase or gigabyte.
Due to a discrepancy between calculated read quality and the quality as summarized by albacore this script takes since v1.1.0 optionally also a --summary argument. Using this argument with the sequencing_summary.txt file from albacore will do the filtering using the quality scores from the summary. It's also faster.

INSTALLATION AND UPGRADING:

pip install nanofilt
pip install nanofilt --upgrade

or

conda install -c bioconda nanofilt

NanoFilt is written for Python 3.

USAGE:

NanoFilt [-h] [-v] [--logfile LOGFILE] [-l LENGTH]
                [--maxlength MAXLENGTH] [-q QUALITY] [--minGC MINGC]
                [--maxGC MAXGC] [--headcrop HEADCROP] [--tailcrop TAILCROP]
                [-s SUMMARY] [--readtype {1D,2D,1D2}]
                [input]

Perform quality and/or length and/or GC filtering of (long read) fastq data. Reads on stdin.

General options:
  -h, --help            show the help and exit
  -v, --version         Print version and exit.
  --logfile LOGFILE     Specify the path and filename for the log file.
  input                 input, uncompressed fastq file (optional)

Options for filtering reads on.:
  -l, --length LENGTH   Filter on a minimum read length
  --maxlength MAXLENGTH Filter on a maximum read length
  -q, --quality QUALITY Filter on a minimum average read quality score
  --minGC MINGC         Sequences must have GC content >= to this. Float between 0.0 and 1.0. Ignored if
                        using summary file.
  --maxGC MAXGC         Sequences must have GC content <= to this. Float between 0.0 and 1.0. Ignored if
                        using summary file.

Options for trimming reads.:
  --headcrop HEADCROP   Trim n nucleotides from start of read
  --tailcrop TAILCROP   Trim n nucleotides from end of read

Input options.:
  -s, --summary SUMMARY Use albacore or guppy summary file for quality scores
  --readtype            Which read type to extract information about from summary. Options are 1D, 2D or 1D2

EXAMPLES

gunzip -c reads.fastq.gz | NanoFilt -q 10 -l 500 --headcrop 50 | minimap2 genome.fa - | samtools sort -O BAM -@24 -o alignment.bam -
gunzip -c reads.fastq.gz | NanoFilt -q 12 --headcrop 75 | gzip > trimmed-reads.fastq.gz
gunzip -c reads.fastq.gz | NanoFilt -q 10 | gzip > highQuality-reads.fastq.gz

I welcome all suggestions, bug reports, feature requests and contributions. Please leave an issue or open a pull request. I will usually respond within a day, or rarely within a few days.

CITATION

If you use this tool, please consider citing our publication.

More Repositories

1

NanoPlot

Plotting scripts for long read sequencing data
Python
430
star
2

nanopack

An overview of all nanopack tools
Python
210
star
3

chopper

Rust
150
star
4

cramino

A *fast* tool for BAM/CRAM quality evaluation, intended for long reads
Rust
127
star
5

nanocomp

Comparison of multiple long read datasets
Python
103
star
6

nanostat

Create statistic summary of an Oxford Nanopore read dataset
Python
92
star
7

nanoQC

Quality control tools for nanopore sequencing data
Python
91
star
8

methplotlib

Plotting tools for nanopore methylation data
Python
90
star
9

nano-snakemake

A snakemake pipeline for SV analysis from nanopore genome sequencing
Python
51
star
10

nanolyse

Remove lambda phage reads from a fastq file
Python
28
star
11

surpyvor

A python wrapper around SURVIVOR
Python
19
star
12

kyber

Rust
17
star
13

DEA.R

Script to automate differential expression analysis using DESeq2, edgeR or limma-voom
R
17
star
14

phasius

Rust
13
star
15

nanoget

Functions to extract information from Oxford Nanopore sequencing data and alignments
Python
11
star
16

nanomath

A few simple math function for other Oxford Nanopore processing scripts
Python
9
star
17

PromisingPreprint

A python twitter bot tweeting about preprints reaching an interesting altmetric score
Python
8
star
18

STRdust

Tandem repeat genotyping from long reads
Rust
8
star
19

enrichr_cli

Python script to use enrichr from command line (http://amp.pharm.mssm.edu/Enrichr/)
Python
7
star
20

nanotest

Small test datasets for testing nanopack scripts and modules
Shell
5
star
21

make_arrow

A Rust tool to create an arrow file from a cram/bam file
Rust
4
star
22

pathSTR

Repository with code for the analysis of pathogenic STRs in the 1000G ONT resequencing data
Jupyter Notebook
4
star
23

read_length_SV_discovery

Jupyter Notebook
3
star
24

nanoplotter

Plotting functions of Oxford Nanopore sequencing data
Python
2
star
25

fast5purge

Purge a fast5 file from sensitive information
Python
2
star
26

tool-packaging

Some notes on how to make a pypi package
Python
1
star
27

GermlineCNVCaller

Testing the GATK4.beta.5 GermlineCNVCaller
Python
1
star
28

nanosplit

Splitting Oxford Nanopore data in a fail and pass dataset using a user defined quality cutoff
Python
1
star
29

determine-gender

Scripts to determine the gender of samples in exome and transcriptome sequencing
Python
1
star
30

combine_images

Bit of Python code to resize and combine images
Python
1
star