Introduction
fqtools
is a software suite for fast processing of FASTQ
files. Various file manipulations are supported. See below for a full list of the subcommands available and a brief description of their purpose. Most of the individual subcommands will take either a single file or a pair of files as input. If no input file is specified, fqtools will attempt to read data from stdin
. In this case, it is advisabe to specify the format of the data provided. For subcommands that generate FASTQ data, either a single file or a pair of files will be generated. If no -o
argument is provided, single files will be writted to stdout
.
Citation
If you use fqtools
in pblished work, please can you include a reference to my Bioinformatics paper:
- Droop, A. P. (2016). fqtools: An efficient software suite for modern FASTQ file manipulation. Bioinformatics (Oxford, England). [DOI:10.1093/bioinformatics/btw088]
Installation
fqtools
requires building against both the zlib and htslib libraries:
zlib
is required for processing compressed (.gz
) data. The code relies on several recent zlib file IO functions, so must be a version >= 1.2.3.5.htslib
is required for reading BAM files. If htslib is not installed, download and compilehtslib
. Then, alter theHTSDIR
path in thefqtools
Makefile to point to the htslib source directory.
If ZLib is already installed, building can be performed similar to the following:
git clone https://github.com/alastair-droop/fqtools
cd fqtools/
git clone https://github.com/samtools/htslib
cd htslib/
autoheader
autoconf
./configure
make
make install
cd ..
make
You might need to run the make install
as sudo make install
. The htslib
library must be installed into a location that the built fqtools
program can find (as fqtools
executable is dynamically linked to the htslib
library). So, if you can not (or do not want to) install HTSlib, you must add the location of the libhts.so
file to your LD_LIBRARY_PATH
variable.
Licence
fqtools
is released under the GNU General Public License version 3.
Subcommands
The fqtools
suite contains the following subcommands:
view
View FASTQ fileshead
View the first reads in FASTQ filescount
Count FASTQ file readsheader
View FASTQ file header datasequence
View FASTQ file sequence dataquality
View FASTQ file quality dataheader2
View FASTQ file secondary header datafasta
Convert FASTQ files to FASTA formatbasetab
Tabulate FASTQ base frequenciesqualtab
Tabulate FASTQ quality character frequenciestype
Attempt to guess the FASTQ quality encoding typevalidate
Validate FASTQ filesfind
Find FASTQ reads containing specific sequencestrim
Trim reads in a FASTQ filequalmap
Translate quality values using a mapping file
Each subcommand has its own set of arguments. The global arguments are:
-h
Show this help message and exit.-v
Show the program version and exit.-d
Allow DNA sequence bases (ACGTN)-r
Allow RNA sequence bases (ACGUN)-a
Allow ambiguous sequence bases (RYKMSWBDHV)-m
Allow mask sequence base (X)-u
Allow uppercase sequence bases-l
Allow lowercase sequence bases-p CHR
Set the pair replacement character (default "%")-b BUFSIZE
Set the input buffer size-B BUFSIZE
Set the output buffer size-q QUALTYPE
Set the quality score encoding-f FORMAT
Set the input file format-F FORMAT
Set the output file format-i
Read interleaved input file pairs-I
Write interleaved output file pairs
CHR
This character will be replaced by the pair value when writing paired files.
BUFSIZE
Possible suffixes are [bkMG]. If no suffix is given, value is in bytes.
QUALTYPE
u
Do not assume specifc quality score encodings
Interpret quality scores as Sanger encodedo
Interpret quality scores as Solexa encodedi
Interpret quality scores as Illumina encoded
FORMAT
F
uncompressed FASTQ format (.fastq)f
compressed FASTQ format (.fastq.gz)b
unaligned BAM format (.bam)u
attempt to infer format from file extension, (default .fastq.gz)