ngs-bits - Short-read sequencing tools
Obtaining ngs-bits
Binaries of ngs-bits are available via Bioconda:
- Binaries for Linux/macOS
Alternatively, ngs-bits can be built from sources. Use git to clone the most recent release (the source code package of GitHub does not contains required sub-modules):
> git clone --recursive https://github.com/imgag/ngs-bits.git
> cd ngs-bits
> git checkout 2023_09
> git submodule update --recursive --init
Depending on your operating system, building instructions vary slightly:
Support
Please report any issues or questions to the ngs-bits issue tracker.
Documentation
Have a look at the ECCB'2018 poster.
The documentation of individual tools is linked in the tools list below.
For some tools the documentation pages contain only the command-line help, for other tools they contain more information.
License
ngs-bits is provided under the MIT license and is based on other open source software:
- htslib for HTS data format support (BAM, VCF, ...)
- SimpleCrypt for weak encryption
- QR-Code-generator for QR code generation
Tools list
ngs-bits contains a lot of tools that are used for NGS-based diagnostics in our institute.
Some of the tools need the NGSD, a database that contains for example gene, transcript and exon data.
Installation instructions for the NGSD can be found here.
Main tools
- SeqPurge - A highly-sensitive adapter trimmer for paired-end short-read data.
- SampleSimilarity - Calculates pairwise sample similarity metrics from VCF/BAM files.
- SampleGender - Determines sample gender based on a BAM file.
- SampleAncestry - Estimates the ancestry of a sample based on variants.
- CnvHunter - CNV detection from targeted resequencing data using non-matched control samples.
- RohHunter - ROH detection based on a variant list annotated with AF values.
- UpdHunter - UPD detection from trio variant data.
QC tools
The default output format of the quality control tools is qcML, an XML-based format for -omics quality control, that consists of an XML schema, which defined the overall structure of the format, and an ontology which defines the QC metrics that can be used.
- ReadQC - Quality control tool for FASTQ files.
- MappingQC - Quality control tool for a BAM file.
- VariantQC - Quality control tool for a VCF file.
- SomaticQC - Quality control tool for tumor-normal pairs (paper and example output data).
- TrioMaternalContamination - Detects maternal contamination of a child using SNPs from parents.
- RnaQC - Calculates QC metrics for RNA samples.
BAM tools
- BamClipOverlap - (Soft-)Clips paired-end reads that overlap.
- BamDownsample - Downsamples a BAM file to the given percentage of reads.
- BamFilter - Filters a BAM file by multiple criteria.
- BamHighCoverage - Determines high-coverage regions in a BAM file.
- BamToFastq - Converts a BAM file to FASTQ files (paired-end only).
BED tools
- BedAdd - Merges regions from several BED files.
- BedAnnotateFromBed - Annotates BED file regions with information from a second BED file.
- BedAnnotateGC - Annnotates the regions in a BED file with GC content.
- BedAnnotateGenes - Annotates BED file regions with gene names (needs NGSD).
- BedChunk - Splits regions in a BED file to chunks of a desired size.
- BedCoverage - Annotates the regions in a BED file with the average coverage in one or several BAM files.
- BedExtend - Extends the regions in a BED file by n bases.
- BedGeneOverlap - Calculates how much of each overlapping gene is covered (needs NGSD).
- BedHighCoverage - Detects high-coverage regions from a BAM file.
- BedInfo - Prints summary information about a BED file.
- BedIntersect - Intersects two BED files.
- BedLiftOver - Lift-over of regions in a BED file to a different genome build.
- BedLowCoverage - Calcualtes regions of low coverage based on a input BED and BAM file.
- BedMerge - Merges overlapping regions in a BED file.
- BedReadCount - Annoates the regions in a BED file with the read count from a BAM file.
- BedShrink - Shrinks the regions in a BED file by n bases.
- BedSort - Sorts the regions in a BED file
- BedSubtract - Subracts one BED file from another BED file.
- BedToFasta - Converts BED file to a FASTA file (based on the reference genome).
FASTQ tools
- FastqAddBarcode - Adds sequences from separate FASTQ as barcodes to read IDs.
- FastqConvert - Converts the quality scores from Illumina 1.5 offset to Sanger/Illumina 1.8 offset.
- FastqConcat - Concatinates several FASTQ files into one output FASTQ file.
- FastqDownsample - Downsamples paired-end FASTQ files.
- FastqExtract - Extracts reads from a FASTQ file according to an ID list.
- FastqExtractBarcode - Moves molecular barcodes of reads to a separate file.
- FastqExtractUMI - Moves unique moleculare identifier from read sequence to read ID.
- FastqFormat - Determines the quality score offset of a FASTQ file.
- FastqList - Lists read IDs and base counts.
- FastqMidParser - Counts the number of occurances of each MID/index/barcode in a FASTQ file.
- FastqToFasta - Converts FASTQ to FASTA format.
- FastqTrim - Trims start/end bases from the reads in a FASTQ file.
VCF tools (small variants)
- VcfAdd - Appends variants from a VCF file to another VCF file.
- VcfAnnotateConsequence - Adds transcript-specific consequence predictions to a VCF file (similar to Ensembl VEP).
- VcfAnnotateFromBed - Annotates the INFO column of a VCF with data from a BED file.
- VcfAnnotateFromBigWig - Annotates the INFO column of a VCF with data from a BED file.
- VcfAnnotateFromVcf - Annotates a VCF file with data from one or more source VCF files.
- VcfAnnotateHexplorer - Annotates a VCF with Hexplorer and HBond scores.
- VcfAnnotateMaxEntScan - Annotates a VCF file with MaxEntScan scores.
- VcfBreakMulti - Breaks multi-allelic variants into several lines, making sure that allele-specific INFO/SAMPLE fields are still valid.
- VcfCalculatePRS - Calculates the Polgenic Risk Score(s) for a sample.
- VcfCheck - Checks a VCF file for errors.
- VcfExtractSamples - Extract one or several samples from a VCF file.
- VcfFilter - Filters a VCF based on the given criteria.
- VcfLeftNormalize - Normalizes all variants and shifts indels to the left in a VCF file.
- VcfSort - Sorts variant lists according to chromosomal position.
- VcfStreamSort - Sorts entries of a VCF file according to genomic position using a stream.
- VcfSubstract - Substracts the variants in a VCF from a second VCF.
- VcfToBed - Converts a VCF file to a BED file.
- VcfToBedpe - Converts a VCF file containing structural variants to BEDPE format.
- VcfToTsv - Converts a VCF file to a tab-separated text file.
BEDPE tools (structural variants)
- BedpeAnnotateFromBed - Annotates a BEDPE file with information from a BED file.
- BedpeFilter - Filters a BEDPE file by region.
- BedpeGeneAnnotation - Annotates a BEDPE file with gene information from the NGSD (needs NGSD).
- BedpeSort - Sort a BEDPE file according to chromosomal position.
- BedpeToBed - Converts a BEDPE file into BED file.
- SvFilterAnnotations - Filter a structural variant list in BEDPE format based on variant annotations.
Gene handling tools
- GenePrioritization: Performs gene prioritization based on list of known disease genes and a PPI graph (see also GraphStringDb).
- GraphStringDb: Creates simple representation of String-DB interaction graph.
- GenesToApproved - Replaces gene symbols by approved symbols using the HGNC database (needs NGSD).
- GenesToBed - Converts a text file with gene names to a BED file (needs NGSD).
- GenesToTranscripts - Converts a text file with gene names to transcript names (needs NGSD).
- NGSDExportGenes - Lists genes from NGSD (needs NGSD).
- TranscriptsToBed - Converts a text file with transcript names to a BED file (needs NGSD).
Phenotype handling tools
- PhenotypesToGenes - Converts a phenotype list to a list of matching genes (needs NGSD).
- PhenotypeSubtree - Returns all sub-phenotype of a given phenotype (needs NGSD).
Misc tools
- PERsim - Paired-end read simulator for Illumina reads.
- FastaInfo - Basic info on a FASTA file.
- HgvsToVcf - Transforms a TSV file with transcript ID and HGVS.c change into a VCF file (needs NGSD).
ChangeLog
Changes of master since last release:
- none so far
Changes in release 2023_09:
- new tools: VcfAnnotateMaxEntScan, SamplePath, NGSDImportGenlab, NGSDImportOncotree
- NGSDExportSamples: added '-add_dates' flag.
- VcfLeftNormalize: added flag '-right' to right-normaliazion.
- NGSD
- Updated enums of 'sequencing_run' for NovaSeqX+ suppport.
- Added 'year_of_borth, 'order_date' and 'sampling_date' to 'sample' table.
- Added 'processing_modus' enum and 'batch number' varchar to 'processed_sample' table.
- Added tables 'oncotree_term', 'oncotree_parent' and 'oncotree_obsolete'.
For older changes see releases.