Expansion Hunter: a tool for estimating repeat sizes
There are a number of regions in the human genome consisting of repetitions of short unit sequence (commonly a trimer). Such repeat regions can expand to a size much larger than the read length and thereby cause a disease. Fragile X Syndrome, ALS, and Huntington's Disease are well known examples.
Expansion Hunter aims to estimate sizes of such repeats by performing a targeted search through a BAM/CRAM file for reads that span, flank, and are fully contained in each repeat.
Linux and macOS operating systems are currently supported.
License
Expansion Hunter is provided under the terms and conditions of the PolyForm Strict License 1.0.0. It relies on several third party packages provided under other open source licenses, please see COPYRIGHT.txt for additional details.
Documentation
Installation instructions, usage guide, and description of file formats are contained in the docs folder.
Companion tools and resources
- A genome-wide STR catalog containing polymorphic repeats with similar properties to known pathogenic and functional STRs
- REViewer, a tool for visualizing alignments of reads in regions containing tandem repeats
Method
The method is described in the following papers:
-
Egor Dolzhenko, Joke van Vugt, Richard Shaw, Mitch Bekritsky, and others, Detection of long repeat expansions from PCR-free whole-genome sequence data, Genome Research 2017
-
Egor Dolzhenko, Viraj Deshpande, Felix Schlesinger, Peter Krusche, Roman Petrovski, and others, ExpansionHunter: A sequence-graph based tool to analyze variation in short tandem repeat regions, Bioinformatics 2019