• Stars
    star
    136
  • Rank 267,670 (Top 6 %)
  • Language
    Perl
  • License
    GNU General Publi...
  • Created over 8 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

🌳 Create a tree using Mash distances

mashtree

DOI Build Status

Create a tree using Mash distances.

For simple usage, see mashtree --help. This is an example command:

mashtree *.fastq.gz > tree.dnd

For confidence values, run either with --help: mashtree_bootstrap.pl or mashtree_jackknife.pl.

Two modes: fast or accurate

Input files: fastq files are interpreted as raw read files. Fasta, GenBank, and EMBL files are interpreted as genome assemblies. Compressed files are also accepted of any of the above file types. You can compress with gz, bz2, or zip.

Output files: Newick (.dnd). If --outmatrix is supplied, then a distance matrix too.

See the documentation on the algorithms for more information.

Faster

mashtree --numcpus 12 *.fastq.gz [*.fasta] > mashtree.dnd

More accurate

You can get a more accurate tree with the minimum abundance finder. Simply give --mindepth 0. This step helps ignore very unique kmers that are more likely read errors.

mashtree --mindepth 0 --numcpus 12 *.fastq.gz [*.fasta] > mashtree.dnd

Adding confidence values

Mashtree can add confidence values using jack knifing. For each jack knife tree, 50% of hashes are used. Confidence values are calculated from the jack knife trees using BioPerl. When using this method, you can pass flags to mashtree using the double-dash like in the example below.

Added in version 0.40.

mashtree_jackknife.pl --reps 100 --numcpus 12 *.fastq.gz -- --min-depth 0 > mashtree.jackknife.dnd
mashtree_jackknife.pl --help # additional usage help

Bootsrapping was added in version 0.55. This runs mashtree itself multiple times, each with a random seed.

mashtree_bootstrap.pl --reps 100 --numcpus 12 *.fastq.gz -- --min-depth 0 > mashtree.bootstrap.dnd

Usage

Usage: mashtree [options] *.fastq *.fasta *.gbk *.msh > tree.dnd
NOTE: fastq files are read as raw reads;
      fasta, gbk, and embl files are read as assemblies;
      Input files can be gzipped.
--tempdir            ''   If specified, this directory will not be
                          removed at the end of the script and can
                          be used to cache results for future
                          analyses.
                          If not specified, a dir will be made for you
                          and then deleted at the end of this script.
--numcpus            1    This script uses Perl threads.
--outmatrix          ''   If specified, will write a distance matrix
                          in tab-delimited format
--file-of-files           If specified, mashtree will try to read
                          filenames from each input file. The file of
                          files format is one filename per line. This
                          file of files cannot be compressed.
--outtree                 If specified, the tree will be written to
                          this file and not to stdout. Log messages
                          will still go to stderr.
--version                 Display the version and exit

TREE OPTIONS
--truncLength        250  How many characters to keep in a filename
--sort-order         ABC  For neighbor-joining, the sort order can
                          make a difference. Options include:
                          ABC (alphabetical), random, input-order

MASH SKETCH OPTIONS
--genomesize         5000000
--mindepth           5    If mindepth is zero, then it will be
                          chosen in a smart but slower method,
                          to discard lower-abundance kmers.
--kmerlength         21
--sketch-size        10000

Installation

Please see INSTALL.md

Further documentation

For more information and help please see the docs folder

For more information on plugins, see the plugins folder. (in development)

For more information on contributions, please see CONTRIBUTING.md.

References

Citation

JOSS

Katz, L. S., Griswold, T., Morrison, S., Caravas, J., Zhang, S., den Bakker, H.C., Deng, X., and Carleton, H. A., (2019). Mashtree: a rapid comparison of whole genome sequence files. Journal of Open Source Software, 4(44), 1762, https://doi.org/10.21105/joss.01762

Poster

Katz, L. S., Griswold, T., & Carleton, H. A. (2017, October 8-11). Generating WGS Trees with Mashtree. Poster presented at the American Society for Microbiology Conference on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatic Pipelines, Washington, DC. Poster number 27.

More Repositories

1

fasten

👷 Fasten toolkit, for streaming operations on fastq files
Rust
76
star
2

awesome-bioinformatics-education

resources for bioinformatics education
68
star
3

Kalamari

🐙 A curated database of completed assemblies with taxonomy IDs
Perl
39
star
4

awesome-bioinformatics-jobs

Resources for bioinformatics jobs
33
star
5

lyve-SET

💃 🌴 LYVE-SET, a method of using hqSNPs to create a phylogeny, especially for outbreak investigations
Perl
23
star
6

lskScripts

A placeholder for all my random scripts. Some scripts might eventually graduate and go to their own projects, so don't be surprised if anything leaves in the future.
Perl
15
star
7

SneakerNet

🐾 QA/QC pipeline for a MiSeq/HiSeq/Ion Torrent/assembly-only run
Perl
12
star
8

pdtk

NCBI Pathogen Detection Portal toolkit
Perl
10
star
9

CG-Pipeline

Genome assembly/prediction/annotation pipeline for the Linux command line
Perl
7
star
10

annapotater

Quick and dirty SNP annotator
Perl
5
star
11

mlst-hash-template

This is a template for any new hash-based MLST database
Perl
5
star
12

is-it-down

4
star
13

BookAI

Machine learning for making my novel
Perl
4
star
14

taxdb

Manipulate taxonomy databases
Perl
4
star
15

dists2trees

Generic distances to trees pipeline
Perl
4
star
16

lskatz.github.io

Website for Lee Katz
JavaScript
4
star
17

lyve-KSNP

a set of wrapper scripts for KSNP
Shell
3
star
18

hashest

estimate MLST with hashes
Perl
3
star
19

cgMLST-comparison

Jupyter Notebook
3
star
20

naughty-binfie-files

Shell
2
star
21

container-test

Test building containers
Dockerfile
2
star
22

file-find-fast

File::Find::Fast
Perl
2
star
23

advent-of-code

https://adventofcode.com/
Perl
2
star
24

cecaelia

Earmarking a tool that would read kraken raw output and detect chimera reads or contigs
Perl
2
star
25

hash2seq

Given a reference sequence and a hash, reverse engineer what the sequence is
Perl
2
star
26

MLST-CLI

Make a database of wgMLST alleles and manipulate it
Perl
2
star
27

readTaxonomy

Methods in various languages for how to create a fast database from NCBI Taxonomy
Perl
2
star
28

perl-app-hump

🐫 A perl module for workflow dependencies.
Perl
2
star
29

nextflow-perl

a very superficial attempt to replace nextflow's java engine with perl
Perl
2
star
30

Suffix--trie

Pure Perl suffix trie
Perl
1
star
31

SARS-CoV-2-trueTree

True tree for a SARS-CoV-2 phylogeny
1
star
32

ROSS

ROSS - Random Operations on Sequences Suite
Perl
1
star
33

perl-mash

perl module for Mash sketches
Perl
1
star
34

test-github-cache

Testing github actions caching
1
star
35

sha256emoji

Create an emoji hashsum from a file
Rust
1
star
36

Bio--Kmer

A perl module for kmer counting
Perl
1
star
37

biophp-at-ga-tech

Bioinformatics tools for PHP
1
star
38

template-perl

A simple template for perl projects
Perl
1
star
39

Schedule--SGELK

Perl module for scheduling tasks, with only using qsub/qstat/qdel. A very portable scheduler.
Perl
1
star
40

callsam

Call variants on sam files
Perl
1
star
41

biophp

Lee Katz's BioPHP
PHP
1
star
42

taco-bell-locations

Print zip codes of taco bell locations.
Perl
1
star
43

any2index

Perl
1
star
44

what3wordsRegion

Give a list of "what 3 words" for a region
Perl
1
star
45

ncbiClusters

Perl
1
star
46

Lyve-SET-paper

Set of scripts used in the Lyve-SET paper
Perl
1
star
47

ani-m

Average Nucleotide Identity analysis with MUMmer under the hood
Perl
1
star
48

bio-js

Lee Katz's original bio-javascript library
HTML
1
star
49

spaceballsTheFacemask

For Emma
Python
1
star
50

portableBioinformatics

A set of tools
JavaScript
1
star
51

lyve-MLST

A module for typing whole genomes, given a BigsDB-style MLST database.
Shell
1
star
52

pp-file-magic

Pure perl file magic
Perl
1
star
53

cronFileMoving

A sort of example repo for showing how to do automatic file copying safely
Shell
1
star