• Stars
    star
    136
  • Rank 261,730 (Top 6 %)
  • Language
    Perl
  • License
    GNU General Publi...
  • Created about 8 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

🌳 Create a tree using Mash distances

mashtree

DOI Build Status

Create a tree using Mash distances.

For simple usage, see mashtree --help. This is an example command:

mashtree *.fastq.gz > tree.dnd

For confidence values, run either with --help: mashtree_bootstrap.pl or mashtree_jackknife.pl.

Two modes: fast or accurate

Input files: fastq files are interpreted as raw read files. Fasta, GenBank, and EMBL files are interpreted as genome assemblies. Compressed files are also accepted of any of the above file types. You can compress with gz, bz2, or zip.

Output files: Newick (.dnd). If --outmatrix is supplied, then a distance matrix too.

See the documentation on the algorithms for more information.

Faster

mashtree --numcpus 12 *.fastq.gz [*.fasta] > mashtree.dnd

More accurate

You can get a more accurate tree with the minimum abundance finder. Simply give --mindepth 0. This step helps ignore very unique kmers that are more likely read errors.

mashtree --mindepth 0 --numcpus 12 *.fastq.gz [*.fasta] > mashtree.dnd

Adding confidence values

Mashtree can add confidence values using jack knifing. For each jack knife tree, 50% of hashes are used. Confidence values are calculated from the jack knife trees using BioPerl. When using this method, you can pass flags to mashtree using the double-dash like in the example below.

Added in version 0.40.

mashtree_jackknife.pl --reps 100 --numcpus 12 *.fastq.gz -- --min-depth 0 > mashtree.jackknife.dnd
mashtree_jackknife.pl --help # additional usage help

Bootsrapping was added in version 0.55. This runs mashtree itself multiple times, each with a random seed.

mashtree_bootstrap.pl --reps 100 --numcpus 12 *.fastq.gz -- --min-depth 0 > mashtree.bootstrap.dnd

Usage

Usage: mashtree [options] *.fastq *.fasta *.gbk *.msh > tree.dnd
NOTE: fastq files are read as raw reads;
      fasta, gbk, and embl files are read as assemblies;
      Input files can be gzipped.
--tempdir            ''   If specified, this directory will not be
                          removed at the end of the script and can
                          be used to cache results for future
                          analyses.
                          If not specified, a dir will be made for you
                          and then deleted at the end of this script.
--numcpus            1    This script uses Perl threads.
--outmatrix          ''   If specified, will write a distance matrix
                          in tab-delimited format
--file-of-files           If specified, mashtree will try to read
                          filenames from each input file. The file of
                          files format is one filename per line. This
                          file of files cannot be compressed.
--outtree                 If specified, the tree will be written to
                          this file and not to stdout. Log messages
                          will still go to stderr.
--version                 Display the version and exit

TREE OPTIONS
--truncLength        250  How many characters to keep in a filename
--sort-order         ABC  For neighbor-joining, the sort order can
                          make a difference. Options include:
                          ABC (alphabetical), random, input-order

MASH SKETCH OPTIONS
--genomesize         5000000
--mindepth           5    If mindepth is zero, then it will be
                          chosen in a smart but slower method,
                          to discard lower-abundance kmers.
--kmerlength         21
--sketch-size        10000

Installation

Please see INSTALL.md

Further documentation

For more information and help please see the docs folder

For more information on plugins, see the plugins folder. (in development)

For more information on contributions, please see CONTRIBUTING.md.

References

Citation

JOSS

Katz, L. S., Griswold, T., Morrison, S., Caravas, J., Zhang, S., den Bakker, H.C., Deng, X., and Carleton, H. A., (2019). Mashtree: a rapid comparison of whole genome sequence files. Journal of Open Source Software, 4(44), 1762, https://doi.org/10.21105/joss.01762

Poster

Katz, L. S., Griswold, T., & Carleton, H. A. (2017, October 8-11). Generating WGS Trees with Mashtree. Poster presented at the American Society for Microbiology Conference on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatic Pipelines, Washington, DC. Poster number 27.

More Repositories

1

fasten

πŸ‘· Fasten toolkit, for streaming operations on fastq files
Rust
75
star
2

awesome-bioinformatics-education

resources for bioinformatics education
63
star
3

Kalamari

πŸ™ A curated database of completed assemblies with taxonomy IDs
Perl
38
star
4

awesome-bioinformatics-jobs

Resources for bioinformatics jobs
31
star
5

lyve-SET

πŸ’ƒ 🌴 LYVE-SET, a method of using hqSNPs to create a phylogeny, especially for outbreak investigations
Perl
23
star
6

lskScripts

A placeholder for all my random scripts. Some scripts might eventually graduate and go to their own projects, so don't be surprised if anything leaves in the future.
Perl
15
star
7

SneakerNet

🐾 QA/QC pipeline for a MiSeq/HiSeq/Ion Torrent/assembly-only run
Perl
12
star
8

pdtk

NCBI Pathogen Detection Portal toolkit
Perl
9
star
9

CG-Pipeline

Genome assembly/prediction/annotation pipeline for the Linux command line
Perl
7
star
10

annapotater

Quick and dirty SNP annotator
Perl
5
star
11

mlst-hash-template

This is a template for any new hash-based MLST database
Perl
5
star
12

is-it-down

4
star
13

BookAI

Machine learning for making my novel
Perl
4
star
14

taxdb

Manipulate taxonomy databases
Perl
4
star
15

lskatz.github.io

Website for Lee Katz
JavaScript
4
star
16

lyve-KSNP

a set of wrapper scripts for KSNP
Shell
3
star
17

hashest

estimate MLST with hashes
Perl
3
star
18

cgMLST-comparison

Jupyter Notebook
3
star
19

naughty-binfie-files

Shell
2
star
20

container-test

Test building containers
Dockerfile
2
star
21

file-find-fast

File::Find::Fast
Perl
2
star
22

advent-of-code

https://adventofcode.com/
Perl
2
star
23

hash2seq

Given a reference sequence and a hash, reverse engineer what the sequence is
Perl
2
star
24

MLST-CLI

Make a database of wgMLST alleles and manipulate it
Perl
2
star
25

readTaxonomy

Methods in various languages for how to create a fast database from NCBI Taxonomy
Perl
2
star
26

perl-app-hump

🐫 A perl module for workflow dependencies.
Perl
2
star
27

nextflow-perl

a very superficial attempt to replace nextflow's java engine with perl
Perl
2
star
28

SARS-CoV-2-trueTree

True tree for a SARS-CoV-2 phylogeny
1
star
29

ROSS

ROSS - Random Operations on Sequences Suite
Perl
1
star
30

perl-mash

perl module for Mash sketches
Perl
1
star
31

template-perl

A simple template for perl projects
Perl
1
star
32

test-github-cache

Testing github actions caching
1
star
33

Bio--Kmer

A perl module for kmer counting
Perl
1
star
34

Suffix--trie

Pure Perl suffix trie
Perl
1
star
35

sha256emoji

Create an emoji hashsum from a file
Rust
1
star
36

biophp-at-ga-tech

Bioinformatics tools for PHP
1
star
37

Schedule--SGELK

Perl module for scheduling tasks, with only using qsub/qstat/qdel. A very portable scheduler.
Perl
1
star
38

callsam

Call variants on sam files
Perl
1
star
39

any2index

Perl
1
star
40

biophp

Lee Katz's BioPHP
PHP
1
star
41

taco-bell-locations

Print zip codes of taco bell locations.
Perl
1
star
42

what3wordsRegion

Give a list of "what 3 words" for a region
Perl
1
star
43

ncbiClusters

Perl
1
star
44

Lyve-SET-paper

Set of scripts used in the Lyve-SET paper
Perl
1
star
45

ani-m

Average Nucleotide Identity analysis with MUMmer under the hood
Perl
1
star
46

bio-js

Lee Katz's original bio-javascript library
HTML
1
star
47

spaceballsTheFacemask

For Emma
Python
1
star
48

portableBioinformatics

A set of tools
JavaScript
1
star
49

lyve-MLST

A module for typing whole genomes, given a BigsDB-style MLST database.
Shell
1
star
50

pp-file-magic

Pure perl file magic
Perl
1
star
51

cronFileMoving

A sort of example repo for showing how to do automatic file copying safely
Shell
1
star