• Stars
    star
    313
  • Rank 133,714 (Top 3 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created about 7 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A comparison of different Oxford Nanopore basecallers

logo

Performance of neural network basecalling tools for Oxford Nanopore sequencing

Ryan R. Wick1, Louise M. Judd1 and Kathryn E. Holt1,2
1. Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Victoria 3004, Australia
2. London School of Hygiene & Tropical Medicine, London WC1E 7HT, UK

logo

DOI

This repository contains the scripts used in the preparation of our manuscript on basecalling performance:
Wick RR, Judd LM, Holt KE. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biology. 2019;20(1):129.

In August 2019, I put a small addendum to this paper on GitHub which looks at a more recent version of Guppy as well as some different polishing strategies:
github.com/rrwick/August-2019-consensus-accuracy-update

Previous versions of this repository contained the analysis results here in the README, but the current results are now in that manuscript and this repo just holds the scripts associated with the analysis. These scripts assume you're running on Ubuntu 16.04. They make work on other OSs, but no guarantees!

If you're still interested in the older results, here is a link to the earlier version of this repo: Comparison of Oxford Nanopore basecalling tools.

Basecalling

Before you analyse a read set, you must generate the read set! The basecalling_scripts directory contains Bash scripts with the loops/commands I used to run the various basecallers. You'll need to edit the paths at the top of these scripts before running them.

Custom training of basecallers

The sloika_training_scripts directory contains the commands we used to train the custom-Kp and custom-Kp-big-net models using our fork of Sloika.

We used many different isolates in our training set, so the per-isolate_commands.sh script contains the commands which must be run separately for each of them.

After the prepartory work is done, the model can be trained with the commands in training_commands.sh.

Read set analysis

The analysis_scripts directory contains the scripts for processing and generating accuracy measurements from read sets. Before the analysis, the reads must be given consistent names, as different basecallers have different conventions for the fastq headers. The fix_read_names.py script will convert a read fastq into a format suitable for the next step.

analysis.sh is the 'master script' that will run all analyses on a given read set: read-level accuracy, assembly, assembly-level accuracy, nanopolish and nanopolish-level accuracy. It will use the other scripts in its execution. You also might want to edit some of the variables at the start of the script to change things like the output directories and the number of CPU threads. You can also comment out parts of this script if you only want to run some of the analyses.

License

GNU General Public License, version 3

More Repositories

1

Bandage

a Bioinformatics Application for Navigating De novo Assembly Graphs Easily
C++
582
star
2

Unicycler

hybrid assembly pipeline for bacterial genomes
C++
559
star
3

Porechop

adapter trimmer for Oxford Nanopore reads
C++
337
star
4

Trycycler

A tool for generating consensus long-read assemblies for bacterial genomes
Python
306
star
5

Filtlong

quality filtering tool for long reads
C++
285
star
6

Long-read-assembler-comparison

Benchmarking of long-read assembly tools for bacterial whole genomes
Python
170
star
7

Badread

a long read simulator that can imitate many types of read problems
Python
168
star
8

Polypolish

a short-read polishing tool for long-read assemblies
Rust
143
star
9

Deepbinner

a signal-level demultiplexer for Oxford Nanopore reads
Python
124
star
10

Perfect-bacterial-genome-tutorial

Python
118
star
11

Metagenomics-Index-Correction

Python
78
star
12

Bacsort

a collection of scripts for organising bacterial genomes by species
Python
76
star
13

Minipolish

A tool for Racon polishing of miniasm assemblies
Python
72
star
14

Assembly-Dereplicator

A tool for removing redundant genomes from a set of assemblies
Python
68
star
15

August-2019-consensus-accuracy-update

A short analysis of Oxford Nanopore consensus accuracy for bacterial genome assemblies
Python
58
star
16

Verticall

Recombination-free trees
Python
56
star
17

Rebaler

reference-based long read assemblies of bacterial genomes
Python
47
star
18

MinION-desktop

Scripts and programs for the Holt Lab's MinION desktop
Python
32
star
19

Bacterial-genome-assemblies-with-multiplex-MinION-sequencing

Shell
32
star
20

Core-SNP-filter

a tool to filter sites in a FASTA-format whole-genome pseudo-alignment
Rust
30
star
21

Fast5-to-Fastq

A simple tool for extracting reads from Oxford Nanopore fast5 files
Python
26
star
22

Compare-annotations

A script for comparing old vs new versions of genome annotations
Python
20
star
23

LinesOfCodeCounter

A Python script to count lines of code in a directory for specific file extension, excluding blank/comment lines
Python
18
star
24

Catpac

a Contig Alignment Tool for Pairwise Assembly Comparison
Python
12
star
25

Small-plasmid-Nanopore

Python
11
star
26

DASCRUBBER-wrapper

Wrapper script for easier read scrubbing with DASCRUBBER
Python
10
star
27

GFA-dead-end-counter

a tool for counting dead ends in GFA assembly graphs
Rust
9
star
28

SPAdes-Contig-Graph

a tool for creating a FASTG contig graph from a SPAdes assembly
Python
9
star
29

Langtons-Ant-Animator

Program for creating Langton's Ant animations
C++
8
star
30

Klebsiella-assembly-species

a tool for assigning species to Klebsiella assemblies
Python
8
star
31

Circular-Contig-Extractor

Python
8
star
32

ONT-assembler-benchmark

Python
5
star
33

KleborateModular

A modular rewrite of Kleborate
Python
4
star
34

SRST2-table-from-assemblies

This is a tool for conducting a gene screen on assemblies, producing an SRST2-like output.
Python
3
star
35

IDBA-to-GFA

Python
3
star
36

Grovolve

Demonstration of evolution by natural selection
C++
3
star
37

Trycycler-paper

Supplementary figures, tables and scripts for the Trycycler paper
Python
3
star
38

Nanopore-barcode-binner

C++
3
star
39

Nanopore-read-processor

A script for sorting, assessing and converting Oxford Nanopore reads
Python
2
star
40

Adapter-assembler

C++
2
star
41

Bugraft

Demonstration of speciation and descent from a common ancestor
C++
2
star
42

Unicycler-assembly-tests

Shell
1
star
43

MLST-from-SRST2

This tool uses a table of compiled results from SRST2 to create an MLST-like scheme.
Python
1
star
44

SPAdes-completion-checker

Tool to assess SPAdes assembly graph paths using read depth
Python
1
star
45

Irsat

Iterative Read Subset Assembly Tool
Python
1
star
46

Polypolish-paper

Supplementary figures, tables and scripts for the Polypolish paper
Python
1
star
47

rrwick.github.io

SCSS
1
star