• Stars
    star
    102
  • Rank 335,584 (Top 7 %)
  • Language
    Python
  • License
    Other
  • Created almost 7 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Read Until client library for Nanopore Sequencing

Read Until

Adaptive sampling enables a large number of applications, traditionally associated with complex molecular biology methods, to be carried out by the sequencer itself. Adaptive sampling enables the following:

Enrichment: Users can ask the system to enrich for strands that contain a target region of interest, a haplotype of choice or an organism of interest against a complex background

Depletion: Users can reject strands from an organism which is of no interest (e.g. host depletion). In the case of pathogen detection or microbiome applications in human health this could be enabled as a "human filter" ensuring that this sensitive, confidential data is never committed to disk.

Balancing: Users can use adaptive sampling to balance their barcodes, ensuring they achieve target depths for each barcode and also even out coverage across a genome by rejecting strands representing regions of the genome already at their target depth in favour of regions that have lower coverage.

The read until API is provided "as is" as a research tool. Issue reporting has been disabled on the github website; users with questions should go to the Nanopore community and post comments here. Usage currently requires some advanced programming capability. Efforts are ongoing by the Oxford Nanopore team to release simpler versions of this tool enabling more and more users to deploy it successfully.

Please add new feature requests to the feature request pinboard under the tag "Adaptive Sampling" (link)

The Read Until API provides a mechanism for an application to connect to a MinKNOW server to obtain read data in real-time. The data can be analysed in the way most fit for purpose, and a return call can be made to the server to unblock the read in progress.

Read Until Example

Installation

The client requires MinKNOW for MinION 20.06 or later (MinKNOW-Core 4.04).

The package can be installed into a python3 virtual environment For example on Ubuntu:

python3 -m venv read_until_env
source read_until_env/bin/activate
pip install --upgrade pip
# Install from github:
pip install git+https://github.com/nanoporetech/read_until_api
# Or from a local clone
python setup.py install

Two demonstration programs are provided (and are installed into MinKNOW/ont-python/bin/):

  1. read_until_simple: this serves as a simple test, and the code (module read_until.simple) demonstrates use of basic functionality for developers.

  2. read_until_ident: this is a rather more fully featured example, using the API to identify reads via basecalling and alignment. To run it requires the optional dependencies of scrappy and mappy. To use the scrappy basecaller efficiently it is important to set the blas library to be single threaded, this is ordinarily done with:

    export OPENBLAS_NUM_THREADS=1
    

    or similar.

Client Overview

The python Read Until package provides a high level interface to requisite parts of MinKNOW's gRPC interface. Developer's can focus on creating rich analyses, rather than the lower level details of handling the data that MinKNOW provides. The purpose of the read until functionality is to selectively, based on any conceiveable analysis, "unblock" sequencing channels to increases the time spent sequencing analytes of interest. MinKNOW can be requested to send a continuous stream of "read chunks" (of a configurable minimum size), which the client can analyse.

The main client code is located in the read_until.base.ReadUntilClient class, which can be imported as simply:

from read_until import ReadUntilClient

The interface to this class is thoroughly documented, with additional comments throughout for developers who wish to develop their own custom client from the gRPC stream. Developers are encouraged to read the code and inline documentation (a HTML version of which can be built using the docs make target).

The gRPC stream managed by the client is bidirectional: it carries both raw data "read chunks" to the client and "action responses" to MinKNOW. The client implements two queues. The first is the .action_queue and is fairly straight-forward: requests to MinKNOW to unblock channels are temporarily stored here, bundled together and then dispatched.

The second queue is more elaborate, it is implemented in read_until.base.ReadCache. The client stores read chunks here in preparation for analysis. The queue is additionally keyed on channel such that it only ever stores a single chunk from each sequencer channel; thereby protecting consumers of the client from reads which have already ended. A restriction of this approach is that consumers cannot combine data from multiple chunks of the same read. If this behaviour is required, a client can be constructed with an alternative implementation of a ReadCache (passed as a parameter on construction of the ReadUntilClient instance). However since the effectiveness of a read until application depends crucially on the latency of analysis, it is recommended to design analyses which require as little data as possible and set the received chunk size accordingly.

For many developers the details of these queues may be unimportant, at least in getting started. Of more immediate importance are several methods of the ReadUntilClient class:

.run() instruct the class to start retrieving read chunks from MinKNOW.

.get_read_chunks() obtain the most recent data retrieved from MinKNOW.

.unblock_read() request that a read be ejected from a channel.

.stop_recieving_read() request that no more data for a read be sent to the client by MinKNOW. It is not guaranteed that further data will not be sent, and in the general case the client does not filter subsequent data from its consumers (although when the client is created with the one_chunk option, the client will provide additional filtering of the data received from MinKNOW).

Examples of use of the client are given in the codebase, but most simply can be reduced to:

from concurrent.futures import ThreadPoolExecutor
import numpy
from read_until import ReadUntilClient

def analysis(client, *args, **kwargs):
    while client.is_running:
        for channel, read in client.get_read_chunks():
            raw_data = numpy.fromstring(read.raw_data, client.signal_dtype)
            # do something with raw data... and maybe call:
            #    client.stop_receiving_read(channel, read.number)
            #    client.unblock_read(channel, read.number)

read_until_client = ReadUntilClient()
read_until_client.run()
with ThreadPoolExecutor() as executor:
    executor.submit(analysis, read_until_client)

Extending the client

The ReadUntilClient class has been implemented to provide an abstraction which does not require an in-depth knowledge of the MinKNOW gRPC interface. To extend the client however some knowledge of the messages passed between MinKNOW and a client is required. Whilst the provided client shows how to contruct and decode basic messages, the following (an extract from Protocol Buffers definition files) serves as a more complete reference.

Messages sent from a client to MinKNOW

message GetLiveReadsRequest {
    enum RawDataType {
        // Don't change the previously specified setting for raw data sent
        // with live reads note: If sent when there is no last setting, NONE
        // is assumed.
        KEEP_LAST = 0;
        // No raw data required for live reads
        NONE = 1;
        // Calibrated raw data should be sent to the user with each read
        CALIBRATED = 2;
        // Uncalibrated data should be sent to the user with each read
        UNCALIBRATED = 3;
    }

    message UnblockAction {
        // Duration of unblock in seconds.
        double duration = 1;
    }

    message StopFurtherData {}

    message Action {
        string action_id = 1;

        // Channel name to unblock
        uint32 channel = 2;

        // Identifier for the read to act on. If the read requested is no
        // longer in progress, the action fails.
        oneof read { string id = 3; uint32 number = 4; }

        oneof action {
            // Unblock a read and skip further data from this read.
            UnblockAction unblock = 5;

            // Skip further data from this read, doesn't affect the read
            // data.
            StopFurtherData stop_further_data = 6;
        }
    }

    message StreamSetup {
        // The first channel (inclusive) for which to return data. Note
        // that channel numbering starts at 1.
        uint32 first_channel = 1;

        // The last channel (inclusive) for which to return data.
        uint32 last_channel = 2;

        // Specify the type of raw data to retrieve
        RawDataType raw_data_type = 3;

        // Minimum chunk size read data is returned in.
        uint64 sample_minimum_chunk_size = 4;
    }

    message Actions { repeated Action actions = 2; }

    oneof request {
        // Read setup request, initialises channel numbers and type of data
        // returned. Must be specified in the first message sent to MinKNOW.
        // Once MinKNOW has the first setup message reads are sent to the
        // caller as requested. The user can then resend a setup message as
        // frequently as they need to in order to reconfigure live reads -
        // for example by changing if raw data is sent with reads or not.
        StreamSetup setup = 1;

        // Actions to take given data returned to the user - can only be
        // sent once the setup message above has been sent.
        Actions actions = 2;
    }
}

Messages received by a client from MinKNOW

message GetLiveReadsResponse {
    message ReadData {
        // The id of this read, this id is unique for every read ever
        // produced.
        string id = 1;

        // The MinKNOW assigned number of this read. Read numbers always
        // increment throughout the experiment, and are unique per channel,
        // however they are not necessarily contiguous.
        uint32 number = 2;
        
        // Absolute start point of this read
        uint64 start_sample = 3;
        
        // Absolute start point through the experiment of this chunk
        uint64 chunk_start_sample = 4;
        
        // Length of the chunk in samples
        uint64 chunk_length = 5;
        
        // All Classifications given to intermediate chunks by analysis
        repeated int32 chunk_classifications = 6;
        
        // Any raw data selected by the request. The type of the elements
        // will depend on whether calibrated data was chosen. The
        // get_data_types() RPC call should be used to determine the
        // precise format of the data, but in general terms, uncalibrated
        // data will be signed integers and calibrated data will be
        // floating-point numbers.
        bytes raw_data = 7;
        
        // The median of the read previous to this read. intended to allow
        // querying of the approximate level of this read, comapred to the
        // last. For example, a user could try to verify this is a strand be
        // ensuring the median of the current read is lower than the
        // median_before level.
        float median_before = 8;
        
        // The media pA level of this read from all aggregated read chunks
        // so far.
        float median = 9;
    };
    
    message ActionResponse {
        string action_id = 1;
        enum Response { SUCCESS = 0; FAILED_READ_FINISHED = 1; }
        Response response = 2;
    }
    
    // The number of samples collected before the first sample included is
    // this response. This gives the position of the first data point on
    // each channel in the overall stream of data being acquired from the
    // device (since this period of data acquisition was started).
    uint64 samples_since_start = 1;
    
    // The number of seconds elapsed since data acquisition started.
    // This is the same as ``samples_since_start``, but expressed in
    // seconds.
    double seconds_since_start = 2;
    
    // In progress reads for the requested channels. Sparsely populated as
    // not all channels have new/incomplete reads.
    map<uint32, ReadData> channels = 4;
    
    // List of responses to requested actions, informing the caller of
    // results to requested unblocks or discards of data.
    repeated ActionResponse action_reponses = 5;
}

More Repositories

1

dorado

Oxford Nanopore's Basecaller
C++
493
star
2

medaka

Sequence correction provided by ONT Research
Python
411
star
3

bonito

A PyTorch Basecaller for Oxford Nanopore Reads
Python
392
star
4

tombo

Tombo is a suite of tools primarily for the identification of modified nucleotides from raw nanopore sequencing data.
Python
230
star
5

megalodon

Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transriptome.
Python
197
star
6

fast-ctc-decode

Blitzing Fast CTC Beam Search Decoder
Rust
176
star
7

remora

Methylation/modified base calling separated from basecalling.
Python
156
star
8

ont_fast5_api

Oxford Nanopore Technologies fast5 API software
Python
146
star
9

modkit

A bioinformatics tool for working with modified bases
Rust
137
star
10

pod5-file-format

Pod5: a high performance file format for nanopore reads.
C++
131
star
11

taiyaki

Training models for basecalling Oxford Nanopore reads
Python
114
star
12

pipeline-structural-variation

Pipeline for calling structural variations in whole genomes sequencing Oxford Nanopore data
Python
113
star
13

pipeline-transcriptome-de

Pipeline for differential gene expression (DGE) and differential transcript usage (DTU) analysis using long reads
Python
106
star
14

rerio

Research release basecalling models and configurations
Python
102
star
15

flappie

Flip-flop basecaller for Oxford Nanopore reads
C
98
star
16

pomoxis

Analysis components from Oxford Nanopore Research
Python
94
star
17

scrappie

Scrappie is a technology demonstrator for the Oxford Nanopore Research Algorithms group
C
91
star
18

ont-assembly-polish

ONT assembly and Illumina polishing pipeline
Makefile
91
star
19

pychopper

A tool to identify, orient, trim and rescue full length cDNA reads
Python
79
star
20

qcat

qcat is a Python command-line tool for demultiplexing Oxford Nanopore reads from FASTQ files.
Python
77
star
21

jmespath-ts

Typescript translation of the jmespath.js package
TypeScript
63
star
22

wub

Tools and software library developed by the ONT Applications group
Python
61
star
23

minknow_api

Protobuf and gRPC specifications for the MinKNOW API
Python
55
star
24

pore-c

Pore-C support
Python
53
star
25

kmer_models

Predictive kmer models for development use
53
star
26

katuali

Analysis pipelines from Oxford Nanopore Technologies' Research Division
Python
50
star
27

duplex-tools

Splitting of sequence reads by internal adapter sequence search
Python
49
star
28

pinfish

Tools to annotate genomes using long read transcriptomics data
Go
45
star
29

sockeye

Single Cell Transcriptomics
Python
40
star
30

vbz_compression

VBZ compression plugin for nanopore signal data
C++
38
star
31

pipeline-nanopore-ref-isoforms

Pipeline for annotating genomes using long read transcriptomics data with stringtie and other tools
Python
36
star
32

Pore-C-Snakemake

Python
33
star
33

bwapy

Python bindings to bwa mem
Python
31
star
34

ont_tutorial_basicqc

A bioinformatics tutorial demonstrating a best-practice workflow to review a flowcell's sequence_summary.txt
TeX
30
star
35

pyguppyclient

Python client library for Guppy
Python
30
star
36

pipeline-umi-amplicon

Workflow to prepare high accuracy single molecule consensus sequences from amplicon data using unique molecular identifiers
Python
28
star
37

pipeline-pinfish-analysis

Pipeline for annotating genomes using long read transcriptomics data with pinfish
Python
27
star
38

pipeline-nanopore-denovo-isoforms

Pipeline for de novo clustering of long transcriptomic reads
Python
26
star
39

sloika

Sloika is Oxford Nanopore Technologies' software for training neural network models for base calling
Python
25
star
40

fast5_research

Fast5 API provided by ONT Research
Python
21
star
41

pyspoa

Python bindings to spoa
Python
18
star
42

DTR-phage-pipeline

Python
16
star
43

minimappy

Python bindings to minimap2
Python
16
star
44

isONclust2

A tool for de novo clustering of long transcriptomic reads
C++
14
star
45

jmespath-plus

JMESPath with extended collection of built-in functions
TypeScript
14
star
46

minknow_lims_interface

Protobuff and gRPC specifications for the MinKNOW LIMS Interface
13
star
47

fast5mod

Extract modifed base call information from Guppy Fast5 files.
Python
13
star
48

ont_h5_validator

Python
12
star
49

dRNA-paper-scripts

Direct RNA publication scripts
Python
11
star
50

currennt

Modified fork of CURRENNT https://sourceforge.net/projects/currennt/
C++
11
star
51

pipeline-polya-diff

Pipeline for testing shifts in poly(A) tail lengths estimated by nanopolish
Python
9
star
52

ont-open-datasets

Website describing data releases, and providing additional resources.
HTML
9
star
53

pipeline-polya-ng

Pipeline for calling poly(A) tail lengths from nanopore direct RNA data using nanopolish
Python
9
star
54

ts-runtime-typecheck

A collection of common types for TypeScript along with dynamic type cast methods.
TypeScript
9
star
55

epi2me-api

API for communicating with the EPI2ME Platform for nanopore data analysis. Used by EPI2ME Agent & CLI.
TypeScript
9
star
56

cronkite

One **hell** of a reporter
TypeScript
8
star
57

mako

Analyte identification via squiggles.
Python
7
star
58

marine-phage-paper-scripts

Python
6
star
59

homebrew-tap

Homebrew casks for applications from Oxford Nanopore Technologies PLC and Metrichor Ltd.
Ruby
6
star
60

barcoding

NaΓ―ve barcode deconvolution for amplicons
Perl
6
star
61

ont-minimap2

Cross platform builds for minimap2
CMake
5
star
62

plasmid-map

Plasmid map visualisations for Metrichor reports
TypeScript
5
star
63

spliced_bam2gff

Go
5
star
64

hammerpede

A package for training strand-specific profile HMMs for primer sets from real Nanopore data
Python
5
star
65

hatch-protobuf

Hatch plugin for generating Python files from Protocol Buffers .proto files
Python
4
star
66

fastq-filter

Quality and length filter for FastQ data
Python
4
star
67

bripy

Bam Read Index for python
C
3
star
68

pipeline-pychopper

Utility pipeline for running pychopper, a tool to identify full length cDNA reads
Python
3
star
69

lamprey

GUI for desktop basecalling
JavaScript
3
star
70

panga

Python
2
star
71

data-rambler

An experimental language for a JSON query, transformation and streaming
TypeScript
2
star
72

getopt-win32

C
2
star
73

ts-argue

TypeScript
1
star
74

onesie

A Linux device-driver for the MinION-mk1C
C
1
star
75

vbz-h5py-plugin

Python
1
star
76

fs-inspect

node.js library for indexing the contents of a folder
TypeScript
1
star