• Stars
    star
    167
  • Rank 226,635 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created almost 10 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Antimicrobial Resistance Identification By Assembly

ARIBA

Antimicrobial Resistance Identification By Assembly

For how to use ARIBA, please see the ARIBA wiki page.

PLEASE NOTE: we currently do not have the resources to provide support for Ariba - see the Feedback/Issues section.

Unmaintained Build Status License: GPL v3 status install with bioconda Container ready

Contents

Introduction

ARIBA is a tool that identifies antibiotic resistance genes by running local assemblies. It can also be used for MLST calling.

The input is a FASTA file of reference sequences (can be a mix of genes and noncoding sequences) and paired sequencing reads. ARIBA reports which of the reference sequences were found, plus detailed information on the quality of the assemblies and any variants between the sequencing reads and the reference sequences.

Quick Start

Get reference data, for instance from CARD. See getref for a full list.

ariba getref ncbi out.ncbi

Prepare reference data for ARIBA:

ariba prepareref -f out.ncbi.fa -m out.ncbi.tsv out.ncbi.prepareref

Run local assemblies and call variants:

ariba run out.ncbi.prepareref reads1.fastq reads2.fastq out.run

Summarise data from several runs:

ariba summary out.summary out.run1/report1.tsv out.run2/report2.tsv out.run3/report3.tsv

Please read the ARIBA wiki page for full usage instructions.

Tutorials

The Jupyter notebook tutorial

Installation

If you encounter an issue when installing ARIBA please contact your local system administrator. If you encounter a bug you can log it here.

Required dependencies

ARIBA also depends on several Python packages, all of which are available via pip. Installing ARIBA with pip3 will get these automatically if they are not already installed:

  • dendropy >= 4.2.0
  • matplotlib>=3.1.0
  • pyfastaq >= 3.12.0
  • pysam >= 0.9.1
  • pymummer >= 0.10.1
  • biopython

Using pip3

Install ARIBA using pip:

pip3 install ariba

From Source

Download the latest release from this github repository or clone it. Run the tests:

python3 setup.py test

Note for OS X: The tests require gawk which will need to be installed separately, e.g. via Homebrew.

If the tests all pass, install:

python3 setup.py install

Alternatively, install directly from github using:

pip3 install git+https://github.com/sanger-pathogens/ariba.git #--user

Docker

ARIBA can be run in a Docker container. First install Docker, then install the latest version of ARIBA:

docker pull gchr.io/sanger-pathogens/ariba:latest

All Docker images are listed in the packages page.

To use ARIBA use a command like this (substituting in your directories), where your files are assumed to be stored in /home/ubuntu/data:

docker run --rm -it -v /home/ubuntu/data:/data sangerpathogens/ariba ariba -h

When calling Ariba via Docker (as above) you'll also need to add /data/ in front of all the passed in file or directory names (e.g. /data/my_output_folder).

Singularity

ARIBA can be run in a Singularity container. First install Singularity. Releases include a Singularity image to download.

Alternatively, build your own Singularity image:

singularity build ariba.simg Singularity.def

Debian (Ariba version may not be the latest)

ARIBA is available in the latest version of Debian, and over time will progressively filter through to Ubuntu and other distributions which use Debian. To install it as root:

sudo apt-get install ariba

Ubuntu

You can use apt-get (see above), or to ensure you get the latest version of ARIBA, the following commands can be used to install ARIBA and its dependencies. This was tested on a new instance of Ubuntu 16.04.

sudo  apt-get update
sudo apt-get install -y python3-dev python3-pip python3-tk zlib1g-dev bowtie2 mummer cd-hit
export ARIBA_CDHIT=cdhit-est
sudo pip3 install ariba

Dependencies and environment variables

By default, ARIBA will look for the dependencies in your $PATH, using the names in the table below. This behaviour can be overridden and point ARIBA to a specific program using environment variables. The environment variable is checked first and is used if it is set. Otherwise ARIBA looks in your $PATH for the default name. This applies to the following dependencies.

Dependency Default executable Environment variable name
Bowtie2 bowtie2 $ARIBA_BOWTIE2
CD-HIT (est) cd-hit-est $ARIBA_CDHIT

For example, you could specify an exact version of a bowtie2 executable that you compiled and downloaded in your home directory (assuming BASH):

export ARIBA_BOWTIE2=$HOME/bowtie2-2.1.0/bowtie2

Note that ARIBA also runs bowtie2-build, for which it uses the bowtie2 executable with -build appended. So in this case it would try to use

$HOME/bowtie2-2.1.0/bowtie2-build

Temporary files

ARIBA can temporarily make a large number of files whilst running, which are put in a temporary directory made by ARIBA. The total size of these files is small, but there can be a many of them. This can be a problem when running large numbers (100s or 1000s) of jobs simultaneously on the same file system. The parent directory of the temporary directory is determined in the following order of precedence:

  1. The value of the option --tmp_dir (if that option was used)
  2. The environment variable $ARIBA_TMPDIR (if it is set)
  3. The environment variable $TMPDIR (if it is set)
  4. If none of the above is found, then use the run's output directory.

Each temporary directory is unique to one run of ARIBA, and is automatically deleted at the end of the run (even if ARIBA was killed by the user or crashed). For example,

export $ARIBA_TMPDIR=/tmp

will result in the creation of a new directory inside /tmp, which will have a name of the form

/tmp/ariba.tmp.abcdef

where the suffix abcdef is a random string of characters, chosen such that /tmp/ariba.tmp.abcdef does not already exist.

The exception to the above is if the option --noclean is used. This forces the temporary directory to be placed in the output directory, and temporary files are kept. It is intended for debugging.

Usage

usage: ariba <command> <options>

optional arguments:
  -h, --help      show this help message and exit

Available commands:

aln2meta      Converts multi-aln fasta and SNPs to metadata
expandflag    Expands flag column of report file
flag          Translate the meaning of a flag
getref        Download reference data
micplot       Make violin/dot plots using MIC data
prepareref    Prepare reference data for input to "run"
pubmlstget    Download species from PubMLST and make db
pubmlstspecies
	      Get list of available species from PubMLST
refquery      Get cluster or sequence info from prepareref output
run           Run the local assembly pipeline
summary       Summarise multiple reports made by "run"
test          Run small built-in test dataset
version       Get versions and exit

Please read the ARIBA wiki page for full usage instructions.

License

ARIBA is free software, licensed under GPLv3.

Feedback/Issues

We currently do not have the resources to provide support for Ariba. However, the community might be able to help you out if you report any issues about usage of the software to the issues page.

Citation

If you use this software please cite:

ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads Hunt M, Mather AE, Sánchez-Busó L, Page AJ, Parkhill J , Keane JA, Harris SR. Microbial Genomics 2017. doi: 110.1099/mgen.0.000131

More Repositories

1

Roary

Rapid large-scale prokaryote pan genome analysis
Perl
317
star
2

circlator

A tool to circularize genome assemblies
Python
230
star
3

Artemis

Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation
Java
217
star
4

snp-sites

Finds SNP sites from a multi-FASTA alignment file
C
215
star
5

assembly-stats

Get assembly statistics from FASTA and FASTQ files
C++
115
star
6

Fastaq

Python3 scripts to manipulate FASTA and FASTQ files
Python
70
star
7

pathogen-informatics-training

Jupyter Notebook
62
star
8

assembly_improvement

Improve the quality of a denovo assembly by scaffolding and gap filling
Perl
56
star
9

iva

de novo virus assembler of Illumina paired reads
Python
54
star
10

plasmidtron

Assembling the cause of phenotypes and genotypes from NGS data
Python
29
star
11

gff3toembl

Converts Prokka GFF3 files to EMBL files for uploading annotated assemblies to EBI
Python
29
star
12

pymummer

Python3 module for running MUMmer and reading the output
Python
26
star
13

mlst_check

Multilocus sequence typing by blast using the schemes from PubMLST
Perl
24
star
14

saffrontree

SaffronTree: Reference free rapid phylogenetic tree construction from raw read data
Python
24
star
15

Bio-Tradis

A set of tools to analyse the output from TraDIS analyses
Perl
22
star
16

companion

This repository has been archived, currently maintained version is at https://github.com/iii-companion/companion
Lua
21
star
17

seroba

k-mer based Pipeline to identify the Serotype from Illumina NGS reads
Python
19
star
18

panito

Calculate genome wide average nucleotide identity (gwANI) for a multiFASTA alignment
C
16
star
19

nano-rave

Nextflow pipeline designed for rapid onsite QC and variant calling of Oxford Nanopore data (following basecalling and demultiplexing with Guppy).
Nextflow
10
star
20

Bio-RNASeq

The new Sanger Pathogen Informatics RNA Seq analysis pipeline
Perl
8
star
21

update_pipeline

Update a pipelines metadata
Perl
7
star
22

remove_blocks_from_aln

Python
5
star
23

Farmpy

Python3 package to handle job submission to a compute farm
Python
5
star
24

sanger-pathogens.github.io

Summary of Sanger Pathogen's Repos
Python
5
star
25

SnpEffWrapper

Takes a VCF and applies annotations from a GFF using SnpEff
Python
5
star
26

Bio-InterProScanWrapper

Perl
5
star
27

bact-gen-scripts

Python
4
star
28

Bio-ENA-DataSubmission

Perl
4
star
29

mapping-and-snp-calling-training

TeX
3
star
30

Bio-PacbioMethylation

Runs Pacbio methylation pipeline
Perl
3
star
31

unix-training

A set of jupyter notebooks to provide unix training developed by Pathogen Informatics at Wellcome Sanger Institute.
Jupyter Notebook
2
star
32

Bio-Metagenomics

Perl
2
star
33

setup_tracking

setup a vrtracking pipeline
Perl
2
star
34

pipelines_reporting

Perl
1
star
35

baker

A tool to generate configuration files and wrapper scripts
Python
1
star
36

chado-tools

Tools for accessing CHADO databases.
Python
1
star
37

Bio-ReferenceManager

Perl
1
star
38

Bio-AutomatedAnnotation

Perl module to take in an genomic assembly and produce annoation
Perl
1
star
39

fastml

Addtional functionality for fastml, see http://fastml.tau.ac.il
C++
1
star
40

iva-publication

Supplementary scripts and data for the IVA publication
Python
1
star
41

QC-training

Gnuplot
1
star
42

PathFind-training

Jupyter Notebook
1
star
43

Farm_blast

Python3 module to run blast+ or blastall in parallel on an LSF compute farm
Python
1
star
44

assembly-and-annotation-training

TeX
1
star
45

singularity-bsub

Provides wrapper scripts for executing LSF commands within a Singularity container
Shell
1
star
46

monocle

Python
1
star