• Stars
    star
    327
  • Rank 123,783 (Top 3 %)
  • Language
    Perl
  • License
    GNU General Publi...
  • Created almost 10 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

๐Ÿ”Ž ๐Ÿ’Š Mass screening of contigs for antimicrobial and virulence genes

Build Status License: GPL v2 Don't judge me

ABRicate

Mass screening of contigs for antimicrobial resistance or virulence genes. It comes bundled with multiple databases: NCBI, CARD, ARG-ANNOT, Resfinder, MEGARES, EcOH, PlasmidFinder, Ecoli_VF and VFDB.

Is this the right tool for me?

  1. It only supports contigs, not FASTQ reads
  2. It only detects acquired resistance genes, NOT point mutations
  3. It uses a DNA sequence database, not protein
  4. It needs BLAST+ >= 2.7 and any2fasta to be installed
  5. It's written in Perl ๐Ÿซ

If you are happy with the above, then please continue! Otherwise consider using Ariba, Resfinder, RGI, SRST2, AMRFinderPlus, etc.

Quick Start

% abricate 6159.fasta
Using database resfinder:  2130 sequences -  Mar 17, 2017
Processing: 6159.fna
Found 3 genes in 6159.fna
#FILE     SEQUENCE     START   END     STRAND GENE     COVERAGE     COVERAGE_MAP     GAPS  %COVERAGE  %IDENTITY  DATABASE  ACCESSION  PRODUCT        RESISTANCE
6159.fna  NC_017338.1  39177   41186   +      mecA_15  1-2010/2010  ===============  0/0   100.00     100.000    ncbi      AB505628   n/a	     FUSIDIC_ACID
6159.fna  NC_017338.1  727191  728356  -      norA_1   1-1166/1167  ===============  0/0   99.91      92.367     ncbi      M97169     n/a            FOSFOMYCIN
6159.fna  NC_017339.1  10150   10995   +      blaZ_32  1-846/846    ===============  0/0   100.00     100.000    ncbi      AP004832   betalactamase  BETA-LACTAM;PENICILLIN

Installation

Brew

If you are using the MacOS Homebrew or LinuxBrew packaging system:

brew install brewsci/bio/abricate
abricate --check
abricate --list

Bioconda

If you use Conda follow the instructions to add the Bioconda channel:

conda install -c conda-forge -c bioconda -c defaults abricate
abricate --check
abricate --list

Source

If you install from source, Abricate has the following package dependencies:

  • any2fasta for sequence file format conversion
  • BLAST+ >2.7.0 for blastn, makeblastdb, blastdbcmd
  • Perl modules: LWP::Simple, Bio::Perl, JSON, Path::Tiny
  • git, unzip, gzip for updating databases

Most of these are easy to install on an Ubuntu-based system:

sudo apt-get install bioperl ncbi-blast+ gzip unzip git \
  libjson-perl libtext-csv-perl libpath-tiny-perl liblwp-protocol-https-perl libwww-perl
git clone https://github.com/tseemann/abricate.git
./abricate/bin/abricate --check
./abricate/bin/abricate --setupdb
./abricate/bin/abricate ./abricate/test/assembly.fa

Input

Abricate takes any sequence file that any2fasta can convert to FASTA files (eg. Genbank, EMBL), and they can be optionally gzip or bzip2 compressed.

abricate assembly.fa 
abricate assembly.fa.gz
abricate assembly.gbk 
abricate assembly.gbk.bz2

It can take multiple files at once too:

abricate assembly.*
abricate /mnt/ncbi/bacteria/*.gbk.gz 

Or you can provide it a "file of file names" (a "FOFN"):

% cat test/fofn.txt

assembly.fa
assembly.fa.gz
assembly.gbk
assembly.gbk.bz2

% abricate --fofn test/fofn.txt

It does not accept raw FASTQ reads; please use Ariba or SRTS2 for that.

Output

Abricate produces a tap-separated output file with the following columns:

Column Example Description
FILE Ecoli.fna The filename this hit came from
SEQUENCE contig000324 The sequence in the filename
START 23423 Start coordinate in the sequence
END 24117 End coordinate
STRAND + Strand + or -
GENE tet(M) AMR gene name
COVERAGE 1-1920/1920 What proportion of the gene is in our sequence
COVERAGE_MAP =============== A visual represenation of the hit. ==aligned, .=unaligned, /=has_gaps
GAPS 1/4 Openings / gaps in subject and query - possible psuedogene?
%COVERAGE 100.00% Proportion of gene covered
%IDENTITY 99.95% Proportion of exact nucleotide matches
DATABASE ncbi The database this sequence comes from
ACCESSION NC_009632:49744-50476 The genomic source of the sequence
PRODUCT aminoglycoside O-phosphotransferase APH(3')-IIIa Gene product (if available)
RESISTANCE TETRACYCLINE;FUSIDIC_ACID putative antibiotic resistance phenotype, ;-separated

Caveats

  • Does not find mutational resistance, only acquired genes.
  • Gap reporting incomplete
  • Sometimes two heavily overlapping genes will be reported for the same locus
  • Possible coverage calculation issues

Databases

ABRicate comes with some pre-downloaded databases:

You can check what you have installed with the --list command. This lists the available databases in TSV (or CSV with --csv) and three columns:

% abricate --list

DATABASE       SEQUENCES  DBTYPE  DATE
argannot       1749       nucl    2019-Jul-28
card           2241       nucl    2019-Jul-28
ecoh           597        nucl    2019-Jul-28
ecoli_vf       2701       nucl    2019-Jul-28
megares        6635       nucl    2020-Feb-20
ncbi           4324       nucl    2019-Jul-28
plasmidfinder  263        nucl    2019-Jul-28
resfinder      2434       nucl    2019-Jul-28
vfdb           2597       nucl    2019-Jul-28

The default database is ncbi. You can choose a different database using the --db option:

% abricate --db vfdb --quiet 6159.fa

6159.fna  NC_017338.1  2724620  2726149  aur      1-1530/1530     ===============  0/0    100.00     99.346     vfdb      NP_647375	zinc metalloproteinase aureolysin
6159.fna  NC_017338.1  2766595  2767155  icaR     1-561/561       ===============  0/0    100.00     98.930     vfdb      NP_647402	N-acetylglucosaminyltransferase
6159.fna  NC_017338.1  2767319  2768557  icaA     1-1239/1239     ===============  0/0    100.00     99.677     vfdb      NP_647403	n/a
6159.fna  NC_017338.1  2768521  2768826  icaD     1-306/306       ===============  0/0    100.00     99.020     vfdb      NP_647404	n/a
6159.fna  NC_017338.1  2768823  2769695  icaB     1-873/873       ===============  0/0    100.00     99.542     vfdb      NP_647405	n/a
6159.fna  NC_017338.1  2769682  2770734  icaC     1-1053/1053     ===============  0/0    100.00     98.955     vfdb      NP_647406	n/a
6159.fna  NC_017338.1  2771040  2773085  lip      1-2046/2046     ===============  0/0    100.00     98.778     vfdb      NP_647407	triacylglycerol lipase precursor

Combining reports across samples

ABRicate can combine results into a simple matrix of gene presence/absence. An absent gene is denoted . and a present gene is represented by its '%COVERAGE`. This can be individual abricate reports, or a combined one.

# Run abricate on each .fa file
% abricate 1.fna > 1.tab
% abricate 2.fna > 2.tab

# Combine
% abricate --summary 1.tab 2.tab

#FILE     NUM_FOUND  aac(6')-aph(2'')_1  aadD_1  blaZ_32  blaZ_36  erm(A)_1  mecA_15  norA_1  spc_1  tet(M)_7
1.tab     8          100.00              100.00  .        100.00   100.00    100.00   99.91   100.00  100.00
2.tab     3          .                   .       100.00   .        .         100.00   99.91   .       .

Or if you ran everything in a single report, it will work too.

% abricate *.fna > results.tab
% abricate --summary results.tab > summary.tab

Updating the databases

# force download of latest version
% abricate-get_db --db ncbi --force

# re-use existing download and just regenerate the database
% abricate-get_db --db ncbi

Making your own database

Let's say you want to make your own database called tinyamr. All you need is a FASTA file of nucleotide sequences, say tinyamr.fa. Ideally the sequence IDs would have the format >DB~~~ID~~~ACC~~~RESISTANCES DESC where DB is tinyamr, ID is the gene name, ACC is an accession number of the sequence source, RESISTANCES is the phenotype(s) to report, and DESC can be any textual description.

% cd /path/to/abricate/db     # this is the --datadir default option
% mkdir tinyamr
% cp /path/to/tinyamr.fa sequences
% head -n 1 sequences
>tinyamr~~~GENE_ID~~~GENE_ACC~~RESISTANCES some description here
% abricate --setupdb
% # or just do this: makeblastdb -in sequences -title tinyamr -dbtype nucl -hash_index

% abricate --list
DATABASE  SEQUENCES  DBTYPE  DATE
tinyamr   173        nucl    2019-Aug-28

% abricate --db tinyamr screen_this.fasta

Etymology

The name "ABRicate" was chosen as the first 3 letters are a common acronym for "Anti-Biotic Resistance". It also has the form of an English verb, which suggests the tool actual taking "action" against the problem of antibiotic resistance. It is also relatively unique in Google, and is unlikely to receive an infamous JABBA Award.

Citation

If you publish the results of Abricate please cite both the software and the appropriate database you used with --db

Issues

Please report problems to the Issues Page.

License

GPLv2

Author

Torsten Seemann | @torstenseemann | blog

More Repositories

1

prokka

โšก โ™’ Rapid prokaryotic genome annotation
Perl
746
star
2

snippy

โœ‚๏ธ โšก Rapid haploid variant calling and core genome alignment
Perl
432
star
3

barrnap

๐Ÿ”ฌ โ™Œ Bacterial ribosomal RNA predictor
Perl
200
star
4

shovill

โšกโ™ ๏ธ Assemble bacterial isolate genomes from Illumina paired-end reads
Perl
199
star
5

mlst

๐Ÿ†” Scan contig files against PubMLST typing schemes
Shell
177
star
6

nullarbor

๐Ÿ’พ ๐Ÿ“ƒ "Reads to report" for public health and clinical microbiology
Perl
125
star
7

any2fasta

Convert various sequence formats to FASTA
Perl
115
star
8

snp-dists

Pairwise SNP distance matrix from a FASTA sequence alignment
C
110
star
9

VelvetOptimiser

๐Ÿ“ˆ Automatically optimise three of Velvet's assembly parameters.
Perl
47
star
10

samclip

Filter SAM file for soft and hard clipped alignments
Perl
44
star
11

phastaf

Identify phage regions in bacterial genomes for masking purposes
Perl
29
star
12

seeka

Get microbial sequence data easier and faster
Perl
28
star
13

homebrew-bioinformatics-linux

๐Ÿบ ๐Ÿง Homebrew formulae for bioinformatics software only available for Linux
Ruby
27
star
14

berokka

๐ŸŠ ๐Ÿ’ซ Trim, circularise and orient long read bacterial genome assemblies
Perl
25
star
15

ekidna

Assembly based core genome SNP alignments for bacteria
Perl
25
star
16

cgmlst-dists

๐Ÿปโ‡”๐Ÿจ Calculate distance matrix from ChewBBACA cgMLST allele call tables
C
23
star
17

sixess

๐Ÿ”ฌ๐Ÿ› Rapid 16s rRNA identification from isolate FASTQ files
Shell
23
star
18

PEAR

Pair-End AssembeR
C
22
star
19

mokka

Annotate your metagenome assemblies
11
star
20

scapper

Whole genome core alignments from multiple draft genomes
Perl
10
star
21

kounta

๐Ÿงฎ ๐Ÿ”ข Generate multi-sample k-mer count matrix from WGS
Perl
9
star
22

snasm

Assembly based core SNP alignments
Perl
7
star
23

trencha

Normalize VCF depth for Illumina GC bias
Perl
7
star
24

legsta

๐Ÿ—โญ In silico Legionella pneumophila Sequence Based Typing
Perl
7
star
25

tseemann.github.io

Torsten Seemann's Home Page
HTML
7
star
26

noary

๐Ÿฃ ๐Ÿฆ A lightweight nucleotide bacterial ortholog clustering tool
Perl
7
star
27

wombac

โ€ผ๏ธ Rapid core genome SNP alignments from multiple bacterial genomes
Perl
7
star
28

klosham

Find closest aligned sequences to a query sequnece
C
6
star
29

kopynumba

Identify copy number variation in bacterial Illumina sequences
6
star
30

spekki

Species prediction from NGS reads
Python
5
star
31

fasterqc

A non-Java alternative to the classic FastQC tool
Perl
5
star
32

ragnarokka

Annotate and correct erro-prone ONT genomes
5
star
33

polisha

Fix small assembly errors using Illumina reads
Perl
5
star
34

polyfix

๐Ÿ”ชโ›“๏ธ Repair nanopore assemblies using related genome(s)
Perl
5
star
35

skrofula

Yet another M.tuberculosis typing and resistance tool, but for the impatient (not in-patient)
Perl
5
star
36

varion

5
star
37

injecta

Insert genes into genomes to aid synthetic test data generation
Perl
4
star
38

kurra

Fast whole genome phylogeny
4
star
39

heterik

Estimate heterozygosity or mixture level of a bacterial WGS sample
4
star
40

bowkaster

cgMLST from FASTQ reads
4
star
41

dehomopolymerate

Collapse sequence homopolymers to a single character
C
4
star
42

babykraken

๐Ÿ‘ถ๐Ÿฆ‘ Very small Kraken2 database for bundling with pipelines
4
star
43

perl-biotool

๐Ÿซ ๐Ÿช Small pure Perl5 libraries for writing command line bioinformatics tools
Perl
3
star
44

anthrakks

Distinguish Bacillus cereus and biovar anthracis (anthrax)
3
star
45

bioinfo-scripts

Collection of bioinformatics utility scripts, mostly written in Bioperl
Perl
3
star
46

gbk2bcfgff

Convert Genbank to GFF compatible with "bcftools csq"
2
star
47

easy-web-blast

2
star
48

mini-outbreak

Small WGS dataset for testing bacterial outbreak analysis pipelines
2
star
49

snippa

Experimental modular bacterial SNP calling pipeline
Perl
2
star
50

vikka

Viral genomics toolkit for pandemics
1
star
51

gard

๐Ÿ† ๐Ÿ’Š Gonococcal Antimicrobial Resistance Detection
1
star
52

wtfq

Duplicate FASTQ reads to address undersequenced regions
C
1
star
53

simuvar

Simulate variants of bacterial genomes for testing SNP callers
1
star
54

coginator

Assign COGs to protein sequences
1
star
55

skrilla

It ain't all about skrilla
1
star
56

arborkart

Phylogenomic trees with maps for the web
1
star
57

assembill

Simple script to clip, assemble, tile and annotate a bacterial genome from Illumina reads
Shell
1
star
58

kroucha

Mock repository for Sanger publications citing Croucher et al
1
star