• Stars
    star
    200
  • Rank 189,204 (Top 4 %)
  • Language
    Perl
  • License
    GNU General Publi...
  • Created almost 11 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

๐Ÿ”ฌ โ™Œ Bacterial ribosomal RNA predictor

Build Status License: GPL v3

Barrnap

BAsic Rapid Ribosomal RNA Predictor

Description

Barrnap predicts the location of ribosomal RNA genes in genomes. It supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), metazoan mitochondria (12S,16S) and eukaryotes (5S,5.8S,28S,18S).

It takes FASTA DNA sequence as input, and write GFF3 as output. It uses the new nhmmer tool that comes with HMMER 3.1 for HMM searching in RNA:DNA style. Multithreading is supported and one can expect roughly linear speed-ups with more CPUs.

Installation

Requirements

Conda

Install Conda or Miniconda:

conda install -c bioconda -c conda-forge barrnap

Homebrew

Install Homebrew (macOS) or Linuxbrew (Linux).

brew install brewsci/bio/barrnap

Source

This will install the latest version direct from Github. You'll need to add the bin directory to your PATH.

cd $HOME
git clone https://github.com/tseemann/barrnap.git
cd barrnap/bin
./barrnap --help

Usage

% barrnap --quiet examples/small.fna
##gff-version 3
P.marinus	barrnap:0.8	rRNA	353314	354793	0	+	.	Name=16S_rRNA;product=16S ribosomal RNA
P.marinus	barrnap:0.8	rRNA	355464	358334	0	+	.	Name=23S_rRNA;product=23S ribosomal RNA
P.marinus	barrnap:0.8	rRNA	358433	358536	7.5e-07	+	.	Name=5S_rRNA;product=5S ribosomal RNA

% barrnap -q -k mito examples/mitochondria.fna 
##gff-version 3
AF346967.1	barrnap:0.8	rRNA	643	1610	.	+	.	Name=12S_rRNA;product=12S ribosomal RNA
AF346967.1	barrnap:0.8	rRNA	1672	3228	.	+	.	Name=16S_rRNA;product=16S ribosomal RNA
  
% barrnap -o rrna.fa < contigs.fa > rrna.gff
% head -n 3 rrna.fa
>16S_rRNA::gi|329138943|tpg|BK006945.2|:455935-456864(-)
ACGGTCGGGGGCATCAGTATTCAATTGTCAGAGGTGAAATTCTTGGATT
TATTGAAGACTAACTACTGCGAAAGCATTTGCCAAGGACGTTTTCATTA

Options

General

  • --help show help and exit
  • --version print version in form barrnap X.Y and exit
  • --citation print a citation and exit

Search

  • --kingdom is the database to use: Bacteria:bac, Archaea:arc, Eukaryota:euk, Metazoan Mitochondria:mito
  • --threads is how many CPUs to assign to nhmmer search
  • --evalue is the cut-off for nhmmer reporting, before further scrutiny
  • --lencutoff is the proportion of the full length that qualifies as partial match
  • --reject will not include hits below this proportion of the expected length

Output

  • --quiet will not print any messages to stderr
  • --incseq will include the full input sequences in the output GFF
  • --outseq creates a FASTA file with the hit sequences

Caveats

Barrnap does not do anything fancy. It has HMM models for each different rRNA gene. They are built from full length seed alignments.

Comparison with RNAmmer

Barrnap is designed to be a substitute for RNAmmer. It was motivated by my desire to remove Prokka's dependency on RNAmmer which is encumbered by a free-for-academic sign-up license, and by RNAmmer's dependence on legacy HMMER 2.x which conflicts with HMMER 3.x that most people are using now.

RNAmmer is more sophisticated than Barrnap, and more accurate because it uses HMMER 2.x in glocal alignment mode whereas NHMMER 3.x currently only supports local alignment (Sean Eddy expected glocal to be supported in 2014, but it still isn't available in 2018).

In practice, Barrnap will find all the typical rRNA genes in a few seconds (in bacteria), but may get the end points out by a few bases and will probably miss wierd rRNAs. The HMM models it uses are derived from Rfam, Silva and RefSeq.

Data sources for HMM models

Bacteria (70S)  
        LSU 50S
                5S      RF00001
                23S     SILVA-LSU-Bac
        SSU 30S
                16S     RF00177

Archaea (70S)   
        LSU 50S
                5S      RF00001
                5.8S    RF00002
                23S     SILVA-LSU-Arc
        SSU 30S
                16S     RF01959

Eukarya (80S)   
        LSU 60S
                5S      RF00001
                5.8S    RF00002
                28S     SILVA-LSU-Euk
        SSU 40S
                18S     RF01960

Metazoan Mito
                12S     RefSeq (MT-RNR1, s-rRNA, rns)
                16S     RefSeq (MT-RNR2, l-rRNA, rnl)       

Models I would like to add

Fungi	[Sajeet Haridas]
        LSU 35S ?
                5S
                5.8S
                25S
        SSU ?
                18S
        Mito [http://www.ncbi.nlm.nih.gov/nuccore/NC_001224.1]
                15S 
                21S (multiple exons)
                
Apicoplast [http://www.ncbi.nlm.nih.gov/nuccore/U87145.2]
                LSU ~2500bp 28S ?
                SSU ~1500bp 16S ?

Plant [Shaun Jackman]
	Mito [https://www.ncbi.nlm.nih.gov/nucleotide?cmd=Retrieve&dopt=GenBank&list_uids=26556996]	
		5S	~118 bp  ?	rrn5 	(use RF00001 ?)
		18S	~1935 bp ?	rrn18	(use RF01960 ?)
		26S	~2568 bp ?	rrn26   

Where does the name come from?

The name Barrnap was originally derived from Bacterial/Archaeal Ribosomal RNA Predictor. However it has since been extended to support mitochondrial and eukaryotic rRNAs, and has been given the new backronym BAsic Rapid Ribosomal RNA Predictor. The project was originally spawned at CodeFest 2013 in Berlin, Germany by Torsten Seemann and Tim Booth.

License

Author

Torsten Seemann

More Repositories

1

prokka

โšก โ™’ Rapid prokaryotic genome annotation
Perl
746
star
2

snippy

โœ‚๏ธ โšก Rapid haploid variant calling and core genome alignment
Perl
432
star
3

abricate

๐Ÿ”Ž ๐Ÿ’Š Mass screening of contigs for antimicrobial and virulence genes
Perl
327
star
4

shovill

โšกโ™ ๏ธ Assemble bacterial isolate genomes from Illumina paired-end reads
Perl
199
star
5

mlst

๐Ÿ†” Scan contig files against PubMLST typing schemes
Shell
177
star
6

nullarbor

๐Ÿ’พ ๐Ÿ“ƒ "Reads to report" for public health and clinical microbiology
Perl
125
star
7

any2fasta

Convert various sequence formats to FASTA
Perl
115
star
8

snp-dists

Pairwise SNP distance matrix from a FASTA sequence alignment
C
110
star
9

VelvetOptimiser

๐Ÿ“ˆ Automatically optimise three of Velvet's assembly parameters.
Perl
47
star
10

samclip

Filter SAM file for soft and hard clipped alignments
Perl
44
star
11

phastaf

Identify phage regions in bacterial genomes for masking purposes
Perl
29
star
12

seeka

Get microbial sequence data easier and faster
Perl
28
star
13

homebrew-bioinformatics-linux

๐Ÿบ ๐Ÿง Homebrew formulae for bioinformatics software only available for Linux
Ruby
27
star
14

berokka

๐ŸŠ ๐Ÿ’ซ Trim, circularise and orient long read bacterial genome assemblies
Perl
25
star
15

ekidna

Assembly based core genome SNP alignments for bacteria
Perl
25
star
16

cgmlst-dists

๐Ÿปโ‡”๐Ÿจ Calculate distance matrix from ChewBBACA cgMLST allele call tables
C
23
star
17

sixess

๐Ÿ”ฌ๐Ÿ› Rapid 16s rRNA identification from isolate FASTQ files
Shell
23
star
18

PEAR

Pair-End AssembeR
C
22
star
19

mokka

Annotate your metagenome assemblies
11
star
20

scapper

Whole genome core alignments from multiple draft genomes
Perl
10
star
21

kounta

๐Ÿงฎ ๐Ÿ”ข Generate multi-sample k-mer count matrix from WGS
Perl
9
star
22

snasm

Assembly based core SNP alignments
Perl
7
star
23

trencha

Normalize VCF depth for Illumina GC bias
Perl
7
star
24

legsta

๐Ÿ—โญ In silico Legionella pneumophila Sequence Based Typing
Perl
7
star
25

tseemann.github.io

Torsten Seemann's Home Page
HTML
7
star
26

noary

๐Ÿฃ ๐Ÿฆ A lightweight nucleotide bacterial ortholog clustering tool
Perl
7
star
27

wombac

โ€ผ๏ธ Rapid core genome SNP alignments from multiple bacterial genomes
Perl
7
star
28

klosham

Find closest aligned sequences to a query sequnece
C
6
star
29

kopynumba

Identify copy number variation in bacterial Illumina sequences
6
star
30

spekki

Species prediction from NGS reads
Python
5
star
31

fasterqc

A non-Java alternative to the classic FastQC tool
Perl
5
star
32

ragnarokka

Annotate and correct erro-prone ONT genomes
5
star
33

polisha

Fix small assembly errors using Illumina reads
Perl
5
star
34

polyfix

๐Ÿ”ชโ›“๏ธ Repair nanopore assemblies using related genome(s)
Perl
5
star
35

skrofula

Yet another M.tuberculosis typing and resistance tool, but for the impatient (not in-patient)
Perl
5
star
36

varion

5
star
37

injecta

Insert genes into genomes to aid synthetic test data generation
Perl
4
star
38

kurra

Fast whole genome phylogeny
4
star
39

heterik

Estimate heterozygosity or mixture level of a bacterial WGS sample
4
star
40

bowkaster

cgMLST from FASTQ reads
4
star
41

dehomopolymerate

Collapse sequence homopolymers to a single character
C
4
star
42

babykraken

๐Ÿ‘ถ๐Ÿฆ‘ Very small Kraken2 database for bundling with pipelines
4
star
43

perl-biotool

๐Ÿซ ๐Ÿช Small pure Perl5 libraries for writing command line bioinformatics tools
Perl
3
star
44

anthrakks

Distinguish Bacillus cereus and biovar anthracis (anthrax)
3
star
45

bioinfo-scripts

Collection of bioinformatics utility scripts, mostly written in Bioperl
Perl
3
star
46

gbk2bcfgff

Convert Genbank to GFF compatible with "bcftools csq"
2
star
47

easy-web-blast

2
star
48

mini-outbreak

Small WGS dataset for testing bacterial outbreak analysis pipelines
2
star
49

snippa

Experimental modular bacterial SNP calling pipeline
Perl
2
star
50

vikka

Viral genomics toolkit for pandemics
1
star
51

gard

๐Ÿ† ๐Ÿ’Š Gonococcal Antimicrobial Resistance Detection
1
star
52

wtfq

Duplicate FASTQ reads to address undersequenced regions
C
1
star
53

simuvar

Simulate variants of bacterial genomes for testing SNP callers
1
star
54

coginator

Assign COGs to protein sequences
1
star
55

skrilla

It ain't all about skrilla
1
star
56

arborkart

Phylogenomic trees with maps for the web
1
star
57

assembill

Simple script to clip, assemble, tile and annotate a bacterial genome from Illumina reads
Shell
1
star
58

kroucha

Mock repository for Sanger publications citing Croucher et al
1
star