• Stars
    star
    629
  • Rank 69,840 (Top 2 %)
  • Language
    C++
  • License
    GNU General Publi...
  • Created over 11 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A fast and sensitive gapped read aligner

Random Tests Simple Tests Version

License: GPL v3

Overview

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.

Obtaining Bowtie2

Bowtie 2 is available from various package managers, notably Bioconda. With Bioconda installed, you should be able to install Bowtie 2 with conda install bowtie2.

Containerized versions of Bowtie 2 are also available via the Biocontainers project (e.g. via Docker Hub).

You can also download Bowtie 2 sources and binaries from the "releases" tab on this page. Binaries are available for the Linux, Mac OS X, and Windows. By utilizing the SIMDE project Bowtie 2 now supports the following architectures: ARM64, PPC64, and s390x. If you plan to compile Bowtie 2 yourself, make sure you at least have the zlib library and header files installed. See the Building from source section of the manual for details.

Getting started

Looking to try out Bowtie 2? Check out the Bowtie 2 UI (currently in beta).

Alignment

bowtie2 takes a Bowtie 2 index and a set of sequencing read files and outputs a set of alignments in SAM format.

"Alignment" is the process by which we discover how and where the read sequences are similar to the reference sequence. An "alignment" is a result from this process, specifically: an alignment is a way of "lining up" some or all of the characters in the read with some characters from the reference in a way that reveals how they're similar. For example:

  Read:      GACTGGGCGATCTCGACTTCG
             |||||  |||||||||| |||
  Reference: GACTG--CGATCTCGACATCG

Where dash symbols represent gaps and vertical bars show where aligned characters match.

We use alignment to make an educated guess as to where a read originated with respect to the reference genome. It's not always possible to determine this with certainty. For instance, if the reference genome contains several long stretches of As (AAAAAAAAA etc.) and the read sequence is a short stretch of As (AAAAAAA), we cannot know for certain exactly where in the sea of As the read originated.

Examples

# Aligning unpaired reads
bowtie2 -x example/index/lambda_virus -U example/reads/longreads.fq

# Aligning paired reads
bowtie2 -x example/index/lambda_virus -1 example/reads/reads_1.fq -2 example/reads/reads_2.fq

Building an index

bowtie2-build builds a Bowtie index from a set of DNA sequences. bowtie2-build outputs a set of 6 files with suffixes .1.bt2, .2.bt2, .3.bt2, .4.bt2, .rev.1.bt2, and .rev.2.bt2. In the case of a large index these suffixes will have a bt2l termination. These files together constitute the index: they are all that is needed to align reads to that reference. The original sequence FASTA files are no longer used by Bowtie 2 once the index is built.

Bowtie 2's .bt2 index format is different from Bowtie 1's .ebwt format, and they are not compatible with each other.

Examples

# Building a small index
bowtie2-build example/reference/lambda_virus.fa example/index/lambda_virus

# Building a large index
bowtie2-build --large-index example/reference/lambda_virus.fa example/index/lambda_virus

Index inpection

bowtie2-inspect extracts information from a Bowtie 2 index about what kind of index it is and what reference sequences were used to build it. When run without any options, the tool will output a FASTA file containing the sequences of the original references (with all non-A/C/G/T characters converted to Ns). It can also be used to extract just the reference sequence names using the -n/--names option or a more verbose summary using the -s/--summary option.

Examples

# Inspecting a lambda_virus index (small index) and outputting the summary
bowtie2-inspect --summary example/index/lambda_virus

# Inspecting the entire lambda virus index (large index)
bowtie2-inspect --large-index example/index/lambda_virus

Publications

Bowtie 2 Papers

Related Publications

Related Work

Check out the Bowtie 2 UI, a shiny, frontend to the Bowtie 2 command line.

More Repositories

1

ads1-notebooks

Copies of notebooks used in the practical sessions for Algorithms for DNA Sequencing
347
star
2

ads1-slides

Slides for Algorithms for DNA Sequencing Coursera class
296
star
3

comp-genomics-class

Code and examples for JHU Computational Genomics class
Jupyter Notebook
284
star
4

bowtie

An ultrafast memory-efficient short read aligner
C++
255
star
5

c-cpp-notes

Lecture notes and example code for teaching C & C++
Python
234
star
6

langmead-lab

Publicly-visible Langmead Lab resources
58
star
7

aws-indexes

Catalog of genomic indexes freely available from public clouds
Python
52
star
8

crossbow

Variant calling from sequence reads using cloud computing
Perl
38
star
9

qtip

Qtip: a tandem simulation approach for accurately predicting read alignment mapping qualities
C++
24
star
10

seq-frontiers-class

Code, examples, reading list for JHU Frontiers of Sequencing Data Analysis class
24
star
11

myrna

Cloud-scale differential expression for RNA-seq
Perl
15
star
12

bowtie-majref

Scripts related to building major-allele references for Bowtie and Bowtie 2
Shell
11
star
13

bowtie-scaling

Experiments for "Scaling read aligners to hundreds of threads on general-purpose processors"
Python
11
star
14

bsmooth-align

Alignment of bisulfite sequence reads and tabulation of read-level methylation measurements
C++
10
star
15

ads1-hw-examples

How some homework solutions should work on some small examples
9
star
16

jhu-compute

Resources to help use JHU's compute resources
3
star
17

sra-example

C++
2
star
18

percy

Scripts for starting permanent, lightweight EC2 instances
Python
2
star
19

public-seq-data

Scripts, manifests, etc for grabbing and lightly analyzing public data
Python
1
star
20

recount-docker

Dockerization of recount and its R/Bioconductor dependencies
R
1
star
21

bowtie-dev

Shell
1
star
22

jupyter-all

Jupyter environment with various language kernels already installed
Dockerfile
1
star
23

cgsi18

Scripts for downloading and querying raw Snaptron data
HTML
1
star
24

docker-oss

Automatic compatibility tests, e.g. to learn which OS versions a binary can be run on
Shell
1
star