• Stars
    star
    910
  • Rank 49,850 (Top 1.0 %)
  • Language
    C
  • License
    MIT License
  • Created almost 11 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

bedtools - the swiss army knife for genome arithmetic

shamalamadingdong

bedtools - the swiss army knife for genome arithmetic

Download current version

Documentation

Cheat-sheet from Ilya Levantis

Summary

Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.

While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.

Performance

As of version 2.18, bedtools is substantially more scalable thanks to improvements we have made in the algorithm used to process datasets that are pre-sorted by chromosome and start position. As you can see in the plots below, the speed and memory consumption scale nicely with sorted data as compared to the poor scaling for unsorted data. The current version of bedtools intersect is as fast as (or slightly faster) than the bedops package's bedmap which uses a similar algorithm for sorted data. The plots below represent counting the number of intersecting alignments from exome capture BAM files against CCDS exons. The alignments have been converted to BED to facilitate comparisons to bedops. We compare to the bedmap --ec option because similar error checking is enforced by bedtools.

Note: bedtools could not complete when using 100 million alignments and the R-Tree algorithm used for unsorted data.

Speed Comparison Memory Comparison

Details

First created through urgency and adrenaline by Aaron Quinlan Spring 2009. Maintained by the Quinlan Laboratory at the University of Virginia.

  1. Lead developers: Aaron Quinlan, Hao Hoou, Brent Pedersen, Neil Kindlon
  2. Significant contributions: Hao Hou, John Marshall, Assaf Gordon, Royden Clark, Brent Pedersen, Ryan Dale
  3. Repository: https://github.com/arq5x/bedtools2
  4. Stable releases: https://github.com/arq5x/bedtools2/releases
  5. Documentation: http://bedtools.readthedocs.org
  6. License: Released under MIT license

Citation

Please cite the following article if you use BEDTools in your research:

  • Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841–842.

Also, if you use pybedtools, please cite the following.

  • Dale RK, Pedersen BS, and Quinlan AR. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics (2011). doi:10.1093/bioinformatics/btr539

More Repositories

1

gemini

a lightweight db framework for exploring genetic variation.
Python
316
star
2

lumpy-sv

lumpy: a general probabilistic framework for structural variant discovery
C
301
star
3

poretools

a toolkit for working with Oxford nanopore data
Jupyter Notebook
239
star
4

bedtools

A powerful toolset for genome arithmetic.
C++
140
star
5

grabix

a wee tool for random access into BGZF files.
C
83
star
6

bedtools-protocols

CSS
78
star
7

cyvcf

A fast Python library for VCF files leveraging Cython for speed.
Python
52
star
8

filo

Useful FILe and stream Operations
C++
44
star
9

ggd

Python
43
star
10

nanopore-scripts

Various scripts and recipes for working with nanopore data
Python
34
star
11

bedtools-python

A Python interface to the BEDTools API using Cython
C
28
star
12

kway-mergesort

A templated C++ API for memory-assisted, k-way merge sorts
C
25
star
13

piledriver

Basic, no assumptions, multi-pileup
C++
24
star
14

bits

BITS: Binary Interval Search
C++
23
star
15

Hydra

C++
19
star
16

chrom_sweep

Sweep-line algorithm for genomic features. Detect overlaps on large files w/ minimal memory.
Python
9
star
17

bash_completion

9
star
18

scurgen

A tool for detecting patterns in genomic data with space filling curves
Python
9
star
19

vitae

TeX
6
star
20

ggd-recipes

Recipes for GGD
6
star
21

toy_bottle_bootstrap_app

A demonstration of using bottle with bootstrap for basic web app development
JavaScript
5
star
22

tutorials

CSS
4
star
23

hawk

header awk: awk with _named_ variables
Python
4
star
24

lab_website

quinlan lab website (quinlanlab.org)
HTML
4
star
25

codachrom

Chromosomal copy number tools.
Python
3
star
26

bedtools-galaxy

3
star
27

knotty

A comprehensive SV discovery suite.
C++
3
star
28

bits_paper

R
2
star
29

arq5x.github.io

HTML
1
star