• Stars
    star
    297
  • Rank 140,075 (Top 3 %)
  • Language
    Python
  • License
    Other
  • Created over 14 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python wrapper -- and more -- for BEDTools (bioinformatics tools for "genome arithmetic")

Overview

https://travis-ci.org/daler/pybedtools.png?branch=master https://badge.fury.io/py/pybedtools.svg?style=flat

The BEDTools suite of programs is widely used for genomic interval manipulation or "genome algebra". pybedtools wraps and extends BEDTools and offers feature-level manipulations from within Python.

See full online documentation, including installation instructions, at http://daler.github.io/pybedtools/.

Why pybedtools?

Here is an example to get the names of genes that are <5 kb away from intergenic SNPs:

from pybedtools import BedTool

snps = BedTool('snps.bed.gz')  # [1]
genes = BedTool('hg19.gff')    # [1]

intergenic_snps = snps.subtract(genes)                       # [2]
nearby = genes.closest(intergenic_snps, d=True, stream=True) # [2, 3]

for gene in nearby:             # [4]
    if int(gene[-1]) < 5000:    # [4]
        print gene.name         # [4]

Useful features shown here include:

  • [1] support for all BEDTools-supported formats (here gzipped BED and GFF)
  • [2] wrapping of all BEDTools programs and arguments (here, subtract and closest and passing the -d flag to closest);
  • [3] streaming results (like Unix pipes, here specified by stream=True)
  • [4] iterating over results while accessing feature data by index or by attribute access (here [-1] and .name).

In contrast, here is the same analysis using shell scripting. Note that this requires knowledge in Perl, bash, and awk. The run time is identical to the pybedtools version above:

snps=snps.bed.gz
genes=hg19.gff
intergenic_snps=/tmp/intergenic_snps

snp_fields=`zcat $snps | awk '(NR == 2){print NF; exit;}'`
gene_fields=9
distance_field=$(($gene_fields + $snp_fields + 1))

intersectBed -a $snps -b $genes -v > $intergenic_snps

closestBed -a $genes -b $intergenic_snps -d \
| awk '($'$distance_field' < 5000){print $9;}' \
| perl -ne 'm/[ID|Name|gene_id]=(.*?);/; print "$1\n"'

rm $intergenic_snps

See the Shell script comparison in the docs for more details on this comparison, or keep reading the full documentation at http://daler.github.io/pybedtools.

More Repositories

1

gffutils

GFF and GTF file manipulation and interconversion
Python
284
star
2

metaseq

Framework for integrated analysis and plotting of ChIP/RIP/RNA/*-seq data
Python
87
star
3

trackhub

create, manage, and upload track hubs for use in the UCSC genome browser
Python
51
star
4

matplotlibrc

some example matplotlibrc files, and a script display their effects
Python
48
star
5

sphinxdoc-test

experimenting with the best way to push sphinx-generated docs to gh-pages
24
star
6

dotfiles

dotfiles, batteries included
Shell
18
star
7

hubward

Manage the visualization of large amounts of other people's [often messy] genomics data
Python
18
star
8

pipeline-example

example ruffus pipeline
Python
18
star
9

biomartpy

Simple interface to BioMart (Python -> rpy2 -> R/BioConductor's biomaRt)
Python
17
star
10

chromhmm-tools

Helpers for working with ChromHMM (http://compbio.mit.edu/ChromHMM/)
Python
16
star
11

ucscsession

Python package for managing sessions in the UCSC Genome Browser.
Python
11
star
12

blender-for-3d-printing

Material for introductory course on using Blender for 3D printing
Python
11
star
13

rdbio-scripts

Unorganized collection of bioinformatics scripts and utilities
Python
10
star
14

GFFutils_old

NOTE: see new version at https://github.com/daler/gffutils.
Python
10
star
15

encode-dataframe

Convert UCSC's ENCODE metadata into pandas DataFrames
Python
9
star
16

enhancer-snakemake-demo

Demos a Snakemake workflow to classify enhancer regions based on publicly available chromatin marks.
Python
6
star
17

genomicfeatures

Python
5
star
18

gdc

Genomic Dataset Constructor: create example BED, GFF, SAM, FASTQ files from "ASCII art" definitions
Python
5
star
19

ontologization

Wrapper for Ontologizer gene ontology analysis tool, with manipulation and display of downstream results
Python
5
star
20

seqprint

pretty-print genomic sequences
Python
4
star
21

metaseq-biotrac56

Materials for the metaseq presentation at NIH FAES Bio-Trac 56 (http://www.biotrac.com/pages/Tracs/Trac56.html)
Python
4
star
22

deseq-browser

View DESeq results in a web browser, with filtering and searching
JavaScript
3
star
23

shiny-fet

Shiny app for visualizing the results of a Fisher's exact test
R
3
star
24

metaseq-example-data

Example data for metaseq
Python
2
star
25

chromhmm-enhancers-umel

Identify enhancers. Includes data download, liftover, parallelized workflows, results aggregation, and example output.
Python
2
star
26

sphinxleash

Lightweight framework for programatically generating Sphinx docs.
Python
2
star
27

docker-rw2019

Materials for docker demonstration for Spring 2019 Reproducibility Workshop
HTML
2
star
28

feature-by-reads-matrix

Collection of scripts to create a table of genome features with the number of reads per feature for an arbitrary number of samples
Python
2
star
29

entabled

Convert text data files to a browser-viewable version that can be searched, filtered, and sorted
JavaScript
2
star
30

marginalhists

Scatterplots with marginal histograms using matplotlib
Python
1
star
31

trackhub-demo

Python
1
star
32

build-test

sandbox for bioconda-utils
Python
1
star
33

hubward-studies

Config files for running hubward on published data sets.
Python
1
star