• Stars
    star
    162
  • Rank 232,284 (Top 5 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created about 10 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A set of tools for manipulating and doing calculations on wwPDB macromolecule structure files

pdbtools

A set of tools for manipulating and doing calculations on wwPDB macromolecule structure files

PDBtools has recently changed. The original version was a set of python scripts to be downloaded and used locally on the command line. It has now been reorganized into a python packaged and scripts are now installed globally. Thus, you can run any of the old pdb_tools from the commandline from anywhere on your filesystem.

If you'd like to download the old version of pdbtools, download version v0.1.

Introduction

pdbTools is a set of command line python scripts that manipulate wwPDB protein and nucleic acid structure files. There are many programs, both open source and proprietary, that perform similar tasks; however, most of these tools are buried within programs of larger functionality. Thus, relatively simple calculations often involve learning a new program, compiling modules, and installing libraries. To fill a niche (and get the tasks done that I needed done), I started writing my own toolset. This has evolved into the pdbTools suite. The suite of programs is characterized by the following philosophy:

  • Each program should run as a stand-alone application with a standard, GNU/POSIX style command line interface.
  • Each program should be written in such a way to allow it to be used as a library of functions for more complex programs.
  • Programs should require a minimum of external dependencies.

Most of the scripts will run "out of the box" using a python interpreter. The command line parser is designed to be flexible. It will take an arbitrarily long list of pdb files, pdb ids, text files with pdb ids, or some mixture of all three. If the pdb file or id is not in the working directory, scripts will attempt to download the pdb file from RCSB. Depending on the type of operation being done, a program will either write output files in the working directory or will print to stdout. All structure outputs are written in standard pdb format. All data outputs are in fixed-width column format. They were designed to be read by the statistics package R; however, they should be easily parsed by other graphing programs.

Note: These scripts are only compatible with Python version 2.4-2.7.

Installation

Install the development version by cloning this repo and running pip:

pip install -e .

from inside the package.

Current functions

Miscellaneous

  • download pdb files from the RCSB database: download.py

Structure-based calculations

Geometry

Energy calculation

  • calculate coulomb energy: coulomb.py
  • calculate the dipole moment of the protein: moment.py
  • calculate pKa of ionizable groups using the Solvent-Accessibility-modified Tanford-Kirkwood method satk.py (requires fortran compiler)

Structure properties

  • extract structure experiment properties: exper.py
  • extract protein sequence from structure: seq.py
  • calculate theoretical pI, MW, fraction titratable residues, charge: param.py

File/structure manipulation

Some of the programs are written as interfaces to other programs: CHARMM, [NACCESS](http://www.bioinf.manchester.ac.uk/naccess/ NACCESS), which must be downloaded and installed separately if their functions are desired. To use satk.py, a set of fortran packages must be compiled.

Usage

Commandline usage

Almost all programs in the pdbTools suite have the same command-line usage:

pdb_XXXX pdb_input optional_args > output

pdb_input can be one of the following (in any arbitrary combination):

  • pdb files
  • directories of pdb files
  • four-character pdb ids
  • text files containing whitespace delimited (i.e. space, tab, carriage return) lists of any combination of the other allowed types of arguments. If the list of arguments contains pdb files or ids that do not exist locally, the parser will attempt to download the files from the RCSB database.

optional_args: Although the arguments to each program are identical, the options are quite different depending on the program requirements. The best way to learn how to use a particular program is to type XXXX.py --help. This will spit out a list of available options. In most cases, the options are actually optional: the program will use a sane default if none is specified. In some cases (notably mutator.py), options must be specified for the program to run.

output: Most scripts dump out a pdb file to standard out. This can be captured using the ">" redirect. Some write an output file that uses the name of the input pdb file as a suffix (e.g. close-contacts.py 1stn.pdb creates a file called 1stn.pdb.close_contacts).

API

Version 0.2 has moved all pdbtools into a set of modules. These can be used to develop new scripts easily.

Note: You can download the original pdbtools scripts (prior to packaging) here.

Third Party Software

Some scripts require installation of third-party programs. These should be installed according to the instructions given by the third-party, then placed into the $PATH variable. To use the scripts that require CHARMM, the $CHARMM environment variable must be set to the directory containing the charmm binary and the $CHARMM_LIB environment variable to the directory containing the charmm parameter files.

Contributing

If you find a bug or have an idea for a program you'd like in this package, feel free to open an issue. Even better: feel free to make a pull request!

Project Owner

Mike Harms (https://github.com/harmsm, http://harmslab.uoregon.edu)

More Repositories

1

epistasis

A Python API for estimating statistical high-order epistasis in genotype-phenotype maps.
Python
32
star
2

topiary

Python framework for doing ancestral sequence reconstruction
Python
27
star
3

pytc

python program for analyzing isothermal titration calorimetry data
Python
26
star
4

gpmap

A Python API for managing genotype-phenotype map data
Python
11
star
5

hops

Use machine learning to find rules for peptide binding
Jupyter Notebook
7
star
6

fast_dbscan

fast, lightweight dbscan implementation for peptide strings
C
6
star
7

phylogenetics

A Python API for managing phylogenetics projects
Python
6
star
8

notebooks-nonlinear-high-order-epistasis

Jupyter notebooks for Genetics paper, "Detecting high-order epistasis in nonlinear genotype-phenotype maps"
Jupyter Notebook
6
star
9

gpseer

A tool to predict missing data in sparsely sampled genotype-phenotype maps
Python
5
star
10

pytc-demos

Set of jupyter notebooks illustrating the pytc API.
Jupyter Notebook
3
star
11

examl-tree-ancestor

pipeline stitching together fasttree, examl, and lazarus/paml for phylogenetics and ancestral sequence reconstruction
Python
3
star
12

gpvolve

A Python API for computing evolutionary paths across genotype-phenotype maps and clustering of genotype-phenotype maps.
Python
3
star
13

dataprob

Library for fitting models using a likelihood framework.
Python
3
star
14

topiary-examples

Jupyter notebooks demonstrating the topiary ancestral sequence reconstruction package
Jupyter Notebook
2
star
15

TreeViewer

Interactive Evolutionary Tree Viewer using D3.js
JavaScript
2
star
16

pytc-gui

PyQt5 GUI for pytc API
Python
2
star
17

latticegpm

Python package for building fitness landscapes from protein lattice models
Python
2
star
18

genotype-phenotype-maps

A curated list of experimental genotype-phenotype maps from published literature in JSON format
Python
2
star
19

asr-protocol

Protocol and scripts for doing ancestral sequence reconstruction
Jupyter Notebook
1
star
20

flowsym

A Python API for simulating flow cytometry data.
Python
1
star
21

gpgraph

NetworkX for genotype-phenotype maps
Python
1
star
22

notebooks-epistasis-ensembles

Notebooks demonstrating the origins of high-order epistasis from statistical mechanical ensembles
Jupyter Notebook
1
star
23

phagedisplay

Analysis tools for phage display experiments in the Harms Lab
Python
1
star
24

latticeproteins

2d protein lattice model simulator in Python
Python
1
star