• Stars
    star
    195
  • Rank 198,175 (Top 4 %)
  • Language
    Python
  • License
    Other
  • Created over 10 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A basic introduction to Biopython, intended for a classroom based workshop

Introduction to Biopython

This is a basic introduction to Biopython, intended for a classroom based workshop. It assumes you have been introduced to both working at the command line, and basic Python - for example as covered in Martin Jones' free eBook Python for Biologists.

The Biopython website http://www.biopython.org has more information including the Biopython Tutorial & Cookbook (html, PDF available), which is worth going through once you have mastered the basics of Python. That Tutorial & Cookbook is also available as Jupyter Notebooks, as is another short introductory tutorial.

Workshop Sections

I've broken up the workshop into sections:

This material focuses on Biopython's SeqIO and AlignIO modules (these links include an overview and tables of supported file formats), each of which also has a whole chapter in the Biopython Tutorial & Cookbook (PDF) which would be worth reading after this workshop to learn more.

Notation

Text blocks starting with $ show something you would type and run at the command line prompt, where the $ itself represents the prompt. For example:

$ python -V
Python 2.7.5

Depending how your system is configured, rather than just $ you may see your user name and the current working directory. Here you would only type python -V (python space minus capital V) to find out the default version of Python installed.

Lines starting >>> represent the interactive Python prompt, and something you would type inside Python. For example:

$ python
Python 2.7.3 (default, Nov  7 2012, 23:34:47)
[GCC 4.4.6 20120305 (Red Hat 4.4.6-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 7 * 6
42
>>> quit()

Here you would only need to type 7 * 6 (and enter) into Python, the >>> is already there. To quit the interactive Python prompt use quit() (and enter). This example would usually be shortened to just:

>>> 7 * 6
42

These text blocks are also used for entire short Python scripts, which you can copy and save as a plain text file with the extension .py to run them.

Sample Solutions

Each workshop section was written in a separate directory, and in addition to the main text (named README.rst which is plain text file with markup to make it look pretty on GitHub), the folders contain sample solution Python scripts (named as in the text).

Prerequisites & Sample Data

If you are reading this on GitHub.com, you can view, copy/paste or download individual examples from your web browser.

To make a local copy of the entire workshop, you can use the git command line tool:

$ git clone https://github.com/peterjc/biopython_workshop.git

Alternatively, depending on your firewall settings, use:

$ git clone [email protected]:peterjc/biopython_workshop.git

To learn more about git and software version control, I recommend attending a Software Carpentry Workshop or similar course.

This should make a new sub-directory, biopython_workshop/ which we will now change into:

$ cd biopython_workshop

Most of the examples use real biological data files. You should download them now using the provided shell script:

$ bash fetch_sample_data.sh

We assume you have Python and Biopython 1.63 or later installed and working. Biopython 1.63 supports Python 2.6, 2.7 and 3.3 (and should work on more recent versions). The examples here assume you are using Python 2.6 or 2.7, but in general should work with Python 3 with minimal changes. Check this works:

$ python -c "import Bio; print(Bio.__version__)"
1.63

History

This material was first used as part of a two-day course "Introduction to Python for Biologists" (Kathryn Crouch, Peter Cock and Tim Booth), part of a two-week course Keystone Skills in Bioinformatics, held in February 2014 at Centre for Ecology & Hydrology (CEH), Wallingford, UK. In a morning session lasting about 2.5 hours (plus coffee break), we covered all of reading sequence files and writing sequence files - and I quickly talked through alignment files.

I presented much of it again later in February 2014 at the University of Dundee as part of the third year undergraduate course BS32010 Applied Bioinformatics run by Dr David Martin and Dr David Booth. In the two hour slot we covered all of reading sequence files and most of writing sequence files.

I repeated this in March 2015 for the same third year undergraduate course, BS32010 Applied Bioinformatics at the University of Dundee. In a three hour slot we covered reading sequence files most of writing sequence files (up to editing sequences, but not filtering by identifier), and the start of multiple-sequence alignments.

Copyright and Licence

Copyright 2014-2015 by Peter Cock, The James Hutton Institute, Dundee, UK. All rights reserved.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA 4.0).

http://i.creativecommons.org/l/by-sa/4.0/88x31.png

Note this documentation links to and uses external and separately licenced sample data files.

More Repositories

1

flake8-black

flake8 plugin to run black for checking Python coding style
Python
160
star
2

galaxy_blast

Galaxy wrappers for NCBI BLAST+ and related BLAST tools.
Python
65
star
3

backports.lzma

Backport of Python 3.3's standard library module lzma for LZMA/XY compressed files
Python
56
star
4

mediawiki_to_git_md

Convert a MediaWiki export XML file into MarkDown as a series of git commits
Python
51
star
5

flake8-rst-docstrings

flake8 plugin to validate Python docstrings as reStructuredText (RST)
Python
50
star
6

picobio

Miscellaneous Bioinformatics scripts etc mostly in Python
Python
44
star
7

longsight

Python code for capturing images from a webcam etc
Python
21
star
8

pico_galaxy

Galaxy tools and wrappers for sequence analysis
Python
17
star
9

maf2sam

Convert MIRA Assembly Format (MAF) to Sequence Alignment/Map (SAM) format
Python
12
star
10

thapbi-pict

Tree Health and Plant Biosecurity Initiative - Phytophthora ITS1 Classifier Tool
Python
8
star
11

flake8-sfs

Python flake8 plugin for String Formatting Style (SFS)
Python
6
star
12

galaxy_mira

Galaxy wrappers for MIRA assembler
Python
3
star
13

peterjc.github.io

Obsolete test for mediawiki to markdown conversion for GitHub Pages with Jekyll
CSS
3
star
14

tarball2git

Simple Python script to take a set of versioned tar balls and import them into a git repository
Python
3
star
15

split-dist

Unofficial repository for Thomas Mailund's tool Split-Dist (sdist)
Shell
2
star
16

blast_max_target_seqs

Reproducible test case for the "-max_taget_seqs issue" found by Sujai Kumar in Dec 2015.
Python
1
star
17

ccn

Coupled Cell Networks
Python
1
star