• Stars
    star
    745
  • Rank 60,881 (Top 2 %)
  • Language
    Python
  • License
    BSD 2-Clause "Sim...
  • Created almost 14 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python library to facilitate genome assembly, annotation, and comparative genomics

JCVI utility libraries

DOI Latest PyPI version bioconda Github Actions Downloads

Collection of Python libraries to parse bioinformatics files, or perform computation related to assembly, annotation, and comparative genomics.

Authors Haibao Tang (tanghaibao)
Vivek Krishnakumar (vivekkrish)
Jingping Li (Jingping)
Xingtan Zhang (tangerzhang)
Email [email protected]
License BSD

Citations

  • If you use the MCscan pipeline for synteny inference, please cite:

    Tang et al. (2008) Synteny and Collinearity in Plant Genomes. Science

MCSCAN example

  • If you use the ALLMAPS pipeline for genome scaffolding, please cite:

    Tang et al. (2015) ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biology

ALLMAPS animation

  • For other uses, please cite the package directly:

    Tang et al. (2015). jcvi: JCVI utility libraries. Zenodo. 10.5281/zenodo.31631

GRABSEEDS example

Contents

Following modules are available as generic Bioinformatics handling methods.

  • algorithms

    • Linear programming solver with SCIP and GLPK.
    • Supermap: find set of non-overlapping anchors in BLAST or NUCMER output.
    • Longest or heaviest increasing subsequence.
    • Matrix operations.
  • apps

    • GenBank entrez accession, Phytozome, Ensembl and SRA downloader.
    • Calculate (non)synonymous substitution rate between gene pairs.
    • Basic phylogenetic tree construction using PHYLIP, PhyML, or RAxML, and viualization.
    • Wrapper for BLAST+, LASTZ, LAST, BWA, BOWTIE2, CLC, CDHIT, CAP3, etc.
  • formats

    Currently supports .ace format (phrap, cap3, etc.), .agp (goldenpath), .bed format, .blast output, .btab format, .coords format (nucmer output), .fasta format, .fastq format, .fpc format, .gff format, obo format (ontology), .psl format (UCSC blat, GMAP, etc.), .posmap format (Celera assembler output), .sam format (read mapping), .contig format (TIGR assembly format), etc.

  • graphics

    • BLAST or synteny dot plot.
    • Histogram using R and ASCII art.
    • Paint regions on set of chromosomes.
    • Macro-synteny and micro-synteny plots.
  • utils

    • Grouper can be used as disjoint set data structure.
    • range contains common range operations, like overlap and chaining.
    • Miscellaneous cookbook recipes, iterators decorators, table utilities.

Then there are modules that contain domain-specific methods.

  • assembly

    • K-mer histogram analysis.
    • Preparation and validation of tiling path for clone-based assemblies.
    • Scaffolding through ALLMAPS, optical map and genetic map.
    • Pre-assembly and post-assembly QC procedures.
  • annotation

    • Training of ab initio gene predictors.
    • Calculate gene, exon and intron statistics.
    • Wrapper for PASA and EVM.
    • Launch multiple MAKER processes.
  • compara

    • C-score based BLAST filter.
    • Synteny scan (de-novo) and lift over (find nearby anchors).
    • Ancestral genome reconstruction using Sankoff's and PAR method.
    • Ortholog and tandem gene duplicates finder.

Applications

Please visit wiki for full-fledged applications.

Dependencies

Following are a list of third-party python packages that are used by some routines in the library. These dependencies are not mandatory since they are only used by a few modules.

There are other Python modules here and there in various scripts. The best way is to install them via pip install when you see ImportError.

Installation

The easiest way is to install it via PyPI:

pip install jcvi

To install the development version:

pip install git+git://github.com/tanghaibao/jcvi.git

Alternatively, if you want to install manually:

cd ~/code  # or any directory of your choice
git clone git://github.com/tanghaibao/jcvi.git
pip install -e .

In addition, a few module might ask for locations of external programs, if the extended cannot be found in your PATH. The external programs that are often used are:

Most of the scripts in this package contains multiple actions. To use the fasta example:

Usage:
    python -m jcvi.formats.fasta ACTION


Available ACTIONs:
          clean | Remove irregular chars in FASTA seqs
           diff | Check if two fasta records contain same information
        extract | Given fasta file and seq id, retrieve the sequence in fasta format
          fastq | Combine fasta and qual to create fastq file
         filter | Filter the records by size
         format | Trim accession id to the first space or switch id based on 2-column mapping file
        fromtab | Convert 2-column sequence file to FASTA format
           gaps | Print out a list of gap sizes within sequences
             gc | Plot G+C content distribution
      identical | Given 2 fasta files, find all exactly identical records
            ids | Generate a list of headers
           info | Run `sequence_info` on fasta files
          ispcr | Reformat paired primers into isPcr query format
           join | Concatenate a list of seqs and add gaps in between
     longestorf | Find longest orf for CDS fasta
           pair | Sort paired reads to .pairs, rest to .fragments
    pairinplace | Starting from fragment.fasta, find if adjacent records can form pairs
           pool | Pool a bunch of fastafiles together and add prefix
           qual | Generate dummy .qual file based on FASTA file
         random | Randomly take some records
         sequin | Generate a gapped fasta file for sequin submission
       simulate | Simulate random fasta file for testing
           some | Include or exclude a list of records (also performs on .qual file if available)
           sort | Sort the records by IDs, sizes, etc.
        summary | Report the real no of bases and N's in fasta files
           tidy | Normalize gap sizes and remove small components in fasta
      translate | Translate CDS to proteins
           trim | Given a cross_match screened fasta, trim the sequence
      trimsplit | Split sequences at lower-cased letters
           uniq | Remove records that are the same

Then you need to use one action, you can just do:

python -m jcvi.formats.fasta extract

This will tell you the options and arguments it expects.

Feel free to check out other scripts in the package, it is not just for FASTA.

More Repositories

1

goatools

Python library to handle Gene Ontology (GO) terms
Python
774
star
2

bio-pipeline

My collection of light bioinformatics analysis pipelines for specific tasks
CAP CDS
70
star
3

allhic

Genome scaffolding based on HiC data in heterozygous and high ploidy genomes
Jupyter Notebook
59
star
4

quota-alignment

Guided synteny alignment between duplicated genomes (within specified quota constraint)
Python
55
star
5

treecut

Find nodes in hierarchical clustering that are statistically significant
Python
28
star
6

mcscan

Command-line program to wrap dagchainer and combine pairwise results into multi-alignments in column format
C++
21
star
7

trimReads

Utility programs to trim or sort Illumina reads with adapter sequences
C++
15
star
8

rust-wfa2

Rust binding for WFA2-lib
Rust
9
star
9

klassify

Classify chimeric reads based on unique kmer contents
Jupyter Notebook
9
star
10

jcvi-bin

Collection of third-party softwares used in jcvi library
Java
5
star
11

dna-pygments

Javascript code to highlight features in biological sequences
JavaScript
5
star
12

Splithunter

Identify split reads in given chromosomal regions
C++
5
star
13

positional-history

Internal scripts to run the pipeline to determine the transpositions of A. thaliana genes with respect to multiple outgroups
Python
5
star
14

pybind11_log

A bridge from C++ to Python logging
C++
5
star
15

pgdd

Dynamic contents within the plant genome duplication database
Python
4
star
16

dotfiles

bashrc, vimrc, gitconfig and various other configuration files
C++
4
star
17

nannou-playground

Animation projects that leverage the excellent nannou library
Rust
2
star
18

tanghaibao

1
star