• Stars
    star
    229
  • Rank 174,666 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 8 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Python frontend to (Open Biomedical) Ontologies.

pronto Stars

A Python frontend to ontologies.

Actions License Source Docs Coverage Sanity PyPI Bioconda Versions Wheel Changelog GitHub issues DOI Downloads

🚩 Table of Contents

🗺️ Overview

Pronto is a Python library to parse, browse, create, and export ontologies, supporting several ontology languages and formats. It implement the specifications of the Open Biomedical Ontologies 1.4 in the form of an safe high-level interface. If you're only interested in parsing OBO or OBO Graphs document, you may wish to consider fastobo instead.

🏳️ Supported Languages

🔧 Installing

Installing with pip is the easiest:

# pip install pronto          # if you have the admin rights
$ pip install pronto --user   # install it in a user-site directory

There is also a conda recipe in the bioconda channel:

$ conda install -c bioconda pronto

Finally, a development version can be installed from GitHub using setuptools, provided you have the right dependencies installed already:

$ git clone https://github.com/althonos/pronto
$ cd pronto
# python setup.py install

💡 Examples

If you're only reading ontologies, you'll only use the Ontology class, which is the main entry point.

>>> from pronto import Ontology

It can be instantiated from a path to an ontology in one of the supported formats, even if the file is compressed:

>>> go = Ontology("tests/data/go.obo.gz")

Loading a file from a persistent URL is also supported, although you may also want to use the Ontology.from_obo_library method if you're using persistent URLs a lot:

>>> cl = Ontology("http://purl.obolibrary.org/obo/cl.obo")
>>> stato = Ontology.from_obo_library("stato.owl")

🏷️ Get a term by accession

Ontology objects can be used as mappings to access any entity they contain from their identifier in compact form:

>>> cl['CL:0002116']
Term('CL:0002116', name='B220-low CD38-positive unswitched memory B cell')

Note that when loading an OWL ontology, URIs will be compacted to CURIEs whenever possible:

>>> aeo = Ontology.from_obo_library("aeo.owl")
>>> aeo["AEO:0000078"]
Term('AEO:0000078', name='lumen of tube')

🖊️ Create a new term from scratch

We can load an ontology, and edit it locally. Here, we add a new protein class to the Protein Ontology.

>>> pr = Ontology.from_obo_library("pr.obo")
>>> brh = ms.create_term("PR:XXXXXXXX")
>>> brh.name = "Bacteriorhodopsin"
>>> brh.superclasses().add(pr["PR:000001094"])  # is a rhodopsin-like G-protein
>>> brh.disjoint_from.add(pr["PR:000036194"])   # disjoint from eukaryotic proteins

✏️ Convert an OWL ontology to OBO format

The Ontology.dump method can be used to serialize an ontology to any of the supported formats (currently OBO and OBO JSON):

>>> edam = Ontology("http://edamontology.org/EDAM.owl")
>>> with open("edam.obo", "wb") as f:
...     edam.dump(f, format="obo")

🌿 Find ontology terms without subclasses

The terms method of Ontology instances can be used to iterate over all the terms in the ontology (including the ones that are imported). We can then use the is_leaf method of Term objects to check is the term is a leaf in the class inclusion graph.

>>> ms = Ontology("ms.obo")
>>> for term in ms.terms():
...     if term.is_leaf():
...         print(term.id)
MS:0000000
MS:1000001
...

🤫 Silence warnings

pronto is explicit about the parts of the code that are doing non-standard assumptions, or missing capabilities to handle certain constructs. It does so by raising warnings with the warnings module, which can get quite verbose.

If you are fine with the inconsistencies, you can manually disable warning reports in your consumer code with the filterwarnings function:

import warnings
import pronto
warnings.filterwarnings("ignore", category=pronto.warnings.ProntoWarning)

📖 API Reference

A complete API reference can be found in the online documentation, or directly from the command line using pydoc:

$ pydoc pronto.Ontology

📜 License

This library is provided under the open-source MIT license. Please cite this library if you are using it in a scientific context using the following DOI: 10.5281/zenodo.595572

More Repositories

1

InstaLooter

Another API-less Instagram pictures and videos downloader.
Python
2,003
star
2

ffpb

A progress bar for ffmpeg. Yay !
Python
300
star
3

pyrodigal

Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!
Cython
138
star
4

pyhmmer

Cython bindings and Python interface to HMMER3.
Cython
122
star
5

fs.sshfs

Pyfilesystem2 over SSH using paramiko
Python
88
star
6

rich-msa

A Rich renderable for viewing Multiple Sequence Alignments in the terminal.
Python
77
star
7

peptides.py

Physicochemical properties, indices and descriptors for amino-acid sequences.
Python
69
star
8

lightmotif

A lightweight platform-accelerated library for biological motif scanning using position weight matrices.
Rust
39
star
9

fs.smbfs

Pyfilesystem2 over SMB using pysmb
Python
29
star
10

pyfamsa

Cython bindings and Python interface to FAMSA, an algorithm for ultra-scale multiple sequence alignments.
Python
28
star
11

mini3di

A NumPy port of the foldseek code for encoding protein structures to 3di.
Python
22
star
12

pyskani

PyO3 bindings and Python interface to skani, a method for fast genomic identity calculation using sparse chaining.
Rust
20
star
13

blanket

A simple Rust macro to derive blanket implementations for your traits.
Rust
20
star
14

pytrimal

Cython bindings and Python interface to trimAl, a tool for automated alignment trimming. Now with SIMD!
Cython
20
star
15

pyfastani

Cython bindings and Python interface to FastANI, a method for fast whole-genome similarity estimation.
Cython
19
star
16

pymuscle5

Cython bindings and Python interface to MUSCLE v5, a highly efficient and accurate multiple sequence alignment software.
Cython
18
star
17

orthoani

A Python implementation of the OrthoANI algorithm for nucleotide identity measurement.
Python
17
star
18

fs.archive

Pyfilesystem2 for various archive filesystems
Python
17
star
19

pyrodigal-gv

A Pyrodigal extension to predict genes in giant viruses and viruses with alternative genetic code.
Python
13
star
20

moclo

Modular cloning simulation with the MoClo framework in Python
Python
12
star
21

iocursor

A zero-copy file-like wrapper for Python byte buffers, inspired by Rust's std::io::Cursor.
C
12
star
22

gb-io.py

A Python interface to gb-io, a fast GenBank parser written in Rust.
Python
12
star
23

cksfv.rs

A 10x faster drop-in reimplementation of cksfv using Rust and the crc32fast crate.
Rust
12
star
24

textwrap-macros

Simple procedural macros to use textwrap utilities at compile time.
Rust
12
star
25

pymemesuite

Cython bindings and Python interface to the MEME suite, a collection of tools for the analysis of sequence motifs.
Cython
10
star
26

pysylph

PyO3 bindings and Python interface to sylph, an ultrafast method for containment ANI querying and taxonomic profiling.
Rust
10
star
27

uniprot.rs

Rust data structures and parser for the Uniprot database(s).
Rust
9
star
28

thunar-torrent-property

A small thunar extension displaying the metadata in a torrent file.
C
9
star
29

jinja2-fsloader

A Jinja2 template loader using PyFilesystem2.
Python
9
star
30

nanoset.py

A memory-optimized wrapper for Python sets likely to be empty.
Python
8
star
31

packageurl.rs

Rust implementation of the Package URL specification.
Rust
8
star
32

pyopal

Cython bindings and Python interface to Opal, a SIMD-accelerated database search aligner.
Python
8
star
33

pubchem.rs

Rust data structures and client for the PubChem REST API
Rust
8
star
34

fs.expose

Python
7
star
35

scihub-pubmed-userscript

A GreaseMonkey userscript to add a Full Text Link button redirecting to Sci-Hub on PubMed article pages.
JavaScript
7
star
36

scoring-matrices

Dependency free, Cython-compatible scoring matrices to use with biological sequences.
Python
7
star
37

pyjess

Cython bindings and Python interface to Jess, a 3D template matching software for protein structures.
Cython
5
star
38

torch-treecrf

A PyTorch implementation of Tree-structured Conditional Random Fields.
Python
5
star
39

pruefung

Redundancy checks in pure Rust
Rust
5
star
40

nafcodec

Rust coder/decoder for Nucleotide Archival Format (NAF) files.
Rust
5
star
41

lapucelle-textures

A PPSSPP texture pack for La Pucelle Ragnarok (english patched)
Makefile
4
star
42

proteinogenic

Chemical structure generation for protein sequences as SMILES string.
Rust
4
star
43

flips.rs

Rust bindings to Flips, the Floating IPS patcher.
Rust
4
star
44

opticaldisc

Read optical media filesystems with Rust
Rust
4
star
45

pyswrd

Cython bindings and Python interface to SWORD (Smith Waterman On Reduced Database), a heuristic method for fast database search.
Cython
4
star
46

kmachine

A toy compiler that produces Kappa code from Counter Machine instructions.
Rust
3
star
47

embedded-picofont

The PICO-8 font to use with embedded-graphics.
Rust
2
star
48

annotate.Snakefile

A Snakemake pipeline to copy annotations between GenBank files
Python
2
star
49

diced

A Rust reimplementation of the MinCED method for identifying CRISPRs in full or assembled genomes.
Rust
2
star
50

smatrix

Not the slurm job dispatcher you need, but the one you deserve.
Python
1
star
51

pytantan

Cython bindings and Python interface to Tantan, a fast method for identifying repeats in DNA and protein sequences.
Python
1
star
52

rlinalg

Linear Algebra routines for Python as implemented in the R language.
Python
1
star