• Stars
    star
    127
  • Rank 282,790 (Top 6 %)
  • Language
    Python
  • License
    Other
  • Created over 9 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

OBO-formatted ontologies → networkx (Python 3)

obonet: load OBO-formatted ontologies into networkx

GitHub Actions CI Build Status
Software License
Code style: black
PyPI

Read OBO-formatted ontologies in Python. obonet is

  • user friendly
  • succinct
  • pythonic
  • modern
  • simple and tested
  • lightweight
  • networkx leveraging

This Python package loads OBO serialized ontologies into networks. The function obonet.read_obo() takes an .obo file and returns a networkx.MultiDiGraph representation of the ontology. The parser was designed for the OBO specification version 1.2 & 1.4.

Usage

See pyproject.toml for the minimum Python version required and the dependencies. OBO files can be read from a path, URL, or open file handle. Compression is inferred from the path's extension. See example usage below:

import networkx
import obonet

# Read the taxrank ontology
url = 'https://github.com/dhimmel/obonet/raw/main/tests/data/taxrank.obo'
graph = obonet.read_obo(url)

# Or read the xz-compressed taxrank ontology
url = 'https://github.com/dhimmel/obonet/raw/main/tests/data/taxrank.obo.xz'
graph = obonet.read_obo(url)

# Number of nodes
len(graph)

# Number of edges
graph.number_of_edges()

# Check if the ontology is a DAG
networkx.is_directed_acyclic_graph(graph)

# Mapping from term ID to name
id_to_name = {id_: data.get('name') for id_, data in graph.nodes(data=True)}
id_to_name['TAXRANK:0000006']  # TAXRANK:0000006 is species

# Find all superterms of species. Note that networkx.descendants gets
# superterms, while networkx.ancestors returns subterms.
networkx.descendants(graph, 'TAXRANK:0000006')

For a more detailed tutorial, see the Gene Ontology example notebook.

Comparison

This package specializes in reading OBO files into a newtorkx.MultiDiGraph. A more general ontology-to-NetworkX reader is available in the Python nxontology package via the nxontology.imports.pronto_to_multidigraph function. This function takes a pronto.Ontology object, which can be loaded from an OBO file, OBO Graphs JSON file, or Ontology Web Language 2 RDF/XML file (OWL). Using pronto_to_multidigraph allows creating a MultiDiGraph similar to the created by obonet, with some differences in the amount of metadata retained.

The primary focus of the nxontology package is to provide an NXOntology class for representing ontologies based around a networkx.DiGraph. NXOntology provides optimized implementations for computing node similarity and other intrinsic ontology metrics. There are two important differences between a DiGraph for NXOntology and the MultiDiGraph produced by obonet:

  1. NXOntology is based on a DiGraph that does not allow multiple edges between the same two nodes. Multiple edges between the same two nodes must therefore be collapsed. By default, it only considers is a / rdfs:subClassOf relationships, but using pronto_to_multidigraph to create the NXOntology allows for retaining additional relationship types, like part of in the case of the Gene Ontology.

  2. NXOntology reverses the direction of relationships so edges go from superterm to subterm. Traditionally in ontologies, the is a relationships go from subterm to superterm, but this is confusing. NXOntology reverses edges so functions such as ancestors refer to more general concepts and descendants refer to more specific concepts.

The nxontology.imports.multidigraph_to_digraph function converts from a MultiDiGraph, like the one produced by obonet, to a DiGraph by filtering to the desired relationship types, reversing edges, and collapsing parallel edges.

Installation

The recommended approach is to install the latest release from PyPI using:

pip install obonet

However, if you'd like to install the most recent version from GitHub, use:

pip install git+https://github.com/dhimmel/obonet.git#egg=obonet

Contributing

GitHub issues

We welcome feature suggestions and community contributions. Currently, only reading OBO files is supported.

Develop

Some development commands:

# create virtual environment
python3 -m venv ./env

# activate virtual environment
source env/bin/activate

# editable installation for development
pip install --editable ".[dev]"

# install pre-commit hooks
pre-commit install

# run all pre-commit checks
pre-commit run --all

# run tests
pytest

# generate changelog for release notes
git fetch --tags origin main
OLD_TAG=$(git describe --tags --abbrev=0)
git log --oneline --decorate=no --reverse $OLD_TAG..HEAD

Maintainers can make a new release at https://github.com/dhimmel/obonet/releases/new.

More Repositories

1

drugbank

User-friendly extensions of the DrugBank database
HTML
151
star
2

integrate

Scripts and resources to create Hetionet v1.0, a heterogeneous network for drug repurposing
Jupyter Notebook
28
star
3

scopus

User-friendly Scopus and Journal Metrics data
Jupyter Notebook
26
star
4

hackjohn

Bot to monitor for southbound permit spaces on the John Muir Trail
Python
25
star
5

clintrials

Cataloging pharmacotherapies in clinical trial from ClinicalTrials.gov
Jupyter Notebook
23
star
6

lincs

Library of Integrated Cellular Signatures L1000
Jupyter Notebook
23
star
7

bitcoin-whitepaper

Reproducing the Bitcoin Whitepaper using the Manubot
HTML
21
star
8

disease-ontology

User-friendly extensions to the Disease Ontology
Web Ontology Language
20
star
9

plostime

Publication delays at PLOS and 3,475 other journals
19
star
10

hsdn

Analysis of the human symptoms–disease network
HTML
18
star
11

SIDER4

Processing SIDER 4.1: the side effect resource
Jupyter Notebook
18
star
12

indications

Processing high-throughput drug indication resources.
HTML
13
star
13

delays

Trends in scientific publishing delays
Jupyter Notebook
10
star
14

pubmedpy

Utilities for interacting with NCBI EUtilities relating to PubMed
Python
9
star
15

dump-actions-context

Expose GitHub actions context based on the workflow event type
9
star
16

mesh

User-friendly extensions to MeSH
Jupyter Notebook
8
star
17

bindingdb

Process BindingDB
HTML
8
star
18

disgenet

Processing the DisGeNET database of disease–gene association
Jupyter Notebook
7
star
19

snplentiful

SNP abundance correlates with network degree
Jupyter Notebook
7
star
20

biorxiv-licenses

The licensing of bioRxiv preprints
Jupyter Notebook
7
star
21

gene-ontology

User-friendly Gene Ontology annotations
Jupyter Notebook
7
star
22

fratjuice

Uncovering the microbes of fraternity basements
Python
6
star
23

rephetio

Miscellaneous Content for Project Rephetio to repurpose drugs
Jupyter Notebook
6
star
24

rephetio-manuscript

Markdown source for the Project Rephetio Manuscript https://doi.org/10.7554/eLife.26726
HTML
6
star
25

hodgkins

Human disease network based on GWAS loci
Jupyter Notebook
5
star
26

het.io-dag-data

Data backend for http://het.io/disease-genes
5
star
27

drugcentral

Converting DrugCentral data to Rephetio identifiers
Jupyter Notebook
4
star
28

uniprot

Extract uniprot mappings to other vocabularies
4
star
29

learn

Machine learning and feature extraction for the Rephetio project
Jupyter Notebook
4
star
30

psb-manuscript

Manuscript source for the People of the Pacific Symposium on Biocomputing
HTML
3
star
31

SIDER2

Calculating compound similarities using side effects and indications extracted from drug labels
TeX
3
star
32

gtex

GTEx Expression Analysis
3
star
33

entrez-gene

Processing the human Entrez Gene subset
Jupyter Notebook
3
star
34

diseases

Processing the DISEASES database
Jupyter Notebook
3
star
35

het.io-rep-data

Data from Project Rephetio for the het.io website
Jupyter Notebook
2
star
36

pathways

A compilation of pathway gene sets
Jupyter Notebook
2
star
37

elevcan

Elevation and Cancer Incidence
PostScript
2
star
38

irreproducible-timestamps

Replication analysis of Irving & Holden 2016
Jupyter Notebook
2
star
39

tissues

Gene–Tissue relationships from the TISSUES database
2
star
40

mesothelioma

Data visualization of mesothelioma in Python
Jupyter Notebook
2
star
41

ppi

Compiling human protein–protein interactions
Jupyter Notebook
2
star
42

myelinet

Hetnet prediction of candidate remyelinating compounds
Jupyter Notebook
2
star
43

bgee

Extracting anatomy-specific gene expression in humans from Bgee
Jupyter Notebook
2
star
44

stargeo

Generating expression signatures for disease using STARGEO
Jupyter Notebook
2
star
45

gwas-catalog

Extracting disease-gene associations from the GWAS Catalog
Jupyter Notebook
2
star
46

het.io-rep-guides

Neo4j Browser Guides for Project Rephetio Predictions
Jupyter Notebook
1
star
47

kg

1000 Genomes
1
star
48

doaf

Processing data from DOAF, the Disease Ontology Annotation Framework
Jupyter Notebook
1
star
49

unichem

A python package for mapping compounds via UniChem
Python
1
star
50

covid-publication-times

COVID-19 Literature Publication Times
Jupyter Notebook
1
star
51

advent-of-code

Advent Of Code 2021
Python
1
star
52

erc

Processing human Evolutionary Rate Covariation data
Jupyter Notebook
1
star
53

2011-ucsf-bp205a-microarray-project

A set of python utilities for analyzing microarray data.
Python
1
star
54

uberon

User-friendly anatomical structures data from the Uberon Ontology
Jupyter Notebook
1
star
55

code-96

Testing repo for https://github.com/greenelab/manubot-rootstock/pull/101
CSS
1
star
56

het.io-dag-pycode

Python Code for Hetio Disease-Gene Prediction Study (PLOS Comp Bio, 2015)
Python
1
star