• Stars
    star
    105
  • Rank 328,196 (Top 7 %)
  • Language
    Jupyter Notebook
  • License
    Creative Commons ...
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Materials for Machine Learning with Ontologies

This repository contains all the materials for our "Machine learning with biomedical ontologies" manuscript. We provide the Jupyter Notebooks to reproduce our experimental results and the benchmark datasets based on predicting protein-protein interactions. Furthermore, we make a set of slides available (as PDF and source code in LaTeX Beamer) that may be useful for teaching or presentations.

Notebooks

We provide several Jupyter notebooks. The notebooks include:

PPI Benchmark

We provide two benchmark datasets for protein--protein interaction prediction task. The datasets can be downloaded using the following link: DOI

Two benchmark datasets for evaluating machine learning methods on the task of predicting protein--protein interaction networks. The original data was downloaded from StringDB database of protein--protein interactions and Gene Ontology Resource. This archive includes:

  • Protein--protein interactions for human and yeast organisms
  • Gene Ontology in OBO and OWL format
  • Gene Ontology Annotations for human and yeast proteins
  • Protein aliases files with ID mappings between StringDB proteins and other databases.

We filter out interactions with confidence score less than 700 and consider them to be symmetric. We randomly split the datasets into 80/20% training/testing sets by the number of interactions and use 20% of the training set as a validation set.

Dependencies

Please install the following software to run our notebooks:

  • Groovy 2.0+
  • Raptor RDF Syntax Library
  • Python 3.6+
  • Install python dependencies with: pip install -r requirements.txt
  • Load submodules with: git submodule update --init --recursive

Running the notebooks

Run jupyter notebook and then open the notebook files.

Current benchmark results (yeast)

Method Raw Hits@10 Filtered Hits@10 Raw Hits@10 Filtered Hits@100 Raw Mean Rank Filtered Mean Rank Raw AUC Filtered AUC
TransE 0.06 0.13 0.32 0.40 1125.4 1074.8 0.82 0.83
SimResnik 0.09 0.17 0.38 0.48 757.8 706.9 0.86 0.87
SimLin 0.08 0.15 0.33 0.41 875.4 824.5 0.84 0.85
SiameseNN 0.06 0.17 0.46 0.68 674.27 622.20 0.89 0.90
SiameseNN (Ont) 0.08 0.19 0.50 0.72 543.56 491.56 0.91 0.92
EL Embeddings 0.08 0.17 0.44 0.62 451.29 394.04 0.92 0.93
Onto2Vec 0.08 0.15 0.35 0.48 641.1 587.9 0.79 0.80
OPA2Vec 0.06 0.13 0.39 0.58 523.3 466.6 0.87 0.88
Random walk 0.06 0.13 0.31 0.40 612.6 587.4 0.87 0.88
Node2Vec 0.07 0.15 0.36 0.46 589.1 522.4 0.87 0.88

Current benchmark results (human)

Method Raw Hits@10 Filtered Hits@10 Raw Hits@10 Filtered Hits@100 Raw Mean Rank Filtered Mean Rank Raw AUC Filtered AUC
TransE 0.05 0.11 0.24 0.29 3960.4 3890.6 0.78 0.79
SimResnik 0.05 0.09 0.25 0.30 1933.6 1864.4 0.88 0.89
SimLin 0.04 0.08 0.20 0.23 2287.9 2218.7 0.86 0.87
SiameseNN 0.05 0.15 0.41 0.64 1881.10 1808.77 0.90 0.89
SiameseNN (Ont) 0.05 0.13 0.38 0.59 1838.31 1766.34 0.89 0.89
EL Embeddings 0.01 0.02 0.22 0.26 1679.72 1637.65 0.90 0.90
Onto2Vec 0.05 0.08 0.24 0.31 2434.6 2391.2 0.77 0.77
OPA2Vec 0.03 0.07 0.23 0.26 1809.7 1767.6 0.86 0.88
Random walk 0.04 0.10 0.28 0.34 1942.6 1958.6 0.85 0.86
Node2Vec 0.03 0.07 0.22 0.28 1860.5 1813.1 0.86 0.87

Adding to the benchmark

To add your own results to the benchmark, please send us a pull request with a link to the source repository that contains the code to reproduce the results. Alternatively, please create an issue on the issue tracker and we will add your results.

Slides

We provides slides that can be used to present some of this work. The slides have been created as part of an Ontology Tutorial that was developed and taught over several years at various events. All methods in the slides are also implemented with examples in our Jupyter Notebooks.

  1. Introduction
  2. Ontologies and Graphs -- basic introduction to ontologies, Description Logic, and how they can give rise to graph-based representations
  3. Semantic Similarity -- different semantic similarity measures on ontologies
  4. Ontology Embeddings -- methods to generate embeddings for ontologies, including syntactic, graph-based, and model-based approaches.

Resources

Processing and pre-processing ontologies

  • OWLAPI: Reference library to process OWL ontologies, supports most OWL reasoners.
  • funowl: Python library to process OWL ontologies.
  • owlready2: Python library to process OWL ontologies.
  • Apache Jena: RDF library with OWL support.
  • rdflib: Python RDF library with OWL support, in particular infixowl.
  • Protege: Comprehensive ontology editor and knowledge engineering environment.

Computing entailments, reasoning

  • ELK: Very fast reasoner for the OWL 2 EL profile with polynomial worst-case time complexity.
  • HermiT: Automated reasoner supporting most of OWL axioms with exponential worst-case complexity.
  • Pellet: OWL reasoner supporting most of the OWL constructs and supporting several additional features.

Generating graphs from ontologies

  • OBOGraphs: Syntactic conversion of ontologies to graphs, targeted at OBO ontologies.
  • Onto2Graph: Semantic conversion of OWL ontologies to graphs using automated reasoning, following the axiom patterns of the OBO Relation Ontology.

Computing Semantic Similarity

  • Semantic Measures Library: Comprehensive Java library to compute semantic similarity measures over ontologies.
  • sematch: Python library to compute semantic similarity on knowledge graphs.
  • DiShIn: Python library to compute semantic similarity on ontologies.

Embedding graphs

  • OWL2Vec: Graph-based ontology embedding method that combines generation of graphs from ontologies, random walks on the generated graphs, and generation of embeddings using Word2Vec. Supports most OWL axioms.
  • DL2Vec: Graph-based ontology embedding method that combines generation of graphs from ontologies, random walks on the generated graphs, and generation of embeddings using Word2Vec. Supports most OWL axioms.
  • Walking RDF & OWL: Graph-based ontology embedding method that combines generation of graphs from ontologies, random walks on the generated graphs, and generation of embeddings using Word2Vec. Supports only the ontology taxonomy.
  • RDF2Vec (pyRDF2Vec): Method to embed RDF graphs.
  • Node2Vec: Method to embed graphs using biased random walks.
  • PyKEEN and BioKEEN: Toolkit for generating knowledge graph embeddings using several different approaches.
  • OpenKE: Library and toolkit for generating knowledge graph embeddings
  • PyTorch Geometric: Library for graph neural networks which can be used to generate graph embeddings.
  • OWL2Vec*: Graph-based ontology embedding method that combines generation of graphs from ontologies, random walks on the generated graphs, and generation of embeddings using Word2Vec. Supports OWL 2 axioms and annotation axioms. Extension of OWL2Vec.

Embedding axioms

  • Onto2Vec: Embeddings based on treating logical axioms as a text corpus.
  • OPA2Vec: Embeddings that combine logical axioms with annotation properties.
  • EL Embeddings: Embeddings that approximate the interpretation function and preserve semantics for intersection, existential quantifiers, and bottom class.

Ontology-based constrained learning:

  • DeepGO and DeepPheno: Implement ontology-based hierarchical classifiers for function and phenotype prediction. The hierarchical classification modules are generic and can be used with other ontologies and applications.

Publication

If you like our work, please cite our paper:

@article{machine-learning-with-ontologies,
    author = {Kulmanov, Maxat and Smaili, Fatima Zohra and Gao, Xin and Hoehndorf, Robert},
    title = {Semantic similarity and machine learning with ontologies},
    journal = {Briefings in Bioinformatics},
    year = {2020},
    month = {10},
    abstract = {Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.},
    issn = {1477-4054},
    doi = {10.1093/bib/bbaa199},
    url = {https://doi.org/10.1093/bib/bbaa199},
    note = {bbaa199},
    eprint = {https://academic.oup.com/bib/advance-article-pdf/doi/10.1093/bib/bbaa199/33875255/bbaa199.pdf},
}

More Repositories

1

deepgo

Function prediction using a deep ontology-aware classifier
Python
75
star
2

ontology-tutorial

Ontology Tutorial
Jupyter Notebook
68
star
3

deepgoplus

DeepGO with GOPlus axioms
Python
62
star
4

mowl

mOWL: Machine Learning library with Ontologies
Python
54
star
5

walking-rdf-and-owl

Feature learning over RDF data and OWL ontologies
Python
44
star
6

phenomenet-vp

A phenotype-based tool for variant prioritization in WES and WGS data
Java
36
star
7

opa2vec

Turning Ontologies Plus Annotation Properties into Vectors
Groovy
35
star
8

multi-drug-embedding

Method for drug repurposing from knowledge graphs and literature
Python
31
star
9

el-embeddings

Embeddings for ontologies in the Description Logic EL++
Python
24
star
10

onto2vec

Representation learning for ontologies and their annotations
Groovy
21
star
11

DL2Vec

Convert Description Logic axioms into a graph, and generate embedding representation for the nodes.
Python
20
star
12

unit-ontology

An ontology of units of measurements
Makefile
19
star
13

deepgozero

DeepGO with Fuzzy DL
Python
15
star
14

vec2sparql

SPARQL Endpoint with functions for computing embedding similarities
Java
13
star
15

DeepSVP

Prioritizing Copy Number Variants (CNV) using Phenotype and Gene Functional Similarity
Python
12
star
16

SMUDGE

SmuDGE: Semantic Disease Gene Embeddings
Python
12
star
17

Onto2Graph

Generating graph structures from OWL ontologies
JavaScript
11
star
18

DeepViral

Source code for the DeepViral paper
Python
9
star
19

AberOWL

Ontology repository that provides Reasoning as as Service
Groovy
9
star
20

DeepMOCCA

Graph Neural Network model for survival analysis from multi-omics data
Jupyter Notebook
8
star
21

pathophenodb

Web application and additional content for PathoPhenoDB
Python
7
star
22

aberowl-orepo

AberOWL Ontology Repository
JavaScript
7
star
23

deeppheno

Python
6
star
24

interpretable-learning

Generate interpretable rules from knowledge bases
Python
5
star
25

mowl-tutorial

Jupyter Notebook
5
star
26

padimi

4
star
27

predCAN

Ontology-based prediction of cancer driver genes
Python
4
star
28

OAEI2016

Ontology Alignment Evaluation Initiative 2016 Campaign
Web Ontology Language
3
star
29

neural-network-plant-trait-classification

Fine-grained trait classification using deep convolutional neural networks
Python
3
star
30

aberowlweb

JavaScript
3
star
31

phenogocon

Method to predict phenotypes from functions
Groovy
3
star
32

ontology-extension

Combining lexical and context features for automatic ontology extension
Python
3
star
33

genepheno

this repository contains text mined gene-phenotype data
Perl
3
star
34

UNMIREOT

Tools to identify, diagnose, and semi-automatically repair hidden contradictions in biomedical ontologies
Groovy
2
star
35

ai-biomed-summer-school

TeX
2
star
36

mo-phenotype-analysis

Model organism phenotypes contribution in predicting gene disease associations
Python
2
star
37

whatizit

A dictionary-based Named Entitiy Recognition tool
Java
2
star
38

BORD

Python
2
star
39

aberowl-meta

Install script and sub-directories for AberOWL
Groovy
2
star
40

phenomeweb

Python
1
star
41

icdpheno

Perl
1
star
42

Sparql2OWL

Java
1
star
43

EL2Box_embedding

Python
1
star
44

ddimech

Python
1
star
45

foodontology

Java
1
star
46

pgsim

Similarity mesure analysis
Python
1
star
47

ontology-and-phenotype-tutorial

Information the the tutorial on phenotypes
TeX
1
star
48

DDIEM

DDIEM
Python
1
star
49

pmcanalysis

Groovy
1
star
50

tsoe

Task Specific Ontology Evaluation
1
star
51

deepgoweb

JavaScript
1
star
52

hpi-predict

The scripts to reproduce the analysis of the HPI prediction paper
Jupyter Notebook
1
star
53

semantic-haiku

Groovy
1
star
54

metabolitenet

Jupyter Notebook
1
star
55

catE

Category-theoretic representation of ALC theories
Python
1
star
56

semkernel

Java
1
star
57

Random-walk-with-edge-stratification

C++
1
star
58

phenomeblast

Source for PhenomeNET and related projects
Groovy
1
star
59

similarityonMGI

this repsoitory contanst data ns scripts to evaluate similarity between gene-phenotypes and disease phenotypes on the MGI gene-disease datasets
Groovy
1
star