• Stars
    star
    134
  • Rank 270,967 (Top 6 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 9 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python module for prior knowledge integration. Builds databases of signaling pathways, enzyme-substrate interactions, complexes, annotations and intercellular communication roles.

pypath: A Python module for molecular signaling prior knowledge processing

Demo

OmniPath

Are you interested in OmniPath data? Check out our R package OmnipathR, the most popular and most versatile access point to OmniPath, a database built from more than 150 original resources. If you use Python and don't need to build the database yourself, try our Python client. Read more about the web service here.

Do you need pypath?

Pypath is the database builder of OmniPath. For most people the data distributed in OmniPath is satisfying (see above), they don't really need pypath. Typically you need pypath to:

  • Build a custom or very fresh version of the OmniPath database(s)
  • Use one of the utilities such as ID translation, homology translation, etc. (see the utils module)
  • Access the raw or preprocessed data directly from the original resources (see the inputs module)

Installation

From PyPI:

pip install pypath-omnipath

From Git:

pip install git+https://github.com/saezlab/pypath.git

Docs

Read the reference documentation or check out the tutorials. The most comprehensive guide to pypath is The Pypath Book.

Get help

Should you have a question or experiencing an issue, please write us by the Github issues page.

Features

pypath is a Python module for processing molecular biology data resources, combining them into databases and providing a versatile interface in Python as well as exporting the data for access through other platforms such as R, web service, Cytoscape and BEL (Biological Expression Language).

pypath provides access to more than 100 resources! It builds 5 major combined databases and within these we can distinguish different datasets. The 5 major databases are interactions (molecular interaction network or pathways), enzyme-substrate relationships, protein complexes, molecular annotations (functional roles, localizations, and more) and inter-cellular communication roles.

pypath consists of a number of submodules and each of them again contains a number of submodules. Overall pypath consists of around 100 modules. The most important higher level submodules:

  • pypath.core: contains the database classes e.g. network, complex, annotations, etc
  • pypath.inputs: contains the resource specific methods which directly downlad and preprocess data from the original sources
  • pypath.omnipath: higher level applications, e.g. a database manager, a web server
  • pypath.utils: stand alone useful utilities, e.g. identifier translator, Gene Ontology processor, BioPax processor, etc

Integrated databases

In the beginning the primary aim of pypath was to build networks from multiple sources using an igraph object as the fundament of the integrated data structure. From version 0.7 and 0.8 this design principle started to change. Today pypath builds a number of different databases, exposes them by a rich API and each of them can be converted to pandas.DataFrame. The modules and classes responsible for the integrated databases are located in pypath.core. The five main databases are the followings:

  • network - core.network
  • enzyme-substrate - core.enz_sub
  • complexes - core.complex
  • annotations - core.annot
  • intercell - core.intercell

Some of the databases have different variants (e.g. PPI and transcriptional network) and all can be customized by many parameters.

Database management

The databases above can be loaded by calling the appropriate classes. However building the databases require time and memory so we want to avoid building them more often than necessary or keeping more than one copies in the memory. Some of the modules listed above have a method get_db which ensures only one instance of the database is loaded. But there is a more full featured database management system available in pypath, this is the pypath.omnipath module. This module is able to build the databases, automatically saves them to pickle files and loads them from there in subsequent sessions. pypath comes with a number of database definitions and users can add more. The pickle files are located by default in the ~/.pypath/pickles/ directory. With the omnipath module it's easy to get an instance of a database. For example to get the omnipath PPI network dataset:

from pypath import omnipath
op = omnipath.db.get_db('omnipath')

Important: Building the databases for the first time requires the download of several MB or GB of data from the original resources. This normally takes long time and is prone of errors (e.g. truncated or empty downloads due to interrupted HTTP connection). In this case you should check the log to find the path of the problematic cache file, check the contents of this file to find out the reason and possibly delete the file to ensure another download attempt when you call the database build again. Sometimes the original resources change their content or go offline. If you encounter such case please open an issue at https://github.com/saezlab/pypath/issues so we can fix it in pypath. Once all the necessary contents are downloaded and stored in the cache, the database builds are much faster, but still can take minutes.

Further modules in pypath

Apart from the databases, pypath has many submodules with standalone functionality which can be used in other modules and scripts. Below we present a few of these.

ID conversion

The ID conversion module utils.mapping translates between a large variety of gene, protein, miRNA and small molecule ID types. It has the feature to translate secondary UniProt ACs to primaries, and Trembl ACs to SwissProt, using primary Gene Symbols to find the connections. This module automatically loads and stores the necessary conversion tables. Many tables are predefined, such as all the IDs in UniProt mapping service, while users are able to load any table from file using the classes provided in the module input_formats. An example how to translate identifiers:

from pypath.utils import mapping
mapping.map_name('P00533', 'uniprot', 'genesymbol')
# {'EGFR'}

Homology translation

The pypath.utils.homology module is able to find the orthologs of genes between two organisms. It uses data both from NCBI HomoloGene, Ensembl and UniProt. This module is really simple to use:

from pypath.utils import homology
homology.translate('P00533', 10090) # translating the human EGFR to mouse
# ['Q01279'] # it returns the mouse Egfr UniProt AC

It is able to handle any ID type supported by pypath.utils.mapping. Alternatively, you can access a complete dictionary of orthologous genes, or translate columns in a pandas data frame.

FAQ

Does it run on my old Python?

Most likely it doesn't. The oldest supported version, currently 3.9, is defined in our pyproject.toml.

Is there something similar in R?

OmniPath's R client, besides accessing data from OmniPath, provides many similar services as pypath: ID translation, homology translation, taxonomy support, GO support, and many more.

Questions about OmniPath

Contact

We prefer to keep all communication within the Github issues. About private or sensitive matters feel free to contact us by [email protected].

Impressum

The development of pypath is coordinated by DΓ©nes TΓΌrei in the Saez Lab, with the contribution of developers and scientists from other groups:

  • Erva Ulusoy, Melih Darcan, Γ–mer Kaan Vural, Tennur KΔ±lΔ±Γ§, Elif Γ‡evrim, BΓΌnyamin Şen and Atabey ÜnlΓΌ in the HU Biological Data Science Lab (PI: Tunca Doğan) created many new input modules in pypath;
  • Leila Gul, DezsΕ‘ MΓ³dos, MΓ‘rton Γ–lbei and TamΓ‘s KorcsmΓ‘ros in the Korcsmaros Lab contributed to the overall design of OmniPath, the design and implementation of the intercellular communication database, and with various case studies and tutorials;
  • Michael Klein from the group of Fabian Theis developed the Python client for the OmniPath web service;
  • Charles Tapley Hoyt and Daniel Domingo-FernΓ‘ndez added the BEL export module.
  • From the Saez Lab, Olga Ivanova introduced the resource manager in pypath, Sophia MΓΌller-Dott added the CollecTRI gene regulatory network, while NicolΓ s Palacio, Sebastian Lobentanzer and Ahmet Rifaioglu have done various maintenance and refactoring works. Aurelien Dugourd and Christina Schmidt helped with the design of the metabolomics related datasets and services.
  • The R package and the Cytoscape app are developed and maintained by Francesco Ceccarelli, Attila GΓ‘bor, Alberto Valdeolivas, DΓ©nes TΓΌrei and NicolΓ s Palacio;
  • The first logo of OmniPath has been designed by Jakob Wirbel (Saez Lab), the current logo by DΓ©nes TΓΌrei, while the cover graphics for Nature Methods is the work of Spencer Phillips from EMBL-EBI.

History and releases

See here a bird eye view of pypath's development history. For more details about recent developments see the Github releases.

More Repositories

1

decoupleR

R package to infer biological activities from omics data using a collection of methods.
R
192
star
2

liana

LIANA: a LIgand-receptor ANalysis frAmework
R
181
star
3

decoupler-py

Python package to perform enrichment analysis from omics data.
Python
158
star
4

liana-py

LIANA+: an all-in-one framework for cell-cell communication
Python
156
star
5

dorothea

R package to access DoRothEA's regulons
R
132
star
6

OmnipathR

R client for the OmniPath web service
R
101
star
7

progeny

R package for Pathway RespOnsive GENe activity inference
R
93
star
8

visium_heart

Spatial transcriptomics of heart tissue
Jupyter Notebook
70
star
9

CARNIVAL

CAusal Reasoning for Network Identification with integer VALue programming in R
R
57
star
10

cosmosR

COSMOS (Causal Oriented Search of Multi-Omic Space) is a method that integrates phosphoproteomics, transcriptomics, and metabolomics data sets.
R
56
star
11

transcriptutorial

This is a tutorial to guide the analysis of RNAseq dataset using footprint based tools such as DOROTHEA, PROGENY and CARNIVAL
R
55
star
12

CollecTRI

Gene regulatory network containing signed transcription factor-target gene interactions
R
53
star
13

mistyR

Multiview Intercellular SpaTial modeling framework
R
43
star
14

omnipath

Python client for the OmniPath web service
Python
37
star
15

FootprintMethods_on_scRNAseq

Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data
R
26
star
16

footprints

Analysis code for "Perturbation-response genes reveal signaling footprints in cancer gene expression"
R
20
star
17

corneto

Unified knowledge-driven network inference from omics data
Python
16
star
18

lipyd

Python module for lipidomics LC MS/MS data analysis
Python
15
star
19

flop

FunctionaL Omics Processing platform
R
13
star
20

ShinyFUNKI

FUNctional toolKIt platform for multi-omic functional analysis. An standardised pipeline to analysis transcriptomic, proteomic, phosphoproteomic and metabolomic datasets.
R
13
star
21

progeny-py

PROGENY Python implementation
Jupyter Notebook
12
star
22

dorothea-py

Dorothea package in Python
Jupyter Notebook
11
star
23

MOFAcell

Code used for muti cellular factor analysis
Jupyter Notebook
11
star
24

ocean

R package for metabolic enzyme enrichment anaylsis
R
11
star
25

kinact

Toolbox for Kinase Activity Scoring based on phosphoproteomic data
Python
10
star
26

BioCypher

A unified language for biomedical research knowledge graphs
Python
9
star
27

CellNOptR

Training of boolean logic models of signalling networks using prior knowledge networks and perturbation data.
R
9
star
28

SLAPenrich

Sample Level Analysis of Pathway Alteration Enrichments
R
8
star
29

DOT

DOT
R
8
star
30

decoupleRBench

Package to benchmark methods from decoupleR
R
8
star
31

MOFAcellulaR

R package to infer multicellular programs from single-cell data using multi-omics factor analysis (MOFA)
R
8
star
32

MetaProViz

R-package to perform metabolomics pre-processing, differential metabolite analysis, metabolite clustering and custom visualisations.
HTML
8
star
33

decoupleR_manuscript

Code to reproduce the results from decoupleR's manuscript
R
7
star
34

eccb2022_sc_funcomics

Functional analysis of single-cell transcriptomics
HTML
7
star
35

PHONEMeS

PHONEMeS (PHOsphorylation NEtworks for Mass Spectrometry) is an R package to model signalling networks based on untargeted phosphoproteomics
R
7
star
36

Omnipath_Cytoscape

a plug-in to access Omnipath from Cytoscape
Java
7
star
37

Factor_COSMOS

Formatting NCI60 data into cosmos ready inputs and generation of testable hypothesis connecting cell-line specific TF and metabolic deregulations.
HTML
7
star
38

scell_hfpef

single cell RNAseq analysis of HFpEF mice model
R
6
star
39

kasumi_bench

R
5
star
40

ligrec_decouple

Systematic Comparison of Cell-Cell Communication Tools and Resources
R
5
star
41

HF_meta-analysis

Code that generates results and figures from: "A Consensus Transcriptional Landscape of Human End-Stage Heart Failure"
R
5
star
42

CKG-BioCypher

Python
4
star
43

Covid19

We use our tools to analysis Covid19 RNAseq datasets
4
star
44

VisiumMS

Study of Multiple Sclerosis(MS) using paried snRNA-seq and Visium transcriptmics datasets.
Python
4
star
45

PerMedCoE_summer_school_2023

PerMedCoE summer school 2023
Jupyter Notebook
4
star
46

cellnopt

Tool for training of logic models of signalling networks using prior knowledge networks and perturbation data.
4
star
47

CPT_QSPtutorial

Supplementary material for CPT tutorial on logic modeling for quantitative systems pharmacology
Python
4
star
48

visium_colon_si

ST pipelines on mouse colon and small intestine
R
3
star
49

CNORode

add-on for CellNOptR using logic based differential equations
C
3
star
50

cyrface

Bridging Cytoscape with R
Java
3
star
51

FUNKI

FUNctional analysis worKflows Interface
Python
3
star
52

protein_attenuation

Proteogenomics analsysis of protein attenuation in tumours
Python
3
star
53

scheduling

Repository to collect issues for events to be scheduled
Python
2
star
54

CellNOptR-MaBoSS

CellNOptR with MaBoSS simulation
R
2
star
55

CNORode2017

modified version of CNORode including: steady state penalty, L1 regularisation, bootstrap, new transfer function
C
2
star
56

liverx

Analysis of liver proteomics data from Aebersold lab
Python
2
star
57

neo4j-utils

Rich interface on top of the official Neo4j driver
Python
2
star
58

cytocopter

CellNOptR in Cytoscape
Java
2
star
59

TumorDeconvolution

Estimate tumor purities from gene expression data
HTML
2
star
60

Singlecell_course_2022

Teaching material for the single-cell course 2022
2
star
61

ShinyCNOR

Shiny application for the CellNOptR packages
R
2
star
62

2023-SysBioCourse-ACSB

HTML
2
star
63

Microbiome_analysis_course_2022

Mobi microbiome course materials
HTML
2
star
64

PerMedCoE_tools_virtual_course_2023

Material for the PerMedCoE virtual course: transcriptomics to mechanistic models of signalling.
Jupyter Notebook
2
star
65

OTAR-BioCypher

Python
2
star
66

snk-tutorial

Python
2
star
67

NicheNet_Omnipath

Building and Training of the NicheNet Method exclusively using OmniPath resources. SARS-CoV-2 case study
2
star
68

teaching_material

teaching material for various courses
HTML
2
star
69

process_rnaseq_cellines

Processing RNAseq data from Cell Lines. From raw data to normalised, voom and ComBat batch-correction
R
2
star
70

TFbenchmark

This repository contains the code used to benchmark TF-target datasets via TF activities in 3 benchmark datasets
R
2
star
71

2024_EBI_GRN

Materials for the 2024 course at EMBL-EBI: "Modelling gene regulation from transcriptomics and chromatin accessibility single-cell data".
Jupyter Notebook
2
star
72

kinase_tf_mini_tuto

This is a short tutorial to show in parallel how to estimate TF and kinase activities from transcriptomic and phosphoproteomic data
R
2
star
73

kinase_tf_mini_tuto_simple

R
1
star
74

Xu_tubuloid

HTML
1
star
75

2022-SysBioCourse-ACSB

Teaching material for 2022 SysBio Course
HTML
1
star
76

NetworkModeling_course_2020

teaching materials for the network modeling course
HTML
1
star
77

gene-network-inference-in-R

Mutual information-based and bicor-based methods for genome-wide reverse engineering of gene regulatory networks in R.
1
star
78

liver-disease-atlas

Transcriptomic cross-species analysis of chronic liver disease reveals consistent regulation between humans and mice
R
1
star
79

liver-disease-atlas-app

R
1
star
80

ccc_protocols

LIANA x Tensor-cell2cell Protocols
Jupyter Notebook
1
star
81

MOON_example

HTML
1
star
82

omnipath_analysis

analysis and visualization workflows for the OmniPath 2 paper
Python
1
star
83

meta_PKN_BIGG

R
1
star
84

Macau_Synergy_Prediction

Target functional similarity based workflows for drug synergy prediction and stratification
R
1
star
85

MedInfNetworks2022

HTML
1
star
86

breastCancerCytof

modelling breast cancer cytof data with logic ODEs
HTML
1
star
87

BiRewire

An R package implementing high-performing routines for the randomisation of bipartite graphs preserving their node degrees.
1
star
88

hepatic-microenviroment

Gut microbiota fuels HCC development by shaping the hepatic inflammatory microenvironment
R
1
star
89

network_tools

Collection of Python functions to run network-based analysis in signed and directed networks.
Jupyter Notebook
1
star
90

Meta_PKN

integration of omnipath causal network with STITCH and Recon3D
R
1
star
91

biocypher-project-template

Template for creating a BioCypher-driven knowledge graph
Python
1
star
92

CARNIVAL-Bioconductor-Dev

Provisional repository for the development of CARNIVAL package for Bioconductor
R
1
star
93

MedInfNetworks2021

HTML
1
star
94

CKD_Landscape

R
1
star
95

insilico_tissue_simulator

simulator used for the Misty paper
R
1
star
96

CNORprob

Probabilistic logic version of CellNOpt (derived from FALCON)
R
1
star
97

PHONEMeS-ILP

ILP implementation of PHONEMeS
R
1
star
98

recon3D_BIGG

Jupyter Notebook
1
star
99

DrugVsDisease

DvD: An R and Cytoscape plug-in for comparing Drug and Disease profiles
R
1
star
100

astromouse

Spatial space data from mouse
Jupyter Notebook
1
star