• Stars
    star
    143
  • Rank 255,484 (Top 6 %)
  • Language
    Python
  • License
    Other
  • Created over 9 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

scikit-fusion: Data fusion via collective latent factor models

scikit-fusion

build: passing BSD license

scikit-fusion is a Python module for data fusion and learning over heterogeneous datasets. The core of scikit-fusion are recent collective latent factor models and large-scale joint matrix factorization algorithms.

[News:] Fast CPU and GPU-accelerated implementatons of some of our methods.

[News:] Scikit-fusion, collective latent factor models, matrix factorization for data fusion and learning over hetnets.

[News:] fastGNMF, fast implementation of graph-regularized non-negative matrix factorization using Facebook FAISS.

Dependencies

scikit-fusion is tested to work under Python 3.

The required dependencies to build the software are Numpy >= 1.7, SciPy >= 0.12, PyGraphviz >= 1.3 (needed only for drawing data fusion graphs) and Joblib >= 0.8.4.

Install

This package uses distutils, which is the default way of installing python modules. To install in your home directory, use:

python setup.py install --user

To install for all users on Unix/Linux:

python setup.py build
sudo python setup.py install

For development mode use:

python setup.py develop

Use

Let's generate three random data matrices describing three different object types:

 >>> import numpy as np
 >>> R12 = np.random.rand(50, 100)
 >>> R13 = np.random.rand(50, 40)
 >>> R23 = np.random.rand(100, 40)

Next, we define our data fusion graph:

 >>> from skfusion import fusion
 >>> t1 = fusion.ObjectType('Type 1', 10)
 >>> t2 = fusion.ObjectType('Type 2', 20)
 >>> t3 = fusion.ObjectType('Type 3', 30)
 >>> relations = [fusion.Relation(R12, t1, t2),
                  fusion.Relation(R13, t1, t3),
                  fusion.Relation(R23, t2, t3)]
 >>> fusion_graph = fusion.FusionGraph()
 >>> fusion_graph.add_relations_from(relations)

and then collectively infer the latent data model:

 >>> fuser = fusion.Dfmf()
 >>> fuser.fuse(fusion_graph)
 >>> print(fuser.factor(t1).shape)
 (50, 10)

Afterwards new data might arrive:

 >>> new_R12 = np.random.rand(10, 100)
 >>> new_R13 = np.random.rand(10, 40)

for which we define the fusion graph:

 >>> new_relations = [fusion.Relation(new_R12, t1, t2),
                      fusion.Relation(new_R13, t1, t3)]
 >>> new_graph = fusion.FusionGraph(new_relations)

and transform new objects to the latent space induced by the fuser:

 >>> transformer = fusion.DfmfTransform()
 >>> transformer.transform(t1, new_graph, fuser)
 >>> print(transformer.factor(t1).shape)
 (10, 10)

scikit-fusion contains several applications of data fusion:

>>> from skfusion import datasets
>>> dicty = datasets.load_dicty()
>>> print(dicty)
FusionGraph(Object types: 3, Relations: 3)
>>> print(dicty.object_types)
{ObjectType(GO term), ObjectType(Experimental condition), ObjectType(Gene)}
>>> print(dicty.relations)
{Relation(ObjectType(Gene), ObjectType(GO term)),
 Relation(ObjectType(Gene), ObjectType(Gene)),
 Relation(ObjectType(Gene), ObjectType(Experimental condition))}

Selected publications (Methods)

Selected publications (Applications)

Tutorials

  • Large-scale data fusion by collective matrix factorization, Basel Computational Biology Conference, [BC]^2 [Slides] [Handouts]
  • Data fusion of everything, 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC [Slides] [Handouts]

More Repositories

1

TDC

Therapeutics Commons (TDC-2): Multimodal Foundation for Therapeutic Science
Jupyter Notebook
979
star
2

nimfa

Nimfa: Nonnegative matrix factorization in Python
Python
535
star
3

decagon

Graph convolutional neural network for multirelational link prediction
Jupyter Notebook
442
star
4

TFC-pretraining

Self-supervised contrastive learning for time series via time-frequency consistency
Python
405
star
5

UniTS

A unified multi-task time series model.
Python
386
star
6

PrimeKG

Precision Medicine Knowledge Graph (PrimeKG)
Jupyter Notebook
364
star
7

graphml-tutorials

Tutorials for Machine Learning on Graphs
Jupyter Notebook
204
star
8

SubGNN

Subgraph Neural Networks (NeurIPS 2020)
Python
187
star
9

Raindrop

Graph Neural Networks for Irregular Time Series
Python
162
star
10

GraphXAI

GraphXAI: Resource to support the development and evaluation of GNN explainers
Python
151
star
11

G-Meta

Graph meta learning via local subgraphs (NeurIPS 2020)
Python
119
star
12

Raincoat

Domain Adaptation for Time Series Under Feature and Label Shifts
Jupyter Notebook
105
star
13

ohmnet

OhmNet: Representation learning in multi-layer graphs
Python
78
star
14

PINNACLE

Contextualizing protein representations using deep learning on protein networks and single-cell data
Python
62
star
15

TxGNN

TxGNN: Zero-shot prediction of therapeutic use with geometric deep learning and clinician centered design
Jupyter Notebook
61
star
16

GNNGuard

Defending graph neural networks against adversarial attacks (NeurIPS 2020)
Python
57
star
17

SHEPHERD

SHEPHERD: Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases
HTML
41
star
18

GNNDelete

General Strategy for Unlearning in Graph Neural Networks
Python
36
star
19

crank

Prioritizing network communities
C++
28
star
20

TimeX

Time series explainability via self-supervised model behavior consistency
Python
25
star
21

SPECTRA

Spectral Framework For AI Model Evaluation
Roff
23
star
22

pathways

Disease pathways in the human interactome
Python
22
star
23

fastGNMF

Fast graph-regularized matrix factorization
Python
19
star
24

PDGrapher

Combinatorial prediction of therapeutic perturbations using causally-inspired neural networks
Python
17
star
25

fusenet

Network inference by fusing data from diverse distributions
Python
14
star
26

medusa

Jumping across biomedical contexts using compressive data fusion
Python
7
star
27

life-tree

Evolution of protein interactomes across the tree of life
C++
7
star
28

patient-safety

Population-scale patient safety data reveal inequalities in adverse events before and during COVID-19 pandemic
Jupyter Notebook
7
star
29

nimfa-ipynb

IPython notebooks demonstrating Nimfa's functionality
6
star
30

scCIPHER

scCIPHER: Contextual deep learning on single-cell-enriched knowledge graphs in neurological disorders
Jupyter Notebook
5
star
31

ngmc

Network-guided matrix completion
Python
3
star
32

BMI702

Biomedical Artificial Intelligence
HTML
3
star
33

AWARE

AWARE: Contextualizing protein representations using deep learning on interactomes and single-cell experiments
Python
3
star
34

data-mining-unipv

Short Course on Data Mining at University of Pavia
Jupyter Notebook
2
star
35

collage-dicty

Gene prioritization by compressive data fusion and chaining
Python
2
star
36

copacar

Collective pairwise classification for multi-way (multi-relational) data analysis
Python
1
star
37

mims-harvard.github.io

Lab website
HTML
1
star