• Stars
    star
    109
  • Rank 319,077 (Top 7 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 2 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

μKG: A Library for Multi-source Knowledge Graph Embeddings and Applications, ISWC 2022

μKG is an open-source Python library for representation learning over knowledge graphs. μKG supports joint representation learning over multi-source knowledge graphs (and also a single knowledge graph), multiple deep learning libraries (PyTorch and TF2), multiple embedding tasks (link prediction, entity alignment, entity typing, and multi-source link prediction), and multiple parallel computing modes (multi-process and multi-GPU computing).

Table of contents

  1. Introduction of μKG 📃
    1. Overview
    2. Package Description
  2. Getting Started 🚀
    1. Dependencies
    2. Installation
    3. Usage
  3. Models hub 🏠
    1. KGE models
    2. EA models
    3. ET models
  4. Datasets hub 🏠
    1. KGE datasets
    2. EA datasets
    3. ET datasets
  5. Utils 📂
    1. Sampler
    2. Evaluator
    3. ET datasets
    4. Multi-GPU and multi-processing computation
  6. Running Experiments 🔬
  7. License
  8. Citation

Introduction of μKG 📃

Overview

We use Python , Tensorflow and PyTorch to develop the basic framework of μKG. And using RAY for distributed training. The software architecture is illustrated in the following Figure.

image-20220507103409697

Compared with other existing KG systems, μKG has the following competitive features.

👍Comprehensive. μKG is a full-featured Python library for representation learning over a single KG or multi-source KGs. It is compatible with the two widely-used deep learning libraries PyTorch and TensorFlow 2, and can therefore be easily integrated into downstream applications. It integrates a variety of KG embedding models and supports four KG tasks including link prediction, entity alignment, entity typing, and multi-source link prediction.

Fast and scalable. μKG provides advanced implementations of KG embedding techniques with the support of multi-process and multi-GPU parallel computing, making it fast and scalable to large KGs.

🤳Easy-to-use. μKG provides simplified pipelines of KG embedding tasks for easy use. Users can interact with μKG with both method APIs and the command line. It also has high-quality documentation.

😀Continuously updated. Our team will keep up-to-date on new related techniques and integrate new (multi-source) KG embedding models, tasks, and datasets into μKG. We will also keep improving existing implementations.

Package Description

μKG/
├── src/
│   ├── py/: a Python-based toolkit used for the upper layer of μKG
		|── data/: a collection of datasets used for knowledge graph reasoning
		|── args/: json files used for configuring hyperparameters of training process
		|── evaluation/: package of the implementations for supported downstream tasks
		|── load/: toolkit used for data loading and processing
		|── base/: package of the implementations for different initializers, losses and optimizers
		|── util/: package of the implementations for checking virtual environment
│   ├── tf/: package of the implementations for KGE models, EA models and ET models in TensorFlow 2
│   ├── torch/: package of the implementations for KGE models, EA models and ET models in PyTorch

Getting Started 🚀

Dependenciespython3

μKG supports PyTorch and TensorFlow 2 deep learning libraries, users can choose one of the following two dependencies according to their preferences.

  • Torch 1.10.2 | Tensorflow 2.x
  • Ray 1.12.0
  • Scipy
  • Numpy
  • Igraph
  • Pandas
  • Scikit-learn
  • Gensim
  • Tqdm

Installation 🔧

We suggest you create a new conda environment firstly. We provide two installation instructions for tensorflow-gpu (tested on 2.3.0) and pytorch (tested on 1.10.2). Note that there is a difference between the Ray 1.10.0 and Ray 1.12.0 in batch generation. The Ray 1.12.0 is used as an example.

# command for Tensorflow
conda create -n muKG python=3.8
conda activate muKG
conda install tensorflow-gpu==2.3.0
conda install -c conda-forge python-igraph
pip install -U ray==1.12.0

To install PyTorch, you must install Anaconda and follow the instructions on the PyTorch website. For example, if you’re using CUDA version 11.3, use the following command:

# command for PyTorch
conda create -n muKG python=3.8
conda activate muKG
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
conda install -c conda-forge python-igraph
pip install -U ray==1.12.0

The latest code can be installed by the following instructions:

git clone https://github.com/nju-websoft/muKG.git muKG
cd muKG
pip install -e .

Usage 📝

Currently, there are two ways to do your job. Here we provide tutorials of using command line as well as editing file to configure your model. The following is an example about how to use μKG in Python. You can choose different tasks, select the specific model and change the mode (training or evaluation) here. The hyperparameter files are stored in the subfolder args. It maintains compelete details for training process.

model_name = 'model name'
kg_task = 'selected KG task'
if kg_task == 'ea':
	args = load_args("hyperparameter file folder of entity alignment task")
elif kg_task == 'lp':
	args = load_args("hyperparameter file folder of link prediction task")
else:
	args = load_args("hyperparameter file folder of entity typing task")
kgs = read_kgs_from_folder()
if kg_task == 'ea':
	model = ea_models(args, kgs)
elif kg_task == 'lp':
	model = kge_models(args, kgs)
else:
	model = et_models(args, kgs)
model.get_model('model name')
model.run()
model.test()

To run a model on a dataset with the following command line. We show an example of training TransE on FB15K here. The hyperparameters will default to the corresponding json file in the args_kge folder.

# -t:lp, ea, et -m: selected model name -o train and valid -d selected dataset
python main_args.py -t lp -m transe -o train -d data/FB15K

Models hub 🏠

μKG has implemented 26 KG models. The citation for each models corresponds to either the paper describing the model. According to different knowledge graph downstream tasks, we divided the models into three categories. It is available for you to add your own models under one of the three folders.

KGE models

Name Citation
TransE Bordes et al., 2013
TransR Lin et al., 2015
TransD Ji et al., 2015
TransH Wang et al., 2014
TuckER Balažević et al., 2019
RotatE Sun et al., 2019
SimplE Kazemi et al., 2018
RESCAL Nickel et al., 2011
ComplEx Trouillon et al., 2016
Analogy Liu et al., 2017
DistMult Yang et al., 2014
HolE Nickel et al., 2016
ConvE Dettmers et al., 2018

EA models

Name Citation
MTransE Chen et al., 2017
IPTransE Zhu et al., 2017
BootEA Sun et al., 2018
JAPE Sun et al., 2017
IMUSE He et al., 2019
RDGCN Wu et al., 2019
AttrE Trisedya et al., 2019
SEA Pei et al., 2019
GCN-Align Wang et al., 2018
RSN4EA Guo et al., 2019

ET models

Name Citation
TransE Bordes et al., 2013
RESCAL Nickel et al., 2011
HolE Nickel et al., 2016

Datasets hub 🏠

μKG has bulit in 16 KG datasets for different downstream tasks. Here we list the number of entities, relations, train triples, valid triples and test triples for these datasets. You can prepare your own datasets in the Datasets hub. Firstly, you should create a subfolder dataset name in the data folder, then put your train.txt, valid.txt and test.txt files in this folder. The data should be in the triple format.

KGE datasets

Datasets Name Entities Relations Train Valid Test Citation
FB15K 14951 1345 483142 50000 59071 Bordes et al., 2013
FB15K237 14541 237 272115 17535 20466 Bordes et al., 2013
WN18RR 40943 11 86835 3034 3134 Toutanova et al., 2015
WN18 40943 18 141442 5000 5000 Bordes et al., 2013
WN11 38588 11 112581 2609 10544 Socher et al., 2013
DBpedia50 49900 654 23288 399 10969 Shi et al., 2017
DBpedia500 517475 654 3102677 10000 1155937
Countries 271 2 1111 24 24 Bouchard et al., 2015
FB13 75043 13 316232 5908 23733 Socher et al., 2013
Kinsip 104 25 8544 1086 1074 Kemp et al., 2006
Nations 14 55 1592 199 201 ZhenfengLei/KGDatasets
NELL-995 75492 200 149678 543 3992 Nathani et al., 2019
UMLS 75492 135 5216 652 661 ZhenfengLei/KGDatasets

EA datasets

Datasets name Entities Relations Triples Citation
OpenEA supported 15000 248 38265 Sun et al., 2020

ET datasets

Datasets name Entities Relations Triples Types Citation
FB15K-ET 15000 248 38265 3851 Moon et al., 2017

Utils 📂

Sampler

Negative sampler:

μKG includes several negative sampling methods to randomly generate negative examples.

  • Uniform negative sampling: This method replaces an entity in a triple or an alignment pair with another randomly-sampled entity to generate a negative example. It gives each entity the same replacement probability.
  • Self-adversarial negative sampling: This method samples negative triples according to the current embedding model.
  • Truncated negative sampling: This method seeks to generate hard negative examples.

Path sampler: The Path sampler is to support some embedding models that are built by modeling the paths of KGs, such as IPTransE and RSN4EA. It can generate relational path like (e_1, r_1, e_2, r_2, e_3), entity path like (e_1, e_2, e_3), and relation path like (r_1, r_2).

Subgraph sampler: The subgraph sampler is to support GNN-based embedding models like GCN-Align and AliNet. It can generate both first-order (i.e., one-hop) and high-order (i.e., multi-hop) neighborhood subgraphs of entities.

Evaluator

(joint) link prediction & entity typing: This module is inspired by TorchKGE, a PyTorch-based library for efficient training and evaluation of KG embedding. It uses the energy function to compute the plausibility of a candidate triple. The implemented metrics for assessing the performance of embedding tasks include Hits@K, mean rank (MR) and mean reciprocal rank (MRR). The hyperparameter json file stored in args subfolder allows you to set Hits@K.

entity alignment: It provides several metrics to measure entity embedding similarities, such as the cosine, inner, Euclidean distance, and cross-domain similarity local scaling. The evaluation process can be accelerated using multiprocessing.

Multi-GPU and multi-processing computation

We use Ray to provide a uniform and easy-to-use interface for multi-GPU and multi-processing computation. The following figure shows our Ray-based implementation for parallel computing and the code snippet to use it. Users can set the number of CPUs or GPUs used for model training.

image-20220507172436866

To use the following command line to train your model with multi-GPU and multi-processing. Firstly check the number of resources on your machine (GPU or CPU), and then specify the number of parallels. The system will automatically allocate resources for each worker working in parallel.

# When you run on one or more GPUs, use os.environ['CUDA_VISIBLE_DEVICES'] to set GPU id list first 
python main_args.py -t lp -m transe -o train -d data/FB15K -r gpu:2 -w 2  

Running Experiments 🔬

Instruction

We have provided the hyper-parameters of some models for critical experiments in the paper. These scripts can be founded in the folder experiments. You can simply select the specific model in the corresponding Python file to reproduce experiments. And we recommend you to check GPU resources when doing experiments on efficiency. Then add the following code to set GPU IDs for all RAY workers.

os.environ['CUDA_VISIBLE_DEVICES'] = "GPU IDs set"

Efficiency of multi-GPU training

We give the evaluation results of the efficiency of the proposed library μKG here. The experiments were conducted on a server with an Intel Xeon Gold 6240 2.6GHz CPU, 512GB of memory and four NVIDIA Tesla V100 GPUs. The following figure compares the training time of RotatE and ConvE on FB15K-237 when using different numbers of GPUs.

image-20220508150812794

Training time comparison of different libraries

We further compare the training time used by μKG with LibKGE and PyKEEN. The backbone of μKG in this experiment is also PyTorch. We use the same hyper-parameter settings (e.g., batch size and maximum training epochs) for each model in the three libraries. The following table gives the training time of ConvE and RotatE on FB15K-237 with a single GPU for calculation.

Models μKG LibKGE PyKEEN
RotatE 639 s 3,260 s 1,085 s
ConvE 824 s 1,801 s 961 s

License

This project is licensed under the GPL License - see the LICENSE file for details

Citation

@inproceedings{muKG,
  author    = {Xindi Luo and
  	       Zequn Sun and
               Wei Hu},
  title     = {μKG: A Library for Multi-source Knowledge Graph Embeddings and Applications},
  booktitle = {ISWC},
  year      = {2022}
}

More Repositories

1

OpenEA

A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs, VLDB 2020
Python
518
star
2

BootEA

Bootstrapping Entity Alignment with Knowledge Graph Embedding, IJCAI 2018
Python
151
star
3

KnowledgeGraphFusion

CCF ADL 2019 slides for knowledge graph fusion
141
star
4

MultiKE

Multi-view Knowledge Graph Embedding for Entity Alignment, IJCAI 2019
Python
114
star
5

RSN

Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs, ICML 2019
Jupyter Notebook
99
star
6

AliNet

Knowledge Graph Alignment Network with Gated Multi-hop Neighborhood Aggregation, AAAI 2020
Python
98
star
7

JAPE

Cross-Lingual Entity Alignment via Joint Attribute-Preserving Embedding, ISWC 2017
Python
96
star
8

SPARQA

SPARQA: Skeleton-based Semantic Parsing for Complex Questions over Knowledge Bases (AAAI 2020)
Python
69
star
9

DSKG

Jupyter Notebook
66
star
10

GLRE

Global-to-Local Neural Networks for Document-Level Relation Extraction, EMNLP 2020
Python
53
star
11

HyperKA

Knowledge Association with Hyperbolic Knowledge Graph Embeddings, EMNLP 2020
Python
39
star
12

GenMC

Clues Before Answers: Generation-Enhanced Multiple-Choice QA (NAACL 2022)
Python
28
star
13

TransEdge

TransEdge: Translating Relation-contextualized Embeddings for Knowledge Graphs, ISWC 2019
Python
27
star
14

LKGE

Lifelong Embedding Learning and Transfer for Growing Knowledge Graphs, AAAI 2023
Python
26
star
15

AdaLoGN

AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading Comprehension (ACL 2022)
Python
25
star
16

KBQA

KBQA demo
Python
24
star
17

CoLE

I Know What You Do Not Know: Knowledge Graph Embedding via Co-distillation Learning, CIKM 2022
Python
23
star
18

ContEA

Facing Changes: Continual Entity Alignment for Growing Knowledge Graphs, ISWC 2022
Python
21
star
19

FedLU

Heterogeneous Federated Knowledge Graph Embedding Learning and Unlearning, WWW 2023
Python
18
star
20

CKGG

CKGG: A Chinese Knowledge Graph for High-School Geography Education and Beyond (ISWC 2021)
Java
18
star
21

RGRec

Rule-Guided Graph Neural Networks for Recommender Systems, ISWC 2020
Python
17
star
22

TSQA

TSQA: Tabular Scenario Based Question Answering (AAAI 2021)
Python
17
star
23

NJU_KnowledgeFusionCourseExp

HTML
17
star
24

KGProgress2020fromSemWeb

从语义网视角看知识图谱的近期研究进展
17
star
25

DraCo

Dataflow-guided retrieval augmentation for repository-level code completion, ACL 2024 (main)
Python
17
star
26

MBE

Inductive Knowledge Graph Reasoning for Multi-batch Emerging Entities, CIKM 2022
Python
15
star
27

Knowformer

Python
15
star
28

TKGC

Trustworthy Knowledge Graph Completion Based on Multi-sourced Noisy Data, WWW 2022
Python
14
star
29

NJU_KEPractice

The final project for the Knowledge Engineering course at Nanjing University.
Java
13
star
30

PyCRE

Conflict-aware Inference of Python Compatible Runtime Environments with Domain Knowledge Graph, ICSE 2022
Python
13
star
31

OKELE

Open Knowledge Enrichment for Long-tail Entities, WWW 2020
Java
13
star
32

KnowLA

KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable Adaptation, NAACL 2024
Python
13
star
33

KIRE

Enhancing Document-level Relation Extraction by Entity Knowledge Injection, ISWC 2022
Python
13
star
34

CCA

Knowledge Graph Error Detection with Contrastive Confidence Adaption, AAAI 2024
Python
13
star
35

EventEA

EventEA: Benchmarking Entity Alignment for Event-centric Knowledge Graphs
Python
11
star
36

SpanQualifier

Python
10
star
37

ESBM

ESBM: An Entity Summarization Benchmark (ESWC 2020)
10
star
38

KeyKG

Keyword Search over Knowledge Graphs via Static and Dynamic Hub Labelings (WWW 2020)
C++
10
star
39

One2Branch

Python
10
star
40

FAN

Knowing False Negatives: An Adversarial Training Method for Distantly Supervised Relation Extraction, EMNLP 2021
Python
9
star
41

FBPrompt

Python
9
star
42

DRESSED

Entity Summarization with User Feedback (ESWC 2020)
Python
9
star
43

LifeKE

基于链接实体回放的多源知识图谱终身表示学习
Python
9
star
44

SCR

Continual Event Extraction with Semantic Confusion Rectification, EMNLP 2023
Python
9
star
45

DIFT

Finetuning Generative Large Language Models with Discrimination Instructions for Knowledge Graph Completion, ISWC 2024
Python
9
star
46

SkeletonKBQA

Skeleton parsing for complex question answering over knowledge bases (JoWS 2022)
Python
8
star
47

Jeeves-GKMC

When Retriever-Reader Meets Scenario-Based Multiple-Choice Questions (Findings of EMNLP 2021)
Python
8
star
48

SCKD

Serial Contrastive Knowledge Distillation for Continual Few-shot Relation Extraction, Findings of ACL 2023
Python
8
star
49

EPR-KGQA

Enhancing Complex Question Answering over Knowledge Graphs through Evidence Pattern Retrieval, WWW 2024
Python
8
star
50

CEAR

Improving Continual Relation Extraction by Distinguishing Analogous Semantics, ACL 2023
Python
6
star
51

GeoCEQA

基于抽象事理图谱的因果简答题求解方法 (中文信息学报, 2022)
Python
6
star
52

DAEM

Deep Entity Matching with Adversarial Active Learning
Python
6
star
53

RepresentationLearning4KGs

Keynote at 3rd International Workshop on EntitY Retrieval and lEarning (EYRE '20)
6
star
54

ACORDAR

ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval (SIGIR 2022)
6
star
55

NEST

Neural Entity Summarization with Joint Encoding and Weak Supervision (IJCAI 2020)
Python
5
star
56

MuKGE

Joint Pre-training and Local Re-training: Transferable Representation Learning on Multi-source Knowledge Graphs, KDD 2023
5
star
57

MAGIC

Multi-Aspect Controllable Text Generation with Disentangled Counterfactual Augmentation, ACL 2024 (main)
Python
5
star
58

DeepLENS

DeepLENS: Deep Learning for Entity Summarization (DL4KG 2020)
Python
5
star
59

nju-gpt

GPTs @ NJU
5
star
60

Unify-EA-SF

What Makes Entities Similar? A Similarity Flooding Perspective for Multi-sourced Knowledge Graph Embeddings, ICML 2023
Python
5
star
61

DAAKG

Deep Active Alignment of Knowledge Graph Entities and Schemata, SIGMOD 2023
Python
5
star
62

CORE

Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Graphs (ISWC 2020)
Java
4
star
63

GREASE

GREASE: A Generative Model for Relevance Search over Knowledge Graphs (WSDM 2020)
Java
4
star
64

RoadEA

Revisiting Embedding-based Entity Alignment: A Robust and Adaptive Method, TKDE 2022
Python
4
star
65

DyRRen

DyRRen: A Dynamic Retriever-Reranker-Generator Model for Numerical Reasoning over Tabular and Textual Data (AAAI 2023)
Python
4
star
66

B3F

Keyword-Based Knowledge Graph Exploration Based on Quadratic Group Steiner Trees (IJCAI 2021)
Java
4
star
67

iESBM

实体摘要系统的解释性评测 (大数据, 2021)
Python
4
star
68

TTQA

基于图匹配网络的可解释知识图谱复杂问答方法 (计算机研究与发展, 2021)
Python
4
star
69

ReadPyE

Revisiting Knowledge-Based Inference of Python Runtime Environments: A Realistic and Adaptive Approach
Python
4
star
70

ARTime

Automatic Rule Generation for Time Expression Normalization (Findings of EMNLP, 2021)
Scala
3
star
71

GeoQA-GLM

Python
3
star
72

PCSG

PCSG: Pattern-Coverage Snippet Generation for RDF Datasets (ISWC 2021)
Java
3
star
73

CBA

Efficient Approximation Algorithms for the Diameter-Bounded Max-Coverage Group Steiner Tree Problem (WWW 2023)
Java
3
star
74

Remp

Relational match propagation
Python
3
star
75

BANDAR

BANDAR: Benchmarking Snippet Generation Algorithms for Dataset Search (TKDE)
Java
3
star
76

CertQR

Relaxing Relationship Queries on Graph Data (JoWS 2020)
Java
3
star
77

FormulaReasoning

FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning
Python
3
star
78

PairCoder

A Pair Programming Framework for Code Generation via Multi-Plan Exploration and Feedback-Driven Refinement, ASE 2024
3
star
79

ACORDAR-2

[SIGIR 2024] ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style Queries
Java
2
star
80

TRAVERS

TRAVERS: A Diversity-Based Dynamic Approach to Iterative Relevance Search over Knowledge Graphs (WWW 2023)
Java
2
star
81

DO4KG

C++
2
star
82

QGSTP

Efficient Computation of Semantically Cohesive Subgraphs for Keyword-Based Knowledge Graph Exploration (WWW 2021)
Java
2
star
83

FedChain

Python
2
star
84

ExEA

Generating Explanations to Understand and Repair Embedding-based Entity Alignment, ICDE 2024
JavaScript
2
star
85

DR2

[ISWC 2023] Dense Re-Ranking with Weak Supervision for RDF Dataset Search
Python
2
star
86

MStar

Expanding the Scope: Inductive Knowledge Graph Reasoning with Multi-Starting Progressive Propagation, ISWC 2024
Python
2
star
87

CADDIE

A prototype of content-based ad hoc dataset retrieval over RDF datasets.
Java
1
star
88

LogiNumBENCH

Python
1
star
89

QGSTP-BO

Java
1
star
90

QGSTP-HB

Java
1
star
91

INFO

Generating Characteristic Summaries for Entity Descriptions (TKDE)
1
star
92

SF-TQA

Python
1
star
93

TargetedTraining

Python
1
star
94

AHDR-KnowledgeEnhanced

An Empirical Investigation of Implicit and Explicit Knowledge-Enhanced Methods for Ad Hoc Dataset Retrieval (Findings of EMNLP 2023)
Python
1
star
95

CDS

[SIGIR 2024] Enhancing Dataset Search with Compact Data Snippets
Java
1
star
96

DUNKS

[ISWC 2024] DUNKS: Chunking and Summarizing Large and Heterogeneous Web Data for Dataset Search
Python
1
star