• Stars
    star
    194
  • Rank 200,219 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Entity Alignment toolkit (EAkit), a lightweight, easy-to-use and highly extensible PyTorch implementation of many entity alignment algorithms.

EAkit

Entity Alignment toolkit (EAkit), a lightweight, easy-to-use and highly extensible PyTorch implementation of many entity alignment algorithms. The algorithm list is from Entity_Alignment_Papers.

Table of Contents

  1. Design
  2. Organization
  3. Usage
    1. Run an implemented model
      1. Semantic Matching Models
      2. GNN-based Models
      3. KE-based Models
      4. Results
    2. Write a new model
  4. Dataset
  5. Reqirements
  6. TODO
  7. Acknowledgement

Design

We sort out the existing entity alignment algorithms and modularizing the composition of them, and then define an abstract structure as 1 Encoder - N Decoder(s), where different modules are regarded as specific implementations of different encoders and decoders, so as to restore the structures of the algorithms.

Framework of EAkit

Organization

./EAkit
├── README.md                           # Doc of EAkit
├── _runs                               # Tensorboard log dir
├── data                                # Datasets. (unzip data.zip)
│   └── DBP15K
├── examples                            # Shell scripts of implemented algorithms
│   ├── Tensorboard.sh                  # Start Tensorboard visualization
│   ├── run_BootEA.sh
│   ├── run_ComplEx.sh
│   ├── run_ConvE.sh
│   ├── run_DistMult.sh
│   ├── run_GCN-Align.sh
│   ├── run_HAKE.sh
│   ├── run_KECG.sh
│   ├── run_MMEA.sh
│   ├── run_MTransE.sh
│   ├── run_NAEA.sh
│   ├── run_RotatE.sh
│   ├── run_TransE.sh
│   ├── run_TransEdge.sh
│   ├── run_TransH.sh
│   └── run_TransR.sh
├── load_data.py                        # Load datasets. (data adapter)
├── models.py                           # Encoders & Decoders
├── run.py                              # Main
├── semi_utils.py                       # Bootstrap strategy
└── utils.py                            # Sampling methods, ...

Usage

Run an implemented model

  1. Start TensorBoard for metrics visualization (run under examples/):
./Tensorboard.sh
  1. Modify and run a script as follow (examples are under examples/):
CUDA_VISIBLE_DEVICES=0 python3 run.py --log gcnalign \
                                    --data_dir "data/DBP15K/zh_en" \
                                    --rate 0.3 \
                                    --epoch 1000 \
                                    --check 10 \
                                    --update 10 \
                                    --train_batch_size -1 \
                                    --encoder "GCN-Align" \
                                    --hiddens "100,100,100" \
                                    --decoder "Align" \
                                    --sampling "N" \
                                    --k "25" \
                                    --margin "1" \
                                    --alpha "1" \
                                    --feat_drop 0.0 \
                                    --lr 0.005 \
                                    --train_dist "euclidean" \
                                    --test_dist "euclidean"

In detail, the following methods are currently implemented:

Semantic Matching Models

Method Encoder Decoder
MTransE from Chen et al. (IJCAI 2017) [sh], [origin] None TransE, MTransE_Align
BootEA from Sun et al. (IJCAI 2018) [sh], [origin] None AlignEA
TransEdge from Sun et al. (ISWC 2019) [sh], [origin] None TransEdge
MMEA from Shi et al. (EMNLP 2019) [sh], [origin] None MMEA

GNN-based Models

Method Encoder Decoder
GCN-Align from Wang et al. (EMNLP 2018) [sh], [origin] GCN-Align Align
NAEA from Zhu et al. (IJCAI 2019) [sh], [origin] NAEA [N_TransE], N_TransE, N_R_Align
KECG from Li et al. (EMNLP 2019) [sh], [origin] KECG TransE, Align

KE-based Models

Method Encoder Decoder
TransE from Bordes et al. (NIPS 2013) [sh], None TransE
TransH from Wang et al. (AAAI 2014) [sh], None TransH
TransR from Lin et al. (AAAI 2015) [sh], None TransR
RotatE from Sun et al. (ICLR 2019) [sh], None RotatE
HAKE from Zhang et al. (AAAI 2020) [sh], None HAKE
DistMult from Yang et al. (ICLR 2015) [sh], None DistMult
ComplEx from Trouillon et al. (ICML 2016) [sh], None ComplEx
ConvE from Dettmers et al. (AAAI 2018) [sh], None ConvE

Results

Results on DBP15K(zh_en, ja_en, fr_en).

Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR
MTransE 0.419 0.753 0.535 0.433 0.773 0.549 0.407 0.751 0.526
BootEA 0.490 0.793 0.593 0.499 0.813 0.605 0.515 0.838 0.623
TransEdge 0.519 0.813 0.621 0.526 0.825 0.632 0.397 0.824 0.543
MMEA 0.405 0.672 0.499 0.397 0.680 0.496 0.442 0.749 0.550
GCN-Align 0.410 0.756 0.527 0.442 0.810 0.566 0.430 0.813 0.557
NAEA 0.323 0.481 0.381 0.311 0.457 0.363 0.307 0.460 0.362
KECG 0.467 0.815 0.586 0.485 0.843 0.605 0.479 0.844 0.602
TransE 0.343 0.634 0.441 0.365 0.710 0.480 0.374 0.735 0.493
TransH 0.436 0.735 0.540 0.450 0.778 0.561 0.485 0.821 0.599
TransR 0.371 0.697 0.481 0.368 0.709 0.484 0.378 0.741 0.497
RotatE 0.423 0.754 0.534 0.448 0.785 0.561 0.439 0.800 0.560
HAKE 0.288 0.588 0.391 0.319 0.607 0.421 0.319 0.638 0.428
DistMult 0.180 0.400 0.255 0.058 0.179 0.099 0.095 0.285 0.157
ComplEx 0.115 0.265 0.166 0.063 0.251 0.146 0.141 0.332 0.206
ConvE 0.210 0.466 0.299 0.339 0.556 0.415 0.350 0.602 0.439

Write a new model

  1. Divide the algorithm at the abstract level to obtain the structure of 1 (or 0) Encoder and 1 (or more) Decoder(s).
  2. Register the modules and add extra parameters in the top-level encoder (class Encoder) and top-level decoder (class Decoder) in models.py.
  3. Implement the concrete encoding module (class Encoder_Instance) and decoding module(s) (class Decoder_Instance) according to the given template.
  4. Write an execution script (XXX.sh) with parameter settings to run the new model.
  5. (Adapt a new dataset in load_data.py, and add a new sampling strategy in utils.py.)

Example of writing a new model

Dataset

(Currently, EAkit only supports DBP15K, but it is easy to adapt to other datasets.)

  • DBP15K is from the "mapping" folder of JAPE(But need to combine "ref_ent_ids" and "sup_ent_ids" into a single file named "ill_ent_ids")

Here, you can directly unpack the data file after downloading:

unzip data.zip

Reqirements

  • Python3 (tested on 3.7.7)
  • PyTorch (tested on 1.4.0)
  • PyTorch Geometric (PyG) (tested on 1.4.3)
  • TensorBoard (tested on 2.0.2)
  • Numpy
  • Scipy
  • Scikit-learn
  • Graph-tool (if use bootstrapping)

TODO

  • Results of BootEA, TransEdge, MMEA, NAEA are not satisfactory, they need debug (maybe on the bootstrapping process).

There are still many algorithms that need to be implemented (integrated):

  • Semantic Matching Models: NTAM, AttrE, CEAFF, ...
  • GNN-based Models: AVR-GCN, AliNet, MRAEA, CG-MuAlign, RDGCN, HGCN, GMNN, ...
  • KE-based Models: TransD, CapsE, ...
  • GAN-based Models: SEA, AKE, ...
  • Other Models: OTEA, ...

Find algorithms from Entity_Alignment_Papers.

Pull requests for implementing algorithms & updating (reproducible) results with shell scripts are welcome!

Acknowledgement

We refer to some codes of the following repos, and we appreciate for their great contributions: PyTorch Geometric, BootEA, TransEdge, AliNet, TuckER. If we miss some, do please let us know in Issues.

This project is mainly contributed by Chengjiang Li, Kaisheng Zeng, Lei Hou, Juanzi Li.

Citation

If you use the code, please cite the following paper:

@article{zeng2021comprehensive,
  title={A comprehensive survey of entity alignment for knowledge graphs},
  author={Zeng, Kaisheng and Li, Chengjiang and Hou, Lei and Li, Juanzi and Feng, Ling},
  journal={AI Open},
  volume={2},
  pages={1--13},
  year={2021},
  publisher={Elsevier}
}

More Repositories

1

Entity_Alignment_Papers

Must-read papers on entity alignment published in recent years
530
star
2

EvaluationPapers4ChatGPT

Resource, Evaluation and Detection Papers for ChatGPT
451
star
3

Knowledge_Graph_Reasoning_Papers

Must-read papers on knowledge graph reasoning
429
star
4

OmniEvent

A comprehensive, unified and modular event extraction toolkit.
Python
338
star
5

KEPLER

Source code for TACL paper "KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation".
Python
193
star
6

MAVEN-dataset

Source code and dataset for EMNLP 2020 paper "MAVEN: A Massive General Domain Event Detection Dataset".
Python
149
star
7

MetaKGR

Source codes and datasets for EMNLP 2019 paper "Adapting Meta Knowledge Graph Information for Multi-Hop Reasoning over Few-Shot Relations"
Python
113
star
8

ChatLog

⏳ ChatLog: Recording and Analysing ChatGPT Across Time
Jupyter Notebook
94
star
9

MOOCCubeX

A large-scale knowledge repository for adaptive learning, learning analytics, and knowledge discovery in MOOCs, hosted by THU KEG.
Python
84
star
10

CLEVE

Source code for ACL 2021 paper "CLEVE: Contrastive Pre-training for Event Extraction"
Python
80
star
11

KoPL

Knowledge Oriented Programming Language
Python
79
star
12

MAVEN-ERE

Source code and dataset for EMNLP 2022 paper "MAVEN-ERE: A Unified Large-scale Dataset for Event Coreference, Temporal, Causal, and Subevent Relation Extraction".
Python
76
star
13

KoLA

[ICLR24] The open-source repo of THU-KEG's KoLA benchmark.
Jupyter Notebook
50
star
14

DacKGR

Source codes and datasets for EMNLP 2020 paper "Dynamic Anticipation and Completion for Multi-Hop Reasoning over Sparse Knowledge Graph"
Jupyter Notebook
46
star
15

PKGC

Do Pre-trained Models Benefit Knowledge Graph Completion? A Reliable Evaluation and a Reasonable Approach
Python
43
star
16

EDUKG

EDUKG: a Heterogeneous Sustainable K-12 Educational Knowledge Graph
Python
39
star
17

KECG

Source code and datasets for EMNLP 2019 paper "Semi-supervised Entity Alignment via Joint Knowledge Embedding Model and Cross-graph Model".
Python
38
star
18

ADELIE

Aligning Large Language Models on Information Extraction
Python
30
star
19

MOOC-Radar

The data and source code for the paper "MoocRadar: A Fine-grained and Multi-aspect Knowledge Repository for Improving Cognitive Student Modeling in MOOCs"
Python
30
star
20

CCL2022_Storyline_Relationship_Classification

CCL2022 新闻脉络关系识别
29
star
21

TWAG

Code and dataset for the ACL 2021 paper "TWAG: A Topic-guided Wikipedia Abstract Generator"
Perl
20
star
22

COPEN

The official code and dataset for EMNLP 2022 paper "COPEN: Probing Conceptual Knowledge in Pre-trained Language Models".
Python
19
star
23

BIMR

Datasets and source codes for paper "Is Multi-Hop Reasoning Really Explainable? Towards Benchmarking Reasoning Interpretability"
19
star
24

WaterBench

[ACL2024-Main] Data and Code for WaterBench: Towards Holistic Evaluation of LLM Watermarks
Python
17
star
25

Skill-Neuron

Source code for EMNLP2022 paper "Finding Skill Neurons in Pre-trained Transformers via Prompt Tuning".
Python
16
star
26

SeaKR

Python
16
star
27

Awesome_MOOCs

This is a repo listing some must-read papers on *AI-driven MOOCs* or *Intelligent Education* published in recent years, mainly contributed by the MOOC team members at Knowledge Engineering Group ([KEG](http://keg.cs.tsinghua.edu.cn/)) of Tsinghua University.
15
star
28

ProgramTransfer

Official code and data of the ACL 2022 paper "Program Transfer for Complex Question Answering over Knowledge Bases"
Python
14
star
29

Entity-Linking-Trends-and-History

Papers about the trend of Entity Linking in recent years.
11
star
30

ProbTree

Source code for EMNLP 2023 paper "Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions".
Python
11
star
31

Xlore2.0

Xlore2.0 Code[BaiduExtractor, HudongExtractor, WikiExtractor, XloreData, XloreWeb]
Java
10
star
32

MAVEN-Argument

Completing the Puzzle of All-in-One Event Understanding Benchmark with Event Arguments
Python
8
star
33

goal

Python
8
star
34

UPER

Code for the COLING22 paper "UPER: Boosting Multi-Document Summarization with an Unsupervised Prompt-based Extractor"
Python
8
star
35

Event-Level-Knowledge-Editing

Python
8
star
36

KoRC

Baseline for KoRC
Python
7
star
37

MOOC-NER

The code and dataset of ACL'23 paper "Distantly Supervised Course Concept Extraction in MOOCs with Academic Discipline"
Python
6
star
38

KB-Plugin

This is the accompanying code & data for the paper "KB-Plugin: A Plug-and-play Framework for Large Language Models to Induce Programs over Low-resourced Knowledge Bases".
Python
6
star
39

HIF-KAT

Python
5
star
40

DICE

DICE: Detecting In-distribution Data Contamination with LLM's Internal State
Python
5
star
41

Awesome-KBQA

4
star
42

ICLEA

Code and datasets for ICLEA: Interactive Contrastive Learning for Self-supervised Entity Alignment
4
star
43

ijcai13data

ijcai13-dataset-content-alignment
4
star
44

WikiExtrator

extractor for wikipedia dump files
Java
4
star
45

CStory

Data resource of CStory
Python
4
star
46

ARTE

4
star
47

MAVEN-FACT

Python
4
star
48

ClinicNER

ClinicNER experiments
Python
3
star
49

R-Eval

[KDD24-ADS] R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models
Python
3
star
50

VTA

Code, APIs and data for the CIKM23 paper "LittleMu: Deploying an Online Virtual Teaching Assistant via Heterogeneous Sources Integration and Chain-of-Teach Prompts"
3
star
51

ConstGCN

2
star
52

XAlias

XAlias: An Unsupervised Bilingual Entity Alias Discovery System with Multiple Sources
Python
2
star
53

IR4KGC

2
star
54

NGS

Source code for AACL-IJCNLP 2020 paper "Neural Gibbs Sampling for Joint Event Argument Extraction".
Python
2
star
55

SQC-Score

Python
2
star
56

LLMAEL

LLM-Augmented Entity Linking
Python
2
star
57

LLM_Reasoning_Papers

Papers on LLM Reasoning and Retrieval-Augmented LLM Reasoning
1
star
58

SafetyNeuron

Data and code for the paper: Finding Safety Neurons in Large Language Models
Jupyter Notebook
1
star
59

KNOT

1
star