• Stars
    star
    110
  • Rank 316,770 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Falcon 2.0 is a joint entity and relation linking tool over Wikidata.

FALCON 2.0

Falcon 2.0 is an entity and relation linking tool over Wikidata (accepted in CIKM 2020). The full CIKM paper can be found at the link: Falcon 2.0 Paper

It leverages fundamental principles of the English morphology (e.g., N-Gram tiling and N-Gramsplitting) to accurately map entities and relations in short texts to resources in Wikidata. Falcon is available as Web API and can be queried using CURL:

curl --header "Content-Type: application/json" \
  --request POST \
  --data '{"text":"Who painted The Storm on the Sea of Galilee?"}' \
  https://labs.tib.eu/falcon/falcon2/api?mode=long

This is the first resource of this repository. The second resource is described in the ElasticSearch section.

Implementation

To begin with, install the libraries stated in the requirements.txt file as follows:

pip install -r requirements.txt

The FALCON 2.0 tool's code has three main aspects: elastic search, algorithm, and evaluation.

Elastic Search and Background Knowledge

Before beginning working with the Wikidata Dump, we first need to connect to an elasticsearch endpoint and a Wikidata endpoint. The elasticsearch endpoint is used to interact with our cluster through the Elasticsearch API. The ElasticSearch dump (Also knowns as R2: Background Knowledge) for Falcon 2.0 can be downloaded from this link: https://doi.org/10.6084/m9.figshare.11362883

To import the Elasticsearch dump please use elasticdump and execute the following commands:

elasticdump  --output=http://localhost:9200/wikidataentityindex/  --input=wikidataentity.json  --type=data

elasticdump  --output=http://localhost:9200/wikidatapropertyindex/  --input=wikidatapropertyindex.json  --type=data

To change your elasticsearch endpoint, makes changes in Elastic/searchIndex.py and Elastic/addIndex.py:

es = Elasticsearch(['http://localhost:9200'])

Wikidata SPARQL endpoint helps us to quickly search and analyze big volumes of the data stored in the knowledge graph (here, Wikidata). To change Wikidata endpoint, make changes in main.py:

wikidataSPARQL = " "

We then create indices for property search and entity search over Wikidata. Refer to the following two functions in Elastic/addIndex.py for the code:

def propertyIndexAdd(): ...
def entitiesIndexAdd(): ...

Furthermore, we need to execute a search query and get back search hits that match the query. The search query feature is used to find whether a mention is an entity or a property in Wikidata. Note that Elasticsearch uses JSON as the serialization format for the documents. The elasticsearch query used to retrieve candidates from elasticsearch is as follows:

{
  "query": {
    "match" : { "label" : "operating income" }
  }
}

Search queries over Wikidata are implemented in Elastic/searchIndex.py. Refer to the following two functions in the same file for entity search and property search in Wikidata:

def entitySearch(query): ...
def propertySearch(query): ...

Algorithm

main.py contains the code for automatic entity and relation linking to resources in Wikidata using rule-based learning. Falcon 2.0 uses the same approach for Wikidata knowledge graph as used in Falcon for DBpedia(https://labs.tib.eu/falcon/). The rules that represent the English morphology are maintained in a catalog; a forward chaining inference process is performed on top of the catalog during the tasks of extraction and linking. Falcon 2.0 also comprises several modules that identify and link entities and relations to Wikidata knowledge graph. These modules implement POS Tagging, Tokenization & Compounding, N-Gram Tiling, Candidate ListGeneration, Matching & Ranking, Query Classifier, and N-Gram Splitting. The modules are reused from the implementation of Falcon.

Evaluation

Usage

To run Falcon 2.0, you have to call the function "process_text_E_R(question)" where the question is the short text to be processed by Falcon 2.0 We

For evaluating Falcon 2.0, we relied on three different question answering datasets, namely SimpleQuestion dataset for Wikidata, WebQSP-WD, and LC-QuAD 2.0.

For reproducing the results, "evaluateFalconAPI.py" and "evaluateFalconAPI_entities.py" can be used.

"evaluateFalconAPI_entities.py" evaluates entity linking.

"evaluateFalconAPI.py" evaluates entity and relation linking.

Experimental Results for Entity Linking

SimpleQuestions dataset

SimpleQuestion dataset contains 5622 test questions which are answerable using Wikidata as underlying Knowledge Graph. Falcon 2.0 reports precision value 0.56, recall value 0.64 and F-score value 0.60 on this dataset.

LC-QuAD 2.0 dataset

LC-Quad 2.0 contains 6046 test questions that are mostly complex (more than one entity and relation). On this dataset, Falcon 2.0 reports a precision value 0.50, recall value 0.56 and F-score 0.53.

WebQSP-WD dataset

WebQSP-WD contains 1639 test questions with a single entity and relation per question. Falcon 2.0 outperforms all other baselines with the highest F-score value 0.82, precision value 0.80, and highest recall value 0.84 on the WebQSP-WD dataset.

Experimental Results for Relation Linking

SimpleQuestions dataset

Falcon 2.0 reports a precision value of 0.35, recall value 0.44 and F-score 0.39 on SimpleQuestions dataset for relation linking task.

LC-QuAD 2.0

Falcon 2.0 reports a precision value of 0.44, recall value 0.37 and F-score 0.40 on LC-Quad 2.0 dataset.

Cite our work

@inproceedings{10.1145/3340531.3412777,
author = {Sakor, Ahmad and Singh, Kuldeep and Patel, Anery and Vidal, Maria-Esther},
title = {Falcon 2.0: An Entity and Relation Linking Tool over Wikidata},
year = {2020},
isbn = {9781450368599},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3340531.3412777},
doi = {10.1145/3340531.3412777},
booktitle = {Proceedings of the 29th ACM International Conference on Information & Knowledge Management},
pages = {3141–3148},
numpages = {8},
keywords = {wikidata, dbpedia, relation linking, nlp, english morphology, entity linking, background knowledge},
location = {Virtual Event, Ireland},
series = {CIKM '20}
}

More Repositories

1

SDM-RDFizer

An Efficient RML-Compliant Engine for Knowledge Graph Construction
Python
111
star
2

InterpretME

An interpretable machine learning pipeline over knowledge graphs
Jupyter Notebook
24
star
3

Trav-SHACL

A SHACL validator capable of planning the traversal and execution of the validation of a shape schema to detect violations early.
Python
22
star
4

DeTrusty

Federated Query Engine over RDF Sources
Python
13
star
5

SHACL-ACL

Access Control with SHACL
Python
12
star
6

TrustKG

An ecosystem of knowledge driven tools for the transformation of scientific data into a medical knowledge graph
8
star
7

Ontario

Ontario: Federated SPARQL Query Processing Engine over Semantic Data Lakes
Python
7
star
8

Dragoman

An Optimized Interpreter for RML Functional Mappings!
Python
6
star
9

diefpy

Python package for computing diefficiency metrics dief@t and dief@k.
Jupyter Notebook
6
star
10

EABlock

A computational block to solve entity alignment over textual attributes in a knowledge graph creation pipeline.
Python
5
star
11

KGC-Demo

Python
4
star
12

SHACLViewer

Web application for interactive visualizations of SHACL shape schemas
JavaScript
4
star
13

shaclAPI

An API implementing optimizations over SHACL validators.
Python
4
star
14

FunMap

Functional Mappings for Scaled-Up Knowledge Graph Creation
Python
3
star
15

KG-Tools

Knowledge Graph creation and management tools from SDM
3
star
16

Video-Entity-Linking

Knowledge Capturing from Multimodal Video-Textual Knowledge-Entity Linking
Python
3
star
17

Symbolic_Learning_over_KGs

Mining symbolic rules over knowledge graphs
Python
3
star
18

easyRML

easyRML facilitates the RML Mapping rule generation. It receives mappings data from the user via a user interface and translate it into a validated turtle file including RML mapping rules, without any syntax checking to be required from the user side.
JavaScript
3
star
19

LDM_Docker

Jupyter Notebook
2
star
20

DIGGER-ESWC2023Demo

Demonstration of mining symbolic rules to explain lung cancer treatments
Jupyter Notebook
2
star
21

Synthetic-Data-Generator

SDG generates synthetic breast cancer patient data
Python
2
star
22

SPARKLE

Knowledge Graph Enrichment with Symbolic LEarning
Python
1
star
23

SDM-KG-creation-tools

1
star
24

TIB_Data_Manager

Jupyter Notebook
1
star
25

Project_Scientific_DB_Programming

Jupyter Notebook
1
star
26

CareKG

A python library support causal query over KG.
Python
1
star
27

PINYON

Python
1
star
28

SDM-RDFizer-Star

Python
1
star
29

MULDER

Querying the Linked Data Web by Bridging RDF Molecule Templates
Python
1
star
30

P4-LUCAT_KG

1
star
31

DILS2018_Tutorials

1
star
32

SemanticSensorDataFactorization

Java
1
star
33

Graph-Factorization

Python
1
star
34

coypu_demo

Jupyter Notebook
1
star
35

RML_Verifier

Python
1
star
36

Dashboard-SDM-Federation

A semantic data manager for federations of RDF datasets
Python
1
star
37

SDM-RDF2vec

Python
1
star
38

PALADIN

Process-based Data Validator
Python
1
star
39

InterpretME_Demo

Demonstration of InterpretME, an interpretable machine learning pipeline
Jupyter Notebook
1
star
40

Medi-LlaMA

1
star
41

CoyPU_communities_demo

Jupyter Notebook
1
star