Context-Aware Representations for Knowledge Base Relation Extraction
Relation extraction on an open-domain knowledge base
Accompanying repository for our EMNLP 2017 paper (full paper). It contains the code to replicate the experiments and the pre-trained models for sentence-level relation extraction. See below for links to other work on knowledge bases, question answering and graph neural networks.
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
Please use the following citation:
@inproceedings{TUD-CS-2017-0119,
title = {{Context-Aware Representations for Knowledge Base Relation Extraction}},
author = {Sorokin, Daniil and Gurevych, Iryna},
booktitle = {Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
pages = {1784-1789},
year = {2017},
location = {Copenhagen, Denmark},
publisher = {Association for Computational Linguistics},
doi = {10.18653/v1/D17-1188}
}
Paper abstract:
We demonstrate that for sentence-level relation extraction it is beneficial to consider other relations in the sentential context while predicting the target relation. Our architecture uses an LSTM-based encoder to jointly learn representations for all relations in a single sentence. We combine the context representations with an attention mechanism to make the final prediction. We use the Wikidata knowledge base to construct a dataset of multiple relations per sentence and to evaluate our approach. Compared to a baseline system, our method results in an average error reduction of 24% on a held-out set of relations.
Please, refer to the paper for more details.
The dataset described in the paper can be found here:
Contacts:
If you have any questions regarding the code, please, don't hesitate to contact the authors or best report an issue here.
- Daniil Sorokin, personal page
- https://www.informatik.tu-darmstadt.de/ukp/
- https://www.tu-darmstadt.de
Demo:
You can try out the relation extraction model on single sentences in our demo:
http://semanticparsing.ukp.informatik.tu-darmstadt.de:5000/relation-extraction/
UKP Lab's work on knowledge bases:
If you came here looking for our other work on linking text to Wikidata you can also find useful the following links
- Wikidata Entity Linking: https://github.com/UKPLab/starsem2018-entity-linking
- Graph Neural Networks for Knowledge Base Question Answering: https://github.com/UKPLab/coling2018-graph-neural-networks-question-answering
- Question Answering Demo UI: https://github.com/UKPLab/emnlp2018-question-answering-interface
Wikipedia-Wikidata sentence-level relation data set
- Download the data set from the paper here. See the data set ReadMe for more information on the format and see the paper on data set construction.
Project structure:
relation_extraction/
├── eval.py
├── model-train-and-test.py
├── notebooks
├── optimization_space.py
├── core
│ ├── parser.py
│ ├── embeddings.py
│ ├── entity_extraction.py
│ └── keras_models.py
├── relextserver
│ └── server.py
├── graph
│ ├── graph_utils.py
│ ├── io.py
│ └── vis_utils.py
├── stanford_tag_dataset.py
└── evaluation
└── metrics.py
resources/
├── properties-with-labels.txt
└── property_blacklist.txt
File | Description |
---|---|
relation_extraction/ | Main Python module |
relation_extraction/core | Models for joint relation extraction |
relation_extraction/relextserver | The code for the web demo. |
relation_extraction/graph | IO and processing for relation graphs |
relation_extraction/evaluation | Evaluation metrics |
resources/ | Necessary resources |
data/curves/ | The precision-recall curves for each model on the held out data |
Setup:
-
We recommend that you setup a new pip environment first: http://docs.python-guide.org/en/latest/dev/virtualenvs/
-
Check out the repository and run:
pip3 install -r requirements.txt
- Set the Keras (deep learning library) backend to TensorFlow with the following command:
export KERAS_BACKEND=tensorflow
You can also permanently change Keras backend (read more: https://keras.io/backend/). Note that in order to reproduce the experiments in the paper you have to use Theano as a backend instead.
-
Download the data, if you want to replicate the experiments from the paper. Extract the archive inside
emnlp2017-relation-extraction/data/wikipedia-wikidata/
. The data was preprocessed using Stanford Core NLP 3.7.0 models. Seestanford_tag_dataset.py
for more information. -
Download the GloVe embeddings, glove.6B.zip and put them into the folder
emnlp2017-relation-extraction/resources/glove/
. You can change the path to word embeddings in themodel_params.json
file if needed.
Pre-trained models:
- You can download the models that were used in the experiments here
- See
Using pre-trained models.ipynb
for a detailed example on how to use the pre-trained models in your code
Reproducing the experiments from the paper
To reproduce the experiments please refer to the version of the code that was published with the paper: tag emnlp17
In any other case, we recommend using the most recent version.
-
Complete the setup above
-
Run
python model_train.py
inemnlp2017-relation-extraction/relation_extraction/
to see the list of parameters -
If you put the data into the default folders you can train the
ContextWeighted
model with the following command:
python model_train.py model_ContextWeighted train ../data/wikipedia-wikidata/enwiki-20160501/semantic-graphs-filtered-training.02_06.json ../data/wikipedia-wikidata/enwiki-20160501/semantic-graphs-filtered-validation.02_06.json
- Run the following command to compute the precision-recall curves:
python precision_recall_curves.py model_ContextWeighted ../data/wikipedia-wikidata/enwiki-20160501/semantic-graphs-filtered-held-out.02_06.json
Notes
- The web demo code is provided for information only. It is not meant to be run elsewhere.
Requirements:
- Python 3.6
- Keras 2.1.5
- TensorFlow 1.6.0
- See requirements.txt for library requirements.
License:
- Apache License Version 2.0