• Stars
    star
    408
  • Rank 102,741 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 4 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

ACL 2020: Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings

EmbedKGQA

This is the code for our ACL 2020 paper Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings (Slides)

UPDATE: Code for relation matching has been added. Please see the Readme for details on how to use it.

Instructions

Data and pre-trained models

In order to run the code, first download data.zip and pretrained_model.zip from here. Unzip these files in the main directory.

UPDATE: There was an issue with the WebQSP test set containing 43 fewer questions (issue #86). This has been fixed, and the file qa_test_webqsp_fixed.txt should be placed in the directory data/QA_data/WebQuestionsSP

MetaQA

Change to directory ./KGQA/LSTM. Following is an example command to run the QA training code

python3 main.py --mode train --relation_dim 200 --hidden_dim 256 \
--gpu 2 --freeze 0 --batch_size 128 --validate_every 5 --hops 2 --lr 0.0005 --entdrop 0.1 --reldrop 0.2  --scoredrop 0.2 \
--decay 1.0 --model ComplEx --patience 5 --ls 0.0 --kg_type half

WebQuestionsSP

Change to directory ./KGQA/RoBERTa. Following is an example command to run the QA training code

python3 main.py --mode train --relation_dim 200 --do_batch_norm 1 \
--gpu 2 --freeze 1 --batch_size 16 --validate_every 10 --hops webqsp_half --lr 0.00002 --entdrop 0.0 --reldrop 0.0 --scoredrop 0.0 \
--decay 1.0 --model ComplEx --patience 20 --ls 0.05 --l3_reg 0.001 --nb_epochs 200 --outfile half_fbwq

Note: This will run the code in vanilla setting without relation matching, relation matching will have to be done separately. Details on relation matching can be found here. The numbers in Table 3 are after relation matching.

Also, please note that this implementation uses embeddings created through libkge (https://github.com/uma-pi1/kge). This is a very helpful library and I would suggest that you train embeddings through it since it supports sparse embeddings + shared negative sampling to speed up learning for large KGs like Freebase.

Dataset creation

MetaQA

KG dataset

There are 2 datasets: MetaQA_full and MetaQA_half. Full dataset contains the original kb.txt as train.txt with duplicate triples removed. Half contains only 50% of the triples (randomly selected without replacement).

There are some lines like 'entity NOOP entity' in the train.txt for half dataset. This is because when removing the triples, all triples for that entity were removed, hence any KG embedding implementation would not find any embedding vector for them using the train.txt file. By including such 'NOOP' triples we are not including any additional information regarding them from the KG, it is there just so that we can directly use any embedding implementation to generate some random vector for them.

QA Dataset

There are 5 files for each dataset (1, 2 and 3 hop)

  • qa_train_{n}hop_train.txt
  • qa_train_{n}hop_train_half.txt
  • qa_train_{n}hop_train_old.txt
  • qa_dev_{n}hop.txt
  • qa_test_{n}hop.txt

Out of these, qa_dev, qa_test and qa_train_{n}hop_old are exactly the same as the MetaQA original dev, test and train files respectively.

For qa_train_{n}hop_train and qa_train_{n}hop_train_half, we have added triple (h, r, t) in the form of (head entity, question, answer). This is to prevent the model from 'forgetting' the entity embeddings when it is training the QA model using the QA dataset. qa_train.txt contains all triples, while qa_train_half.txt contains only triples from MetaQA_half.

WebQuestionsSP

KG dataset

There are 2 datasets: fbwq_full and fbwq_half

Creating fbwq_full: We restrict the KB to be a subset of Freebase which contains all facts that are within 2-hops of any entity mentioned in the questions of WebQuestionsSP. We further prune it to contain only those relations that are mentioned in the dataset. This smaller KB has 1.8 million entities and 5.7 million triples.

Creating fbwq_half: We randomly sample 50% of the edges from fbwq_full.

QA Dataset

Same as the original WebQuestionsSP QA dataset.

How to cite

If you used our work or found it helpful, please use the following citation:

@inproceedings{saxena2020improving,
  title={Improving multi-hop question answering over knowledge graphs using knowledge base embeddings},
  author={Saxena, Apoorv and Tripathi, Aditay and Talukdar, Partha},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  pages={4498--4507},
  year={2020}
}

More Repositories

1

CompGCN

ICLR 2020: Composition-Based Multi-Relational Graph Convolutional Networks
Python
580
star
2

WordGCN

ACL 2019: Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks
Python
288
star
3

RESIDE

EMNLP 2018: RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information
CSS
247
star
4

HyperGCN

NeurIPS 2019: HyperGCN: A New Method of Training Graph Convolutional Networks on Hypergraphs
Python
175
star
5

HyTE

EMNLP 2018: HyTE: Hyperplane-based Temporally aware Knowledge Graph Embedding
Python
171
star
6

ProteinGCN

ProteinGCN: Protein model quality assessment using Graph Convolutional Networks
Python
109
star
7

cesi

WWW 2018: CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information
Python
100
star
8

ASAP

AAAI 2020 - ASAP: Adaptive Structure Aware Pooling for Learning Hierarchical Graph Representations
Python
95
star
9

InteractE

AAAI 2020 - InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions
Python
86
star
10

EWISE

ACL 2019: Zero-shot Word Sense Disambiguation using Sense Definition Embedding
Python
74
star
11

SGCP

TACL 2020: Syntax-Guided Controlled Generation of Paraphrases
Python
72
star
12

DiPS

NAACL 2019: Submodular optimization-based diverse paraphrasing and its effectiveness in data augmentation
Python
70
star
13

NeuralDater

ACL 2018: Dating Documents using Graph Convolution Networks
Python
61
star
14

ConfGCN

AISTATS 2019: Confidence-based Graph Convolutional Networks for Semi-Supervised Learning
Python
57
star
15

CaRE

EMNLP 2019: CaRe: Open Knowledge Graph Embeddings
Python
37
star
16

kg-geometry

Python
21
star
17

sictf

Relation Schema Induction using SICTF
Python
17
star
18

AD3

EMNLP 2018: AD3: Attentive Deep Document Dater :: Swayambhu Nath Ray, Shib Sankar Dasgupta, Partha Talukdar
Python
12
star
19

lcn

AISTATS 2019: Lovász Convolutional Networks
Python
8
star
20

entity-centric-kb-pop

Python
3
star
21

pra-oda

Path Ranking Algorithm On-Demand
Java
3
star
22

reddit-icwsm16

OpenEdge ABL
2
star