• Stars
    star
    119
  • Rank 297,930 (Top 6 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created about 4 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

OpenIE6 system

OpenIE6 System

This repository contains the code for the paper:
OpenIE6: Iterative Grid Labeling and Coordination Analysis for Open Information Extraction
Keshav Kolluru*, Vaibhav Adlakha*, Samarth Aggarwal, Mausam and Soumen Chakrabarti
EMNLP 2020

* denotes equal contribution

Installation

conda create -n openie6 python=3.6
conda activate openie6
pip install -r requirements.txt
python -m nltk.downloader stopwords
python -m nltk.downloader punkt 

All results have been obtained on V100 GPU with CUDA 10.0 NOTE: HuggingFace transformers==2.6.0 is necessary. The latest version has a breaking change in the way tokenizer is used in the code. It will not raise an error but will give wrong results!

Download Resources

Download Data (50 MB)

zenodo_get 4054476
tar -xvf openie6_data.tar.gz

Download Models (6.6 GB)

zenodo_get 4055395
tar -xvf openie6_models.tar.gz

Running Model

New command:

python run.py --mode splitpredict --inp sentences.txt --out predictions.txt --rescoring --task oie --gpus 1 --oie_model models/oie_model/epoch=14_eval_acc=0.551_v0.ckpt --conj_model models/conj_model/epoch=28_eval_acc=0.854.ckpt --rescore_model models/rescore_model --num_extractions 5 

Expected models:
models/conj_model: Performs coordination analysis
models/oie_model: Performs OpenIE extraction
models/rescore_model: Performs the final rescoring

--inp sentences.txt - File with one sentence in each line --out predictions.txt - File containing the generated extractions

gpus - 0 for no GPU, 1 for single GPU

Additional flags -

--type labels // outputs word-level aligned labels to the file path `out`+'.labels'
--type sentences // outputs decomposed sentences to the file path `out`+'.sentences'

Additional Notes:

  1. The model is trained with tokenized sentences and hence requires tokenized sentences during prediction as well. The code currently uses nltk tokenization for this purpose. This will lead to the final sentences being different from the input sentences, as they will be the tokenized version. If this is not desirable you can comment the nltk tokenization in data.py and make sure that your sentences are tokenized beforehand.
  2. Due to an artifact of training data in conjunction model, it requires the sentence to end with full stop to function correctly.

Training Model

Warmup Model

Training:

python run.py --save models/warmup_oie_model --mode train_test --model_str bert-base-cased --task oie --epochs 30 --gpus 1 --batch_size 24 --optimizer adamW --lr 2e-05 --iterative_layers 2

Testing:

python run.py --save models/warmup_oie_model --mode test --batch_size 24 --model_str bert-base-cased --task oie --gpus 1

Carb F1: 52.4, Carb AUC: 33.8

Predicting

python run.py --save models/warmup_oie_model --mode predict --model_str bert-base-cased --task oie --gpus 1 --inp sentences.txt --out predictions.txt

Time (Approx): 142 extractions/second

Constrained Model

Training

python run.py --save models/oie_model --mode resume --model_str bert-base-cased --task oie --epochs 16 --gpus 1 --batch_size 16 --optimizer adam --lr 5e-06 --iterative_layers 2 --checkpoint models/warmup_oie_model/epoch=13_eval_acc=0.544.ckpt --constraints posm_hvc_hvr_hve --save_k 3 --accumulate_grad_batches 2 --gradient_clip_val 1 --multi_opt --lr 2e-5 --wreg 1 --cweights 3_3_3_3 --val_check_interval 0.1

Testing

python run.py --save models/oie_model --mode test --batch_size 16 --model_str bert-base-cased --task oie --gpus 1 

Carb F1: 54.0, Carb AUC: 35.7

Predicting

python run.py --save models/oie_model --mode predict --model_str bert-base-cased --task oie --gpus 1 --inp sentences.txt --out predictions.txt

Time (Approx): 142 extractions/second

NOTE: Due to a bug in the code, link, we end up using a loss function based only on the constrained loss term and not the original Cross Entropy (CE) loss. It still seems to work well as the warmup model is already trained with the CE loss and the constrained training is initialized from the warmup model.

Running Coordination Analysis

python run.py --save models/conj_model --mode train_test --model_str bert-large-cased --task conj --epochs 40 --gpus 1 --batch_size 32 --optimizer adamW --lr 2e-05 --iterative_layers 2

F1: 87.8

Final Model

Running

python run.py --mode splitpredict --inp carb/data/carb_sentences.txt --out models/results/final --rescoring --task oie --gpus 1 --oie_model models/oie_model/epoch=14_eval_acc=0.551_v0.ckpt --conj_model models/conj_model/epoch=28_eval_acc=0.854.ckpt --rescore_model models/rescore_model --num_extractions 5 
python utils/oie_to_allennlp.py --inp models/results/final --out models/results/final.carb
python carb/carb.py --allennlp models/results/final.carb --gold carb/data/gold/test.tsv --out /dev/null

Carb F1: 52.7, Carb AUC: 33.7 Time (Approx): 31 extractions/second

Evaluate using other metrics (Carb(s,s), Wire57 and OIE-16)

bash carb/evaluate_all.sh models/results/final.carb carb/data/gold/test.tsv

Carb(s,s): F1: 46.4, AUC: 26.8 Carb(s,m) ==> Carb: F1: 52.7, AUC: 33.7 OIE16: F1: 65.6, AUC: 48.4 Wire57: F1: 40.0

CITE

If you use this code in your research, please cite:

@inproceedings{kolluru&al20,
    title = "{O}pen{IE}6: {I}terative {G}rid {L}abeling and {C}oordination {A}nalysis for {O}pen {I}nformation {E}xtraction",\
    author = "Kolluru, Keshav  and
      Adlakha, Vaibhav and
      Aggarwal, Samarth and
      Mausam, and
      Chakrabarti, Soumen",
    booktitle = "The 58th Annual Meeting of the Association for Computational Linguistics (ACL)",
    month = July,
    year = "2020",
    address = {Seattle, U.S.A}
}

LICENSE

Note that the license is the full GPL, which allows many free uses, but not its use in proprietary software which is distributed to others. For distributors of proprietary software, you can contact us for commercial licensing.

CONTACT

In case of any issues, please send a mail to keshav.kolluru (at) gmail (dot) com

More Repositories

1

OpenIE-standalone

PostScript
567
star
2

imojie

Neural generation model for Open Information Extraction
Python
79
star
3

CaRB

CaRB - A Crowdsourced Benchmark for Open IE
Python
40
star
4

jeebench

JEEBench, EMNLP 2023
Python
28
star
5

tkbi

Python
23
star
6

KBI

Python
23
star
7

dl-with-constraints

Code for experiments in 'Primal Dual Formulation For Deep Learning With Constraints'
Python
21
star
8

BossNet

BossNet: Disentangling Language and Knowledge in Task Oriented Dialogs
Python
16
star
9

ECQA-Dataset

Dataaset Release for Explanations for CommonsenseQA, ACL 2021 Paper
Python
15
star
10

DeGPR

Python
14
star
11

DSRE

Resources for the paper "PARE: A Simple and Strong Baseline for Monolingual and Multilingual Distantly Supervised Relation Extraction"
Python
13
star
12

FloNet

Code for "End-to-End Learning of Flowchart Grounded Task-Oriented Dialogs"
Python
13
star
13

nsrmp

NSRM: Neuro-Symbolic Robot Manipulation
Python
12
star
14

DiS-ReX

Python
10
star
15

FloDial

10
star
16

ilploss

official repo for the NeurIPS 2022 paper "A Solver-Free Framework for Scalable Learning in Neural ILP Architectures"
Python
9
star
17

NS-KGC-AUG

Python
9
star
18

moie

Python
8
star
19

ECQA

Code Repository for the Explanations for CommonsenseQA, ACL 2021 paper
Python
8
star
20

kglr

Python
8
star
21

PoolingAnalysis

[EMNLP'20][Findings] Official Repository for the paper "Why and when should you pool? Analyzing Pooling in Recurrent Architectures."
Python
8
star
22

symnet

Python
7
star
23

torpido

Planning using Reinforcement Learning
Python
7
star
24

BoxCell

Official Repo for "Guided Prompting in SAM for Weakly Supervised Cell Segmentation in Histopathological Images"
Python
7
star
25

OxKBC

State-of-the-art models for Knowledge Base Completion (KBC) for large KBs (such as FB15k1and YAGO) are based on tensor factorization (TF), e.g, DistMult, ComplEx. While they produce2good results, they cannot expose any rationale behind their predictions, potentially reducing the3trust of a user in the outcome of the model. Previous works have explored creating an inherently4explainable model, e.g. Neural Theorem Proving (NTP), DeepPath, MINERVA, but explainability5in them comes at the cost of performance. Others have tried to create an auxiliary explainable6model having high fidelity with the underlying TF model, but unfortunately, they do not scale well7to large KBs. In this work, we proposeOXKBC– anOutcome eXplanation engine forKBC,8which provides a post-hoc explanation for every triple inferred by a (uninterpretable) factorization9based model. It first augments the underlying Knowledge Graph by introducing weighted edges10between entities based on their similarity given by the underlying model. It then defines a notion11of human-understandable explanation paths along with a language to generate them. Depending12on the edges, the paths are aggregated into second–order templates for further selection. The best13template with its grounding is then selected by a neural selection module that is trained with minimal14supervision by a novel loss function. Experiments over Mechanical Turk demonstrate that users15overwhelmingly find our explanations more trustworthy compared to rule mining.
Shell
7
star
26

MPdialog

Python
6
star
27

TourismQA

Python
5
star
28

KGC-Ensemble

Python
5
star
29

pronci

Code for the paper: "Covid vaccine is against Covid but Oxford vaccine is made at Oxford!" Semantic Interpretation of Proper Noun Compounds (EMNLP 2022)
Python
4
star
30

asap-uct

This repository contains all source files corresponding to a novel MDP Planner - which combines abstractions/symmteries and UCT
C++
4
star
31

CDNet

Python
4
star
32

mokb6

ACL 2023 (main): Multilingual Open Knowledge Base Completion
Python
4
star
33

LocationTagger

This repository provides a Location Tagger, for identifying locations, using a BERT-CRF Tagger. It creates a Location chunk using IOB tags when it finds one or more location words.
Python
3
star
34

ZGUL

Python
3
star
35

output-space-invariance

Source code for Neural Models for Output-Space Invariance in Combinatorial Problems
Python
3
star
36

FuSIC-KBQA

3
star
37

symnet2

Python
2
star
38

octopus

Octopus: Cost-Quality-Time Optimization in Crowdsourcing
C++
2
star
39

con-mcmc

This repository maintains code base for contextual symmeties framework! "Contextual Symmetries in Graphical Models" Ankit Anand, Aditya Grover, Mausam and Parag Singla , International Joint Conference on Artificial Intelligence (IJCAI). New York, NY. July 2016.
GAP
2
star
40

trapsnet

Python
1
star
41

kbi-regex

Python
1
star
42

oga-uct

On-the-Go Abstractions in UCT
C++
1
star
43

sa-flonet

Python
1
star
44

trine

This page is under progress! Will be updated soon !
Java
1
star
45

symnet3

Python
1
star
46

1oML_workdir

Working directory for the paper Neural Learning One-of-Many Solutions for Combinatorial Problems in Structured Output Spaces
Python
1
star
47

nc-mcmc

GAP
1
star
48

Conjunction-Splitting

Conjunction splitting and its analysis
HolyC
1
star
49

FlexAE

Code for the paper "FlexAE: Flexibly Learning Latent Priors for Wasserstein Auto-Encoders"
Python
1
star
50

RetinaQA

Python
1
star
51

SpatialReasoner

This repository presents a detailed study of a spatial-reasoner using a simple artificially generated toy-dataset. This allows us to probe and study different aspects of spatial-reasoning in the absence of textual reasoning.
Python
1
star