• Stars
    star
    567
  • Rank 78,634 (Top 2 %)
  • Language PostScript
  • License
    Other
  • Created over 7 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Open IE 5.1

This project contains the principal Open Information Extraction (Open IE) system from the University of Washington (UW) and Indian Institute of Technology,Delhi (IIT Delhi). An Open IE system runs over sentences and creates extractions that represent relations in text. For example, consider the following sentence.

The U.S. president Barack Obama gave his speech on Tuesday and Wednesday to thousands of people.

There are many binary relations in this sentence that can be expressed as a triple (A, B, C) where A and B are arguments, and C is the relation between those arguments. Since Open IE is not aligned with an ontology, the relation is a phrase of text. Here is a possible list of the binary relations in the above sentence:

(Barack Obama, is the president of, United States)
(Barack Obama, gave, his speech)
(Barack Obama, gave his speech, on Tuesday)
(Barack Obama, gave his speech, on Wednesday)
(Barack Obama, gave his speech, to thousands of people)

The first extraction in the above list is a "noun-mediated extraction", because the extraction has a relation phrase is described by the noun "president". The other extractions are very similar. In fact, they can be represented more informatively as an n-ary extraction. An n-ary extraction can have 0 or more secondary arguments. Here is a possible list of the n-ary relations in the sentence:

(Barack Obama, is the president of, United States)
(Barack Obama, gave, [his speech, on Tuesday, on Wednesday, to thousands of people])

Extractions can include more than just the arguments and relation as well. For example, we might be interested in whether the extraction is a negative assertion or a positive assertion, or if it is conditional in some way. Consider the following sentence:

Some people say Barack Obama was born in Kenya.

We would not want to extract that (Barack Obama, was born, in Kenya) alone because this is not true. However, if we have the condition as well, we can have a correct extraction.

Some people say:(Barack Obama, was born in, Kenya)

To see an example of Open IE being used, please visit http://openie.cs.washington.edu/.

Improvements over Open IE 4.0

Firstly, Open IE 5.1 improves upon extractions from numerical sentences. For example, consider the following sentence.

Barack Obama is 6 feet tall.

Open IE 5.1 gives the following extractions:

(Barack Obama, has height of, 6 feet)
(Barack Obama, is, 6 feet tall)

Open IE 5.1 can also extract implicit numerical relations from units in sentences. For example, consider the following sentence.

James Valley has 5 sq kms of fruit orchards.

The extractions are the following:

(James Valley, has area of fruit orchards, 5 sq kms)
(James Valley, has, 5 sq kms of fruit orchards)

Secondly, Open IE 5.1 improves upon conjunctive sentences by breaking conjunctions in arguments to generate multiple extractions. For example, consider the following sentence.

Jack and Jill visited India, Japan and South Korea.

Open IE 5.1 gives the following extractions:

(Jack, visited, India)
(Jill, visited, India)
(Jack, visited, Japan)
(Jill, visited, Japan)
(Jack, visited, South Korea)
(Jill, visited, South Korea)

Citing Open IE 5.1

Open IE 5.1 is a combination of CALMIE(Extraction from conjunctive sentences), BONIE(Extraction from Numerical Sentences), RelNoun (Noun Relations Extraction) and SRLIE. The relevant papers are:

  1. CALMIE - Swarnadeep Saha, Mausam. "Open Information Extraction from Conjunctive Sentences." International Conference on Computational Linguistics (COLING). Santa Fe, NM, USA. August 2018. [paper]

  2. BONIE - Swarnadeep Saha, Harinder Pal, Mausam. "Bootstrapping for Numerical Open IE". Annual Meeting of the Association for Computational Linguistics (ACL). Vancouver, Canada. August 2017. [paper] [data]

  3. RelNoun - Harinder Pal, Mausam. "Demonyms and Compound Relational Nouns in Nominal Open IE". Workshop on Automated Knowledge Base Construction (AKBC) at NAACL. San Diego, CA, USA. June 2016. [paper]

  4. SRLIE - Janara Christensen, Mausam, Stephen Soderland, Oren Etzioni. "An Analysis of Open Information Extraction based on Semantic Role Labeling". International Conference on Knowledge Capture (KCAP). Banff, Alberta, Canada. June 2011. [paper]

A survey paper summarizing about ten years of progress in Open IE:

Mausam. "Open Information Extraction Systems and Downstream Applications". Invited Paper for Early Career Spotlight Track. International Joint Conference on Artificial Intelligence (IJCAI). New York, NY. July 2016. [paper]

Google Group

Research

Open IE 5.1 is the successor to Open IE 4.x and Open IE 4.x is the successor to Ollie. Open IE 5.1 improves extractions from noun relations(RelNoun), numerical sentences(BONIE) and conjunctive sentences(CALMIE). Whereas Ollie used bootstrapped dependency parse paths to extract relations (see Open Language Learning for Information Extraction), Open IE 4.x uses similar argument and relation expansion heuristics to create Open IE extractions from SRL frames. Open IE 4.x also extends the defintion of Open IE extractions to include n-ary extractions (extractions with 0 or more arguments 2s).

Building

First, download the standalone jar for BONIE from here and place it inside a lib folder(create the lib folder parallel to the src folder).

Also, download the standalone jar for CALMIE from here and place it inside the lib folder.

CALMIE uses Berkeley Language Model. Download the Language Model file from here and place it inside a data folder(create the data folder parallel to the src folder)

openie uses java-8-openjdk & the sbt build system, so downloading dependencies and compiling is simple:

  1. Add sbt/bin to your path.
  2. Run compile.sh

Open IE uses scala 2.10.2. In case of a version mismatch problem, try using Scala 2.10.2.

Using pre-compiled OpenIE standalone jar

If you are unable to compile the jar locally on your machine, you can directly use the jar from here. Note that you would still need the Language Model file and Wordnet folders in the correct locations.

This jar has been compiled on an ubuntu machine. Thus, it might not work if there's a platform (or version) change, in which case it is recommended to build the jar locally.

Running

You can run openie with sbt or create a stand-alone jar. openie requires substantial memory. sbt is configured to use these options by default:

-Xmx10G -XX:+UseConcMarkSweepGC

OpenIE's large memory requirements largely accounts to the fact that it currently uses Berkeley Language Model in the background.

Running with sbt

For running without jar:

sbt 'runMain edu.knowitall.openie.OpenIECli'

Running from a stand-alone jar.

First create the stand-alone jar.

sbt clean compile assembly

You may need to add the above memory options.

sbt -J-Xmx10000M clean compile assembly

Then you can run the resulting jar file as normal.

java -jar openie-assembly.jar

You may need to add the above memory options.

java -Xmx10g -XX:+UseConcMarkSweepGC -jar openie-assembly.jar

The WordNet folder and the data/languageModel file must be placed parallel to the standalone openie jar, while running it.

Running as HTTP Server

OpenIE 5.1 can be run as a server. For this, server port is required as an argument.

java -jar openie-assembly.jar --httpPort 8000

To run the server with memory options.

java -Xmx10g -XX:+UseConcMarkSweepGC -jar openie-assembly.jar --httpPort 8000

To get an extraction from the server use the POST request on '/getExtraction' address. The sentence will go in the body of HTTP request. An example of curl request.

curl -X POST http://localhost:8000/getExtraction -d 'The U.S. president Barack Obama gave his speech on Tuesday to thousands of people.'

The response is a JSON list of extractions.

Running with Python

Python Wrapper for OpenIE 5.1, along with instructions and examples can be found here

Command Line Interface

openie takes one sentence per line unless --split is specified. If --split is specified, the input text will be split into sentences. You can either pipe input from Standard Input, specify an input file (an option first argument), or type sentences interactively. Output will be written to Standard Output unless a second option argument is specified for an output file.

openie takes a number of command line arguments. To see them all run java -jar openie-assembly.jar --usage. Of particular interest are --ignore-errors which continues running even if an exception is encountered, --binary which gives the binary(triples) output and --split which splits the input document text into sentences.

There are two formats--a simple format made for ease of reading and a columnated format used for machine processing. The format can be specified with either --format simple or --format column. The simple format is chosen by default.

Contact

More Repositories

1

openie6

OpenIE6 system
Python
119
star
2

imojie

Neural generation model for Open Information Extraction
Python
79
star
3

CaRB

CaRB - A Crowdsourced Benchmark for Open IE
Python
40
star
4

jeebench

JEEBench, EMNLP 2023
Python
28
star
5

tkbi

Python
23
star
6

KBI

Python
23
star
7

dl-with-constraints

Code for experiments in 'Primal Dual Formulation For Deep Learning With Constraints'
Python
21
star
8

BossNet

BossNet: Disentangling Language and Knowledge in Task Oriented Dialogs
Python
16
star
9

ECQA-Dataset

Dataaset Release for Explanations for CommonsenseQA, ACL 2021 Paper
Python
15
star
10

DeGPR

Python
14
star
11

DSRE

Resources for the paper "PARE: A Simple and Strong Baseline for Monolingual and Multilingual Distantly Supervised Relation Extraction"
Python
13
star
12

FloNet

Code for "End-to-End Learning of Flowchart Grounded Task-Oriented Dialogs"
Python
13
star
13

nsrmp

NSRM: Neuro-Symbolic Robot Manipulation
Python
12
star
14

DiS-ReX

Python
10
star
15

FloDial

10
star
16

ilploss

official repo for the NeurIPS 2022 paper "A Solver-Free Framework for Scalable Learning in Neural ILP Architectures"
Python
9
star
17

NS-KGC-AUG

Python
9
star
18

moie

Python
8
star
19

ECQA

Code Repository for the Explanations for CommonsenseQA, ACL 2021 paper
Python
8
star
20

kglr

Python
8
star
21

PoolingAnalysis

[EMNLP'20][Findings] Official Repository for the paper "Why and when should you pool? Analyzing Pooling in Recurrent Architectures."
Python
8
star
22

symnet

Python
7
star
23

torpido

Planning using Reinforcement Learning
Python
7
star
24

BoxCell

Official Repo for "Guided Prompting in SAM for Weakly Supervised Cell Segmentation in Histopathological Images"
Python
7
star
25

OxKBC

State-of-the-art models for Knowledge Base Completion (KBC) for large KBs (such as FB15k1and YAGO) are based on tensor factorization (TF), e.g, DistMult, ComplEx. While they produce2good results, they cannot expose any rationale behind their predictions, potentially reducing the3trust of a user in the outcome of the model. Previous works have explored creating an inherently4explainable model, e.g. Neural Theorem Proving (NTP), DeepPath, MINERVA, but explainability5in them comes at the cost of performance. Others have tried to create an auxiliary explainable6model having high fidelity with the underlying TF model, but unfortunately, they do not scale well7to large KBs. In this work, we proposeOXKBC– anOutcome eXplanation engine forKBC,8which provides a post-hoc explanation for every triple inferred by a (uninterpretable) factorization9based model. It first augments the underlying Knowledge Graph by introducing weighted edges10between entities based on their similarity given by the underlying model. It then defines a notion11of human-understandable explanation paths along with a language to generate them. Depending12on the edges, the paths are aggregated into second–order templates for further selection. The best13template with its grounding is then selected by a neural selection module that is trained with minimal14supervision by a novel loss function. Experiments over Mechanical Turk demonstrate that users15overwhelmingly find our explanations more trustworthy compared to rule mining.
Shell
7
star
26

MPdialog

Python
6
star
27

TourismQA

Python
5
star
28

KGC-Ensemble

Python
5
star
29

pronci

Code for the paper: "Covid vaccine is against Covid but Oxford vaccine is made at Oxford!" Semantic Interpretation of Proper Noun Compounds (EMNLP 2022)
Python
4
star
30

asap-uct

This repository contains all source files corresponding to a novel MDP Planner - which combines abstractions/symmteries and UCT
C++
4
star
31

CDNet

Python
4
star
32

mokb6

ACL 2023 (main): Multilingual Open Knowledge Base Completion
Python
4
star
33

LocationTagger

This repository provides a Location Tagger, for identifying locations, using a BERT-CRF Tagger. It creates a Location chunk using IOB tags when it finds one or more location words.
Python
3
star
34

ZGUL

Python
3
star
35

output-space-invariance

Source code for Neural Models for Output-Space Invariance in Combinatorial Problems
Python
3
star
36

FuSIC-KBQA

3
star
37

symnet2

Python
2
star
38

octopus

Octopus: Cost-Quality-Time Optimization in Crowdsourcing
C++
2
star
39

con-mcmc

This repository maintains code base for contextual symmeties framework! "Contextual Symmetries in Graphical Models" Ankit Anand, Aditya Grover, Mausam and Parag Singla , International Joint Conference on Artificial Intelligence (IJCAI). New York, NY. July 2016.
GAP
2
star
40

trapsnet

Python
1
star
41

kbi-regex

Python
1
star
42

oga-uct

On-the-Go Abstractions in UCT
C++
1
star
43

sa-flonet

Python
1
star
44

trine

This page is under progress! Will be updated soon !
Java
1
star
45

symnet3

Python
1
star
46

1oML_workdir

Working directory for the paper Neural Learning One-of-Many Solutions for Combinatorial Problems in Structured Output Spaces
Python
1
star
47

nc-mcmc

GAP
1
star
48

Conjunction-Splitting

Conjunction splitting and its analysis
HolyC
1
star
49

FlexAE

Code for the paper "FlexAE: Flexibly Learning Latent Priors for Wasserstein Auto-Encoders"
Python
1
star
50

RetinaQA

Python
1
star
51

SpatialReasoner

This repository presents a detailed study of a spatial-reasoner using a simple artificially generated toy-dataset. This allows us to probe and study different aspects of spatial-reasoning in the absence of textual reasoning.
Python
1
star