• Stars
    star
    454
  • Rank 96,373 (Top 2 %)
  • Language
    Scala
  • License
    Other
  • Created over 11 years ago
  • Updated almost 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Quality information extraction at web scale.

** DEPRECATED! ** Please see https://github.com/dair-iitd/OpenIE-standalone, which has combined multiple projects into a single project and maintains the latest version of Open IE (Open IE 5). It is based on another repository https://github.com/allenai/openie-standalone, which has an older version of Open IE.

Open IE

This project contains the principal Open Information Extraction (Open IE) system from the University of Washington (UW). An Open IE system runs over sentences and creates extractions that represent relations in text. For example, consider the following sentence.

The U.S. president Barack Obama gave his speech on Tuesday to thousands of people.

There are many binary relations in this sentence that can be expressed as a triple (A, B, C) where A and B are arguments, and C is the relation between those arguments. Since Open IE is not aligned with an ontology, the relation is a phrase of text. Here is a possible list of the binary relations in the above sentence:

(Barack Obama, is the president of, the U.S.)
(Barack Obama, gave, his speech)
(Barack Obama, gave his speech, on Tuesday)
(Barack Obama, gave his speech, to thousands of people)

The first extraction in the above list is a "noun-mediated extraction", because the extraction has a relation phrase is described by the noun "president". The other extractions are very similar. In fact, they can be represented more informatively as an n-ary extraction. An n-ary extraction can have 0 or more secondary arguments. Here is a possible list of the n-ary relations in the sentence:

(Barack Obama, is the president of, the U.S.)
(Barack Obama, gave, [his speech, on Tuesday, to thousands of people])

Extractions can include more than just the arguments and relation as well. For example, we might be interested in whether the extraction is a negative assertion or a positive assertion, or if it is conditional in some way. Consider the following sentence:

Some people say Barack Obama was born in Kenya.

We would not want to extract that (Barack Obama, was born, in Kenya) alone because this is not true. However, if we have the condition as well, we can have a correct extraction.

Some people say:(Barack Obama, was born in, Kenya)

To see an example of Open IE being used, please visit http://openie.cs.washington.edu/.

Citing Open IE 4

Open IE 4 is a combination of SRLIE and Relnoun. The closest papers for these two are:

  1.  Janara Christensen, Mausam, Stephen Soderland, Oren Etzioni. "An Analysis of Open Information Extraction based on Semantic Role Labeling". International Conference on Knowledge Capture (KCAP). Banff, Alberta, Canada. June 2011.
    
  2.  Harinder Pal, Mausam. "Demonyms and Compound Relational Nouns in Nominal Open IE". Workshop on Automated Knowledge Base Construction (AKBC) at NAACL. San Diego, CA, USA. June 2016.
    

A survey paper summarizing about ten years of progress in Open IE:

  1.  Mausam. "Open Information Extraction Systems and Downstream Applications". Invited Paper for Early Career Spotlight Track. International Joint Conference on Artificial Intelligence (IJCAI). New York, NY. July 2016.
    

Notifications

Google Group

Research

Open IE 4.x is the successor to Ollie. Whereas Ollie used bootstrapped dependency parse paths to extract relations (see Open Language Learning for Information Extraction), Open IE 4.x uses similar argument and relation expansion heuristics to create Open IE extractions from SRL frames. Open IE 4.x also extends the defintion of Open IE extractions to include n-ary extractions (extractions with 0 or more arguments 2s).

Buiding

openie uses java-7-openjdk & the sbt build system, so downloading dependencies and compiling is simple. Just run:

sbt compile

Running

You can run openie with sbt or create a stand-alone jar. openie requires substantial memory. sbt is configured to use these options by default:

-Xmx4G -XX:+UseConcMarkSweepGC

Running with sbt

sbt 'run-main edu.knowitall.openie.OpenIECli'

Running from a stand-alone jar.

First create the stand-alone jar.

sbt clean compile assembly

You may need to add the above memory options.

sbt -J-Xmx2700M clean compile assembly

Then you can run the resulting jar file as normal.

java -jar openie-assembly.jar

You may need to add the above memory options.

java -Xmx4g -XX:+UseConcMarkSweepGC -jar openie-assembly.jar

Command Line Interface

openie takes one sentence per line unless --split is specified. If --split is specified, the input text will be split into sentences. You can either pipe input from Standard Input, specify an input file (an option first argument), or type sentences interactively. Output will be written to Standard Output unless a second option argument is specified for an output file.

openie takes a number of command line arguments. To see them all run java -jar openie-assembly.jar --usage. Of particular interest are --ignore-errors which continues running even if an exception is encountered, --binary which gives the binary(triples) output and --split which splits the input document text into sentences.

There are two formats--a simple format made for ease of reading and a columnated format used for machine processing. The format can be specified with either --format simple or --format column. The simple format is chosen by default.

Java Demo

A simple java demo which uses openIE (https://github.com/OpenIE-HelperCodes/OpenIEDemo1)

Simple Format

> John ran down the road to fetch a pail of water.
John ran down the road to fetch a pail of water.
0.86 (John; ran; down the road; to fetch a pail of water)
0.82 John ran:(John; ran down the road to fetch; a pail of water)

Columnated Format

Columns are separated by tab, making it hard to read in this README.

0.8576784836790008	John	ran	down the road; to fetch a pail of water	John ran down the road to fetch a pail of water.
0.8195727266148489	John ran	John	ran down the road to fetch	a pail of water	John ran down the road to fetch a pail of water.

Contributors

More Repositories

1

reverb

Web-Scale Open Information Extraction
Java
541
star
2

ollie

Ollie is a open information extractor that uses bootstrapped dependency paths.
Scala
242
star
3

nlptools

A toolkit that wraps various natural language processing implementations behind a common interface.
Scala
101
star
4

openregex

An efficient and flexible token-based regular expression language and engine.
Java
74
star
5

yelp-dataset-challenge

Information extraction over restaurant reviews for the Yelp Dataset Challenge
Python
28
star
6

chunkedextractor

Extractors whose input is a chunked sentence. Includes Relnoun, Nesty, and a scala interface for ReVerb.
Scala
28
star
7

implie

Implicit relation extractor using a natural language model.
Scala
25
star
8

morpha

Morpha lex stemmer converted using jflex.
Java
22
star
9

srlie

The SRL-based Open IE extractor. A principal component of Open IE 4.0.
Scala
19
star
10

common-scala

The UW's library for common routines in scala.
Scala
13
star
11

taggers

Easily identify and label sentence intervals using various taggers.
Scala
11
star
12

DocOpenIE

Document-level information extraction.
Scala
7
star
13

triplestore-qa

Question answering over a triplestore
Scala
7
star
14

openie-demo

The main Open IE demo.
CSS
6
star
15

MultirFramework

Java
5
star
16

Tac2013EntityLinking

Scala
4
star
17

nlpweb

A demonstration of various NLP tools.
CSS
4
star
18

documentextractor

A web application to process documents into extractions and annotate those extractions.
CSS
4
star
19

common-java

Java
3
star
20

hadoop-clueweb

A collection of Hadoop jobs to process ClueWeb into sentences.
Scala
3
star
21

openregex-scala

A scala wrapper for OpenRegex.
Scala
2
star
22

relgrams

Relgrams -- Tool for computing relational co-occurrences.
Scala
2
star
23

openie-backend

Backend code for the Open IE demo (largely deprecated after Rob's efforts to move Open IE to Paralex).
Scala
2
star
24

UIUCWikifier2013Wrapper

Java
2
star
25

extraction-demo

A project for creating extractions from a list of sentences and providing a demo for exploring Open IE extractions. The primary purpose for this project is for exploration of Open IE in the IARPA project.
CSS
2
star
26

MultirExtractor

Java
1
star
27

clueweb-hadoop

1
star
28

kbp-MultiR

Java
1
star
29

KBP2014-Slotfilling-Multir

Scala
1
star
30

tac2013

locationHelper
Scala
1
star