• Stars
    star
    99
  • Rank 343,315 (Top 7 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 9 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Automatic Entity Recognition and Typing for Domain-Specific Corpora (KDD'15)

ClusType

Source code for SIGKDD'15 paper ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering (Slides).

Given a text corpus (e.g., a collection of news articles), it performs automatically entity extraction and typing using distant supervision (i.e., examples from external knowledge bases like Freebase). For example, from the sentence "The best BBQ I’ve tasted in Phoenix " the system will recognize BBQ as food and phoenix as location. More background can be found in our WWW'16 tutorial.

ClusType works on coarse-grained entity types (e.g., Person, Location, Organization); for more fine-grained entity typing, please use AFET (Ren et al., EMNLP'16).

Data

  • NYT:
    • Corpus: 110k New York Times news articles (download)
    • Seed entities: entity linking result by DBpediaSpotlight (download)
  • Yelp:
  • Tweet:
    • Corpus: 302k tweets from May 2011 (download)
    • Seed entities: entity linking result by DBpediaSpotlight (download)

System Output & Evaluation

The system output on NYT dataset can be downloaded from here. We evaluated the result over ~1k (20,874 annotated entity mentions) gold standard set. Sample output on 50k Yelp reviews can be download from here.

Evaluate the result:

python src/evaluation.py -ResultPath -GroundTruthPath

Dependencies

  • python 2.7
  • numpy, scipy, scikit-learn, lxml, TextBlob and related corpora
$ sudo pip install numpy scipy sklearn lxml textblob
$ sudo python -m textblob.download_corpora

Default Run

$ ./run.sh  

Run.sh - File path setup

We take Yelp dataset as an example.

Input: text corpus path.

RawText='data/yelp/yelp_230k.txt'
  • format: "docId \TAB document \n"

Input: type mapping file path.

TypeFile='data/yelp/type_tid.txt'
  • format: "type name \TAB typeId \n". "NIL" means "Not-of-Interest"

Input: mapping between Freebase and DBpedia entities.

FreebaseMap='data/freebase_links.nt'

Output: output file from candidate generation (format: "docId \TAB segmented sentence \n").

SegmentOutFile='result/segment.txt'
  • Segments are separated by ",". Entity mention candidates are marked with ":EP". Relation phrases are marked with ":RP".

Output: entity linking result (please download the corresponding seed entity files).

SeedFile='data/yelp/seed_yelp.txt'
  • Format: "docId \TAB entity name \TAB Original Freebase Type \TAB Refined Type \TAB Freebase EntityID \TAB Similarity Score \TAB Relative Rank \n".
  • NOTE: Our entity linking module calls DBpediaSpotLight Web service, which has limited querying speed. This process can be largely accelarated by installing the tool on your local machine Link.

Output: entity mentions found in each document.

ResultFile='result/yelp/results.txt'
  • Format: "docId \TAB entity mention \TAB entity type \n".

Output: In-text annotation of entity mentions.

ResultFileInText='result/yelp/resultsInText.txt'

Run.sh - Model parameters

Threshold on significance score for candidate generation.

significance="2"

Switch on capitalization feature for candidate generation.

capitalize="1"

Maximal phrase length for candidate generation.

maxLength='4'

Minimal support of phrases for candidate generation.

minSup='30'

Number of relation phrase clusters.

NumRelationPhraseClusters='500'

Reference

@inproceedings{ren2015clustype,
  title={Clustype: Effective entity recognition and typing by relation phrase-based clustering},
  author={Ren, Xiang and El-Kishky, Ahmed and Wang, Chi and Tao, Fangbo and Voss, Clare R and Han, Jiawei},
  booktitle={Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
  pages={995--1004},
  year={2015},
  organization={ACM}
}

More Repositories

1

RE-Net

Recurrent Event Network: Autoregressive Structure Inference over Temporal Knowledge Graphs (EMNLP 2020)
Python
436
star
2

USC-DS-RelationExtraction

Distantly Supervised Relation Extraction
C++
419
star
3

KagNet

Knowledge-Aware Graph Networks for Commonsense Reasoning (EMNLP-IJCNLP 19)
Python
271
star
4

MHGRN

Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering (EMNLP 2020)
Python
246
star
5

TriggerNER

TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)
Python
173
star
6

CommonGen

A Constrained Text Generation Challenge Towards Generative Commonsense Reasoning
Python
139
star
7

AlpacaTag

AlpacaTag: An Active Learning-based Crowd Annotation Framework for Sequence Tagging (ACL 2019 Demo)
HTML
137
star
8

CrossFit

Code for paper "CrossFit πŸ‹οΈ: A Few-shot Learning Challenge for Cross-task Generalization in NLP" (https://arxiv.org/abs/2104.08835)
Python
102
star
9

temporal-gcn-lstm

Code for Characterizing and Forecasting User Engagement with In-App Action Graphs: A Case Study of Snapchat
Python
77
star
10

AFET

AFET: Automatic Fine-Grained Entity Typing (EMNLP'16)
Python
57
star
11

CPL

Collaborative Policy Learning for Open Knowledge Graph Reasoning (EMNLP 2019)
Python
56
star
12

PLE

Label Noise Reduction in Entity Typing (KDD'16)
C++
53
star
13

NERO

Source Code for paper "NERO: A Neural Rule Grounding Framework for Label-Efficient Relation Extraction", WWW 2020
Python
47
star
14

fewNER

Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER (ACL 2022)
Python
43
star
15

StructMineDataPipeline

Performs entity detection, distant supervision, candidate generation, and produces JSON files for typing systems (PLE, AFET, CoType)
C++
43
star
16

shifted-label-distribution

Source code for paper "Looking Beyond Label Noise: Shifted Label Distribution Matters in Distantly Supervised Relation Extraction" (EMNLP 2019)
C++
39
star
17

DualRE

Source code for paper: "Learning Dual Retrieval Module for Semi-supervised Relation Extraction"
Python
36
star
18

hierarchical-explanation-neural-sequence-models

Source code for "Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models", ICLR 2020.
Python
30
star
19

CALM

Source code for ICLR 2021 paper : Pre-training Text-to-Text Transformers for Concept-Centric Common Sense
Python
27
star
20

ReQuest

Indirect Supervision for Relation Extraction Using Question-Answer Pairs (WSDM'18)
C++
24
star
21

DIG

Discretized Integrated Gradients for Explaining Language Models (EMNLP 2021)
Python
24
star
22

LEAN-LIFE

Label Efficient Learning From Explanations
Python
23
star
23

XCSR

Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"
Python
22
star
24

ReCross

ReCross: Unsupervised Cross-Task Generalization via Retrieval Augmentation
Python
22
star
25

VisCOLL

Code and data for the project "Visually grounded continual learning of compositional semantics"
Python
21
star
26

DArtNet

Temporal Attribute Prediction via Joint Modeling of Multi-Relational Structure Evolution
Python
19
star
27

NumerSense

The data and code for NumerSense (EMNLP2020)
Python
19
star
28

NExT

Source Code for paper "Learning from Explanations with Neural Execution Tree", ICLR 2020
Python
18
star
29

GMED

Source code for "Gradient Based Memory Editing for Task-Free Continual Learning", 4th Lifelong ML Workshop@ICML 2020
Python
17
star
30

HGN

Learning Contextualized Knowledge Structures for Commonsense Reasoning
Python
17
star
31

SalKG

This is the official PyTorch implementation of our NeurIPS 2021 paper: "SalKG: Learning From Knowledge Graph Explanations for Commonsense Reasoning"
Python
14
star
32

FaiRR

FaiRR: Faithful and Robust Deductive Reasoning over Natural Language (ACL 2022)
Python
14
star
33

hypter

Zero-shot Learning by Generating Task-specific Adapters
Python
14
star
34

FiD-ICL

"FiD-ICL: A Fusion-in-Decoder Approach for Efficient In-Context Learning" (ACL 2023)
Python
13
star
35

IsoBN

IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
Python
13
star
36

sparse-distillation

Code for "Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models"
Python
12
star
37

expl-refinement

Code for the paper "Refining Language Model with Compositional Explanation" (NeurIPS 2021)
Python
12
star
38

RiddleSense

RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge
Python
12
star
39

ConNet

Python
12
star
40

entity-robustness

Code and data for paper "On the Robustness of Reading Comprehension Models to Entity Renaming" (NAACL'22)
Python
11
star
41

mrc-explanation

Source Code for "Teaching Machine Comprehension with Compositional Explanations" (Findings of EMNLP 2020)
Python
11
star
42

Reflect

Data and Code for Paper "Reflect Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality" (EMNLP 2022)
Python
11
star
43

rockner

Python
10
star
44

BITE

Code and data for paper "BITE: Textual Backdoor Attacks with Iterative Trigger Injection"
Python
9
star
45

CLIF

Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"
Python
8
star
46

G-PlanET

Python
8
star
47

procedural-extraction

Code for paper Eliciting Knowledge from Experts: Automatic Transcript Parsing for Cognitive Task Analysis, in proceedings of ACL 2019
Python
8
star
48

XMD

XMD: An End-to-End Framework for Interactive Explanation-Based Debugging of NLP Models
Vue
7
star
49

RobustLR

A Diagnostic Benchmark for Evaluating Logical Robustness of Deductive Reasoners
Python
7
star
50

RationaleMultiRewardDistillation

Code and Dataset for preprint titled "Tailoring Self-Rationalizers with Multi-Reward Distillation"
Python
6
star
51

LINK

Code for paper "In Search of the Long-Tail: Systematic Generation of Long-Tail Knowledge via Logical Rule Guided Search"
Python
6
star
52

Upstream-Bias-Mitigation

Code and data for NAACL 2021 paper "On Transferability of Bias Mitigation Effects in Language Model Fine-Tuning"
Python
5
star
53

RationaleHumanUtility

Codebase for Human Utility of FTRs at ACL 2023
Python
5
star
54

Lifelong-ICL

Code for paper "Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack"
Jupyter Notebook
4
star
55

PE2

Code for paper "Prompt Engineering a Prompt Engineer" (https://arxiv.org/abs/2311.05661)
Python
4
star
56

deceive-KG-models

An implementation of the experiments on KG robustness
Python
4
star
57

ER-Test

Code for ER-Test, accepted to the Findings of EMNLP 2022
Python
3
star
58

get-started-on-dl-experiments

2
star
59

ink-usc.github.io

INK Research Lab Website
JavaScript
2
star
60

CrossTaskMoE

Code for paper "Eliciting and Understanding Cross-task Skills with Task-level Mixture-of-Experts" (Findings of EMNLP 2022)
Python
2
star
61

predicting-big-bench

Code for paper "How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench"
Python
2
star
62

bias-mitigation-via-transfer-learning

Source code for Arxiv paper: Efficiently Mitigating Classification Bias via Transfer Learning
2
star
63

Controllable-AV-Explanations

Python
1
star
64

lm-forgetting-prediction-code

Python
1
star
65

MACROSCORE

MACROSCORE - Scoring Scientific Research
Jupyter Notebook
1
star