davidsbatista/NER-Evaluation

Stars
213
Rank 185,410 (Top 4 %)
Language
Python
License
MIT License
Created over 6 years ago
Updated 5 months ago

davidsbatista/NER-Evaluation

davidsbatista

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity

Named Entity Evaluation as in SemEval 2013 task 9.1

My own implementation, with lots of input from Matt Upson, of the Named-Entity Recognition evaluation metrics as defined by the SemEval 2013 - 9.1 task.

This evaluation metrics go belong a simple token/tag based schema, and consider diferent scenarios based on wether all the tokens that belong to a named entity were classified or not, and also wether the correct entity type was assigned.

You can find a more detailed explanation in the following blog post:

http://www.davidsbatista.net/blog/2018/05/09/Named_Entity_Evaluation/

Notes:

In scenarios IV and VI the entity type of the true and pred does not match, in both cases we only scored against the true entity, not the predicted one. You can argue that the predicted entity could also be scored as spurious, but according to the definition of spurious:

Spurius (SPU) : system produces a response which doesn’t exist in the golden annotation;

In this case it exists an annotation, but only with a different entity type, so we assume it's only incorrect

Example:

You can see a working example on the following notebook:

example-full-named-entity-evaluation.ipynb

Note that in order to run that example you need to have installed:

sklearn
nltk
sklearn_crfsuite

For testing you will need:

pytest
coverage

These dependencies can be installed by running pip3 install -r requirements.txt

Code tests and tests coverage:

To run tests:

coverage run --rcfile=setup.cfg -m pytest

To produce a coverage report:

coverage report

Annotated-Semantic-Relationships-Datasets

A collections of public and free annotated datasets of relationships between entities/nominals (Portuguese and English)

NER-datasets

Datasets to train supervised classifiers for Named-Entity Recognition in different languages (Portuguese, German, Dutch, French, English)

Snowball

Implementation with some extensions of the paper "Snowball: Extracting Relations from Large Plain-Text Collections" (Agichtein and Gravano, 2000)

BREDS

"Bootstrapping Relationship Extractors with Distributional Semantics" (Batista et al., 2015) in EMNLP'15 - Python implementation

Aspect-Based-Sentiment-Analysis

Aspect-Based Sentiment Analysis Experiments

text-classification

An example on how to train supervised classifiers for multi-label text classification using sklearn pipelines

Jupyter Notebook

ConvNets-for-Sentence-Classification

"Convolutional Neural Networks for Sentence Classification" (Kim 2014) - https://www.aclweb.org/anthology/D14-1181

Jupyter Notebook

machine-learning-notebooks

Assorted exercises and proof-of-concepts to understand and study machine learning and statistical learning theory

Jupyter Notebook

lexicons

Dictionaries of names, surnames, acronyms and it's extensions, stop-words, etc., which I gathered for different experiments.

TAC-Entity-Linking

An entity linking prototype, developed using the datasets from the TAC-KBP sub-task.

awesome-Portuguese-NLP

A list of libraries and NLP projects for Portuguese

information-extraction-PT

An example of triples extraction with PoS-tags using ReVerb

REACTION-resources

Resources developed by and for the project REACTION (Retrieval, Extraction and Aggregation Computing Technology for Integrating and Organizing News) an initiative for developing a computational journalism platform (mostly) for Portuguese.

StanfordNER-experiments

SLANG-Sequence-LAbeliNG

Sequence LAbeliNG with Neural Networks: "Neural Architectures for Named Entity Recognition" (Lample et al., 2016) and "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF" (Ma, 2016)

Jupyter Notebook

Toponym-Disambiguation-Using-Ontology-Based-Semantic-Similarity

Toponym Disambiguation using Ontology-based Semantic Similarity.

coding-exercises

A repository of coding interview questions and solutions

MuSICo

A Minwise Hashing Method for Addressing Relationship Extraction from Text

bash-shell-utils

bash scripts, sed examples, and other stuff that I need from time to time

Temporal-Information-Datasets

minhash-classifier

supervised relationship extraction based on min-hash and locality sensitive hashing

NER-English-Gigaword-LDC

Python scripts to parse the Gigaword collection and perform NER tagging with StanfordNER

dbpedia-webapps

Simple webapps, relying on DBpedia as a data-source.

GermEval-2019-Task_1

GermEval 2019 Task 1 - Shared Task on Hierarchical Classification of Blurbs

ml-report-kit

A plug-in to generate various evaluation metrics and reports ( PR-curves, classifications reports, confusion matrix) for supervised machine learning models using only two lines of code.

nostalgia

Old projects of mine, done during high-school or university and found in old hard-drives

politiquices

Explore relações de apoio e oposição, entre personalidades políticas, expressas em títulos de notícias preservadas no arquivo.pt

Snowball-Java

Snowball: Extracting Relations from Large Plain-Text Collections

GermEval-2017-Aspect-Based-Sentiment-Analysis

davidsbatista.net

my personal homepage and blog