• Stars
    star
    213
  • Rank 185,410 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 6 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity

Named Entity Evaluation as in SemEval 2013 task 9.1

My own implementation, with lots of input from Matt Upson, of the Named-Entity Recognition evaluation metrics as defined by the SemEval 2013 - 9.1 task.

This evaluation metrics go belong a simple token/tag based schema, and consider diferent scenarios based on wether all the tokens that belong to a named entity were classified or not, and also wether the correct entity type was assigned.

You can find a more detailed explanation in the following blog post:

Notes:

In scenarios IV and VI the entity type of the true and pred does not match, in both cases we only scored against the true entity, not the predicted one. You can argue that the predicted entity could also be scored as spurious, but according to the definition of spurious:

  • Spurius (SPU) : system produces a response which doesn’t exist in the golden annotation;

In this case it exists an annotation, but only with a different entity type, so we assume it's only incorrect

Example:

You can see a working example on the following notebook:

Note that in order to run that example you need to have installed:

  • sklearn
  • nltk
  • sklearn_crfsuite

For testing you will need:

  • pytest
  • coverage

These dependencies can be installed by running pip3 install -r requirements.txt

Code tests and tests coverage:

To run tests:

coverage run --rcfile=setup.cfg -m pytest

To produce a coverage report:

coverage report

More Repositories

1

Annotated-Semantic-Relationships-Datasets

A collections of public and free annotated datasets of relationships between entities/nominals (Portuguese and English)
683
star
2

NER-datasets

Datasets to train supervised classifiers for Named-Entity Recognition in different languages (Portuguese, German, Dutch, French, English)
Python
337
star
3

Snowball

Implementation with some extensions of the paper "Snowball: Extracting Relations from Large Plain-Text Collections" (Agichtein and Gravano, 2000)
Python
177
star
4

BREDS

"Bootstrapping Relationship Extractors with Distributional Semantics" (Batista et al., 2015) in EMNLP'15 - Python implementation
Python
145
star
5

Aspect-Based-Sentiment-Analysis

Aspect-Based Sentiment Analysis Experiments
Python
133
star
6

text-classification

An example on how to train supervised classifiers for multi-label text classification using sklearn pipelines
Jupyter Notebook
110
star
7

ConvNets-for-Sentence-Classification

"Convolutional Neural Networks for Sentence Classification" (Kim 2014) - https://www.aclweb.org/anthology/D14-1181
Jupyter Notebook
54
star
8

machine-learning-notebooks

Assorted exercises and proof-of-concepts to understand and study machine learning and statistical learning theory
Jupyter Notebook
44
star
9

lexicons

Dictionaries of names, surnames, acronyms and it's extensions, stop-words, etc., which I gathered for different experiments.
29
star
10

TAC-Entity-Linking

An entity linking prototype, developed using the datasets from the TAC-KBP sub-task.
Java
28
star
11

awesome-Portuguese-NLP

A list of libraries and NLP projects for Portuguese
19
star
12

information-extraction-PT

An example of triples extraction with PoS-tags using ReVerb
Python
16
star
13

REACTION-resources

Resources developed by and for the project REACTION (Retrieval, Extraction and Aggregation Computing Technology for Integrating and Organizing News) an initiative for developing a computational journalism platform (mostly) for Portuguese.
9
star
14

StanfordNER-experiments

Python
8
star
15

SLANG-Sequence-LAbeliNG

Sequence LAbeliNG with Neural Networks: "Neural Architectures for Named Entity Recognition" (Lample et al., 2016) and "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF" (Ma, 2016)
Jupyter Notebook
7
star
16

Toponym-Disambiguation-Using-Ontology-Based-Semantic-Similarity

Toponym Disambiguation using Ontology-based Semantic Similarity.
Python
6
star
17

coding-exercises

A repository of coding interview questions and solutions
Python
6
star
18

MuSICo

A Minwise Hashing Method for Addressing Relationship Extraction from Text
Java
5
star
19

bash-shell-utils

bash scripts, sed examples, and other stuff that I need from time to time
Shell
4
star
20

Temporal-Information-Datasets

3
star
21

minhash-classifier

supervised relationship extraction based on min-hash and locality sensitive hashing
Python
3
star
22

NER-English-Gigaword-LDC

Python scripts to parse the Gigaword collection and perform NER tagging with StanfordNER
Python
3
star
23

dbpedia-webapps

Simple webapps, relying on DBpedia as a data-source.
Python
3
star
24

GermEval-2019-Task_1

GermEval 2019 Task 1 - Shared Task on Hierarchical Classification of Blurbs
Python
2
star
25

ml-report-kit

A plug-in to generate various evaluation metrics and reports ( PR-curves, classifications reports, confusion matrix) for supervised machine learning models using only two lines of code.
Python
2
star
26

nostalgia

Old projects of mine, done during high-school or university and found in old hard-drives
HTML
1
star
27

politiquices

Explore relações de apoio e oposição, entre personalidades políticas, expressas em títulos de notícias preservadas no arquivo.pt
1
star
28

Snowball-Java

Snowball: Extracting Relations from Large Plain-Text Collections
Java
1
star
29

GermEval-2017-Aspect-Based-Sentiment-Analysis

Python
1
star
30

davidsbatista.net

my personal homepage and blog
HTML
1
star