• This repository has been archived on 19/Apr/2021
  • Stars
    star
    124
  • Rank 288,207 (Top 6 %)
  • Language
    Jupyter Notebook
  • Created over 4 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Framework to learn Named Entity Recognition models without labelled data using weak supervision.

Weak supervision for NER

BIG FAT WARNING: This codebase is now deprecated and has been replaced by our brand-new skweak framework, please check it out!

Source code associated with the paper "Named Entity Recognition without Labelled Data: a Weak Supervision Approach" accepted to ACL 2020.

Requirements:

You should first make sure that the following Python packages are installed:

  • spacy (version >= 2.2)
  • hmmlearn
  • snips-nlu-parsers
  • pandas
  • numba
  • scikit-learn

You should also install the en_core_web_sm and en_core_web_md models in Spacy.

To run the neural models in ner.py, you need also need pytorch, cupy, keras and tensorflow installed.

To run the baselines, you will also need to have snorkel installed.

Finally, you also need to download the following files and add them to the data directory:

Quick start

You should first convert your corpus to Spacy DocBin format.

Then, to run all labelling functions on your corpus, you can simply:

import annotations
annotator = annotations.FullAnnotator().add_all()
annotator.annotate_docbin('path_to_your_docbin_corpus')

You can then estimate an HMM model that aggregates all sources:

import labelling
hmm = labelling.HMMAnnotator()
hmm.train('path_to_your_docbin_corpus')

And run it on your corpus to get the aggregated labels:

hmm.annotate_docbin('path_to_your_docbin_corpus')

Step-by-step instructions

More detailed instructions with a step-by-step example are available in the Jupyter Notebook Weak Supervision.ipynb. Don't forget to run it using Jupyter to get the visualisation for the NER annotations.

More Repositories

1

skweak

skweak: A software toolkit for weak supervision applied to NLP tasks
Python
918
star
2

shapr

Explaining the output of machine learning models with more accurately estimated Shapley values
R
145
star
3

text-anonymization-benchmark

Annotated corpus + evaluation metrics for text anonymisation
Python
49
star
4

NeuralTextSanitizer

Neural models for detecting and masking personal information from texts
Python
14
star
5

skchange

skchange provides sktime-compatible change detection and changepoint-based anomaly detection algorithms
Python
8
star
6

streamchange

A package for segmenting streaming time series data into homogenous segments. The segmentation is based on statistical change-point detection (aka online/sequential/iterative change-point detection).
Python
7
star
7

mccepy

Python package to generate counterfactuals using Monte Carlo sampling of realistic counterfactual explanations
Python
5
star
8

smms

R package for fitting semi-markovian multistate models
R
5
star
9

explego

eXplego is a decision tree toolkit that provides developers with interactive guidance to help select an appropriate XAI-method for their particular use case.
5
star
10

SpatGEVBMA

R-package fitting a Bayesian hierarchical spatial model for the generalized extreme value distribution with the option of model averaging over the space of covariates
R
2
star
11

nrresqml

Python package for converting Delft3D models (netcdf) to ResQml
Python
2
star
12

mcceR

R
2
star
13

naturinngrep

Python
2
star
14

vargrest

Variogram estimation for ResQml models (related to nrresqml)
Python
1
star
15

NRTools

Tools used by Alex Lenkoski at the Norwegian Computing Center to develop R packages
R
1
star
16

channest

Channel parameter estimation for ResQml models (related to nrresqml)
Python
1
star
17

spatioTemporalIndices

R
1
star
18

GraphDial

Python framework for graph-based dialogue management
Python
1
star
19

IQD

Functions to calculate the integrated qudratic distance (IQD)
R
1
star
20

IVBMA

R
1
star
21

ai4artic_snow

Estimation of snow-parameters from sentinel-3 data using deep learning
Python
1
star