• Stars
    star
    138
  • Rank 264,508 (Top 6 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created about 5 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Implementation of Nested Named Entity Recognition using BERT

Implementation of Nested Named Entity Recognition

Some files are part of NeuroNLP2.

Requirements

We tested this library with the following libraries:

Running experiments

Testing this library with a sample data

  1. Run the gen_data.py to generate the processed data files for training, and they will be placed at the "./data/" directory
    python gen_data.py
  2. Run the train.py to start training
    python train.py

Reproducing our experiment on the ACE-2004 dataset

  1. Put the corpus ACE-2004 into the "../ACE2004/" directory
  2. Put this .tgz file into the "../" and extract it
  3. Run the parse_ace2004.py to extract sentences for training, and they will be placed at the "./data/ace2004/"
    python parse_ace2004.py
  4. Run the gen_data_for_ace2004.py to prepare the processed data files for training, and they will be placed at the "./data/" directory
    python gen_data_for_ace2004.py
  5. Run the train.py to start training
    python train.py

Reproducing our experiment on the ACE-2005 dataset

  1. Put the corpus ACE-2005 into the "../ACE2005/" directory
  2. Put this .tgz file into the "../" and extract it
  3. Run the parse_ace2005.py to extract sentences for training, and they will be placed at the "./data/ace2005/"
    python parse_ace2005.py
  4. Run the gen_data_for_ace2005.py to prepare the processed data files for training, and they will be placed at the "./data/" directory
    python gen_data_for_ace2005.py
  5. Run the train.py to start training
    python train.py

Reproducing our experiment on the GENIA dataset

  1. Put the corpus GENIA into the "../GENIA/" directory
  2. Run the parse_genia.py to extract sentences for training, and they will be placed at the "./data/genia/"
    python parse_genia.py
  3. Run the gen_data_for_genia.py to prepare the processed data files for training, and they will be placed at the "./data/" directory
    python gen_data_for_genia.py
  4. Run the train.py to start training
    python train.py

Configuration

Configurations of the model and training are in config.py

Citation

Please cite our paper:

@article{shibuya-hovy-2020-nested,
  title = "Nested Named Entity Recognition via Second-best Sequence Learning and Decoding",
  author = "Shibuya, Takashi and Hovy, Eduard",
  journal = "Transactions of the Association for Computational Linguistics",
  volume = "8",
  year = "2020",
  doi = "10.1162/tacl_a_00334",
  pages = "605--620",
}