Chinese NER Using Lattice LSTM
Lattice LSTM for Chinese NER. Character based LSTM with Lattice embeddings as input.
Models and results can be found at our ACL 2018 paper Chinese NER Using Lattice LSTM. It achieves 93.18% F1-value on MSRA dataset, which is the state-of-the-art result on Chinese NER task.
Details will be updated soon.
Requirement:
Python: 2.7
PyTorch: 0.3.0
(for PyTorch 0.3.1, please refer issue#8 for a slight modification.)
Input format:
CoNLL format (prefer BIOES tag scheme), with each character its label for one line. Sentences are splited with a null line.
ηΎ B-LOC
ε½ E-LOC
η O
ε B-PER
θ± I-PER
士 E-PER
ζ O
θ· O
δ» O
θ° O
η¬ O
ι£ O
η O
Pretrained Embeddings:
The pretrained character and word embeddings are the same with the embeddings in the baseline of RichWordSegmentor
Character embeddings (gigaword_chn.all.a2b.uni.ite50.vec): Google Drive or Baidu Pan
Word(Lattice) embeddings (ctb.50d.vec): Google Drive or Baidu Pan
How to run the code?
- Download the character embeddings and word embeddings and put them in the
data
folder. - Modify the
run_main.py
orrun_demo.py
by adding your train/dev/test file directory. sh run_main.py
orsh run_demo.py
Resume NER data
Crawled from the Sina Finance, it includes the resumes of senior executives from listed companies in the Chinese stock market. Details can be found in our paper.
Cite:
Please cite our ACL 2018 paper:
@article{zhang2018chinese,
title={Chinese NER Using Lattice LSTM},
author={Yue Zhang and Jie Yang},
booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL)},
year={2018}
}