• Stars
    star
    972
  • Rank 47,098 (Top 1.0 %)
  • Language
    Python
  • Created over 4 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

code for ACL 2020 paper: FLAT: Chinese NER Using Flat-Lattice Transformer

English 中文

Flat-Lattice-Transformer

code for ACL 2020 paper: FLAT: Chinese NER Using Flat-Lattice Transformer.

Models and results can be found at our ACL 2020 paper FLAT: Chinese NER Using Flat-Lattice Transformer.

Requirement:

Python: 3.7.3
PyTorch: 1.2.0
FastNLP: 0.5.0
Numpy: 1.16.4

you can go here to know more about FastNLP.

How to run the code?

  1. Download the character embeddings and word embeddings.

    Character and Bigram embeddings (gigaword_chn.all.a2b.{'uni' or 'bi'}.ite50.vec) : Google Drive or Baidu Pan

    Word(Lattice) embeddings:

    yj, (ctb.50d.vec) : Google Drive or Baidu Pan

    ls, (sgns.merge.word.bz2) : Baidu Pan

  2. Modify the paths.py to add the pretrained embedding and the dataset

  3. Run following commands

python preprocess.py (add '--clip_msra' if you need to train FLAT on MSRA NER dataset)
cd V0 (without Bert) / V1 (with Bert)
python flat_main.py --dataset <dataset_name> (ontonotes, msra, weibo or resume)

If you want to record experiment result, you can use fitlog:

pip install fitlog
fitlog init V0
cd V0
fitlog log logs

then set use_fitlog = True in flat_main.py.

you can go here to know more about Fitlog.

Cite:

bibtex


运行环境:

Python: 3.7.3
PyTorch: 1.2.0
FastNLP: 0.5.0
Numpy: 1.16.4

你可以在 这里 深入了解 FastNLP 这个库.

如何运行?

  1. 请下载预训练的embedding

    Google DriveBaidu Pan 下载字和 Bigram 的 embedding (gigaword_chn.all.a2b.{'uni' or 'bi'}.ite50.vec)

    Google DriveBaidu Pan 下载词的 embedding (ctb.50d.vec)(yj)

    Baidu Pan 下载词的embedding (sgns.merge.bigram.bz2)(ls)

  2. 修改 paths.py 来添加预训练的 embedding 和你的数据集

  3. 运行下面的代码

python preprocess.py (add '--clip_msra' if you need to train FLAT on MSRA NER dataset)
cd V0 (without Bert) / V1 (with Bert)
python flat_main.py --dataset <dataset_name> (ontonotes, msra, weibo or resume)

如果你想方便地记录和观察实验结果, 你可以使用fitlog:

pip install fitlog
fitlog init V0
cd V0
fitlog log logs

然后把flat_main.py里的 use_fitlog 设置为 True 就行 你可以在 这里 深入了解 Fitlog 这个工具

引用:

bibtex

更新说明:

5.7共提交两个版本,其中V2使用tensor.unique()用于去除相对位置中重复组合(记为Flat_unique),V3使用标量替代了FLAt中的相对位置编码(记为Flat_scalar).详见FLAT瘦身日记
使用这两种方法的显存占用如下表所示
batch_size=10

seq_len 50 100 150 200 250 300
Flat 1096MB 1668MB 2734MB 4118MB 5938MB 8374MB
Flat_unique 964MB 1204MB 1610MB 2166MB 2922MB 3940MB
Flat_scalar 878MB 916MB 1028MB 1062MB 1148MB 1322MB
Bert+Flat 1605MB 2237MB 3333MB 4725MB 6571MB 9039MB
Bert+Flat_unique 1495MB 1685MB 2129MB 2697MB 3453MB 4585MB
Bert+Flat_scalar 1409MB 1481MB 1565MB 1617MB 1705MB 2051MB