• Stars
    star
    2,062
  • Rank 22,268 (Top 0.5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Chinese NER using Bert

BERT for Chinese NER.

update:其他一些可以参考,包括Biaffine、GlobalPointer等:examples

dataset list

  1. cner: datasets/cner
  2. CLUENER: https://github.com/CLUEbenchmark/CLUENER

model list

  1. BERT+Softmax
  2. BERT+CRF
  3. BERT+Span

requirement

  1. 1.1.0 =< PyTorch < 1.5.0
  2. cuda=9.0
  3. python3.6+

input format

Input format (prefer BIOS tag scheme), with each character its label for one line. Sentences are splited with a null line.

美	B-LOC
国	I-LOC
的	O
华	B-PER
莱	I-PER
士	I-PER

我	O
跟	O
他	O

run the code

  1. Modify the configuration information in run_ner_xxx.py or run_ner_xxx.sh .
  2. sh scripts/run_ner_xxx.sh

note: file structure of the model

├── prev_trained_model
|  └── bert_base
|  |  └── pytorch_model.bin
|  |  └── config.json
|  |  └── vocab.txt
|  |  └── ......

CLUENER result

The overall performance of BERT on dev:

Accuracy (entity) Recall (entity) F1 score (entity)
BERT+Softmax 0.7897 0.8031 0.7963
BERT+CRF 0.7977 0.8177 0.8076
BERT+Span 0.8132 0.8092 0.8112
BERT+Span+adv 0.8267 0.8073 0.8169
BERT-small(6 layers)+Span+kd 0.8241 0.7839 0.8051
BERT+Span+focal_loss 0.8121 0.8008 0.8064
BERT+Span+label_smoothing 0.8235 0.7946 0.8088

ALBERT for CLUENER

The overall performance of ALBERT on dev:

model version Accuracy(entity) Recall(entity) F1(entity) Train time/epoch
albert base_google 0.8014 0.6908 0.7420 0.75x
albert large_google 0.8024 0.7520 0.7763 2.1x
albert xlarge_google 0.8286 0.7773 0.8021 6.7x
bert google 0.8118 0.8031 0.8074 -----
albert base_bright 0.8068 0.7529 0.7789 0.75x
albert large_bright 0.8152 0.7480 0.7802 2.2x
albert xlarge_bright 0.8222 0.7692 0.7948 7.3x

Cner result

The overall performance of BERT on dev(test):

Accuracy (entity) Recall (entity) F1 score (entity)
BERT+Softmax 0.9586(0.9566) 0.9644(0.9613) 0.9615(0.9590)
BERT+CRF 0.9562(0.9539) 0.9671(0.9644) 0.9616(0.9591)
BERT+Span 0.9604(0.9620) 0.9617(0.9632) 0.9611(0.9626)
BERT+Span+focal_loss 0.9516(0.9569) 0.9644(0.9681) 0.9580(0.9625)
BERT+Span+label_smoothing 0.9566(0.9568) 0.9624(0.9656) 0.9595(0.9612)

More Repositories

1

awesome-pretrained-chinese-nlp-models

Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
Python
4,712
star
2

Bert-Multi-Label-Text-Classification

This repo contains a PyTorch implementation of a pretrained BERT model for multi-label text classification.
Python
860
star
3

albert_pytorch

A Lite Bert For Self-Supervised Learning Language Representations
Python
708
star
4

NeZha_Chinese_PyTorch

NEZHA: Neural Contextualized Representation for Chinese Language Understanding
Python
261
star
5

lookahead_pytorch

pytorch implement of Lookahead Optimizer
Python
188
star
6

TorchBlocks

A PyTorch-based toolkit for natural language processing
Python
151
star
7

daguan_2019_rank9

datagrand 2019 information extraction competition rank9
Python
130
star
8

BiLSTM-CRF-NER-PyTorch

This repo contains a PyTorch implementation of a BiLSTM-CRF model for named entity recognition task.
Python
120
star
9

Deep_Learning_For_Computer_Vision_With_Python

Deep Learning For Computer Vision With Python
Python
118
star
10

BERT-chinese-text-classification-pytorch

This repo contains a PyTorch implementation of a pretrained BERT model for text classification.
Python
99
star
11

electra_pytorch

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Python
91
star
12

CLUE_pytorch

CLUE baseline pytorch CLUE的pytorch版本基线
Python
73
star
13

MobileBert_PyTorch

MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Python
61
star
14

BERT-Attribute-Value-Extract

A Pytorch implementation of "Scaling Up Open Tagging from Tens to Thousands: Comprehension Empowered Attribute Value Extraction from Product Title" (ACL 2019).
Python
59
star
15

multi-sample_dropout_pytorch

a simple pytorch implement of Multi-Sample Dropout
Python
56
star
16

BERT-SDA

A PyTorch implementation of "Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation"
Python
56
star
17

ERNIE-text-classification-pytorch

This repo contains a PyTorch implementation of a pretrained ERNIE model for text classification.
Python
54
star
18

chinese-word2vec-pytorch

word2vec implementation for skip-gram in pytorch
Python
53
star
19

bert-sentence-similarity-pytorch

This repo contains a PyTorch implementation of a pretrained BERT model for sentence similarity task.
Python
49
star
20

label_smoothing_pytorch

pytorch implement of Label Smoothing
Python
32
star
21

EvoNorms_PyTorch

Evolving Normalization-Activation Layers
Python
19
star
22

NovoGrad-pytorch

pytorch implement of NovoGrad Optimizer
Python
18
star
23

cw2vec-pytorch

cw2vec implementation in pytorch
Python
17
star
24

train-bert-pytorch

Python
15
star
25

knowledge-driven-dialogue-lic2019-rank5

2019语言与智能技术竞赛第5名方案
Python
14
star
26

2021-GAIIC-Track1-idea

全球人工智能技术创新大赛【赛道一】
10
star
27

pytorch_fashionMNIST_practice

使用pytorch进行图像训练的模板
Python
9
star
28

keras_learning

Jupyter Notebook
9
star
29

Contextual-Chinese-Strokes-Embeddings

Implementation of the language model for Contextual chinese strokes Embeddings with PyTorch
Python
8
star
30

lonePatient.github.io

HTML
6
star
31

kaggle-camera-model-identification

IEEE's Signal Processing Society - Camera Model Identification
Python
6
star
32

tensorflow-eager-examples

Examples of Eager Execution in tensorflow
Python
6
star
33

char-cnn-text-classification

This repo contains a PyTorch implementation of a char-level CNN model for text classification.
Python
3
star