awesome-chinese-ner
中文命名实体识别
延申
- 中文预训练模型综述
https://www.jsjkx.com/CN/10.11896/jsjkx.211200018 - 中文预训练模型下载地址
https://github.com/lonePatient/awesome-pretrained-chinese-nlp-models - 中文词向量下载地址
https://github.com/Embedding/Chinese-Word-Vectors - Bilstm_CRF怎么调参?
https://arxiv.org/pdf/1707.06799.pdf - 使用chatgpt进行信息抽取(实体、关系、事件)
Zero-Shot Information Extraction via Chatting with ChatGPT
演示地址:http://124.221.16.143:5000/
https://arxiv.org/pdf/2302.10205.pdf
https://github.com/cocacola-lab/ChatIE - GPT for Information Extraction
https://github.com/cocacola-lab/GPT4IE - Evaluation-of-ChatGPT-on-Information-Extraction
https://github.com/RidongHan/Evaluation-of-ChatGPT-on-Information-Extraction - 这篇把它放在延申这里:
Unified Text Structuralization with Instruction-tuned Language Models
2023
https://arxiv.org/pdf/2303.14956v2.pdf - GPT-NER: Named Entity Recognition via Large Language Models
2023
https://arxiv.org/pdf/2304.10428v1.pdf
https://github.com/ShuheWang1998/GPT-NER - EasyInstruct: An Easy-to-use Framework to Instruct Large Language Models
https://github.com/zjunlp/EasyInstruct - CODEIE: Large Code Generation Models are Better Few-Shot Information Extractors
在代码中进行实体和关系的提取
2023
https://arxiv.org/pdf/2305.05711v1.pdf
https://github.com/dasepli/CodeIE - PromptNER : Prompting For Named Entity Recognition
2023
https://arxiv.org/pdf/2305.15444v2.pdf
命名实体识别综述(中文)
- 基于深度学习的中文命名实体识别最新研究进展综述
2022年 中文信息学报
http://61.175.198.136:8083/rwt/125/http/GEZC6MJZFZZUPLSSGM3B/Qikan/Article/Detail?id=7107633068 - 命名实体识别方法研究综述
2022年 计算机科学与探索
http://fcst.ceaj.org/CN/10.3778/j.issn.1673-9418.2112109 - 中文命名实体识别综述
2021年 计算机科学与探索
http://fcst.ceaj.org/CN/abstract/abstract2902.shtml - Chinese named entity recognition: The state of the art
Neurocomputing 2022
link
模型
-
Attack Named Entity Recognition by Entity Boundary Interference
2023
https://arxiv.org/pdf/2305.05253v1.pdf -
Token Relation Aware Chinese Named Entity Recognition
ACM Transactions on Asian and Low-Resource Language Information Processing 2023
https://dl.acm.org/doi/10.1145/3531534 -
PUnifiedNER: a Prompting-based Unified NER System for Diverse Datasets
AAAI 2023
https://arxiv.org/pdf/2211.14838.pdf
https://github.com/GeorgeLuImmortal/PUnifiedNER -
END-TO-END ENTITY DETECTION WITH PROPOSER ANDREGRESSOR
借鉴目标检测的思想
2022
https://arxiv.org/pdf/2210.10260v2.pdf
https://github.com/Rosenberg37/EntityDetection -
DAMO-NLP at SemEval-2022 Task 11:A Knowledge-based System for Multilingual Named Entity Recognition
多语言的命名实体识别
2022
https://arxiv.org/pdf/2203.00545.pdf
https://github.com/Alibaba-NLP/KB-NER -
PCBERT: Parent and Child BERT for Chinese Few-shot NER
COLING 2022
https://aclanthology.org/2022.coling-1.192.pdf -
GNN-SL: Sequence Labeling Based on Nearest Examples via GNN
2022
https://arxiv.org/pdf/2212.02017.pdf
https://github.com/ShuheWang1998/GNN-SL -
EiCi: A New Method of Dynamic Embedding Incorporating Contextual Information in Chinese NER
这个和AMBERT的思想感觉差不多:AMBERT
2022
https://openreview.net/pdf?id=0TKg4UlnEEQ -
Deep Span Representations for Named Entity Recognition
2022
https://arxiv.org/pdf/2210.04182v1.pdf -
Mulco: Recognizing Chinese Nested Named Entities Through Multiple Scopes
2022
https://arxiv.org/pdf/2211.10854.pdf -
Unsupervised Boundary-Aware Language Model Pretraining for Chinese Sequence Labeling
EMNLP 2022
https://arxiv.org/pdf/2210.15231.pdf
http://github.com/modelscope/adaseq/examples/babert -
Domain-Specific NER via Retrieving Correlated Samples
COLING 2022
https://arxiv.org/pdf/2208.12995.pdf -
Two Languages Are Better than One: Bilingual Enhancement for Chinese Named Entity Recognition
COLING 2022
https://aclanthology.org/2022.coling-1.176.pdf -
A hybrid Transformer approach for Chinese NER with features augmentation
Expert Syst. Appl 2022
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4087645 -
Adaptive Threshold Selective Self-Attention for Chinese NER
COLING 2022
https://aclanthology.org/2022.coling-1.157.pdf -
Improving Chinese Named Entity Recognition by Search Engine Augmentation
2022
https://arxiv.org/pdf/2210.12662.pdf -
Domain-Specific NER via Retrieving Correlated Samples
COLING 2022
https://arxiv.org/pdf/2208.12995.pdf -
Robust Self-Augmentation for Named Entity Recognition with Meta Reweighting
NAACL 2022
https://arxiv.org/pdf/2204.11406.pdf
https://github.com/LindgeW/MetaAug4NER -
Boundary Smoothing for Named Entity Recognition
ACL 2022
https://arxiv.org/pdf/2204.12031v1.pdf
https://github.com/syuoni/eznlp -
NFLAT: Non-Flat-Lattice Transformer for Chinese Named Entity Recognition
2022
https://arxiv.org/pdf/2205.05832.pdf -
Unified Structure Generation for Universal Information Extraction
(一统实体识别、关系抽取、事件抽取、情感分析),百度UIE
ACL 2022
https://arxiv.org/pdf/2203.12277.pdf
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/uie
https://github.com/universal-ie/UIE
以下这篇也是通用的,只是英文方面的,没有中文数据上的实验:- DEEPSTRUCT: Pretraining of Language Models for Structure Prediction
2022
https://arxiv.org/pdf/2205.10475v1.pdf
https://github.com/cgraywang/deepstruct
- DEEPSTRUCT: Pretraining of Language Models for Structure Prediction
-
Parallel Instance Query Network for Named Entity Recognition
2022
https://arxiv.org/pdf/2203.10545v1.pdf -
Delving Deep into Regularity: A Simple but Effective Method for Chinese Named Entity Recognition
NAACL 2022
https://arxiv.org/pdf/2204.05544.pdf -
TURNER: The Uncertainty-based Retrieval Framework for Chinese NER
2022
https://arxiv.org/pdf/2202.09022 -
NN-NER: Named Entity Recognition with Nearest Neighbor Search
2022
https://arxiv.org/pdf/2203.17103
https://github.com/ShannonAI/KNN-NER -
Unified Named Entity Recognition as Word-Word Relation Classification
AAAI 2022
https://arxiv.org/abs/2112.10070
https://github.com/ljynlp/W2NER.git -
MarkBERT: Marking Word Boundaries Improves Chinese BERT
2022
https://arxiv.org/pdf/2203.06378 -
MFE-NER: Multi-feature Fusion Embedding for Chinese Named Entity Recognition
2021
https://arxiv.org/pdf/2109.07877 -
AdaK-NER: An Adaptive Top-K Approach for Named Entity Recognition with Incomplete Annotations
2021
https://arxiv.org/pdf/2109.05233 -
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information
ACL 2021
https://arxiv.org/pdf/2106.16038
https://github.com/ShannonAI/ChineseBert -
Enhanced Language Representation with Label Knowledge for Span Extraction
EMNLP 2021
https://aclanthology.org/2021.emnlp-main.379.pdf
https://github.com/Akeepers/LEAR -
Lex-BERT: Enhancing BERT based NER with lexicons
ICLR 2021
https://arxiv.org/pdf/2101.00396v1.pdf -
Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter
ACL 2021
https://arxiv.org/pdf/2105.07148.pdf
https://github.com/liuwei1206/LEBERT -
MECT: Multi-Metadata Embedding based Cross-Transformer for Chinese Named Entity Recognition
ACL 2021
https://arxiv.org/pdf/2107.05418v1.pdf
https://github.com/CoderMusou/MECT4CNER -
Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition
ACL 2021
https://arxiv.org/pdf/2105.06804v2.pdf
https://github.com/tricktreat/locate-and-label -
Dynamic Modeling Cross- and Self-Lattice Attention Network for Chinese NER
AAAI 2021
https://ojs.aaai.org/index.php/AAAI/article/view/17706/17513
https://github.com/zs50910/DCSAN-for-Chinese-NER -
Improving Named Entity Recognition with Attentive Ensemble of Syntactic Information
EMNLP-2020
https://arxiv.org/pdf/2010.15466
https://github.com/cuhksz-nlp/AESINER -
ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations
ACL 2020
https://arxiv.org/pdf/1911.00720v1.pdf
https://github.com/sinovation/ZEN -
A Unified MRC Framework for Named Entity Recognition
ACL 2020
https://arxiv.org/pdf/1910.11476v6.pdf
https://github.com/ShannonAI/mrc-for-flat-nested-ner -
Simplify the Usage of Lexicon in Chinese NER
ACL 2020
https://arxiv.org/pdf/1908.05969.pdf
https://github.com/v-mipeng/LexiconAugmentedNER -
A Boundary Regression Model for Nested Named Entity Recognition
2020
https://arxiv.org/pdf/2011.14330v3.pdf
https://github.com/yuelfei/BR -
Dice Loss for Data-imbalanced NLP Tasks
ACL 2020
https://arxiv.org/pdf/1911.02855v3.pdf
https://github.com/ShannonAI/dice_loss_for_NLP -
Porous Lattice Transformer Encoder for Chinese NER
COLING 2020
https://aclanthology.org/2020.coling-main.340.pdf -
FLAT: Chinese NER Using Flat-Lattice Transformer
ACL 2020
https://arxiv.org/pdf/2004.11795v2.pdf
https://github.com/LeeSureman/Flat-Lattice-Transformer -
FGN: Fusion Glyph Network for Chinese Named Entity Recognition
2020
https://arxiv.org/pdf/2001.05272v6.pdf
https://github.com/AidenHuen/FGN-NER -
SLK-NER: Exploiting Second-order Lexicon Knowledge for Chinese NER
2020
https://arxiv.org/pdf/2007.08416v1.pdf
https://github.com/zerohd4869/SLK-NER -
Entity Enhanced BERT Pre-training for Chinese NER
EMNLP 2020
https://aclanthology.org/2020.emnlp-main.518.pdf
https://github.com/jiachenwestlake/Entity_BERT -
Improving Named Entity Recognition with Attentive Ensemble of Syntactic Information
ACL2020
https://arxiv.org/pdf/2010.15466v1.pdf
https://github.com/cuhksz-nlp/AESINER -
Named Entity Recognition for Social Media Texts with Semantic Augmentation
EMNLP 2020
https://arxiv.org/pdf/2010.15458v1.pdf
https://github.com/cuhksz-nlp/SANER -
CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese
2020
https://arxiv.org/pdf/2001.04351v4.pdf
https://github.com/CLUEbenchmark/CLUENER2020 -
ERNIE: Enhanced Representation through Knowledge Integration
2019
https://arxiv.org/pdf/1904.09223v1.pdf
https://github.com/PaddlePaddle/ERNIE -
TENER: Adapting Transformer Encoder for Named Entity Recognition
2019
https://arxiv.org/pdf/1911.04474v3.pdf
https://github.com/fastnlp/TENER -
Chinese NER Using Lattice LSTM
ACL 2018
https://arxiv.org/pdf/1805.02023v4.pdf
https://github.com/jiesutd/LatticeLSTM -
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
2019
https://arxiv.org/pdf/1907.12412v2.pdf
https://github.com/PaddlePaddle/ERNIE -
Glyce: Glyph-vectors for Chinese Character Representations
NeurIPS 2019
https://arxiv.org/pdf/1901.10125v5.pdf
https://github.com/ShannonAI/glyce -
CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition
NAACL 2019
https://arxiv.org/pdf/1904.02141v3.pdf
https://github.com/microsoft/vert-papers/tree/master/papers/CAN-NER -
Neural Chinese Named Entity Recognition via CNN-LSTM-CRF and Joint Training with Word Segmentation
2019
https://arxiv.org/pdf/1905.01964v1.pdf
https://github.com/rxy007/cnn-lstm-crf -
Chinese Named Entity Recognition Augmented with Lexicon Memory
2019
https://arxiv.org/pdf/1912.08282v2.pdf
https://github.com/dugu9sword/LEMON -
Exploiting Multiple Embeddings for Chinese Named Entity Recognition
2019
https://arxiv.org/pdf/1908.10657v1.pdf
https://github.com/WHUIR/ME-CNER -
Dependency-Guided LSTM-CRF for Named Entity Recognition
IJCNLP 2019
https://arxiv.org/pdf/1909.10148v1.pdf
https://github.com/allanj/ner_with_dependency -
CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition
NAACL-HLT (1) 2019
https://aclanthology.org/N19-1342/ -
CNN-Based Chinese NER with Lexicon Rethinking
IJCAI 2019
https://www.ijcai.org/proceedings/2019/0692.pdf
https://aclanthology.org/N19-1342.pdf -
Leverage Lexical Knowledge for Chinese Named Entity Recognition via Collaborative Graph Network
IJCNLP 2019
https://aclanthology.org/D19-1396.pdf
https://github.com/DianboWork/Graph4CNER -
Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning
COLING 2018
https://aclanthology.org/C18-1183.pdf
https://github.com/rainarch/DSNER -
Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism
EMNLP 2018
https://aclanthology.org/D18-1017.pdf
https://github.com/CPF-NLPR/AT4ChineseNER
非中文模型
没有针对于中文的实验,但是思想可以借鉴的:
- DiffusionNER: Boundary Diffusion for Named Entity Recognition
2023
https://arxiv.org/pdf/2305.13298v1.pdf
https://github.com/tricktreat/DiffusionNER - Learning In-context Learning for Named Entity Recognition
ACL 2023
https://arxiv.org/pdf/2305.11038v1.pdf
https://github.com/chen700564/metaner-icl - UniEX: An Effective and Efficient Framework for Unified Information Extraction via a Span-extractive Perspective
2023
https://arxiv.org/pdf/2305.10306v1.pdf - Easy-to-Hard Learning for Information Extraction∗
2023
https://arxiv.org/pdf/2305.09193v1.pdf
https://github.com/DAMO-NLP-SG/IE-E2H - UTC-IE: A Unified Token-pair Classification Architecture for Information Extraction
2023
https://openreview.net/pdf?id=cRQwl-59CU8
https://github.com/yhcc/utcie - Deep Span Representations for Named Entity Recognition
Boundary Smoothing for Named Entity Recognition(同作者)
ACL 2023
https://github.com/syuoni/eznlp
https://arxiv.org/pdf/2210.04182v2.pdf - NER-to-MRC: Named-Entity Recognition Completely Solving as Machine Reading Comprehension
2023
https://arxiv.org/pdf/2305.03970v1.pdf - RexUIE: A Recursive Method with Explicit Schema Instructor for Universal Information Extraction
通用信息抽取,对比USM
2023
https://arxiv.org/pdf/2304.14770.pdf - InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction
(又一篇通用信息抽取,对比百度UIE以及USM)
2023
https://arxiv.org/pdf/2304.08085v1.pdf
https://github.com/BeyonderXX/InstructUIE - Universal Information Extraction as Unified Semantic Matching
通用的信息抽取:实体、关系、事件(没有在中文数据上的实验),简称USM
AAAI 2023
https://arxiv.org/pdf/2301.03282.pdf - MULTI-TASK TRANSFORMER WITH RELATION-ATTENTION AND TYPE-ATTENTION FOR NAMED ENTITY RECOGNITION
2023
https://arxiv.org/pdf/2303.10870v1.pdf - DEEPSTRUCT: Pretraining of Language Models for Structure Prediction
通用信息抽取
ACL 2022
https://arxiv.org/pdf/2205.10475v2.pdf
https://github.com/cgraywang/deepstruct - TOE: A Grid-Tagging Discontinuous NER Model Enhanced by Embedding Tag/Word Relations and More Fine-Grained Tags
改进W2NER模型
IEEE TASLP(Transactions on Audio, Speech and Language Processing)
https://arxiv.org/pdf/2211.00684.pdf
https://github.com/solkx/TOE - OPTIMIZING BI-ENCODER FOR NAMED ENTITY RECOGNITION VIA CONTRASTIVE LEARNING
ICLR 2023
https://arxiv.org/pdf/2208.14565v2.pdf
github.com/microsoft/binder - One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER
2023
https://arxiv.org/pdf/2301.10410v2.pdf
https://github.com/zjunlp/DeepKE/tree/main/example/ner/cross - QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition
2022
https://arxiv.org/pdf/2203.01543.pdf - A Unified Generative Framework for Various NER Subtasks
(使用BART生成模型进行命名实体识别)
ACL-ICJNLP 2021
https://arxiv.org/pdf/2106.01223.pdf
https://github.com/yhcc/BARTNER
(以下四篇是基于prompt的命名实体识别) - Template-Based Named Entity Recognition Using BART
https://arxiv.org/abs/2106.01760
https://github.com/Nealcly/templateNER - Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER
https://arxiv.org/abs/2110.08454
https://github.com/INK-USC/fewNER - LightNER: A Lightweight Generative Framework with Prompt-guided Attention for Low-resource NER
https://arxiv.org/abs/2109.00720
https://github.com/zjunlp/DeepKE/blob/main/example/ner/few-shot/README_CN.md - Template-free Prompt Tuning for Few-shot NER
https://arxiv.org/abs/2109.13532
https://github.com/rtmaww/EntLM/
数据集
- MSRA
- resume
- onenotes4
- onenotes5
- 一家公司提供的数据集,包含人名、地名、机构名、专有名词。
- 人民网(04年)
- 影视-音乐-书籍实体标注数据
- 中文医学文本命名实体识别 2020CCKS
- 医渡云实体识别数据集
- CLUENER2020
- 不同任务中文数据集整理
- 医疗相关的数据集
- 30+ner数据汇总
- 中文实体识别数据集汇总
预训练语言模型
- ChineseBert ACL2021
- MacBert 2020
- SpanBert
- XLNet
- Roberta
- Bert
- StructBert
- WoBert
- ELECTRA
- Ernie1.0
- Ernie2.0
- Ernie3.0
- ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding
- NeZha
- MengZi
- ZEN
- ALBERT
- roformer
- roformer-v2
- Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words
- PERT: Pre-Training BERT with Permuted Language Model
- RoChBert: Towards Robust BERT Fine-tuning for Chinese EMNLP2022
- MarkBERT: Marking Word Boundaries Improves Chinese BERT 2022
- MVP-BERT: REDESIGNING VOCABULARIES FOR CHINESE BERT AND MULTI-VOCAB PRETRAINING 2022
- LERT: A Linguistically-motivated Pre-trained Language Model 2022
- AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization 2022
- BURT: BERT-inspired Universal Representation from Learning Meaningful Segment 2021
- Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
- Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence
- AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models 多种方法
- TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning NAACL 2022
- Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models 2023
- MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model 2023
- sikuGPT 古文模型 2023
- UniIE 通用信息抽取
Ner工具
- Stanza
- LAC
- Ltp 哈工大
- Hanlp
- foolnltk
- NLTK
- BosonNLP
- FudanNlp 复旦大学
- Jionlp
- HarvestText
- fastHan
- EsayNLP 阿里巴巴
- PaddleNLP 百度
- AliceMind 阿里巴巴
- spacy
- DeepKE
- coreNlp JAVA/Python
- opennlp JAVA
- NLPIR
- trankit 多语言
- HugIE 通用信息抽取
- EasyInstruct
比赛
- CCKS2017开放的中文的电子病例测评相关的数据。
评测任务一:https://biendata.com/competition/CCKS2017_1/
评测任务二:https://biendata.com/competition/CCKS2017_2/ - CCKS2018开放的音乐领域的实体识别任务。
评测任务:https://biendata.com/competition/CCKS2018_2/ - (CoNLL 2002)Annotated Corpus for Named Entity Recognition。
地址:https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus - NLPCC2018开放的任务型对话系统中的口语理解评测。
地址:http://tcci.ccf.org.cn/conference/2018/taskdata.php - 非结构化商业文本信息中隐私信息识别
地址:https://www.datafountain.cn/competitions/472/datasets - 商品标题识别
地址:https://www.heywhale.com/home/competition/620b34ed28270b0017b823ad/content/3 - CCKS2021中文NLP地址要素解析
地址:https://tianchi.aliyun.com/competition/entrance/531900/introduction - CAIL2022信息抽取赛道
地址:http://cail.cipsc.org.cn/task6.html?raceID=6&cail_tag=2022 - 2019互联网金融新实体发现
- 2020CHIP-中药说明书实体识别挑战
- 2020CHIP-中文医学文本命名实体识别
- 2020CCKS面向试验鉴定的命名实体识别
- 2020CCKS面向中文电子病历的医疗实体及事件抽取-子任务1:医疗命名实体识别
- LAIC2022-犯罪事实实体识别
- SemEval-2023 Task 2: Fine-grained Multilingual Named Entity Recognition (MultiCoNER 2)
- 新型电力系统人工智能应用大赛赛题二:电力生产知识图谱多模式信息抽取
- CCKS2022通用信息抽取