Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

Elixir

MATLAB

R

Dart

F#

Zig

C++

Groovy

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

Perl

TypeScript

Ruby

Zig

Scala

Rust

PHP

Crystal

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇰🇪 Kenya

🇲🇷 Mauritania

🇱🇰 Sri Lanka

🇬🇪 Georgia

🇨🇽 Christmas Island

🇲🇽 Mexico

🇬🇩 Grenada

🇵🇫 French Polynesia

All Countries Compare Countries

zhoujx4/NLP-Series-relation-extraction

Stars
101
Rank 338,166 (Top 7 %)
Language
Python
Created over 3 years ago
Updated over 3 years ago

zhoujx4/NLP-Series-relation-extraction

zhoujx4

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

NLP关系抽取：序列标注、层叠式指针网络、Multi-head Selection、Deep Biaffine Attention

项目说明:

本项目是关系抽取相关模型的代码复现
包括以下四种方法

序列标注
层叠式指针网络（基于主语感知）
Multi-head Selection
Deep Biaffine Attention

用的数据是百度21年语言技术经验竞赛抽取赛道的数据，四种方法的效果如下表，更详细的请看我的知乎博文 https://zhuanlan.zhihu.com/p/381894616

	F1值
官方baseline	64.69
层叠式指针网络（基于主语感知）	61.22
Multi-head Selection	67.90
Deep Biaffine Attention	68.45

环境

python=3.6
torch=1.7
transformers=4.5.0

运行示例

序列标注

python3 run_baseline.py
--max_len=200
--model_name_or_path=预训练模型路径
--per_gpu_train_batch_size=80
--per_gpu_eval_batch_size=100
--learning_rate=1e-4
--num_train_epochs=40
--output_dir="./output"
--weight_decay=0.01
--early_stop=2

层叠式指针网络（基于主语感知）

python3 run_mpn.py
--max_len=200
--model_name_or_path=预训练模型路径
--per_gpu_train_batch_size=100
--per_gpu_eval_batch_size=100
--learning_rate=1e-4
--num_train_epochs=40
--output_dir="./output"
--weight_decay=0.01
--early_stop=2

Multi-head Selection

python3 run_mhs.py
--max_len=200
--model_name_or_path=/data/zhoujx/prev_trained_model/rbt3
--per_gpu_train_batch_size=25
--per_gpu_eval_batch_size=30
--learning_rate=1e-4
--num_train_epochs=40
--output_dir="./output"
--weight_decay=0.01
--early_stop=2

Deep Biaffine Attention

python3 run_mhs_biaffine.py
--max_len=200
--model_name_or_path=/data/zhoujx/prev_trained_model/rbt3
--per_gpu_train_batch_size=15
--per_gpu_eval_batch_size=20
--learning_rate=1e-4
--num_train_epochs=40
--output_dir="./output"
--weight_decay=0.01
--early_stop=2
--overwrite_cache=True

NLP-Series-sentence-embeddings

NLP句子编码、句子embedding、语义相似度：BERT_avg、BERT_whitening、SBERT、SmiCSE

NLP-Series-text-cls

文本分类baseline：BERT、半监督学习UDA、对抗学习、数据增强

Jupyter Notebook

DuEE

百度2021年语言与智能技术竞赛多形态信息抽取赛道事件抽取部分torch版baseline

NLP-Data-Augmentation

NLP文本增强的两种方式：同义词替换（利用word2vec词表）和回译

DuReader-Checklist-BASELINE

百度2021年语言与智能技术竞赛机器阅读理解torch版baseline

DuIE

百度2021年语言与智能技术竞赛多形态信息抽取赛道关系抽取部分torch版baseline

NLP-Series-NewWordsMining-PTMPretraining

NLP实验：新词挖掘+预训练模型继续Pre-training

NLP-Series-text-generation-PGN

pointer-network文本生成

Crawls

爬虫程序汇总：Scrapy和selenium爬取房天下、3房网、选哪儿网、土流网等

NLP-topic_model

中文主题模型构建

Jupyter Notebook

NLP-Series-Unified-IE

统一信息抽取实践

Knowledge-Graph

知识图谱案例

Spark

用pyspark实现的一个KMeans聚类算法