• Stars
    star
    153
  • Rank 243,368 (Top 5 %)
  • Language
    Python
  • Created almost 3 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

experiments of some semantic matching models and comparison of experimental results.

中文文本语义匹配模型集锦

数据说明

训练集(数量) 验证集(数量) 测试集(数量)
ATEC 62477 20000 20000
BQ 100000 10000 10000
LCQMC 238766 8802 12500
PAWSX 49401 2000 2000
STS-B 5231 1458 1361

评价指标的说明

  • 皮尔逊系数(pearsonr): 是衡量两个连续型变量的线性相关关系。
  • 斯皮尔曼相关系数(spearmanr): 是衡量两变量之间的单调关系,两个变量同时变化,但是并非同样速率变化,即并非一定是线性关系。

实验结果:

没有专门去调参。 无监督的模型从训练集中随机采样了10000条数据。下面是在测试集上的结果。对最终结果影响比较大的就是学习率。尽可能的小就行。

斯皮尔曼系数(spearmanr)对比:

ATEC BQ LCQMC PAWSX STS-B Avg
Word2Vec (unsup) 16.4936 24.6732 29.8313 7.4375 23.5855 20.4042
SimCSE (unsup) 30.8634 49.1813 68.9802 9.5895 71.3976 46.0024
PromptBERT (unsup) 34.9434 48.7067 67.7634 14.3244 71.4191 47.4314
GS-infoNCE (unsup) 28.9731 46.3247 67.3204 11.2317 73.2998 45.4299
ESimCSE (unsup) 31.8443 48.0718 66.8673 9.1819 65.1843 44.2299
ConSERT (unsup) 29.7437 46.7806 67.5121 8.1442 74.1097 45.2580
SentenceBert (sup) 48.5157 67.8545 79.6023 60.1675 71.0148 65.4309
CoSENT (sup) 50.5969 72.5191 79.3777 60.5475 80.4344 68.6951
SimCSE (sup) ** ** ** ** ** **

皮尔逊相关系数(pearsonr)对比:

ATEC BQ LCQMC PAWSX STS-B Avg
Word2Vec (unsup) 14.2917 18.1433 24.7312 10.6328 12.9765 16.1551
SimCSE (unsup) 33.1678 49.0413 57.5075 9.9956 72.8918 44.5207
PromptBERT (unsup) 35.6218 48.6450 59.8181 13.5495 71.7247 45.8718
GS-infoNCE (unsup) 30.3781 46.2700 57.2458 10.3298 74.4048 43.7257
ESimCSE (unsup) 32.6815 47.9271 52.8407 10.5426 65.2000 41.8383
ConSERT (unsup) 31.1873 46.6954 60.7141 8.2408 75.3964 44.4468
SentenceBert (sup) 45.4922 66.3670 75.2732 57.7105 71.4540 63.2593
CoSENT (sup) 50.4301 72.5830 77.6607 57.6305 78.5165 67.3641
SimCSE (sup) ** ** ** ** ** **

  • Word2Vec模型没有在大规模的语料上训练,只是在训练集上训练过,然后再测试集上做的推理。
  • 上述无监督的深度学习模型都采用的是“CLS”向量

More Repositories

1

NLP_pytorch_project

Embedding, NMT, Text_Classification, Text_Generation, NER etc.
Python
556
star
2

CoSENT_Pytorch

CoSENT、STS、SentenceBERT
Python
161
star
3

DeepCTR-pytorch

Here are the models listed in CTR. Example: FM、DeepFM、xDeepFM etc.
Python
61
star
4

Python-Library-Learning

Here we will sort out a variety of interesting Python library learning
Python
61
star
5

NLP-Project

Here I sort out some small projects I did in the process of learning NLP.
Python
36
star
6

NLP_tensorflow_project

Use tensorflow to achieve some NLP project, eg: classification chatbot ner attention QAetc.
xBase
34
star
7

Text-Classification-Pytorch

Summary and comparison of Chinese classification models
Python
34
star
8

Keras-Learning-Summary

Summary of keras knowledge points.
Python
19
star
9

Text-Generation-Chinese-Pytorch

Python
13
star
10

NER-Pytorch

Python
7
star
11

Community-Detection

社团检测算法总结
Python
7
star
12

GAN-pytorch

Implementation of GAN
Python
6
star
13

GraphNeuralNetwork

The repository includes GNN, GAT, GCN, GraphSAGE, PinSAGE, etc algorithm implementation.
Python
6
star
14

MinProject

关于pyqt的一些小项目
Python
5
star
15

Pytorch-Learning-Summary

pytorch学习总结
Python
4
star
16

Algorithm

面试中一些算法题总结
Python
4
star
17

Tensorflow-Learning-Summary

This will describle some element knowlege about tensorflow.
Python
3
star
18

Python_Crawler

Summary of Python crawler practice.
Python
3
star
19

Weather-forecasting-system

The weather forecast system involves pyqt5 + crawler + SQLite database operation.
Python
3
star
20

CV_pytorch_project

Here we will sort out the items related to CV, including image classification, objection detection, semantic segmentation, instance segmentation, etc.
Python
2
star
21

Competition-Summary

参加各种比赛的总结,以及代码分享
Python
1
star
22

notebook

c++, data_analysis, deep_learning, docker, git, python etc.
1
star
23

DeepLearning-with-CV

This is project about CV. I will summary skills which is some processing CV.
Python
1
star