Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

CSS

Java

Solidity

Perl

Python

Kotlin

Assembly

HTML

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

Shell

TypeScript

R

F#

Elixir

PHP

Go

Groovy

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇸🇴 Somalia

🇮🇷 Iran

🇳🇦 Namibia

🇱🇺 Luxembourg

🇲🇲 Myanmar (Burma)

🇨🇺 Cuba

🇲🇸 Montserrat

🇬🇮 Gibraltar

All Countries Compare Countries

shawroad/Semantic-Textual-Similarity-Pytorch

Stars
153
Rank 243,368 (Top 5 %)
Language
Python
Created almost 3 years ago
Updated over 1 year ago

shawroad/Semantic-Textual-Similarity-Pytorch

shawroad

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

experiments of some semantic matching models and comparison of experimental results.

中文文本语义匹配模型集锦

数据说明

	训练集(数量)	验证集(数量)	测试集(数量)
ATEC	62477	20000	20000
BQ	100000	10000	10000
LCQMC	238766	8802	12500
PAWSX	49401	2000	2000
STS-B	5231	1458	1361

评价指标的说明

皮尔逊系数(pearsonr): 是衡量两个连续型变量的线性相关关系。
斯皮尔曼相关系数(spearmanr): 是衡量两变量之间的单调关系，两个变量同时变化，但是并非同样速率变化，即并非一定是线性关系。

实验结果:

没有专门去调参。无监督的模型从训练集中随机采样了10000条数据。下面是在测试集上的结果。对最终结果影响比较大的就是学习率。尽可能的小就行。

斯皮尔曼系数(spearmanr)对比:

	ATEC	BQ	LCQMC	PAWSX	STS-B	Avg
Word2Vec (unsup)	16.4936	24.6732	29.8313	7.4375	23.5855	20.4042
SimCSE (unsup)	30.8634	49.1813	68.9802	9.5895	71.3976	46.0024
PromptBERT (unsup)	34.9434	48.7067	67.7634	14.3244	71.4191	47.4314
GS-infoNCE (unsup)	28.9731	46.3247	67.3204	11.2317	73.2998	45.4299
ESimCSE (unsup)	31.8443	48.0718	66.8673	9.1819	65.1843	44.2299
ConSERT (unsup)	29.7437	46.7806	67.5121	8.1442	74.1097	45.2580
SentenceBert (sup)	48.5157	67.8545	79.6023	60.1675	71.0148	65.4309
CoSENT (sup)	50.5969	72.5191	79.3777	60.5475	80.4344	68.6951
SimCSE (sup)	**	**	**	**	**	**

皮尔逊相关系数(pearsonr)对比:

	ATEC	BQ	LCQMC	PAWSX	STS-B	Avg
Word2Vec (unsup)	14.2917	18.1433	24.7312	10.6328	12.9765	16.1551
SimCSE (unsup)	33.1678	49.0413	57.5075	9.9956	72.8918	44.5207
PromptBERT (unsup)	35.6218	48.6450	59.8181	13.5495	71.7247	45.8718
GS-infoNCE (unsup)	30.3781	46.2700	57.2458	10.3298	74.4048	43.7257
ESimCSE (unsup)	32.6815	47.9271	52.8407	10.5426	65.2000	41.8383
ConSERT (unsup)	31.1873	46.6954	60.7141	8.2408	75.3964	44.4468
SentenceBert (sup)	45.4922	66.3670	75.2732	57.7105	71.4540	63.2593
CoSENT (sup)	50.4301	72.5830	77.6607	57.6305	78.5165	67.3641
SimCSE (sup)	**	**	**	**	**	**

注

Word2Vec模型没有在大规模的语料上训练，只是在训练集上训练过，然后再测试集上做的推理。
上述无监督的深度学习模型都采用的是“CLS”向量

NLP_pytorch_project

Embedding, NMT, Text_Classification, Text_Generation, NER etc.

CoSENT_Pytorch

CoSENT、STS、SentenceBERT

DeepCTR-pytorch

Here are the models listed in CTR. Example: FM、DeepFM、xDeepFM etc.

Python-Library-Learning

Here we will sort out a variety of interesting Python library learning

NLP-Project

Here I sort out some small projects I did in the process of learning NLP.

NLP_tensorflow_project

Use tensorflow to achieve some NLP project, eg: classification chatbot ner attention QAetc.

Text-Classification-Pytorch

Summary and comparison of Chinese classification models

Keras-Learning-Summary

Summary of keras knowledge points.

Text-Generation-Chinese-Pytorch

NER-Pytorch

Community-Detection

社团检测算法总结

GAN-pytorch

Implementation of GAN

GraphNeuralNetwork

The repository includes GNN, GAT, GCN, GraphSAGE, PinSAGE, etc algorithm implementation.

MinProject

关于pyqt的一些小项目

Pytorch-Learning-Summary

pytorch学习总结

Algorithm

面试中一些算法题总结

Tensorflow-Learning-Summary

This will describle some element knowlege about tensorflow.

Python_Crawler

Summary of Python crawler practice.

Weather-forecasting-system

The weather forecast system involves pyqt5 + crawler + SQLite database operation.

CV_pytorch_project

Here we will sort out the items related to CV, including image classification, objection detection, semantic segmentation， instance segmentation, etc.

Competition-Summary

参加各种比赛的总结，以及代码分享

notebook

c++, data_analysis, deep_learning, docker, git, python etc.

DeepLearning-with-CV

This is project about CV. I will summary skills which is some processing CV.