• This repository has been archived on 17/Nov/2020
  • Stars
    star
    200
  • Rank 189,359 (Top 4 %)
  • Language
    Python
  • Created over 4 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This repository contains various ways to calculate sentence vector similarity using NLP models

Sentence Similarity Calculator

This repo contains various ways to calculate the similarity between source and target sentences. You can choose the pre-trained models you want to use such as ELMo, BERT and Universal Sentence Encoder (USE).

And you can also choose the method to be used to get the similarity:

1. Cosine similarity
2. Manhattan distance
3. Euclidean distance
4. Angular distance
5. Inner product
6. TS-SS score
7. Pairwise-cosine similarity
8. Pairwise-cosine similarity + IDF

You can experiment with (The number of models) x (The number of methods) combinations!


Installation

  • This project is developed under conda enviroment
  • After cloning this repository, you can simply install all the dependent libraries described in requirements.txt with bash install.sh
conda create -n sensim python=3.7
conda activate sensim
git clone https://github.com/Huffon/sentence-similarity.git
cd sentence-similarity
bash install.sh

Usage

  • To test your own sentences, you should fill out corpus.txt with sentences as below:
I ate an apple.
I went to the Apple.
I ate an orange.
...
  • Then, choose the model and method to be used to calculate the similarity between source and target sentences
python sensim.py
    --model    MODEL_NAME  [use, bert, elmo]
    --method   METHOD_NAME [cosine, manhattan, euclidean, inner,
                            ts-ss, angular, pairwise, pairwise-idf]
    --verbose  LOG_OPTION (bool)

Examples

  • In this section, you can see the example result of sentence-similarity
  • As you know, there is a no silver-bullet which can calculate perfect similarity between sentences
  • You should conduct various experiments with your dataset
    • Caution: TS-SS score might not fit with sentence similarity task, since this method originally devised to calculate the similarity between long documents
  • Result:


References

Papers


Libraries


Articles

More Repositories

1

NLP101

NLP 101: a resource repository for Deep Learning and Natural Language Processing
382
star
2

nlp-startups

κ΅­λ‚΄ μžμ—°μ–΄ 처리 κΈ°μˆ μ„ 연ꡬ 및 κ°œλ°œν•˜λŠ” μŠ€νƒ€νŠΈμ—… λͺ©λ‘
166
star
3

klue-transformers-tutorial

KLUE 데이터λ₯Ό ν™œμš©ν•œ HuggingFace Transformers νŠœν† λ¦¬μ–Ό
Jupyter Notebook
126
star
4

factsumm

FactSumm: Factual Consistency Scorer for Abstractive Summarization
Python
104
star
5

nlp-various-tutorials

μžμ—°μ–΄ μ²˜λ¦¬μ™€ κ΄€λ ¨ν•œ μ—¬λŸ¬ νŠœν† λ¦¬μ–Ό μ €μž₯μ†Œ
Jupyter Notebook
78
star
6

pytorch-transformer-kor-eng

Transformer Implementation using PyTorch for Neural Machine Translation (Korean to English)
Python
62
star
7

semantic-search-faiss

Semantic Search using FAISS & ElasticSearch
Python
30
star
8

grammar-corrector

Correct your grammatically erroneous sentence βœ‚οΈ
Python
9
star
9

pytorch-sentiment-analysis-kor

Sentiment analysis model implementation using PyTorch and torchtext with Korean corpus
Python
9
star
10

sentence-compressor

Compress your lengthy sentence πŸ—œοΈ
Python
7
star
11

pytorch-seq2seq-kor-eng

Sequence to sequence implementation using PyTorch and torchtext with Korean-English pair dataset
Python
4
star
12

that-i-read

3
star
13

2019_Capstone_bee

2019λ…„ μœ΅λ³΅ν•©μ†Œν”„νŠΈμ›¨μ–΄ 쒅합섀계 BEE μ €μž₯μ†Œ
Java
2
star
14

pytorch-bert

(Unofficial) PyTorch Implementation of BERT [WIP]
Python
1
star