• Stars
    star
    219
  • Rank 181,133 (Top 4 %)
  • Language
    HTML
  • License
    MIT License
  • Created about 8 years ago
  • Updated almost 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Examine two sentences and determine whether they have the same meaning.

Paraphrase-Identification-Task

Paraphrase detection is the task of examining two text entities (ex. sentence) and determining whether they have the same meaning. In order to obtain high accuracy on this task, thorough syntactic and semantic analysis of the two text entities is required.

What is Paraphrase?

In simple words, paraphrase is just an alternative representation of the same meaning.

Classification of Paraphrases

According to granularity, paraphrases are of two types.

  • Surface Paraphrases
    • Lexical level
      • Example - solve and resolve
    • Phrase level
      • Example - look after and take care of
    • Sentence level
      • Example - The table was set up in the carriage shed and The table was laid under the cart-shed
    • Discourse level
  • Structural paraphrases
  • Pattern level
    • Example - [X] considers [Y] and [X] takes [Y] into consideration
  • Collocation level
    • Example - (turn on, OBJ ligth) and (switch on, OBJ light)

According to paraphrase style, they can be classified into five types.

  • Trivial Change
    • Example - all the members of and all members of
  • Phrase replacement
    • Example - There will be major cuts in the salaries of high-level civil servants and There will be major cuts in the salaries of senior officials
  • Phrase reordering
    • Example - Last night, I saw TOM in the shopping mall and I saw Tom in the shopping mall last night
  • Sentence split & merge
    • Example - He baught a computer which is very expensive and (1) He bought a computer. (2) The computer is very expensive.
  • Complex paraphrase
    • Example - He said there will be major cuts in the salaries of high-level civil servants and He claimed to implement huge salary cut to senior civil servants

Applications of Paraphrase Identification

  • Machine Translation
    • Simplify input sentences
    • Alleviate data sparseness
  • Question Answering
    • Question reformulation
  • Information Extraction
    • IE pattern expansion
  • Information Retrieval
    • Query reformulation
  • Summarization
    • Sentence clustering
    • Automatic evaluation
  • Natural Language Generation
    • Sentence rewriting
  • Others
    • Changing writing style
    • Text simplification
    • Identifying plagiarism

Relevant Research Topic

  • Textual Entailment
  • Semantic Textual Similarity

Research on Paraphrasing

  • Paraphrase identification
  • Paraphrase extraction
  • Paraphrase generation
  • Paraphrase applications

Paraphrase Identification

  • Specially refers to sentential paraphrase identification
    • Given any pair of sentences, automatically identifies whether these two sentences are paraphrases

Overview of Paraphrase Identification Methods

More discussion on the previous works are documented here.

Reference

More Repositories

1

Awesome-LLM-Synthetic-Data

A reading list on LLM based Synthetic Data Generation 🔥
666
star
2

NeuralCodeSum

Official implementation of our work, A Transformer-based Approach for Source Code Summarization [ACL 2020].
Python
192
star
3

PLBART

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].
Python
186
star
4

context_attentive_ir

Official implementation of our ICLR 2018 and SIGIR 2019 papers on Context-aware Neural Information Retrieval
Python
119
star
5

AVATAR

Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation.
Python
53
star
6

GATE

Official implementation of our work, GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction [AAAI 2021].
Python
48
star
7

aol_query_log_analysis

This project aims to analyze different aspects of the AOL query log
Java
26
star
8

transferable_sent2vec

Official code of our work, Robust, Transferable Sentence Representations for Text Classification [Arxiv 2018].
Python
20
star
9

Syntax-MBERT

Official code of our work, Syntax-augmented Multilingual BERT for Cross-lingual Transfer [ACL 2021].
Python
16
star
10

ACE05-Processor

UDPipe based preprocessing of the ACE05 dataset
Python
16
star
11

PolicyQA

Official code of our work, PolicyQA: A Reading Comprehension Dataset for Privacy Policies [Findings of EMNLP 2020].
Python
12
star
12

SumGenToBT

Official code of our work, Summarize and Generate to Back-Translate: Unsupervised Translation of Programming Languages [arXiv].
Python
11
star
13

PolicyIE

Official code of our work, Intent Classification and Slot Filling for Privacy Policies [ACL 2021].
Python
10
star
14

NeuralKpGen

An Empirical Study on Pre-trained Language Models for Neural Keyphrase Generation
Python
8
star
15

cross_lingual_parsing

Official code for our CoNLL 2019 paper on Cross-lingual Dependency Parsing with Unlabeled Auxiliary Languages
Python
7
star
16

intent_aware_privacy_protection_in_pws

Intent-aware Query-obfuscation for Privacy Protection in Personalized Web Search
Java
5
star
17

mining_wikipedia

Extract mentions and category taxonomy from Wikipedia
Java
4
star
18

topic_based_privacy_protection_in_pws

Topic Model based Privacy Protection in Personalized Web Search
Java
4
star
19

PrivacyQA

Unofficial model implementations for the PrivacyQA benchmark (https://github.com/AbhilashaRavichander/PrivacyQA_EMNLP)
Python
3
star
20

wasiahmad.github.io

My Personal Website
JavaScript
1
star