• Stars
    star
    725
  • Rank 62,504 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 11 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python Implementations of Word Sense Disambiguation (WSD) Technologies.

Build Status PyPI license FOSSA Status

pywsd

Python Implementations of Word Sense Disambiguation (WSD) technologies:

  • Lesk algorithms

    • Original Lesk (Lesk, 1986)
    • Adapted/Extended Lesk (Banerjee and Pederson, 2002/2003)
    • Simple Lesk (with definition, example(s) and hyper+hyponyms)
    • Cosine Lesk (use cosines to calculate overlaps instead of using raw counts)
  • Maximizing Similarity (see also, Pedersen et al. (2003))

    • Path similarity (Wu-Palmer, 1994; Leacock and Chodorow, 1998)
    • Information Content (Resnik, 1995; Jiang and Corath, 1997; Lin, 1998)
  • Baselines
    • Random sense
    • First NLTK sense
    • Highest lemma counts

NOTE: PyWSD only supports Python 3 now (pywsd>=1.2.0). If you're using Python 2, the last possible version is pywsd==1.1.7.

Install

pip install -U nltk
python -m nltk.downloader 'popular'
pip install -U pywsd

Usage

$ python
>>> from pywsd.lesk import simple_lesk
>>> sent = 'I went to the bank to deposit my money'
>>> ambiguous = 'bank'
>>> answer = simple_lesk(sent, ambiguous, pos='n')
>>> print answer
Synset('depository_financial_institution.n.01')
>>> print answer.definition()
'a financial institution that accepts deposits and channels the money into lending activities'

For all-words WSD, try:

>>> from pywsd import disambiguate
>>> from pywsd.similarity import max_similarity as maxsim
>>> disambiguate('I went to the bank to deposit my money')
[('I', None), ('went', Synset('run_low.v.01')), ('to', None), ('the', None), ('bank', Synset('depository_financial_institution.n.01')), ('to', None), ('deposit', Synset('deposit.v.02')), ('my', None), ('money', Synset('money.n.03'))]
>>> disambiguate('I went to the bank to deposit my money', algorithm=maxsim, similarity_option='wup', keepLemmas=True)
[('I', 'i', None), ('went', u'go', Synset('sound.v.02')), ('to', 'to', None), ('the', 'the', None), ('bank', 'bank', Synset('bank.n.06')), ('to', 'to', None), ('deposit', 'deposit', Synset('deposit.v.02')), ('my', 'my', None), ('money', 'money', Synset('money.n.01'))]

To read pre-computed signatures per synset:

>>> from pywsd.lesk import cached_signatures
>>> cached_signatures['dog.n.01']['simple']
set([u'canid', u'belgian_griffon', u'breed', u'barker', ... , u'genus', u'newfoundland'])
>>> cached_signatures['dog.n.01']['adapted']
set([u'canid', u'belgian_griffon', u'breed', u'leonberg', ... , u'newfoundland', u'pack'])

>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('dog')[0]
Synset('dog.n.01')
>>> dog = wn.synsets('dog')[0]
>>> dog.name()
u'dog.n.01'
>>> cached_signatures[dog.name()]['simple']
set([u'canid', u'belgian_griffon', u'breed', u'barker', ... , u'genus', u'newfoundland'])

Cite

To cite pywsd:

Liling Tan. 2014. Pywsd: Python Implementations of Word Sense Disambiguation (WSD) Technologies [software]. Retrieved from https://github.com/alvations/pywsd

In bibtex:

@misc{pywsd14,
author =   {Liling Tan},
title =    {Pywsd: Python Implementations of Word Sense Disambiguation (WSD) Technologies [software]},
howpublished = {https://github.com/alvations/pywsd},
year = {2014}
}

References

  • Michael Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual international conference on Systems documentation (SIGDOC '86), Virginia DeBuys (Ed.). ACM, New York, NY, USA, 24-26. DOI=10.1145/318723.318728 http://doi.acm.org/10.1145/318723.318728

  • Satanjeev Banerjee and Ted Pedersen. 2002. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing (CICLing '02), Alexander F. Gelbukh (Ed.). Springer-Verlag, London, UK, UK, 136-145.

  • Satanjeev Banerjee and Ted Pedersen. 2003. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pages 805โ€“810, Acapulco.

  • Jay J. Jiang and David W. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference on Research in Computational Linguistics, Taiwan.

  • Claudia Leacock and Martin Chodorow. 1998. Combining local context and WordNet similarity for word sense identification. In Fellbaum 1998, pp. 265โ€“283.

  • Lee, Yoong Keok, Hwee Tou Ng, and Tee Kiah Chia. "Supervised word sense disambiguation with support vector machines and multiple knowledge sources." Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text. 2004.

  • Dekang Lin. 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI.

  • Linlin Li, Benjamin Roth and Caroline Sporleder. 2010. Topic Models for Word Sense Disambiguation and Token-based Idiom Detection. The 48th Annual Meeting of the Association for Computational Linguistics (ACL). Uppsala, Sweden.

  • Andrea Moro, Roberto Navigli, Francesco Maria Tucci and Rebecca J. Passonneau. 2014. Annotating the MASC Corpus with BabelNet. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Reykjavik, Iceland.

  • Zhi Zhong and Hwee Tou Ng. 2010. It makes sense: a wide-coverage word sense disambiguation system for free text. In Proceedings of the ACL 2010 System Demonstrations (ACLDemos '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 78-83.

  • Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python (1st ed.). O'Reilly Media, Inc..

  • Eneko Agirre and Aitor Soroa. 2009. Personalizing PageRank for Word Sense Disambiguation. Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics (EACL-2009). Athens, Greece.

More Repositories

1

sacremoses

Python port of Moses tokenizer, truecaser and normalizer
Python
464
star
2

awesome-community-curated-nlp

Community Curated NLP List
187
star
3

stasis

Semantic Textual Similarity in Python
Jupyter Notebook
80
star
4

Quotables

A Corpus of Quotes
68
star
5

annotate-questionnaire

Summary of Responses to Questionnaire on Annotation Platform https://forms.gle/iZk8kehkjAWmB8xe9
59
star
6

nltk_cli

Python
19
star
7

usaarhat-repo

Hack and Tell @ Saarland University
PHP
19
star
8

gachalign

Gale-Church sentence aligner with options for variable parameters
Python
17
star
9

tsundoku

PyTorch Tutorials for NLP with Deep Learning
Jupyter Notebook
17
star
10

spaghetti-tagger

Recipe for Spanish POS tagging using the CESS corpus with NLTK
Python
17
star
11

expletives

Expletives vomiting library...
Python
12
star
12

DLTK

Deutsch Language Tool Kit
Perl
12
star
13

lazyme

Lazy python recipes.
Python
11
star
14

USAAR-SemEval-2015

USAAR participation in SemEval2015
SourcePawn
11
star
15

SeedLing

Building and Using A Seed Corpus for the Human Language Project
Python
10
star
16

kopitiam

How to Order Coffee in Singapore?
Jupyter Notebook
10
star
17

charguana

Character Vomiting
Python
10
star
18

NTU-MC

Nanyang Technological University - Multilingual Corpus (STB subcorpora)
Python
9
star
19

myth

Myanmar and Thai Language Resources
Shell
9
star
20

rubberduck

Yet another Python API to DuckDuckGo Instant Answer API.
Python
7
star
21

vegetables

Collection of Repackaged Word Embeddings
6
star
22

mini-segmenter

Lightweight lexicon/dictionary based Chinese text segmenter
Python
6
star
23

earthy

Earthy: Academic-strength NLP
Python
5
star
24

bayesmax

Bayesian Classifiers for Language Identification
Python
5
star
25

MacSaar-CWI

Zipfian and Character-level features for Complex Word Identification
Python
5
star
26

Eliezer

Eli Machine Translation
4
star
27

boredom

When bored, code.
Jupyter Notebook
4
star
28

bayesline-DSL

A Multinomial Bayesian Classification for Language Identification
Python
4
star
29

cliffjumper

Neural Search.
3
star
30

yubin

Japanese Address Munger
Python
3
star
31

entroplexity

Sense Entropy and Sentence Perplexity for Complex Word Identification
Python
3
star
32

lightyear

Python
3
star
33

translation-cloud

Visualizing word translations as clouds.
Python
3
star
34

Terminator

Python
3
star
35

Endocentricity

Jupyter Notebook
2
star
36

nltk2

A fresh rewrite
2
star
37

cranium

Bashing CLI arguments
Python
2
star
38

whyclick

Cos I don't like clicking...
Python
2
star
39

sugali

Python
2
star
40

warmth

WMT data in Python
Erlang
2
star
41

mindset

Python
2
star
42

vanilla-moses

Python
2
star
43

shoganai

2
star
44

stubboRNNess

Complex word identification with RNN
Python
2
star
45

warppipe

Warp Pile (ใƒฏใƒผใƒ—ๅœŸ็ฎก)
Python
1
star
46

dopplershift

Pythonic SQL for mere mortals
Python
1
star
47

annotated-ordered-rnn

1
star
48

aomame

Python
1
star
49

shiva-something

Perl
1
star
50

Basic-NLP

Basic NLP for PUG-SG (25Oct)
1
star
51

hooper

Lets see what 5 hours can do...
Jupyter Notebook
1
star
52

oque

Python
1
star
53

moulton

Jupyter Notebook
1
star
54

Wikicorpus

Perl
1
star
55

burpee

Pseudo Byte Pair Encoding
Python
1
star
56

onigiri

Python SDK for RIT Translate
Python
1
star
57

evilunicorn

There'll be no place to run when you're caught within the grip of the evil Unicorn...
Python
1
star
58

decepticon

1
star
59

watercooler

Significant Machine Translation news/gossips...
1
star
60

merlin

secret, shhhh...
Python
1
star
61

data

Lets try this again...
Jupyter Notebook
1
star
62

pywsd_data

Python
1
star
63

pyBabelNet

for BabelNet v2.5
Python
1
star
64

toktok

Stand-alone Python port of https://github.com/jonsafari/tok-tok
Python
1
star
65

SuGarLike

Language Identification for Low Resource Languages (by Susanne, Guy and Liling)
Python
1
star
66

mitochondria

Jupyter Notebook
1
star
67

spirit-guess

Rewrite of https://pypi.org/project/guess-language/
Python
1
star