• Stars
    star
    1,740
  • Rank 26,760 (Top 0.6 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created almost 8 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A curated list of awesome embedding models tutorials, projects and communities.

awesome-embedding-modelsAwesome

A curated list of awesome embedding models tutorials, projects and communities. Please feel free to pull requests to add links.

Table of Contents

Papers

Word Embeddings

Word2vec, GloVe, FastText

  • Efficient Estimation of Word Representations in Vector Space (2013), T. Mikolov et al. [pdf]
  • Distributed Representations of Words and Phrases and their Compositionality (2013), T. Mikolov et al. [pdf]
  • word2vec Parameter Learning Explained (2014), Xin Rong [pdf]
  • word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method (2014), Yoav Goldberg, Omer Levy [pdf]
  • GloVe: Global Vectors for Word Representation (2014), J. Pennington et al. [pdf]
  • Improving Word Representations via Global Context and Multiple Word Prototypes (2012), EH Huang et al. [pdf]
  • Enriching Word Vectors with Subword Information (2016), P. Bojanowski et al. [pdf]
  • Bag of Tricks for Efficient Text Classification (2016), A. Joulin et al. [pdf]

Language Model

  • Semi-supervised sequence tagging with bidirectional language models (2017), Peters, Matthew E., et al. [pdf]
  • Deep contextualized word representations (2018), Peters, Matthew E., et al. [pdf]
  • Contextual String Embeddings for Sequence Labeling (2018), Akbik, Alan, Duncan Blythe, and Roland Vollgraf. [pdf]
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018), [pdf]

Embedding Enhancement

  • Sentence Embedding:Learning Semantic Sentence Embeddings using Pair-wise Discriminator(2018),Patro et al.[Project Page] [Paper]
  • Retrofitting Word Vectors to Semantic Lexicons (2014), M. Faruqui et al. [pdf]
  • Better Word Representations with Recursive Neural Networks for Morphology (2013), T.Luong et al. [pdf]
  • Dependency-Based Word Embeddings (2014), Omer Levy, Yoav Goldberg [pdf]
  • Not All Neural Embeddings are Born Equal (2014), F. Hill et al. [pdf]
  • Two/Too Simple Adaptations of Word2Vec for Syntax Problems (2015), W. Ling[pdf]

Comparing count-based vs predict-based method

  • Linguistic Regularities in Sparse and Explicit Word Representations (2014), Omer Levy, Yoav Goldberg[pdf]
  • Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors (2014), M. Baroni [pdf]
  • Improving Distributional Similarity with Lessons Learned from Word Embeddings (2015), Omer Levy [pdf]

Evaluation, Analysis

  • Evaluation methods for unsupervised word embeddings (2015), T. Schnabel [pdf]
  • Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance (2016), B. Chiu [pdf]
  • Problems With Evaluation of Word Embeddings Using Word Similarity Tasks (2016), M. Faruqui [pdf]
  • Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure (2016), Oded Avraham, Yoav Goldberg [pdf]
  • Evaluating Word Embeddings Using a Representative Suite of Practical Tasks (2016), N. Nayak [pdf]

Phrase, Sentence and Document Embeddings

Sentence

Document

Sense Embeddings

Neural Language Models

Researchers

Courses and Lectures

Datasets

Training

Evaluation

Pre-Trained Language Models

Below is pre-trained ELMo models. Adding ELMo to existing NLP systems significantly improves the state-of-the-art for every considered task.

Below is pre-trained sent2vec models.

Pre-Trained Word Vectors

Convenient downloader for pre-trained word vectors:

Links for pre-trained word vectors:

Implementations and Tools

Word2vec

GloVe

More Repositories

1

BossSensor

Hide screen when boss is approaching.
Python
6,197
star
2

anago

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.
Python
1,481
star
3

bertsearch

Elasticsearch with BERT for advanced document search.
Python
894
star
4

HotPepperGourmetDialogue

Restaurant Search System through Dialogue in Japanese.
Python
271
star
5

HateSonar

Hate Speech Detection Library for Python.
Jupyter Notebook
187
star
6

asari

Japanese sentiment analyzer implemented in Python.
Python
143
star
7

natural-language-preprocessings

Some recipes of natural language pre-processing
Python
132
star
8

ja.text8

Japanese text8 corpus for word embedding.
Python
108
star
9

keras-crf-layer

Implementation of CRF layer in Keras.
Python
74
star
10

IOB2Corpus

Japanese IOB2 tagged corpus for Named Entity Recognition.
60
star
11

neraug

A text augmentation tool for named entity recognition.
Python
53
star
12

WikipediaQA

HTML
46
star
13

google-vision-sampler

Code examples for Google Vision API.
46
star
14

tensorflow-nlp-examples

TensorFlow Examples for Natural Language Processing
Python
32
star
15

awesome-text-classification

Text classification meets word embeddings.
Python
30
star
16

google-natural-language-sampler

Code examples for Google Natural Language API.
13
star
17

sentiment-analysis-toolbox

Sentiment analysis toolbox for all NLPer.
Jupyter Notebook
11
star
18

wiki-article-dataset

Wikipedia article dataset
Jupyter Notebook
11
star
19

kintone-handson

Python
10
star
20

japanese-news-crawler

A complete automated japanese news crawler built on the top of Scrapy framework
Python
8
star
21

protext

Python library for processing Japanese text.
Python
8
star
22

ChatDeTornado

CSS
5
star
23

TatsujinDaifugo

コンピュータ大貧民のクライアント作成を通じて、達人プログラマに近づくためのサービス
JavaScript
5
star
24

PyFaceRecognizer

Python
4
star
25

uecda-pyclient

Standard UECda Client written in Python.
Python
3
star
26

CourseraMachineLearning

Python
3
star
27

Internship

Python
2
star
28

spacy-hearst

Hearst patterns, for finding hyponyms, written in Python and spaCy.
Python
1
star
29

Hironsan

1
star