Word2Vec Explorer
This tool helps you visualize, query and explore Word2Vec models. Word2Vec is a deep learning technique that feeds massive amounts of text into a shallow neural net which can then be used to solve a variety of NLP and ML problems.
Word2Vec Explorer uses Gensim to list and compare vectors and it uses t-SNE to visualize a dimensional reduction of the vector space. Scikit-Learn is used for K-Means clustering.
The UI is built using React, Babel, Browserify, StandardJS, D3 and Three.js.
Setup
To install all Python depenencies:
pip install -r requirements.txt
Usage
Load the explorer with a Word2Vec model:
./explore GoogleNews-vectors-negative300.bin
Now point your browser at localhost:8080 to load the explorer!
Obtaining Pre-Trained Models
A classic example of Word2Vec is the Google News model trained on 600M sentences: GoogleNews-vectors-negative300.bin.gz
[More pre-trained models]](https://github.com/3Top/word2vec-api#where-to-get-a-pretrained-models)
Development
In order to make changes to the user interface you will need some NPM dependencies:
npm install
npm start
The command npm start
will automatically transpile and bundle any code changes in the ui/
folder. All backend code can be found in explorer.py
and ./explore
.
Before submitting code changes make sure all code is compliant with StandardJS as well as Pep8:
standard
pep8 --max-line-length=100 *.py explore
Todo
- 3D GPU/WebGL view (on branch
3d
) - Make sure axes stay when zooming/panning scatterplot
- Autocomplete in query interface
- Look into supporting other high dimensional data models (go beyond word vectors)
- Drill-down of vector that shows real distance between neighbors
- Improved sample rated view that takes into account term counts and connectedness