• Stars
    star
    107
  • Rank 323,587 (Top 7 %)
  • Language
    Python
  • Created over 15 years ago
  • Updated about 9 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

2-d visualization of high-dimensional input: Python code for rendering t-SNE code with text labels for each point
textSNE
=======

Python code for rendering t-SNE code with text labels for each point.

See test-output.expected.png for an example of the sort of visualization
this code will perform.

t-SNE is:
    van der Maaten, L. J. P. and Hinton, G. E. (2008)
    Visualizing Data using t-SNE.
    Journal of Machine Learning Research, Vol 9, (Nov) pp 2579-2605.

Where noted in header code or by directory name, I have included 3rd-party code.

My main change from the original t-SNE implementation is that I
disable PCA as a preprocessing step, unless specifically explicitly by
a function parameter. Since my data is high-dimensional and sparse,
PCA is painfully slow.

To get started:

1) Unpack the original tSNE package:
    cd 3rd-party/t-SNE_files/
    tar zxvf tSNE_linux.tar.gz 
If you are on a different architecture, you will have to unzip another package.

Alternately, you can use the pure Python implementation of t-SNE by
replacing all code that reads:
    from calc_tsne import tsne
with the following code:
    from tsne import tsne
You will need matplotlib to run the pure Python implementation. However, 

2) (Optional) Edit render.py and change DEFAULT_FONT to a TTF file
containing a font you like.

3) Run ./test.py to test your installation.
This will generate file 'test-output.rendered.png'.
Note that 'test-output.rendered.png' and 'test-output.expected.png'
are different, because each invockation of tSNE_linux uses a different
random initialization.

=======

REQUIREMENTS:
    imagemagick:
        We use convert at the end of render.render, to flatten an image.
        Type:
            which convert
        as a test to see if you have this executable.
        You could perhaps remove this image flattening step, if you like.

More Repositories

1

neural-language-model

Implementation of neural language models, in particular Collobert + Weston (2008) and a stochastic margin-based version of Mnih's LBL.
Python
178
star
2

topia.termextract

Updates to Zope's keyphrase extractor (forked from 1.1.0)
Python
67
star
3

crfchunking-with-wordrepresentations

Train a CRF for syntactic chunking (CoNLL2000), and use word representations
Python
43
star
4

common

Common Python library, especially for text processing and controlling experimental runs
Python
42
star
5

kea-service

KEA 5.0 (keyphrase extraction software), modified to be an XML-RPC service
Shell
42
star
6

pytextpreprocess

Preprocess text for NLP (tokenizing, lowercasing, stemming, sentence splitting, etc.)
Python
29
star
7

random-indexing-wordrepresentations

Induce word representations using random indexing (RI)
Python
29
star
8

save-my-browser-tabs

Extension for Mozilla Firefox and Google Chrome to save all of your open tabs to a text file (window/tab index, URL and title of each tab)
JavaScript
27
star
9

stanford-pos-tagger-service

XML-RPC version of the Stanford POS tagger
Python
21
star
10

common-scripts

Common scripts, mainly for text processing and experimental control
Python
20
star
11

pyrandomprojection

Random projection library for Python, converting a dictionary to low-dimensional numpy matrix
Python
18
star
12

donatefaces

Extract faces from video clips; generate training data for pose-invariant face features
Python
17
star
13

py80legsformat

In Python, read the .80 file format, for 80legs web crawl results.
Python
12
star
14

fatfreecrm-ec2

Deploy FatFree CRM on EC2
Shell
10
star
15

scikits.learn.recipes

Recipes for scikits.learn
Python
9
star
16

batchtrain

Find the best model, using random hyperparameter optimization, using scikit-learn
Python
9
star
17

parser-model

A neural network with a sparse input, for predicting decisions of a natural language syntax parser.
Python
8
star
18

django-instantmessage

IM-like application for Pinax social networks (Django), that allow you to see which friends are online and chat them
8
star
19

simple-twitter-similarity

Didactic example of information retrieval, computing the similarity of two twitter users
6
star
20

pytc-example

Example code for pytc (Python TokyoCabinet API)
Python
6
star
21

osqa

OSQA branch, with some fixes
Python
6
star
22

flickorpus

flickorpus collects an image and tag corpus from flickr.
Python
6
star
23

biased-text-sample

Perform a biased sample of text data
Python
5
star
24

pycrowdflower

Python code for accessing the CrowdFlower API
5
star
25

wikiprep-postprocess

Postprocess XML output from wikiprep (Wikipedia preprocessor) into JSON
Python
5
star
26

query-classification-with-word-representations

KDDCup 2005 query classification with word representations
5
star
27

flann-1.2

Fork of FLANN 1.2, Fast Library for Approximate Nearest Neighbors
Python
5
star
28

osqa-install-webfaction

Install OSQA on webfaction
Python
5
star
29

wordrepresentations-hmm

HMM model for word representations, using the method of Huang + Yates (2009).
4
star
30

fabricrecipes

fabric recipes, primarily for deploying Ubuntu and EC2 instances.
Python
4
star
31

doubleblind

Django project to do blind testing and figure out which of your friends post things you actually like
Python
4
star
32

renderman-dexed-linux

Instructions for using the RenderMan Python API for controlling the Dexed FM synthesizer on Linux
Python
4
star
33

sounder

Tinder for discovering music
JavaScript
4
star
34

search-autocomplete

Javascript autocomplete, with MySQL/PHP backend
3
star
35

pyshortstringcompression

Compress short strings, using the Huffman algorithm.
3
star
36

audio-discrimination-crowdsource-batch

Batch processing for audio-discrimination-crowdsource
Python
3
star
37

inverse-audio-synthesis

Inverse audio synthesis
Python
3
star
38

language-model-linear

A neural language model, intended to produce embeddings for a linear classifier
3
star
39

pitch-detection-echonest

Pitch detection, for an audio file, using the Echonest remix API
Python
3
star
40

soundcloudsampler

A widget to help you quickly sample soundcloud tracks.
JavaScript
3
star
41

python-SimpleXMLRPCServer-permissive

A permissive version of the Python SimpleXMLRPCServer, which can correct errant XML input from the client.
Python
3
star
42

vworker-select-all-workers-firefox-extension

Firefox extension to select all workers in vWorker search results page
JavaScript
3
star
43

osqa-jsmath

jsMath support for OSQA
3
star
44

pycrunchbase

Python methods to interact with the Crunchbase API v1.
2
star
45

openl3_numpy_weights

OpenL3 audio model weights, in numpy format
2
star
46

transformer-fsd50k

HUBERT or wav2vec2 pretrained on FSD50K
2
star
47

lisadiary

A bliki (blog+wiki) compiler, inspired by ikiwiki
2
star
48

grab-wikipedia-abstracts

Grab all Wikipedia abstracts, in all languages
2
star
49

aucoder

Python
2
star
50

writing-collaboration

An article about scientific collaboration
2
star
51

audio-discrimination-crowdsource

Web service to crowd-source audio discrimination data
CSS
2
star
52

datasciencepatterns

1
star
53

audiojnd

Audio pair JND
Python
1
star
54

kinda-deep

Technical blog
JavaScript
1
star
55

sherlock-rest

A Django JSON REST API for Sherlock
Python
1
star
56

embeddingcache

Retrieve text embeddings, but cache them locally if we have already computed them.
Python
1
star
57

query-categorization-with-word-representations

KDDCup 2005 query classification with word representations
1
star
58

dx7render-docker

Render dx7 patches, dockerized
Dockerfile
1
star
59

archivebox-render

ArchiveBox blueprint for Render
1
star
60

batch-elki-cluster

1
star
61

grokmusic

Grok your music collection, and save it into a persistent format.
Python
1
star