• Stars
    star
    29
  • Rank 855,780 (Top 17 %)
  • Language
    Python
  • Created over 14 years ago
  • Updated over 13 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Preprocess text for NLP (tokenizing, lowercasing, stemming, sentence splitting, etc.)

More Repositories

1

neural-language-model

Implementation of neural language models, in particular Collobert + Weston (2008) and a stochastic margin-based version of Mnih's LBL.
Python
178
star
2

textSNE

2-d visualization of high-dimensional input: Python code for rendering t-SNE code with text labels for each point
Python
107
star
3

topia.termextract

Updates to Zope's keyphrase extractor (forked from 1.1.0)
Python
67
star
4

crfchunking-with-wordrepresentations

Train a CRF for syntactic chunking (CoNLL2000), and use word representations
Python
43
star
5

common

Common Python library, especially for text processing and controlling experimental runs
Python
42
star
6

kea-service

KEA 5.0 (keyphrase extraction software), modified to be an XML-RPC service
Shell
42
star
7

random-indexing-wordrepresentations

Induce word representations using random indexing (RI)
Python
29
star
8

save-my-browser-tabs

Extension for Mozilla Firefox and Google Chrome to save all of your open tabs to a text file (window/tab index, URL and title of each tab)
JavaScript
27
star
9

stanford-pos-tagger-service

XML-RPC version of the Stanford POS tagger
Python
21
star
10

common-scripts

Common scripts, mainly for text processing and experimental control
Python
20
star
11

pyrandomprojection

Random projection library for Python, converting a dictionary to low-dimensional numpy matrix
Python
18
star
12

donatefaces

Extract faces from video clips; generate training data for pose-invariant face features
Python
17
star
13

py80legsformat

In Python, read the .80 file format, for 80legs web crawl results.
Python
12
star
14

fatfreecrm-ec2

Deploy FatFree CRM on EC2
Shell
10
star
15

scikits.learn.recipes

Recipes for scikits.learn
Python
9
star
16

batchtrain

Find the best model, using random hyperparameter optimization, using scikit-learn
Python
9
star
17

parser-model

A neural network with a sparse input, for predicting decisions of a natural language syntax parser.
Python
8
star
18

django-instantmessage

IM-like application for Pinax social networks (Django), that allow you to see which friends are online and chat them
8
star
19

simple-twitter-similarity

Didactic example of information retrieval, computing the similarity of two twitter users
6
star
20

pytc-example

Example code for pytc (Python TokyoCabinet API)
Python
6
star
21

osqa

OSQA branch, with some fixes
Python
6
star
22

flickorpus

flickorpus collects an image and tag corpus from flickr.
Python
6
star
23

biased-text-sample

Perform a biased sample of text data
Python
5
star
24

pycrowdflower

Python code for accessing the CrowdFlower API
5
star
25

wikiprep-postprocess

Postprocess XML output from wikiprep (Wikipedia preprocessor) into JSON
Python
5
star
26

query-classification-with-word-representations

KDDCup 2005 query classification with word representations
5
star
27

flann-1.2

Fork of FLANN 1.2, Fast Library for Approximate Nearest Neighbors
Python
5
star
28

osqa-install-webfaction

Install OSQA on webfaction
Python
5
star
29

wordrepresentations-hmm

HMM model for word representations, using the method of Huang + Yates (2009).
4
star
30

fabricrecipes

fabric recipes, primarily for deploying Ubuntu and EC2 instances.
Python
4
star
31

doubleblind

Django project to do blind testing and figure out which of your friends post things you actually like
Python
4
star
32

renderman-dexed-linux

Instructions for using the RenderMan Python API for controlling the Dexed FM synthesizer on Linux
Python
4
star
33

sounder

Tinder for discovering music
JavaScript
4
star
34

search-autocomplete

Javascript autocomplete, with MySQL/PHP backend
3
star
35

pyshortstringcompression

Compress short strings, using the Huffman algorithm.
3
star
36

audio-discrimination-crowdsource-batch

Batch processing for audio-discrimination-crowdsource
Python
3
star
37

inverse-audio-synthesis

Inverse audio synthesis
Python
3
star
38

language-model-linear

A neural language model, intended to produce embeddings for a linear classifier
3
star
39

pitch-detection-echonest

Pitch detection, for an audio file, using the Echonest remix API
Python
3
star
40

soundcloudsampler

A widget to help you quickly sample soundcloud tracks.
JavaScript
3
star
41

python-SimpleXMLRPCServer-permissive

A permissive version of the Python SimpleXMLRPCServer, which can correct errant XML input from the client.
Python
3
star
42

vworker-select-all-workers-firefox-extension

Firefox extension to select all workers in vWorker search results page
JavaScript
3
star
43

osqa-jsmath

jsMath support for OSQA
3
star
44

pycrunchbase

Python methods to interact with the Crunchbase API v1.
2
star
45

openl3_numpy_weights

OpenL3 audio model weights, in numpy format
2
star
46

transformer-fsd50k

HUBERT or wav2vec2 pretrained on FSD50K
2
star
47

lisadiary

A bliki (blog+wiki) compiler, inspired by ikiwiki
2
star
48

grab-wikipedia-abstracts

Grab all Wikipedia abstracts, in all languages
2
star
49

aucoder

Python
2
star
50

writing-collaboration

An article about scientific collaboration
2
star
51

audio-discrimination-crowdsource

Web service to crowd-source audio discrimination data
CSS
2
star
52

datasciencepatterns

1
star
53

audiojnd

Audio pair JND
Python
1
star
54

kinda-deep

Technical blog
JavaScript
1
star
55

sherlock-rest

A Django JSON REST API for Sherlock
Python
1
star
56

embeddingcache

Retrieve text embeddings, but cache them locally if we have already computed them.
Python
1
star
57

query-categorization-with-word-representations

KDDCup 2005 query classification with word representations
1
star
58

dx7render-docker

Render dx7 patches, dockerized
Dockerfile
1
star
59

archivebox-render

ArchiveBox blueprint for Render
1
star
60

batch-elki-cluster

1
star
61

grokmusic

Grok your music collection, and save it into a persistent format.
Python
1
star