• Stars
    star
    162
  • Rank 231,017 (Top 5 %)
  • Language
    Python
  • Created over 14 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Topical search for Twitter. See twokenize.py, emoticons.py for tokenization.

TweetMotif

TweetMotif is a faceted/topic/summarizing search system for Twitter, built on top of the search.twitter.com API. http://tweetmotif.com

Do you just want the tokenizer?

All you need is two files:

If you use it in research, please cite:

  • Brendan O'Connor, Michel Krieger, and David Ahn. TweetMotif: Exploratory Search and Topic Summarization for Twitter. ICWSM-2010.

Latest version (Java)

The latest version, with a number of improvements, is in Java. We released a new version Sept. 2012. See the explanation and links at: http://www.ark.cs.cmu.edu/TweetNLP

More on TweetMotif

By Brendan O'Connor, Michel Krieger, and David Ahn. Written over April-May 2009 and released April 2010.

The TweetMotif paper (inside EXAMPLES_AND_WRITING, or a copy at this link) overviews the system.

Running TweetMotif

Prerequisites

  • Tokyo Cabinet
  • Tokyo Tyrant
  • mod_wsgi
  • Python: version 2.5 works

There are precompiled versions of the Tokyo infrastructure in platform/, for Mac OSX 10.5 and Ubuntu 8.04-ish. In the off-chance they will work for your system, uncomment the code that specifies to use them (grep platform *.py). You may also have to muck around with ld.so.conf.d and ldconfig (on Linux) to get mod_wsgi, which is inside Apache, to see them.

You also need to be running Tokyo Tyrant for the query cache. This is usually inconvenient for just getting started; in which case, disable it by commenting out the lines

# the_cache = ....
# @the_cache.wrap

In query_cache.py

Architecture

There is a backend and frontend. The backend talks to search.twitter.com and does all text processing, clustering, etc. The frontend is a Django web site with normal and iPhone versions.

The backend makes extensive use of Tokyo Cabinet and Tyrant databases: for the language model, and the query cache.

Both the backend and frontend are WSGI apps. Everything is set up to run through mod_wsgi. They communicate via JSON-over-HTTP.

Backend

The backend is run through, confusingly enough, frontend.py. It also has a primitive frontend for development purposes there.

Frontend

The frontend is Django. See djfrontend/.

License

TweetMotif is licensed under the Apache License 2.0: http://www.apache.org/licenses/LICENSE-2.0.html

Copyright Brendan O'Connor, Michel Krieger, and David Ahn, 2009-2010.

More Repositories

1

ark-tweet-nlp

CMU ARK Twitter Part-of-Speech Tagger
Java
575
star
2

stanford_corenlp_pywrapper

Java
151
star
3

tsvutils

Utilities for processing tab-separated files
Python
127
star
4

awkspeed

Speed testing for a data munging task
C++
44
star
5

arkref

http://www.ark.cs.cmu.edu/ARKref/
Java
32
star
6

scalacheat

cheat sheet for scala syntax
Shell
32
star
7

parseviz

Visualize constituent and dependency parses as PDF or image formats, through GraphViz.
Python
31
star
8

OConnor_IREvents_ACL2013

Replication software, data, and supplementary materials for the paper: O'Connor, Stewart and Smith, ACL-2013, "Learning to Extract International Relations from Political Context"
C++
26
star
9

mte

MiTextExplorer - interactive browser of text and document covariates.
Java
24
star
10

myutil

Java
23
star
11

dlanalysis

a bunch of R code for various statistical analyses
R
21
star
12

conplot

Console ascii art plotter - quick-and-dirty data visualization, e.g. for log statistics
Python
18
star
13

running_stat

Running variance / standard deviation calculation (C++ and Python)
Python
14
star
14

cmdutils

Some command-line utilities, mostly for data manipulation and inspection.
Python
13
star
15

muc4_proc

preprocessing of the MUC4 dataset
Python
11
star
16

bow

A patched version of bow & rainbow 20020213 that compiles with modern gcc 4.0.1, OSX 10.5
C
11
star
17

twitter_geo_preproc

A preprocessing script to get geo-coded tweets from the Streaming API
Python
9
star
18

gfl_syntax

Graph Fragment Language for Easy Syntactic Annotation
Python
8
star
19

nlp_jobs

research code from rion and brendan when writing snow, o'connor, jurafsky, ng EMNLP-2008 "cheap and fast, but is it good?"
Ruby
6
star
20

stanfordnlp-util

java utilities for stanford nlp
Java
5
star
21

gigaword_conversion

Python
3
star
22

glmnet_starter

Starter code for the glmnet package (elastic net regressions)
R
2
star
23

slmunge

Scripts to munge certain machine learning sparse data formats, including SVMLight/LibSVM
Python
2
star
24

twitter_geo_viz

REALLY HALFBAKED DO NOT USE YOU MAY CRASH OUR SERVER
JavaScript
2
star
25

namefreedom

data and analysis of country names versus democratic freedoms
2
star
26

viewdb

HTML report of an SQL DB's schema and data
Python
1
star
27

super_tuesday_2020

analysis of Super Tuesday exit poll data
HTML
1
star
28

flex-for-morpha

Patched version of GNU Flex 2.5.35 to compile "morpha"
C
1
star
29

beta_explorer

1
star
30

flightstats

Python
1
star
31

randomsearch

web app to randomly choose which search engine to use per query
Python
1
star