• Stars
    star
    575
  • Rank 77,622 (Top 2 %)
  • Language
    Java
  • License
    Other
  • Created over 13 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

CMU ARK Twitter Part-of-Speech Tagger
CMU ARK Twitter Part-of-Speech Tagger v0.3.2
http://www.ark.cs.cmu.edu/TweetNLP/

Basic usage for released version
================================

Requires Java 6.  To run the tagger on example data, try:

    java -Xmx500m -jar ark-tweet-nlp-0.3.2.jar examples/example_tweets.txt

where the jar file is the one included in the release download.
The tagger outputs tokens, predicted part-of-speech tags, and confidences.
Use the "--help" flag for more information.  On Unix systems, "./runTagger.sh"
invokes the tagger; e.g.

    ./runTagger.sh examples/example_tweets.txt
    ./runTagger.sh --help

We also include a script that invokes just the tokenizer:

    ./twokenize.sh examples/example_tweets.txt

You may have to adjust the parameters to "java" depending on your system.

If instead you are using a source checkout, see docs/hacking.txt for info.

Information
===========

Version 0.3 of the tagger is much faster and more accurate.  Please see the
tech report on the website for details.

For the Java API, see src/cmu/arktweetnlp; especially Tagger.java.
See also documentation in docs/ and src/cmu/arktweetnlp/package.html.

This tagger is described in the following two papers, available at the website.
Please cite these if you write a research paper using this software.

Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
Kevin Gimpel, Nathan Schneider, Brendan O'Connor, Dipanjan Das, Daniel Mills,
  Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and 
  Noah A. Smith
In Proceedings of the Annual Meeting of the Association
  for Computational Linguistics, companion volume, Portland, OR, June 2011.
http://www.ark.cs.cmu.edu/TweetNLP/gimpel+etal.acl11.pdf

Part-of-Speech Tagging for Twitter: Word Clusters and Other Advances
Olutobi Owoputi, Brendan O'Connor, Chris Dyer, Kevin Gimpel, and
  Nathan Schneider.
Technical Report, Machine Learning Department. CMU-ML-12-107. September 2012.

Contact
=======

Please contact Brendan O'Connor ([email protected]) and Kevin Gimpel
([email protected]) if you encounter any problems.

More Repositories

1

tweetmotif

Topical search for Twitter. See twokenize.py, emoticons.py for tokenization.
Python
162
star
2

stanford_corenlp_pywrapper

Java
151
star
3

tsvutils

Utilities for processing tab-separated files
Python
127
star
4

awkspeed

Speed testing for a data munging task
C++
44
star
5

arkref

http://www.ark.cs.cmu.edu/ARKref/
Java
32
star
6

scalacheat

cheat sheet for scala syntax
Shell
32
star
7

parseviz

Visualize constituent and dependency parses as PDF or image formats, through GraphViz.
Python
31
star
8

OConnor_IREvents_ACL2013

Replication software, data, and supplementary materials for the paper: O'Connor, Stewart and Smith, ACL-2013, "Learning to Extract International Relations from Political Context"
C++
26
star
9

mte

MiTextExplorer - interactive browser of text and document covariates.
Java
24
star
10

myutil

Java
23
star
11

dlanalysis

a bunch of R code for various statistical analyses
R
21
star
12

conplot

Console ascii art plotter - quick-and-dirty data visualization, e.g. for log statistics
Python
18
star
13

running_stat

Running variance / standard deviation calculation (C++ and Python)
Python
14
star
14

cmdutils

Some command-line utilities, mostly for data manipulation and inspection.
Python
13
star
15

muc4_proc

preprocessing of the MUC4 dataset
Python
11
star
16

bow

A patched version of bow & rainbow 20020213 that compiles with modern gcc 4.0.1, OSX 10.5
C
11
star
17

twitter_geo_preproc

A preprocessing script to get geo-coded tweets from the Streaming API
Python
9
star
18

gfl_syntax

Graph Fragment Language for Easy Syntactic Annotation
Python
8
star
19

nlp_jobs

research code from rion and brendan when writing snow, o'connor, jurafsky, ng EMNLP-2008 "cheap and fast, but is it good?"
Ruby
6
star
20

stanfordnlp-util

java utilities for stanford nlp
Java
5
star
21

gigaword_conversion

Python
3
star
22

glmnet_starter

Starter code for the glmnet package (elastic net regressions)
R
2
star
23

slmunge

Scripts to munge certain machine learning sparse data formats, including SVMLight/LibSVM
Python
2
star
24

twitter_geo_viz

REALLY HALFBAKED DO NOT USE YOU MAY CRASH OUR SERVER
JavaScript
2
star
25

namefreedom

data and analysis of country names versus democratic freedoms
2
star
26

viewdb

HTML report of an SQL DB's schema and data
Python
1
star
27

super_tuesday_2020

analysis of Super Tuesday exit poll data
HTML
1
star
28

flex-for-morpha

Patched version of GNU Flex 2.5.35 to compile "morpha"
C
1
star
29

beta_explorer

1
star
30

flightstats

Python
1
star
31

randomsearch

web app to randomly choose which search engine to use per query
Python
1
star