• Stars
    star
    110
  • Rank 305,396 (Top 7 %)
  • Language
    R
  • Created about 8 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A data package containing lexicons and dictionaries for text analysis

lexicon

Project Status: Active - The project has reached a stable, usable state and is being actively developed. Build Status

Table of Contents

Description

lexicon is a collection of lexical hash tables, dictionaries, and word lists. The data prefixes help to categorize the data types:

Prefix Meaning
key_ A data.frame with a lookup and return value
hash_ A keyed data.table hash table
freq_ A data.table of terms with frequencies
profanity_ A profane words vector
pos_ A part of speech vector
pos_df_ A part of speech data.frame
sw_ A stopword vector

Data

Data Description
cliches Common Cliches
common_names First Names (U.S.)
constraining_loughran_mcdonald Loughran-McDonald Constraining Words
emojis_sentiment Emoji Sentiment Data
freq_first_names Frequent U.S. First Names
freq_last_names Frequent U.S. Last Names
function_words Function Words
grady_augmented Augmented List of Grady Ward’s English Words and Mark Kantrowitz’s Names List
hash_emojis Emoji Description Lookup Table
hash_emojis_identifier Emoji Identifier Lookup Table
hash_emoticons Emoticons
hash_grady_pos Grady Ward’s Moby Parts of Speech
hash_internet_slang List of Internet Slang and Corresponding Meanings
hash_lemmas Lemmatization List
hash_nrc_emotions NRC Emotion Table
hash_sentiment_emojis Emoji Sentiment Polarity Lookup Table
hash_sentiment_huliu Hu Liu Polarity Lookup Table
hash_sentiment_jockers Jockers Sentiment Polarity Table
hash_sentiment_jockers_rinker Combined Jockers & Rinker Polarity Lookup Table
hash_sentiment_loughran_mcdonald Loughran-McDonald Polarity Table
hash_sentiment_nrc NRC Sentiment Polarity Table
hash_sentiment_senticnet Augmented SenticNet Polarity Table
hash_sentiment_sentiword Augmented Sentiword Polarity Table
hash_sentiment_slangsd SlangSD Sentiment Polarity Table
hash_sentiment_socal_google SO-CAL Google Polarity Table
hash_valence_shifters Valence Shifters
key_contractions Contraction Conversions
key_corporate_social_responsibility Nadra Pencle and Irina Malaescu’s Corporate Social Responsibility Dictionary
key_grade Grades Data Set
key_rating Ratings Data Set
key_regressive_imagery Colin Martindale’s English Regressive Imagery Dictionary
key_sentiment_jockers Jockers Sentiment Data Set
modal_loughran_mcdonald Loughran-McDonald Modal List
nrc_emotions NRC Emotions
pos_action_verb Action Word List
pos_df_irregular_nouns Irregular Nouns Word Dataframe
pos_df_pronouns Pronouns
pos_interjections Interjections
pos_preposition Preposition Words
profanity_alvarez Alejandro U. Alvarez’s List of Profane Words
profanity_arr_bad Stackoverflow user2592414’s List of Profane Words
profanity_banned bannedwordlist.com’s List of Profane Words
profanity_racist Titus Wormer’s List of Racist Words
profanity_zac_anger Zac Anger’s List of Profane Words
sw_dolch Leveled Dolch List of 220 Common Words
sw_fry_100 Fry’s 100 Most Commonly Used English Words
sw_fry_1000 Fry’s 1000 Most Commonly Used English Words
sw_fry_200 Fry’s 200 Most Commonly Used English Words
sw_fry_25 Fry’s 25 Most Commonly Used English Words
sw_jockers Matthew Jocker’s Expanded Topic Modeling Stopword List
sw_loughran_mcdonald_long Loughran-McDonald Long Stopword List
sw_loughran_mcdonald_short Loughran-McDonald Short Stopword List
sw_lucene Lucene Stopword List
sw_mallet MALLET Stopword List
sw_python Python Stopword List

Installation

To download the development version of lexicon:

Download the zip ball or tar ball, decompress and run R CMD INSTALL on it, or use the pacman package to install the development version:

if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/lexicon")

Contact

You are welcome to:

More Repositories

1

sentimentr

Dictionary based sentiment analysis that considers valence shifters
R
416
star
2

pacman

A package management tools for R
HTML
290
star
3

wakefield

Generate random data sets
R
247
star
4

textclean

Tools for cleaning and normalizing text data
R
235
star
5

topicmodels_learning

A repository of learning & R resources related to topic models
R
227
star
6

qdap

Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis
R
172
star
7

reports

An R package to assist in the workflow of writing academic articles and other reports
R
102
star
8

textreadr

Tools to uniformly read in text data including semi-structured transcripts
R
72
star
9

numform

tools to assist in the formatting of numbers and plots for publication
R
52
star
10

entity

Easy named entity extraction
R
51
star
11

qdapRegex

qdapRegex is a collection of regular expression tools associated with the qdap package that may be useful outside of the context of discourse analysis.
R
47
star
12

textshape

Tools for reshaping text data
R
45
star
13

textstem

Tools for fast text stemming & lemmatization
R
41
star
14

plotflow

A group of tools to speed up work flow associated with plotting tasks.
R
39
star
15

dplyr_in_a_nutshell

This is a minimal guide, mostly for myself, to remind me of the most import dplyr functions and how they relate to base R functions I'm that familiar with.
35
star
16

Make_Task

A minimal Example for Scheduling Windows Tasks with R
R
34
star
17

gmailR

send email with attachments in R
R
27
star
18

termco

Regular Expression Counts of Terms and Substrings
R
25
star
19

readability

Fast readability scores for text data
R
22
star
20

pathr

R
19
star
21

gofastr

Make a DocumentTermMatrix faster
R
19
star
22

clustext

Easy, fast clustering of texts
R
18
star
23

tidyr_in_a_nutshell

18
star
24

rnltk

R
18
star
25

textplot

Plotting for text data
R
18
star
26

stansent

R
16
star
27

pax

R
16
star
28

regexr

Readable Regular Expressions
HTML
14
star
29

qdapTools

qdapTools is an R package that contains tools associated with the qdap package that may be useful outside of the context of text analysis.
R
13
star
30

syllable

A Small Collection of Syllable Counting Functions
R
11
star
31

tagger

Part of speech (POS) tagger
R
11
star
32

pysty

R
10
star
33

sentimentpy

A Python port of the #rstats sentimentr package
Python
9
star
34

hclustext

R
8
star
35

rmarkdown_variable_doc_demo

R
7
star
36

cal

R console calendars
R
7
star
37

read_docx

R
5
star
38

gtrend

A wrapper for the GTrendsR package for work that interests me.
R
4
star
39

hangman

hangman game
R
4
star
40

qdapDictionaries

Word lists used by the qdap package.
HTML
4
star
41

lemmar

R
4
star
42

parsent

Sentence parsing tools; create sentence parse trees & extract portions of sentences
R
3
star
43

kmeanstext

R
3
star
44

formality

R
3
star
45

CAinterprTools

R package for visual aid to the interpretation of Correspondence Analysis
R
3
star
46

Regression

Tools for regression analyisis
R
3
star
47

discon

Tools for analyzing discourse connectors in text
HTML
3
star
48

qdap2

R
2
star
49

Annotated_Bibliography

TeX
2
star
50

blog_pacman

Blog for Initial Release of pacman
2
star
51

synonym

R
2
star
52

cv

Curriculum Vitae for Tyler Rinker
HTML
2
star
53

testing_Rmd

R
2
star
54

rdir

Functions to work with directories
R
2
star
55

word_vectors_learning

1
star
56

lexr

R
1
star
57

validateMake

Python
1
star
58

coreNLPsetup

Easy coreNLP setup
R
1
star
59

space_manikin

TeX
1
star
60

hilight

R
1
star
61

metaDAT

R
1
star
62

textcorpus

R
1
star
63

flip_example

JavaScript
1
star
64

trinker.github.com

HTML
1
star
65

DIFdetect

R
1
star
66

textcode

R
1
star
67

wakefield_shiny

R
1
star
68

embodied

A package that provides video analysis tools for embodiement related tasks
TeX
1
star
69

acc.ggplot2

A collection of tools to extend and speed up coding for repeated uses of plotting functions that use ggplot2.
R
1
star
70

mapit

R
1
star
71

textproj

R
1
star
72

ggtree-1

This is a read-only mirror of the Bioconductor SVN repository. Package Homepage: http://bioconductor.org/packages/devel/bioc/html/ggtree.html Contributions: https://github.com/GuangchuangYu/ggtree. Bug Reports: https://support.bioconductor.org/p/new/post/?tag_val=ggtree or https://github.com/GuangchuangYu/ggtree/issues.
R
1
star
73

carnegie

R
1
star
74

SOdemoing

R
1
star
75

bounding_box

R
1
star