• Stars
    star
    329
  • Rank 128,030 (Top 3 %)
  • Language
    R
  • License
    Other
  • Created over 8 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Word counts and readability statistics in R markdown documents

wordcountaddin

Last-changedate minimal R version Licence Travis-CI Build Status codecov.io ORCiD

This R package is an RStudio addin to count words and characters in text in an R markdown document. It also has a function to compute readability statistics so you can get an indication of how easy or difficult your document is to read.

You can count words in your Rmd file in three ways:

  • In a selection of text in your active Rmd, by selecting some text with your mouse in RStudio and using the Wordcount Addin
  • All the words in your active Rmd in RStudio, by using the Wordcount Addin with no text selected
  • All the words in an Rmd file, directly using the word_count function from the console or command line (RStudio not required), and specifiying the filename as an argument to the function (e.g. wordcountaddin::word_count("my_file.Rmd")). This will give you a single integer result, rather than the Markdown table that the other functions return.

Independent of an Rmd file, you can also count words in a character vector from the console using the text_stats_chr function (and there is readability_chr for readability).

Word count

When counting words in the text of your Rmd document, these things will be ignored:

  • YAML front matter
  • code chunks and inline code
  • text in HTML comment tags: <!-- text -->
  • HTML tags in the text: <br>, </br>
  • inline URLs in this format: [text of link](url)
  • images with captions in this format: ![this is the caption](/path/to/image.png)
  • header level indicators such as # and ##, etc.

And because my regex is quite simple, the word count function may also ignore parts of your actual text that resemble these things.

The word count will include text in headers, block quotations, verbatim code blocks, tables, raw LaTeX and raw HTML.

In general, there are numerous ways to count words, with no widely accepted standard method. The variety of methods is due to differences in the definitions of a word and a sentence. Run ?stringi::stri_stats_latex and ?koRpus::describe to learn more about the word counting methods.

For this addin I’ve included two methods, mostly out of curiosity to see how they differ from each other. I use functions from the stringi and koRpus packages. If you’re curious, you can compare the results you get with this addin to an online tool such as http://wordcounttools.com/.

The output of the Word count function is a markdown table in your R console that might look like this:

|Method          |koRpus      |stringi       |
|:---------------|:-----------|:-------------|
|Word count      |107         |104           |
|Character count |604         |603           |
|Sentence count  |10          |Not available |
|Reading time    |0.5 minutes |0.5 minutes   |

If you want to reuse these results in other R functions, you can use an unexported function like this wordcountaddin:::text_stats_fn_(text), where text is a character vector of your text (with length one, ie. all your text in a single character string). The output will be a list object, and will include several other items not shown in the markdown table.

Readability

The readability function ignores all the same parts of the text as the word count function, and then computes the values of a bunch of readability statistics.

Most of these readability measurements aim to approximate the years of education required to understand your text. They look at the number of characters and syllables per word, the number of words per sentence, and so on. They don’t analyse the meaning of the words. A score of around 10-12 is roughly the reading level on completion of high school in the US. These stats are computed by the koRpus package.

There about 27 measurements that this readability function returns (depending on how long your text is), including the Automated Readability Index (ARI), Coleman-Liau, th Flesch-Kincaid Grade Level, and the Simple Measure of Gobbledygook (SMOG). For the full list of readability measurements that are returned by the readability function, run ?koRpus::readability. That help page also shows the formulae and citations for each statistic (and an additional 20-odd other readability statistics not used here).

Readability stats are, of course, no substitute for critical self-reflection on the effectiveness of your writing at communicating ideas and information. To help with that, read Style: Toward Clarity and Grace.

The output of the readability function is a markdown table in your R console that might look like this:

|index                 |flavour     |raw   |grade |age  |
|:---------------------|:-----------|:-----|:-----|:----|
|ARI                   |            |      |2.31  |     |
|Coleman-Liau          |            |66    |4.91  |     |
|Danielson-Bryan DB1   |            |6.46  |      |     |
|Danielson-Bryan DB2   |            |60.39 |6     |     |
|Dickes-Steiwer        |            |53.07 |      |     |
|ELF                   |            |1.83  |      |     |
|Farr-Jenkins-Paterson |            |66.81 |8-9   |     |
|Flesch                |en (Flesch) |69.57 |8-9   |     |
|Flesch-Kincaid        |            |      |4.85  |9.8  |
|FOG                   |            |      |7.84  |     |
|FORCAST               |            |      |10.28 |15.3 |
|Fucks                 |            |23.38 |4.83  |     |
|Linsear-Write         |            |      |2.35  |     |
|LIX                   |            |32.41 |< 5   |     |
|nWS1                  |            |      |4.19  |     |
|nWS2                  |            |      |4.72  |     |
|nWS3                  |            |      |4.14  |     |
|nWS4                  |            |      |3.64  |     |
|RIX                   |            |1.42  |5     |     |
|SMOG                  |            |      |8.08  |13.1 |
|Strain                |            |2.44  |      |     |
|TRI                   |            |-94   |      |     |
|Tuldava               |            |2.57  |      |     |
|Wheeler-Smith         |            |18.33 |2     |     |

Similar to the word count function, if you want to reuse these results in other R functions, you can use an unexported function like this wordcountaddin:::readability_fn_(text), where text is a character vector of your text (with length one, ie. all your text in a single character string). The output will be a list object with slightly more detail than the summary table above.

Inspiration for this addin came from jadd and WrapRmd.

How to install

Install with devtools::install_github("benmarwick/wordcountaddin", type = "source", dependencies = TRUE)

Go to Tools > Addins in RStudio to select and configure addins.

How to use

  1. Open a Rmd file in RStudio.
  2. Select some text, it can include YAML, code chunks and inline code
  3. Go to Tools > Addins in RStudio and click on Word count or Readability. Computing Readability may take a few moments on longer documents because it has to count syllables for some of the stats.
  4. Look in the console for the output

Feedback, contributing, etc.

Please open an issue if you find something that doesn’t work as expected. Note that this project is released with a Guide to Contributing and a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

More Repositories

1

rrtools

rrtools: Tools for Writing Reproducible Research in R
R
670
star
2

ctv-archaeology

CRAN Task View: Archaeological Science
R
131
star
3

AAA2011-Tweets

R code for analyzing tweets relating to #AAA2011 (text mining, topic modelling, network analysis, clustering and sentiment analysis)
R
71
star
4

JSTORr

Simple text mining of journal articles from JSTOR's Data for Research service
R
70
star
5

Interactive_PCA_Explorer

Shiny app for exploring a PCA
R
47
star
6

How-To-Do-Archaeological-Science-Using-R

HTML
29
star
7

researchcompendium

NOTE: This repo is archived. Please see https://github.com/benmarwick/rrtools for my current approach
R
25
star
8

bookdown-ort

An experiment to add elements of the Open Review Toolkit to bookdown
TeX
20
star
9

dayofarchaeology

A Distant Reading of the Day of Archaeology
R
20
star
10

binford

Datasets used in Binford's 2001 book "Constructing Frames of Reference: An Analytical Method for Archaeological Theory Building Using Ethnographic and Environmental Data Sets"
R
19
star
11

mjbtramp

TeX
17
star
12

CSSCR-2018-R-Markdown-for-Research-Students

View the slides here: https://rawgit.com/benmarwick/CSSCR-2018-R-Markdown-for-Research-Students/master/uw-csscr-huskydown-slides.html
HTML
16
star
13

atom-for-scholarly-writing-with-markdown

TeX
16
star
14

outliner

R
14
star
15

1989-excavation-report-Madjedbebe

Text, data and code to accompany the analysis of the 1989 excavation data
R
14
star
16

evoarchdata

Datasets from studies of cultural evolution in Archaeology
R
13
star
17

signatselect

signatselect: Identifying signatures of selection
R
11
star
18

basic_computational_reproducibility_case_study

TeX
11
star
19

cvequality

Tests for the equality of coefficients of variation from multiple groups
R
10
star
20

UW-eScience-docker-for-reproducible-research

This repository contains my slides and references for a presentation to the UW eScience Institute on using Docker for reproducible research (10 Feb 2015). To view the slides, go to http://benmarwick.github.io/UW-eScience-docker-for-reproducible-research
10
star
21

Analysing-Archaeological-Radiocabon-Ages-Using-R

R
9
star
22

snakecaser

An RStudio Add-in to convert text to snake_case (e.g. for making object names)
R
9
star
23

SAA2015-Open-Methods

Materials relating to the SAA2015 session on Open Methods in Archaeology
HTML
9
star
24

UW-eScience-reproducibility-social-sciences

This repository contains my slides and references for a presentation to the UW eScience Institute on reproducible research in the social sciences (9 April 2014). To view the slides, go to http://benmarwick.github.io/UW-eScience-reproducibility-social-sciences
9
star
25

Stratigraphy-and-radiocarbon-dates-from-Gua-Mo-o-hono-Sulawesi

Text, code and data to accompany Piper et al.
R
8
star
26

smps

time series colour contour plots of data from Scanning Mobility Particle Sizer (SMPS) data
R
7
star
27

CES2021

R
6
star
28

LaplacesDemon

A complete environment for Bayesian inference within R
R
6
star
29

CSSS-Primer-Reproducible-Research

This repository contains my slides and references for a presentation to the UW Center for Statistics and Social Sciences on reproducible research in the social sciences (12 March 2014). To view the slides, go to http://benmarwick.github.io/CSSS-Primer-Reproducible-Research
HTML
6
star
30

teaching-replication-in-archaeology

This repository contains the data and code for our paper: "How to use replication assignments for teaching integrity in empirical archaeology"
TeX
6
star
31

CAA2021

R
5
star
32

tidyverse-for-archaeology

View slides at
HTML
5
star
33

mjbnaturepaper

R
5
star
34

gsloid

Global Sea Level and Oxygen Isotope Data
R
5
star
35

arcas-workshop-good-stat-practice

R
5
star
36

rmgarbage

Automatic garbage extraction from OCR'd text
R
4
star
37

saa-ethics-survey-2020

HTML
4
star
38

onboarding-reproducible-compendia

4
star
39

roev

Rates of Evolution
R
4
star
40

new-data-presentation-paradigm-using-r

Using R to produce the plots recommended by Weissgerber et al. in 10.1371/journal.pbio.1002128. To see the plots click here: https://rawgit.com/benmarwick/new-data-presentation-paradigm-using-r/master/Weissgerber_et_al_supplementary_plots.html
HTML
4
star
41

saa-meeting-abstracts

Quantitative analysis of test in SAA abstracts (raw data is available in this repo)
HTML
4
star
42

CSSS_2016_Packaging

View slides at https://rawgit.com/benmarwick/CSSS_2016_Packaging/master/CSSS_2016_Packaging.html
HTML
3
star
43

Advances-in-Archaeological-Practice-Tweets

R
3
star
44

culturalevochange

R
3
star
45

scopusarchaeology

Explore the titles of archaeology articles from Scopus
R
3
star
46

SAA2017-How-to-do-archaeological-science-using-R

HTML
3
star
47

Marwick-Nara-2019-lecture-4-rrtools-workshop

View the slides here: https://benmarwick.github.io/Marwick-Nara-2019-lecture-4-rrtools-workshop/Marwick-Nara-2019-lecture-4-rrtools-workshop.html#1
JavaScript
3
star
48

CSSS-594-WI23-text-as-data

CS&SS 594 A Wi 23: Special Topics In Social Science And Statistics: Text as Data
Dockerfile
3
star
49

ETH-Zurich-ZuKoSt-Reproducible-Research-Compendia-via-R-packages

Slides for my seminar on 2 March 2017, view the slides here: https://rawgit.com/benmarwick/ETH-Zurich-ZuKoSt-Reproducible-Research-Compendia-via-R-packages/master/ETH-Z%C3%BCrich-Z%C3%BCKoSt-Reproducible-Research-Compendia-via-R-packages.html
HTML
3
star
50

UW-eScience-reproducibility-collaboration

This repository contains my slides and references for a presentation to the UW eScience Institute on reproducible research and collaboration (2 Dec 2014). To view the slides, go to http://benmarwick.github.io/UW-eScience-reproducibility-collaboration
CSS
3
star
51

2019-03-26-Cambridge-Archaeology-Big-Data-Workshop

Data Carpentry Workshop materials for the conference "Big Data in Archaeology: Practicalities and Possibilities"
3
star
52

polygonoverlap

The goal of polygonoverlap is to compute the probability that an observed area of overlap between two sets of polygons is due to chance
R
2
star
53

stat-inference-and-exploration-for-archaeologists

View the slides here:
HTML
2
star
54

Persistence-of-Public-Interest-in-Gun-Control

See here for the output with interactive plots: https://rawgit.com/benmarwick/Persistence-of-Public-Interest-in-Gun-Control/master/README.html
R
2
star
55

systematicsinprehistory

What the Package Does (One Line, Title Case)
HTML
2
star
56

olympicdamboundaries

R
2
star
57

linter-retextjs

A plugin for Atom's Linter that provides an interface to retext.
JavaScript
2
star
58

2019-04-10-saa-workshop

2
star
59

au13uwgeoarchlab

R Code for reproducible research in geoarchaeology
R
2
star
60

predictSource

HTML
2
star
61

UO-2018-On-Ramps-to-Reproducibility

Slides for my talk at the UO Anthropology Department Series.
JavaScript
2
star
62

2019-09-14-morph2019

Please view the website at: https://benmarwick.github.io/2019-09-14-morph2019/
PLSQL
2
star
63

aswr

TeX
2
star
64

guanyingdongstoneartefacts

2
star
65

Data-Science-at-UW-Poster

Text and code of poster presented at this event: http://escience.washington.edu/event/data-science-university-washington-campus-conversation
2
star
66

berlinsummerschoolkeynote

Code, data and slides for my keynote presentation at the 2017 Archaeology Summer School at Freie Universität Berlin
R
2
star
67

saa2019-tweets

Dockerfile
2
star
68

modelextinctionideas

What the Package Does (one line, title case)
HTML
2
star
69

Fatalities-from-the-2021-Military-Coup-in-Myanmar

Dashboard of fatalities from the 2021 Military coup in Myanmar. Data from the Assistance Association for Political Prisoners (Burma, https://aappb.org/)
2
star
70

Pleistocene-aged-stone-artefacts-from-Jerimalai--East-Timor

Text, data and code to accompany the analysis of stone artefacts reported in Marwick et al.
HTML
2
star
71

confschedlr

confschedlr is a package to help organise the program for the 2018 Society of American Archaeology meeting
R
1
star
72

2019-04-10-saa

Transparent and Open Archaeological Science Using R A Short Workshop at the Society of American Archaeology Annual Meeting, Albuquerque Convention Center
Python
1
star
73

March-2019-Cambridge-Big-Data-Archaeology

R
1
star
74

Hernandez-Fernandez-bioclimatic-models

1
star
75

saa-2019-Park-and-Marwick

R
1
star
76

Steele_et_al_VR003_MSA_Pigments

R
1
star
77

pandanusisotopes

Research compendium
Lua
1
star
78

VJU-Geoscience-mapping-with-R-Workshop

VJU Geoscience mapping with R Workshop, July 2022
R
1
star
79

dayofdh2014

R
1
star
80

kwakmarwickaas2015

HTML
1
star
81

ktc11

R
1
star
82

UOW-NIASRA-2016-talk

HTML
1
star
83

Monash-Wombat-2016-talk

HTML
1
star
84

maualithics

PLSQL
1
star
85

bm-vita

Is there a more complex way to write your CV than this? Probably not. PDF is here:
TeX
1
star
86

Seattle-UseR-Group-April-2018

View the slides online at https://rawgit.com/benmarwick/Seattle-UseR-Group-April-2018/master/Seattle-UseR-Group-April-2018.html
HTML
1
star
87

testcontainerit

R
1
star
88

Marwick-UCL-March-2019-Reproducibility

Reproducible Research at the University College London, March 2019: Workshop and Presentation
JavaScript
1
star
89

datacitation

Research compendium for our paper in 'Advances in Archaeological Practice'
HTML
1
star
90

Particle-size-analysis-Putslaagte-1

Text, data and code to accompany the particle size analysis reported in Mackay et al. 2014 http://dx.doi.org/10.1016/j.quaint.2014.05.007
1
star
91

marwick-and-maloney-saa2014

R
1
star