• Stars
    star
    168
  • Rank 225,507 (Top 5 %)
  • Language
    R
  • Created over 11 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

R package for stylometric analyses

stylo: R package for stylometric analyses

Authors: Maciej Eder*, Mike Kestemont, Jan Rybicki, Steffen Pielström
License: GPL-3

CRAN Version CRAN Downloads Downloads GitHub R package version license paper

This package provides a number of functions, supplemented by a GUI, to perform various analyses in the field of computational stylistics, authorship attribution, etc.

Refer to the Computational Stylistics Group webpage, especially the subpage Projects, to get some ideas about possible applications of the package stylo.

Citation

If you find the package stylo useful and plan to publish your results, please consider citing the following paper:

Eder, M., Rybicki, J. and Kestemont, M. (2016). Stylometry with R: a package for computational text analysis. R Journal, 8(1): 107-21. https://journal.r-project.org/archive/2016/RJ-2016-007/index.html

Installation

There are four ways of installing stylo:

  1. from CRAN repository
  2. from the GitHub repository, via the package devtools
  3. from a locally downloaded file
  4. building the package directly from source files

1. Installing from CRAN repository

This is the simplest way to install stylo (as well as any other R package). Launch R, make sure you are connected to the internet, type:

install.packages("stylo")

choose your favorite CRAN mirror (a window will usually pop up), click OK.

If you are a MacOS user, please have a look below, at the Installation issues section.

2. Installing from the GitHub repository

A convenient way to install R packages directly from the GitHub repository is to use the package devtools. Unless you have already installed it, you should do it now:

install.packages("devtools")

Then, install the package stylo:

library(devtools)
install_github("computationalstylistics/stylo")

The remarks about possible issues on MacOS apply are valid also in this case.

3. Installing from a local file

This is an option for more advanced users. You need to obtain a so-called tarball file, which is a compressed version of the package (you can grab it from CRAN). It might be named stylo_0.7.4.tar.gz, depending of the current version of course. Then type in R console:

setwd("I/hope/I/can/remember/where/I/have/put/the/zipfile/")
install.packages("stylo_0.7.4.tar.gz", repos = NULL, type = "source")

4. Building a package from source files

This is something for real geeks. Clone this very repository, unpack it, and type the following lines at the command prompt:

R CMD build stylo
R CMD INSTALL stylo

Installation issues

NOTE (Mac OS users): the package stylo requires X11 support being installed. To quote "R for Mac OS X FAQ" (http://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html): “Each binary distribution of R available through CRAN is build to use the X11 implementation of Tcl/Tk. Of course a X windows server has to be started first: this should happen automatically on OS X, provided it has been installed (it needs a separate install on Mountain Lion or later). The first time things are done in the X server there can be a long delay whilst a font cache is constructed; starting the server can take several seconds.”

You might also run into encoding errors when you start up R (e.g. “WARNING: You're using a non-UTF8 locale” etc.). In that case, you should close R, open a new window in Applications > Terminal and execute the following line:

defaults write org.R-project.R force.LANG en_US.UTF-8

Next, close the Terminal and start up R again.

ANOTHER NOTE A slightly different workaround of the above problem (Mac users again):

  • Install XQuartz, restart Mac
  • Open Terminal, type: sudo ln -s /opt/X11 /usr/X11
  • Run XQuartz
  • Run R, type: system('defaults write org.R-project.R force.LANG en_US.UTF-8')

YET ANOTHER NOTE On MacOS Mojave one usually faces the problem of not properly recognized tcltk support. Open your terminal and type the following command:

xcode-select --install

This will download and install xcode developer tools and fix the problem. The problem is that one needs to explicitly agree to the license agreement.

Usage

This section is meant to give the users a general outline of what the package can do, rather than providing a comprehensive description of designing a stylometric test using the R package stylo. Refer to the following documents:

  • for (real) beginners: a crush introduction in the form of a slideshow
  • for (sort of) beginners: a concise HOWTO
  • for advanced users: a paper in R Journal
  • full documentation at CRAN

Materials on Youtube

Docs on non-obvious functionalities

Other relevant resources

  • Kudos to David L. Wrisley for a super-useful post (a notebook with code snippets, to be precise) on analyzing the Gutenberg Project texts with stylo.

  • Despite a black legend, R and Python are not necessarily in a deadly clash: here is a great post by José Calvo Tello on invoking the package stylo directly from Python!

  • Using the package stylo with the TXM environment: see this post by Serge Heiden.

  • Probably not a bad idea to check a comprehensive Stylometry Bibliography curated by Christof Schöch, before starting an experiment in text analysis.

  • The package stylo has been created as a by-product of a few projects conducted by the Computational Stylistics Group. See this website for further details. An older version of the webpage is also there, even if it has not been be updated for a while.

More Repositories

1

100_english_novels

A benchmark corpus of 100 English novels, covering the 19th and the beginning of the 20th century
11
star
2

stylo_howto

Documentation for 'stylo', an R package for text analysis, suitable for authorship attribution, stylometry, and other multivariate analysis tasks in the domain of (literary) texts
TeX
10
star
3

tidystopwords

Customizable lists of stopwords in multiple languages
R
6
star
4

A_Small_Collection_of_British_Fiction

A selection of 28 classic British novels from the 19th century (including a few late 18th-century items). Full text versions, in plain text format, harvested from trustworthy public domain sites.
6
star
5

beyond_Manhattan

Data and code supporting the study Manhattan, Euclidean, and their Siblings: Exploring Exotic Similarity Measures in Text Classification
R
4
star
6

100_polish_novels

A benchmark corpus of 100 Polish novels, covering the 19th and the beginning of the 20th century
4
star
7

68_german_novels

A benchmark corpus of 68 German novels, covering the 19th and the beginning of the 20th century
4
star
8

DHAbstracts_biblio_style

A bibliographic style definition for Digital Humanities 2016 conference
3
star
9

computationalstylistics.github.io

SCSS
3
star
10

NT_Vulgate

2
star
11

stylometry_of_papyri

2
star
12

preprints

A selection of pre-prints by the members of the Group
1
star
13

word_frequencies

Code for the study on improving relative word frequencies
1
star
14

presentations

HTML
1
star
15

litRiddle

The package contains the data of a reader survey about fiction in Dutch, a description of the novels the readers rated, and the results of stylistic measurements of the novels. The package also contains functions to combine, analyze, and visualize these data.
R
1
star