tedunderwood/JDH-scripts

Stars
2
Language
R
Created over 12 years ago
Updated over 12 years ago

tedunderwood/JDH-scripts

tedunderwood

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

DataMunging

Scripts that clean up OCR and munge Hathi metadata.

fictional-time-with-GPT4

An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.

Jupyter Notebook

noveltmmeta

Code and data supporting "NovelTM Data Sets for English-Language Fiction."

Jupyter Notebook

paceofchange

Code and data to support the article, "How quickly do literary standards change?"

fiction

Project on the history of genre.

LIS590DSH

Jupyter Notebook

BrowseLDA

R scripts that browse the results of LDA

character

Data and code for analyzing language associated with fictional characters.

Jupyter Notebook

genredistance

Exploring textual and social measures of distance between genres.

Jupyter Notebook

LDA

A Java package that does basic LDA, without hyperparameter optimization. Folder settings are local. Ymmv.

plot

Initial exploratory research on patterns of change across narrative time.

Jupyter Notebook

genre

Code for Understanding Genre in a Collection of a Million Volumes.

horizon

Data and code to support Distant Horizons (University of Chicago Press, 2019).

Jupyter Notebook

nehuncertainty

Code used in "Broadening Access to Text Analysis by Describing Uncertainty."

Jupyter Notebook

period-cohort

Code and data for an experiment on the relation between individual change and cohort succession in literary history.

bayes-bestsellers

Code and data to support "Bestsellers and Critical Favorites 1850-1949," a paper at CA2017.

Jupyter Notebook

reviews

Parsing periodical indexes and finding book reviews, 1800-2007.

is417

IS 417, Data Science in the Humanities.

Jupyter Notebook

changepoint

Measuring the scale and significance of changes *in the pace of change* in an auto-correlated multivariate time series.

Jupyter Notebook

ocreval

Python modules that evaluate OCR quality.

badpublicity

A presentation at MLA 2020 in Seattle, "No Such Thing as Bad Publicity: Toward a Distant Reading of Reception."

hathimetadata

Metadata for English-language fiction and poetry beyond 1923 in HathiTrust Digital Library.

Java-OCR-spellchecking.

Tokenizer

Python scripts for tokenizing text files

measureperspective

Code and data to support "Machine Learning and Human Perspective."

Jupyter Notebook

Parallel-LDA

Java package that partitions a corpus and runs LDA in parallel on it

riseandfall

Code and data supporting The Rise and Fall of Genre Differentiation in English-Language Fiction.

moments

Data and code to support "Why Is Literary Time Measured in Minutes?"

Jupyter Notebook

meta2018

A temporary workspace for novelTM metadata reviewed and analyzed in summer 2018.

Jupyter Notebook

GenreProject

Code and documentation associated with "Understanding Genre in a Collection of a Million Volumes"

collator

Python scripts for collating HathiTrust page files.

pmla-scripts

Data for 1924-2006 pmla model, plus scripts to turn into Gephi network.

noise

Data and code for measuring consequences of noise in digital libraries.

asymmetry

Research on information-theoretic asymmetries in literary history.

Jupyter Notebook

avant

Was the avant-garde really ahead of its time?

Jupyter Notebook

oralarg

Code and results related to oral argument in the Supreme Court. Work in progress: Tonja Jacobi, Matthew Sag, and Ted Underwood.

Jupyter Notebook

overlappingcategories

Python 3 code for training models in a multilabel environment where classes overlap. Based on code in the fiction repo, but with bug fixes and improvements.

Tokenize

folder storing current rulesets, scripts, and metadata for tokenizing / collection building

pages

Java code for mapping genres at the page level in a large collection. Originally based on pagelevelHMM.

20cgenres

Code and data used for page-level mapping of literary genres beyond 1923.

roles

Code for a topic modeling variant that allows for character level 'roles' as well as book-level 'themes.'

time

Further research on narrative pace.

Jupyter Notebook

metadatapredictor

Java code that uses existing metadata to train classifiers that then make predictions for cases where metadata is missing / suspected.