Ted Underwood (@tedunderwood)

Top repositories

1

DataMunging

Scripts that clean up OCR and munge Hathi metadata.
Python
69
star
2

fictional-time-with-GPT4

An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.
Jupyter Notebook
30
star
3

paceofchange

Code and data to support the article, "How quickly do literary standards change?"
Python
21
star
4

fiction

Project on the history of genre.
Python
21
star
5

LIS590DSH

Jupyter Notebook
19
star
6

BrowseLDA

R scripts that browse the results of LDA
R
19
star
7

noveltmmeta

Code and data supporting "NovelTM Data Sets for English-Language Fiction."
Jupyter Notebook
17
star
8

character

Data and code for analyzing language associated with fictional characters.
Jupyter Notebook
15
star
9

genredistance

Exploring textual and social measures of distance between genres.
Jupyter Notebook
14
star
10

LDA

A Java package that does basic LDA, without hyperparameter optimization. Folder settings are local. Ymmv.
Java
13
star
11

plot

Initial exploratory research on patterns of change across narrative time.
Jupyter Notebook
10
star
12

genre

Code for Understanding Genre in a Collection of a Million Volumes.
HTML
10
star
13

horizon

Data and code to support Distant Horizons (University of Chicago Press, 2019).
Jupyter Notebook
10
star
14

nehuncertainty

Code used in "Broadening Access to Text Analysis by Describing Uncertainty."
Jupyter Notebook
7
star
15

period-cohort

Code and data for an experiment on the relation between individual change and cohort succession in literary history.
HTML
6
star
16

bayes-bestsellers

Code and data to support "Bestsellers and Critical Favorites 1850-1949," a paper at CA2017.
Jupyter Notebook
6
star
17

badpublicity

A presentation at MLA 2020 in Seattle, "No Such Thing as Bad Publicity: Toward a Distant Reading of Reception."
Python
5
star
18

reviews

Parsing periodical indexes and finding book reviews, 1800-2007.
Python
5
star
19

is417

IS 417, Data Science in the Humanities.
Jupyter Notebook
5
star
20

changepoint

Measuring the scale and significance of changes *in the pace of change* in an auto-correlated multivariate time series.
Jupyter Notebook
5
star
21

ocreval

Python modules that evaluate OCR quality.
Python
5
star
22

Java-OCR-spellchecking.

Java
4
star
23

Tokenizer

Python scripts for tokenizing text files
Python
4
star
24

hathimetadata

Metadata for English-language fiction and poetry beyond 1923 in HathiTrust Digital Library.
Python
4
star
25

measureperspective

Code and data to support "Machine Learning and Human Perspective."
Jupyter Notebook
4
star
26

Parallel-LDA

Java package that partitions a corpus and runs LDA in parallel on it
Java
3
star
27

riseandfall

Code and data supporting The Rise and Fall of Genre Differentiation in English-Language Fiction.
Python
3
star
28

moments

Data and code to support "Why Is Literary Time Measured in Minutes?"
Jupyter Notebook
3
star
29

meta2018

A temporary workspace for novelTM metadata reviewed and analyzed in summer 2018.
Jupyter Notebook
3
star
30

GenreProject

Code and documentation associated with "Understanding Genre in a Collection of a Million Volumes"
Python
3
star
31

JDH-scripts

R
2
star
32

collator

Python scripts for collating HathiTrust page files.
Python
2
star
33

pmla-scripts

Data for 1924-2006 pmla model, plus scripts to turn into Gephi network.
R
2
star
34

noise

Data and code for measuring consequences of noise in digital libraries.
Python
2
star
35

asymmetry

Research on information-theoretic asymmetries in literary history.
Jupyter Notebook
2
star
36

avant

Was the avant-garde really ahead of its time?
Jupyter Notebook
2
star
37

oralarg

Code and results related to oral argument in the Supreme Court. Work in progress: Tonja Jacobi, Matthew Sag, and Ted Underwood.
Jupyter Notebook
1
star
38

overlappingcategories

Python 3 code for training models in a multilabel environment where classes overlap. Based on code in the fiction repo, but with bug fixes and improvements.
Python
1
star
39

Tokenize

folder storing current rulesets, scripts, and metadata for tokenizing / collection building
Python
1
star
40

pages

Java code for mapping genres at the page level in a large collection. Originally based on pagelevelHMM.
Java
1
star
41

20cgenres

Code and data used for page-level mapping of literary genres beyond 1923.
Python
1
star
42

roles

Code for a topic modeling variant that allows for character level 'roles' as well as book-level 'themes.'
Python
1
star
43

time

Further research on narrative pace.
Jupyter Notebook
1
star
44

metadatapredictor

Java code that uses existing metadata to train classifiers that then make predictions for cases where metadata is missing / suspected.
Java
1
star