• Stars
    star
    2
  • Language
    R
  • Created over 12 years ago
  • Updated over 12 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

More Repositories

1

DataMunging

Scripts that clean up OCR and munge Hathi metadata.
Python
69
star
2

fictional-time-with-GPT4

An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.
Jupyter Notebook
30
star
3

noveltmmeta

Code and data supporting "NovelTM Data Sets for English-Language Fiction."
Jupyter Notebook
22
star
4

paceofchange

Code and data to support the article, "How quickly do literary standards change?"
Python
21
star
5

fiction

Project on the history of genre.
Python
21
star
6

LIS590DSH

Jupyter Notebook
19
star
7

BrowseLDA

R scripts that browse the results of LDA
R
19
star
8

character

Data and code for analyzing language associated with fictional characters.
Jupyter Notebook
15
star
9

genredistance

Exploring textual and social measures of distance between genres.
Jupyter Notebook
14
star
10

LDA

A Java package that does basic LDA, without hyperparameter optimization. Folder settings are local. Ymmv.
Java
13
star
11

plot

Initial exploratory research on patterns of change across narrative time.
Jupyter Notebook
10
star
12

genre

Code for Understanding Genre in a Collection of a Million Volumes.
HTML
10
star
13

horizon

Data and code to support Distant Horizons (University of Chicago Press, 2019).
Jupyter Notebook
10
star
14

nehuncertainty

Code used in "Broadening Access to Text Analysis by Describing Uncertainty."
Jupyter Notebook
7
star
15

period-cohort

Code and data for an experiment on the relation between individual change and cohort succession in literary history.
HTML
6
star
16

bayes-bestsellers

Code and data to support "Bestsellers and Critical Favorites 1850-1949," a paper at CA2017.
Jupyter Notebook
6
star
17

reviews

Parsing periodical indexes and finding book reviews, 1800-2007.
Python
5
star
18

is417

IS 417, Data Science in the Humanities.
Jupyter Notebook
5
star
19

changepoint

Measuring the scale and significance of changes *in the pace of change* in an auto-correlated multivariate time series.
Jupyter Notebook
5
star
20

ocreval

Python modules that evaluate OCR quality.
Python
5
star
21

badpublicity

A presentation at MLA 2020 in Seattle, "No Such Thing as Bad Publicity: Toward a Distant Reading of Reception."
Python
5
star
22

hathimetadata

Metadata for English-language fiction and poetry beyond 1923 in HathiTrust Digital Library.
Python
4
star
23

Java-OCR-spellchecking.

Java
4
star
24

Tokenizer

Python scripts for tokenizing text files
Python
4
star
25

measureperspective

Code and data to support "Machine Learning and Human Perspective."
Jupyter Notebook
4
star
26

Parallel-LDA

Java package that partitions a corpus and runs LDA in parallel on it
Java
3
star
27

riseandfall

Code and data supporting The Rise and Fall of Genre Differentiation in English-Language Fiction.
Python
3
star
28

moments

Data and code to support "Why Is Literary Time Measured in Minutes?"
Jupyter Notebook
3
star
29

meta2018

A temporary workspace for novelTM metadata reviewed and analyzed in summer 2018.
Jupyter Notebook
3
star
30

GenreProject

Code and documentation associated with "Understanding Genre in a Collection of a Million Volumes"
Python
3
star
31

collator

Python scripts for collating HathiTrust page files.
Python
2
star
32

pmla-scripts

Data for 1924-2006 pmla model, plus scripts to turn into Gephi network.
R
2
star
33

noise

Data and code for measuring consequences of noise in digital libraries.
Python
2
star
34

asymmetry

Research on information-theoretic asymmetries in literary history.
Jupyter Notebook
2
star
35

avant

Was the avant-garde really ahead of its time?
Jupyter Notebook
2
star
36

oralarg

Code and results related to oral argument in the Supreme Court. Work in progress: Tonja Jacobi, Matthew Sag, and Ted Underwood.
Jupyter Notebook
1
star
37

overlappingcategories

Python 3 code for training models in a multilabel environment where classes overlap. Based on code in the fiction repo, but with bug fixes and improvements.
Python
1
star
38

Tokenize

folder storing current rulesets, scripts, and metadata for tokenizing / collection building
Python
1
star
39

pages

Java code for mapping genres at the page level in a large collection. Originally based on pagelevelHMM.
Java
1
star
40

20cgenres

Code and data used for page-level mapping of literary genres beyond 1923.
Python
1
star
41

roles

Code for a topic modeling variant that allows for character level 'roles' as well as book-level 'themes.'
Python
1
star
42

time

Further research on narrative pace.
Jupyter Notebook
1
star
43

metadatapredictor

Java code that uses existing metadata to train classifiers that then make predictions for cases where metadata is missing / suspected.
Java
1
star