There are no reviews yet. Be the first to send feedback to the community and the maintainers!
DataMunging
Scripts that clean up OCR and munge Hathi metadata.fictional-time-with-GPT4
An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.noveltmmeta
Code and data supporting "NovelTM Data Sets for English-Language Fiction."paceofchange
Code and data to support the article, "How quickly do literary standards change?"fiction
Project on the history of genre.LIS590DSH
BrowseLDA
R scripts that browse the results of LDAcharacter
Data and code for analyzing language associated with fictional characters.genredistance
Exploring textual and social measures of distance between genres.LDA
A Java package that does basic LDA, without hyperparameter optimization. Folder settings are local. Ymmv.plot
Initial exploratory research on patterns of change across narrative time.horizon
Data and code to support Distant Horizons (University of Chicago Press, 2019).nehuncertainty
Code used in "Broadening Access to Text Analysis by Describing Uncertainty."period-cohort
Code and data for an experiment on the relation between individual change and cohort succession in literary history.bayes-bestsellers
Code and data to support "Bestsellers and Critical Favorites 1850-1949," a paper at CA2017.reviews
Parsing periodical indexes and finding book reviews, 1800-2007.is417
IS 417, Data Science in the Humanities.changepoint
Measuring the scale and significance of changes *in the pace of change* in an auto-correlated multivariate time series.ocreval
Python modules that evaluate OCR quality.badpublicity
A presentation at MLA 2020 in Seattle, "No Such Thing as Bad Publicity: Toward a Distant Reading of Reception."hathimetadata
Metadata for English-language fiction and poetry beyond 1923 in HathiTrust Digital Library.Java-OCR-spellchecking.
Tokenizer
Python scripts for tokenizing text filesmeasureperspective
Code and data to support "Machine Learning and Human Perspective."Parallel-LDA
Java package that partitions a corpus and runs LDA in parallel on itriseandfall
Code and data supporting The Rise and Fall of Genre Differentiation in English-Language Fiction.moments
Data and code to support "Why Is Literary Time Measured in Minutes?"meta2018
A temporary workspace for novelTM metadata reviewed and analyzed in summer 2018.GenreProject
Code and documentation associated with "Understanding Genre in a Collection of a Million Volumes"JDH-scripts
collator
Python scripts for collating HathiTrust page files.pmla-scripts
Data for 1924-2006 pmla model, plus scripts to turn into Gephi network.noise
Data and code for measuring consequences of noise in digital libraries.asymmetry
Research on information-theoretic asymmetries in literary history.avant
Was the avant-garde really ahead of its time?oralarg
Code and results related to oral argument in the Supreme Court. Work in progress: Tonja Jacobi, Matthew Sag, and Ted Underwood.overlappingcategories
Python 3 code for training models in a multilabel environment where classes overlap. Based on code in the fiction repo, but with bug fixes and improvements.Tokenize
folder storing current rulesets, scripts, and metadata for tokenizing / collection buildingpages
Java code for mapping genres at the page level in a large collection. Originally based on pagelevelHMM.20cgenres
Code and data used for page-level mapping of literary genres beyond 1923.roles
Code for a topic modeling variant that allows for character level 'roles' as well as book-level 'themes.'time
Further research on narrative pace.metadatapredictor
Java code that uses existing metadata to train classifiers that then make predictions for cases where metadata is missing / suspected.Love Open Source and this site? Check out how you can help us