There are no reviews yet. Be the first to send feedback to the community and the maintainers!
DataMunging
Scripts that clean up OCR and munge Hathi metadata.fictional-time-with-GPT4
An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.noveltmmeta
Code and data supporting "NovelTM Data Sets for English-Language Fiction."paceofchange
Code and data to support the article, "How quickly do literary standards change?"fiction
Project on the history of genre.LIS590DSH
BrowseLDA
R scripts that browse the results of LDAcharacter
Data and code for analyzing language associated with fictional characters.genredistance
Exploring textual and social measures of distance between genres.LDA
A Java package that does basic LDA, without hyperparameter optimization. Folder settings are local. Ymmv.plot
Initial exploratory research on patterns of change across narrative time.genre
Code for Understanding Genre in a Collection of a Million Volumes.horizon
Data and code to support Distant Horizons (University of Chicago Press, 2019).nehuncertainty
Code used in "Broadening Access to Text Analysis by Describing Uncertainty."period-cohort
Code and data for an experiment on the relation between individual change and cohort succession in literary history.bayes-bestsellers
Code and data to support "Bestsellers and Critical Favorites 1850-1949," a paper at CA2017.reviews
Parsing periodical indexes and finding book reviews, 1800-2007.is417
IS 417, Data Science in the Humanities.changepoint
Measuring the scale and significance of changes *in the pace of change* in an auto-correlated multivariate time series.ocreval
Python modules that evaluate OCR quality.badpublicity
A presentation at MLA 2020 in Seattle, "No Such Thing as Bad Publicity: Toward a Distant Reading of Reception."hathimetadata
Metadata for English-language fiction and poetry beyond 1923 in HathiTrust Digital Library.Java-OCR-spellchecking.
Tokenizer
Python scripts for tokenizing text filesmeasureperspective
Code and data to support "Machine Learning and Human Perspective."Parallel-LDA
Java package that partitions a corpus and runs LDA in parallel on itriseandfall
Code and data supporting The Rise and Fall of Genre Differentiation in English-Language Fiction.moments
Data and code to support "Why Is Literary Time Measured in Minutes?"meta2018
A temporary workspace for novelTM metadata reviewed and analyzed in summer 2018.GenreProject
Code and documentation associated with "Understanding Genre in a Collection of a Million Volumes"collator
Python scripts for collating HathiTrust page files.pmla-scripts
Data for 1924-2006 pmla model, plus scripts to turn into Gephi network.noise
Data and code for measuring consequences of noise in digital libraries.asymmetry
Research on information-theoretic asymmetries in literary history.avant
Was the avant-garde really ahead of its time?oralarg
Code and results related to oral argument in the Supreme Court. Work in progress: Tonja Jacobi, Matthew Sag, and Ted Underwood.overlappingcategories
Python 3 code for training models in a multilabel environment where classes overlap. Based on code in the fiction repo, but with bug fixes and improvements.Tokenize
folder storing current rulesets, scripts, and metadata for tokenizing / collection buildingpages
Java code for mapping genres at the page level in a large collection. Originally based on pagelevelHMM.20cgenres
Code and data used for page-level mapping of literary genres beyond 1923.roles
Code for a topic modeling variant that allows for character level 'roles' as well as book-level 'themes.'time
Further research on narrative pace.metadatapredictor
Java code that uses existing metadata to train classifiers that then make predictions for cases where metadata is missing / suspected.Love Open Source and this site? Check out how you can help us