Ted Underwood's Projects
Java code that uses existing metadata to train classifiers that then make predictions for cases where metadata is missing / suspected.
Data and code to support "Why Is Literary Time Measured in Minutes?"
Code used in "Broadening Access to Text Analysis by Describing Uncertainty."
Code and data supporting the blog post "Can language models predict the next twist in a story?"
Data and code for measuring consequences of noise in digital libraries.
Code and data supporting "NovelTM Data Sets for English-Language Fiction."
Python modules that evaluate OCR quality.
Code and results related to oral argument in the Supreme Court. Work in progress: Tonja Jacobi, Matthew Sag, and Ted Underwood.
Python 3 code for training models in a multilabel environment where classes overlap. Based on code in the fiction repo, but with bug fixes and improvements.
Code and data to support the article, "How quickly do literary standards change?"
Java code I used to train hidden Markov models on top of page-level classification. Weka is a dependency. Needs refactoring.
Java code for mapping genres at the page level in a large collection. Originally based on pagelevelHMM.
Contains Java code for a page-tagging interface.
Java package that partitions a corpus and runs LDA in parallel on it
Code and data for an experiment on the relation between individual change and cohort succession in literary history.
Initial exploratory research on patterns of change across narrative time.
Data for 1924-2006 pmla model, plus scripts to turn into Gephi network.
Python scripts used to wrangle collection from Hathi, mostly on a cluster.
Parsing periodical indexes and finding book reviews, 1800-2007.
Code and data supporting The Rise and Fall of Genre Differentiation in English-Language Fiction.
Code for a topic modeling variant that allows for character level 'roles' as well as book-level 'themes.'
Further research on narrative pace.
folder storing current rulesets, scripts, and metadata for tokenizing / collection building
Python scripts for tokenizing text files