Benjamin Schmidt's Projects
The most recent iteration of the Nara A_Files project
Scripts for working with the American Community Survey
Deposit of crowdsourced W3 annotations of computer advertisements
annonatate
Automatically insert archival photos into your notes using git for text and exif metadata for photos.
Query processing and transformation of array-backed data tables.
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
Interactive visualization of most popular baby names by state from 1910-2015
Bookworm using the Naval Documents of the Barbary Wars
A bare-bones, extremely extensible blog using pandoc and svelte-kit
Development for baseball databank, an Open Data collection of historical baseball data
Ben's Ngrams parsing scripts
Connector to the Wordnik API
18 million books from the hathi trust scrunched into 1280 bits apiece, for your nonconsumptive reading pleasure.
Documentation effort for the BookCorpus dataset
Test the size of random sample deviations in a Bookworm
A docker-compose stack for bookworm
React Apps for interactive charts on top of the Bookworm-Vega package
Deploying gensim models on a Bookworm backend with metadata
Geolocation from geonames.org specifically adapted to the needs of library catalog metadata
Geotagging bookworm extension based on Stanford NER
Bookworm Mallet integration
Scrolling narratives for the Bookworm D3 library
Create a Bookworm from the Open Library
Extension to Bookworm that splits it up into a bunch of (roughly) equal-sized chunks for testing
Sentiment Analysis integration for bookworm libraries
A Hakyll-powered static blog template for sharing bookworm results.
BookwormD3