Giter VIP home page Giter VIP logo

annals-of-cleveland's Introduction

Parsing the index and digest structures for a volume of the 1930s WPA project Annals of Cleveland, and presenting them in a Hugo site. See https://www.wallandbinkley.com/projects/2019/annals-of-cleveland/ for the current output.

To run

  • Clone this repository
  • Install the Hugo theme "techdoc", which is in a Git submodule: git submodule update
  • Save the full text from HathiTrust into the source directory as 1864.html. You will need to be logged in to HathiTrust; otherwise this file will not contain the full text of the volume.
  • Install hugo following instructions at gohugo.io
  • Install ruby dependencies: bundle install
  • Run ./process.rb ./source/1864/1864-corrected.html - this parses the source file and populates the hugo/content and hugo/data directories```
  • Start hugo: cd to the hugo directory and run hugo serve -D
  • Visit the local site at http://localhost:1313/projects/2019/annals-of-cleveland

To Do

  • improve the regex in lib/abstract.rb to handle more ocr variants
  • extend the regex and the hugo output to handle multi-column references
  • learn more about hugo and improve the implementation
  • etc. etc.

Notes

Annals of Cleveland 1864

vol. 47 pt. 1 (1937)

https://babel.hathitrust.org/cgi/pt?id=iau.31858046133199&view=1up&seq=7

  • TOC: image 11
  • Classification Lists: image 17-23
  • Abstracts: p.1-361
  • Chronological Index: pp. 363-376
  • Index: 377-444

Newspapers:

L: Cleveland Leader https://chroniclingamerica.loc.gov/lccn/sn83035143/

Markup

Mark the sections of the volumes to show where they begin and end.

  • #START_CLASSIFICATION, #END_CLASSIFICATION
  • #START_ABSTRACTS, #END_ABSTRACTS
  • #START_CHRON, #END_CHRON
  • #START_TERMS, #END_TERMS

annals-of-cleveland's People

Contributors

dependabot[bot] avatar pbinkley avatar

Watchers

 avatar  avatar  avatar

annals-of-cleveland's Issues

Handle See and See Also links

See

  • Heading level:

    "DESERTIONS, MILITARY. See Wars - Civil War"

    • note that "Wars - Civil War" has secondary subheading "(Desertions)"
  • Abstract level:

    • this is under heading "Welfare":

      L June 15:4/2 - See Schools and Seminaries

    • under "Schools and Seminaries" we find:

      1804 - L June 15:4/2 - Teachers and scholars on the west side will give
      an exhibition for the relief of soldiers' families, at the armory on the
      corner of Franklin and Pearl sts. tonight. Dramatic and musical perfor-
      mances offer a variety of talent that will please everyone. (3)

See also

  • Heading level

    • at the end of a heading:

      See also Iron & Steel - Labor; Labor Unions; Newspaper - Labor

    • need to handle subheadings

    • note generic references, which aren't linkable: "See also names of animals". Can we identify these by the lower-case letter? Or simply don't link terms that aren't in the terms list?

  • Term level

    • e.g. Abstract 2416: "John Connolly, alias John Campbell of Michigan st. ..." - in index, has "Connolly, John, See also Campbell, John, 2416"; "Campbell, John, 2416" - doesn't add any new abstracts

    • case where new abstracts are added by following the see also:

      Oakley, H. T., 2294. See also Oakley, T. H.
      Oakley, T. H., 1619. See also Oakley, H. T.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.