Giter VIP home page Giter VIP logo

corpus-joyce-portrait-tei's Introduction

corpus-joyce-portrait-TEI

Join the chat at https://gitter.im/corpus-joyce-portrait-TEI/Lobby

This is the open collaborative scholarly edition of James Joyce's novel A Portrait of the Artist as a Young Man–corrected, revised, and richly annotated.

See a short presentation about it here, and test the XSL-transformed prototype here on github.io.

About this Project

We intend to aggregate much of the existing knowledge about the book in one place--textual notes, critical commentary, interpretations, and more. Also, we want to be able to answer questions like these:

  • What proportion of each chapter is a song or a verse stanza? What does this tell us about Joyce's associations of song and youth?
  • What proportion of the novel is prayer? What books of the bible are cited or quoted most often, and why?
  • What other authors are represented the most commons literary allusions?
  • How much Latin is in each chapter, and why? How much French?
  • Which character speaks the most lines of dialog? What are the characteristic words of each speaker?

All of these questions are easily answerable with very simple computational analyses, but these analyses need a machine-readable edition of the novel.

This is possible with TEI XML, a markup format that allows for semantic description of the text. Most existing TEI editions, however, don't use the full semantic features of the language. TEI features an epigraph tag, for instance, but epigraphs are often marked up as quotes, verse lines, or, even worse, italicized text. Those descriptions are all true, but not as useful to computational literary analysis as the epigraph tag.

When we see an epigraph, we mark it up as such. When we see dialog, we mark it as such and indicate who is speaking. When we correct an OCR or typographical error in the text, we document the reasons why. One error that appears in most print editions is the word "green"--Joyce actually meant for it to read "geen," indicating infantile pronunciation of the color word. We correct errors like this by referring to manuscripts.

We also mark up:

  • languages other than English
  • literary allusions
  • songs and prayers
  • locations and personal names

Apart from markup features, one of the distinguishing features of this project is its infrastructure. We leverage distributed version control using git and github, which allows anyone to edit the edition in the browser and submit a request for the change.

##(Planned) Features:

  • Dialogue is marked by speaker, using <said who=""> tags, facilitating speaker-specific dialogue extraction.
  • Verse is indicated by <lg> and <l> tags.
  • Features such as epigraphs marked with their respective tags, e.g. <epigraph>.
  • Editorial decisions such as "green" -> "geen" are noted in <note resp="">
  • Prayers are marked up as <seg type="prayer">, with <lg> tags if it is in verse.
  • Songs are marked up as <seg type="song">
  • Non-English languages are marked up with <seg xml:lang="">.

See the issue tracker for a more complete list.

##An Example Epigraph

<epigraph>
    <cit> 
        <quote xml:lang="lat"><hi rend="italic">Et ignotas animum dimittit in artes.</hi></quote> 
        <bibl>- Ovid, <hi rend="italic">Metamorphoses</hi>, VIII, 188</bibl> 
    </cit> 
</epigraph>

##An Example of language markup:

<p>The word now shone in his brain, clearer and brighter 
than any ivory sawn from the mottled tusks of elephants. 
<hi rend="italic">Ivory, <seg xml:lang="fra">ivoire</seg>, 
<seg xml:lang="ita">avorio</seg>, 
<seg xml:lang="lat">ebur</seg>.</hi> One of the first 
examples that he had learnt in Latin had run: 
<seg xml:lang="lat"><hi rend="italic">India mittit ebur</hi></seg>; 

##How to contribute:

See CONTRIBUTING.md.

corpus-joyce-portrait-tei's People

Contributors

jonathanreeve avatar alanburnett avatar goldieshen avatar gitter-badger avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.