Giter VIP home page Giter VIP logo

Comments (26)

ebeshero avatar ebeshero commented on August 25, 2024

Okay--we can get started now! As a special note of interest, Gabi Kirilloff was a student in a digital humanities course at Pitt like the one you are taking, and she originally wrote <Traversing_the_Tree/> for a seminar paper assignment in another class!

from dhclass-hub.

blawrence719 avatar blawrence719 commented on August 25, 2024

Kirilloff explains that the XML that we are learning in class grew out of SGML (Standard Generalized Markup Language) which was originally used in the 1980's as a way to "facilitate the sharing and storage of large-project documents in law, government, and industry." She explains that it was made to be easy to navigate and allow for more freedom than something like HTML. HTML has pre-set markup tags which do not allow for any interpretation. As we've seen in class, we can create an element for a body of text and name it anything that we see fit. Kirilloff explains the hierarchical pattern that we've seen in class described as the nesting dolls, but Kirilloff describes it as a family tree and provides a diagram. She says that this is convenient for marking up something like a book because you consistently have tags that fit into this tree or nest. These tags include "chapters that contain paragraphs that contain sentences etc."

However, Kirilloff explains that there is an issue with this once a text has overlapping hierarchies. She explains that SGML was originally used to categorize text into one genre and once XML started to be used for literature, it became obvious that "hierarchy came to be dependent on interpretation." Kirilloff says that overlapping hierarchies generally become an issue when dealing with poetry and lines with enjambment. She gives an example of a word cutting off at the end of a line and continuing onto the next line creating something that looks like: eng-ineers. So as you can see, the hierarchy is tangled. Kirilloff explains a way to solve this issue using self-closing tags which we learned about in class. This involves something that she says looks more like this: The foureng-ineers. However, she points out that there are other ways to solve this issue and that some scholars even ignore the issue.

from dhclass-hub.

nlottig94 avatar nlottig94 commented on August 25, 2024

To begin with, by reading Kirilloff's <Traversing_the_Tree/>, I learned that overlapping hierarchies are words or any material in a written work that defy the one logically assumed hierarchy when the work is being coded. I found Kirilloff's examples and explanations of overlapping hierarchies very interesting, and her ideas of how to deal with these overlapping hierarchies helpful.

First, my favorite example Kirilloff uses to describe overlapping hierarchies is Mark Z. Danielewski's House of Leaves. I liked how Kirilloff described how the whole book is an overlapping hierarchy because of how the different people who speak in the book are represented as different fonts within the text. When using this exmaple, Kirilloff states that some believe that you could just put all of the same-font texts together in order to fit everything into the hierarchy. However, Kirilloff herself says that this would take away from the book and how it was meant to be read. It seems to me that she was saying what we learned in class: how we as coders are not to change the arrangement or context of a work, whether fiction or nonfiction; we are there to help describe the work the way the author wanted it portrayed.

Next, her explanation of how the poem "The Unrhymable Word: Orange" creates overlapping hierarchies helped me to realize exactly what overlapping hierarchies are in a basic sense. Also, even though the example of a word being broken up into different lines of a poem (e.g. eng-/ineers) is not a common example, it helped me realize that in many poems, it is common that a sentence is broken up into two lines, which can also cause overlapping hierarchies. Kirilloff's advice to fix this problem is to use empty elements. On one hand, Kirilloff says that this will help to get the idea across that these two lines are a part of the same idea. However, she also says that this just tricks the system; it does not really help describe the text, which is one of the key parts of writing XML code.

from dhclass-hub.

ebeshero avatar ebeshero commented on August 25, 2024

Great work, Brooke and Nicole, with launching the discussion and bringing out some key details in the Kirilloff piece! Let's think about this a little: Kirilloff says that using empty elements "tricks the system" as Nicole puts it--and doesn't quite accurately describe text with overlapping hierarchies. Do you agree with this assessment?

I posted the presentation material from Wendell Piez as an example of markup that can actually handle overlapping hierarchies. Piez produces some fascinating visualizations of overlapping hierarchies in the novel Frankenstein based on a form of code called LMNL that he helped to invent. LMNL isn't as standard as XML, and Piez actually converts documents back and forth between XML and LMNL because he needs the XML format for things like preparing for web presentation. But LMNL helps to visualize the points of intersection and overlap of conflicting hierarchies. From what you're able to see of Piez's LMNL, how does it seem to work?

These are just of couple of places our conversation could go next...but where is everyone else?

from dhclass-hub.

alexthattalks avatar alexthattalks commented on August 25, 2024

Discussion of Articles

I can definitely see either ignoring the issue or writing XML that does not worry about those sorts of issues, as described in the section about “The four eng-/ineers/Wore orange/brassieres.” I may be totally off the wall in saying this, but at which point does the entire thing become too complicated and bogged down with XML. I understand that the goal is to preserve the structure and integrity of the original, however, isn't that what the original scan or image of the page for? If I am encoding the text in XML, it should be useful and helpful, not so much of a hierarchical nightmare that it makes it so difficult to even get anywhere. Which point does the effort put in exceed the helpfulness of the actual code? For example, what is the point in identifying each word within a line? If I am marking up each individual word, the time I am expending into (in some instances) a meaningless effort far exceeds the usefulness of doing the encoding in the first place.

Ideally, I would see the XML as a helpful hierarchy for discerning information in a text. Once it becomes affected by the visual appearance of the text, I would attach the scanned image of the letter, page, chapter etc, and let the reader view the original for full formatting and historical accuracy.

I think it is very possible to write write XML code to avoid some of the overlapping hierarchies they describe. For example, when a chapter is split between two pages, I would organize it by chapter first, rather than by page first, and then create chapter segments/paragraphs with a page number attribute. That would allow the chapters to be broken up across pages, but the information as to the page number would still be captured.

from dhclass-hub.

brookestewart avatar brookestewart commented on August 25, 2024

As @amielnicki mentioned, I too wonder how much of the actual text needs encoded. I'm not sure where the point is that you stop categorizing things.

Piez touches on this topic, saying that "larger elements lose hierarchy when smaller elements become plentiful." This poses a problem when coding something as large as a novel. The smaller elements are sure to be abundant within even a page of a novel, let alone a chapter or the novel as a whole. There would also be instances of overlapping hierarchies, which Kirilloff describes as being a "fundamental problem with the way that hierarchy based methods define a text."

Kirilloff offers a way of "tricking the system" by using empty elements to avoid the problem of overlapping hierarchies. Though she does mention that it's not the best solution for encoding text based documents because it doesn't acknowledge the overlapping hierarchy.

from dhclass-hub.

spadafour avatar spadafour commented on August 25, 2024

I've been trying to figure out how LMNL differs from XML. By encoding strictly through XML, texts that play with the nature of the genre or arrangement of stanzas/lines/etc. lose their meaning in the rigidity of the hierarchical XML language. LMNL, conversely, is able to consider structural changes of a text that is buried or not evident through traditional OHCO. By contrasting the Goethe and Rilke poems, Piez highlights a key structural difference in each: whereas Goethe's poem shows a more classical example of "perfect alignment between meter and grammatical structure," Rilke's poem is unaligned and more playful with the form. The way in which both pieces are formed plays a part in the nature of the text and says something important about each.

On the issue of overlapping hierarchy in XML: from what I've gathered, I can't quite understand the hurdles that a language like LMNL must overcome to become a more widely accepted form of coding. Piez seems to have made a solution using the best of both worlds, combining XML to categorize and distribute and LMNL to do the heavy lifting of the author's intended organization.

Lastly, getting lost in the minutia of markup would, I think, be only more beneficial, offering more and more possibilities for referencing and cross-referencing. The objective is not to make a more readable format, but to create a more relatable one. As long as the coder is able to recognize and maintain his or her cohesive strategy for markup, the text remains infinitely open to more definition (as long as changes remain consistent and relevant).

from dhclass-hub.

nlottig94 avatar nlottig94 commented on August 25, 2024

I agree with @amielnicki about how confusing coding could get if every single little idea were to be code, such as every word or every sentence. However, I think that Kirilloff just uses the example of "The Unrhymable Word: Orange" to make a point as to where these overlapping hierarchies, or coding for that matter, can become a major problem. For example, in class Dr. B. has not ever told us to surround sentences or, even smaller, words in elements with attributes. I think a poem is a sort of different, harder genre to code because poems are so full of interpretation.

I also agree with @spadafour because coding in XML only really helps the coder to help others read different ideas within a text. However, I do think that it is possible that XML could get boggled down with code if someone is not doing it correctly. Which means that this class is really going to help us!!! Also, I disagree with @spadafour because coding does matter to other coders in XML, which is why the coding has to be consistent. If the coding in XML is not consistent, it could really confuse other coders looking for examples on how to code or just to get the inside scoop of the original coder's ideas.

from dhclass-hub.

rmz14 avatar rmz14 commented on August 25, 2024

The intro starts by explaining what digital humanities takes care of. The study of Digital Humanities promotes opportunities for people to ask new questions and integrate computing tools to literary research. This is all necessary because computers don’t have the power to analyze or interpret data. Investigating how digital texts are created is critical to understand their relationships. SGML was first used in the 1980’s and 90’s before XML was introduced. XML does not have predefined set tags so the writer determines its “name”. Names are not read and understood by the computer like humans do. The computer processes the text in the machine code so it processes the information in a structured for with rules.

Next, the hierarchical markup is illustrated by a diagram and expresses it as a family tree of people. Like I previously described, computers cannot interpret information like we can so it is crucial to have an organized, structured, format. HTML has tags pre-set so there isn’t as much independence in your programs. This means XML can give writers easier navigation and more liberty through their programs; however, this can cause trouble with overlapping hierarchies. Overlapping hierarchies can easily occur in poetry and when a paragraph covers multiple pages. These issues should be resolved with using empty elements or self closing tags and organizing the text by page and/or chapter. I found these examples intriguing because is imperative to have a plan to resolve some of these issues since they can be so simple to create.

from dhclass-hub.

blawrence719 avatar blawrence719 commented on August 25, 2024

Before having class on Wednesday, I was confused as to why the overlapping hierarchies was such a big issue. Mostly because I didn't understand why the line and the word both needed tagged, when it seems to obvious to us what these things are. However, our class discussion clarified that when converting our texts, the computer won't know to leave white space in something like a poem and this is why lines need tagged. This helped me get a better understanding of why we tag all of the things that we do and how this will all be used later in this whole process. It helped me put some of the pieces together. As Kirilloff points out though, the computer doesn't understand the names that we are using for elements and so I wonder how we tell the computer how to read our code. This is probably for a later class discussion, though - it just makes me wonder.

from dhclass-hub.

CodyKarch avatar CodyKarch commented on August 25, 2024

To me, I really enjoyed the text by Kirilloff because it was set for a basic level of knowledge on this subject and it was substantially explicative. She provides a good setting upon this history of coding in SGML and XML, then goes on to explain the practicality of these. Furthermore, she poses the thoughts of the literary world in this digital setting; can texts be encoded correctly, or is there even a correct manner of encoding literature? She places examples of how text can be tricky to display for a computer at times --the four engin-eers-- and how syntactically, text encoding is hard to assign in a black/white manner, let alone a grey one like XML. Thus, she finishes with thoughts on the OHCO model for coding; gives its benefits, and drawbacks; and alternatives to it. I liked this part most:

----Another helpful way of thinking about the allure of OHCO might be as follows. For a long time people thought CD players were wonderful. Although CD players could not play videos, this was not recognized as a “problem” or inadequacy because a video playing CD player was unimaginable. Most owners of portable CD players did not stop to question the fundamental design properties of the tool they were using to listen to music. The people who did stop to consider the fundamental design principles of CD players were the engineers making them: people who were intimately familiar with the basic principles behind the creation process. Eventually, the iPod came out and changed the way people viewed their music players. Engineers were able to conceive of a future technological advance because they understood the basic design principles behind it and could determine what was feasible.
Most digital humanists do not fully understand the tools they are using or the principles behind them. Nor are the aware of the history of SGML and XML. They know that, for the most part, the tools work. Humanities scholars, the people who care about the ontological aspects of the markup model, cannot understand how another model might exist without better understanding the current system. Thus, many scholars fall into the trap of unquestionably accepting the OHCO model. Scholars need to be aware of the fact that the tools they are using were not designed for them, and that other options are available. This does not mean that scholars should disavow the current system. However, in order to improve, there needs to be more room for inquiry.----

She is most definitely on the side of advancement and understanding of the basic foundations of the digital scope.

from dhclass-hub.

alexthattalks avatar alexthattalks commented on August 25, 2024

I agree with @CodyKarch in that the selection from the text is very helpful. Being more technologically inclined myself, I am very much the opposite of what she describes - I am the engineer/software person that understand the systems, but doesn't have a full grasp of their use in the real world. Before encountering this class, I knew what XML was and some of it's usefulness in driving the web. I had no idea that it was used for things like the humanities projects we have discussed. To me it is fascinating, because now that I see it, I understand why they would want to use XML to encode literature, letters, etc. but I would not have thought of it on my own as a good use of XML.

In a way, I disagree with @spadafour when they say "As long as the coder is able to recognize and maintain his or her cohesive strategy for markup." This is all well and good, the coder should be able to follow along with their code, but at the same time, we as coders are encoding text that is old or significant to history in some way. We are interpreting the works to encode them so they are easily catalogued and preserved for further research, etc. If only the original coder can follow what is going on in their code, what happens when that coder is no longer around. We are building things in the digital world that will far outlast us and future generations to come. So why make more work for the people that will be reading our markup in the future, by making it so dependent on just us being able to discern our code.

from dhclass-hub.

spadafour avatar spadafour commented on August 25, 2024

Both @nlottig94 and @amielnicki made understandable criticisms about the nature of complex and overwhelming amounts of coding. When I spoke about increasing complexity of markup, I did not mean to refer to the coder as one working under his or her own individual ruleset; that coder, I assume, is working with a team of coders. As a whole, the team makes decisions about the necessity for additional categories and labels for elements. Staying organized and on point with your intentions for mark up is of course good practice. If additional labeling makes sense and builds on the overall goal of your project, why not add it? Consider the work that Piez and his team have done. By combining XML with the team's own language (LMNL), they were able to identify an important and intentional feature of Shelley's Frankenstein (placing the voice of the monster at the center of the piece). This process, while complex and unorthodox, was able to reveal a crucial element of the novel. While their coding is unable to be reproduced by digital means (which is what I think Dr. B relayed in class), it is (to me) still valuable and serves a purpose. Would someone else consider that needless, focusing only on more concrete references for text? I have a feeling that this might be a fundemental issue at the heart of digital humanities: what, really, is the inherent purpose of our work?

from dhclass-hub.

KariWomack avatar KariWomack commented on August 25, 2024

I think, @spadafour, that one of many possible answers to your question, is that the purpose of our work is to, in a way, grant immortality to information which would otherwise die out or disappear within a few years, much like what @amielnicki had commented, whether it would be through encoding overwhelming amounts of old text, online storing personal/business information, or creating new ways of digitizing our own history through multimedia or ways of the like. I am very curious as to some other thoughts as well: WHY ARE WE DOING THIS?
I think as well, that this is very much a practical learning practice, whether we are actually discussing encoding as a profession, or as a way of better understanding information since it requires the ability to read an amount of text and make connections by understanding the relationships between the words. Know what I mean?

from dhclass-hub.

ghbondar avatar ghbondar commented on August 25, 2024

DH markup occurs at different scales for different purposes. While some projects are intended to produce definitive editions of texts, and hence be around "forever" (more than 5 years in internet lives), other projects might be intended only to answer a specific and immediate question about a text or texts: once the answer is known, the marked-up text, itself, might no longer be of importance. However, when managing a team of coders, as on the Mitford project (http://digitalmitford.org), having a standardized and intuitive mark-up system is essential, both for the mark-up process, and for later peer review. The use of schemas, such as Relax NG (in the reading for tomorrow's class), and larger-scale systems of mark-up (such as what we'll see from the TEI - Text Encoding Initiative http://tei.org) are ways of controlling and systematizing the xml mark-up we apply to our texts.

from dhclass-hub.

ebeshero avatar ebeshero commented on August 25, 2024

Aha! @KariWomack, @amielnicki, and @spadafour are pulling this discussion into a very interesting territory: the question of how to write code that might outlast the short term, self-contained goals of a project. Wendell Piez worked out LMNL to help model overlapping hierarchies, and as @spadafour rightly notes, that was an inventive move to try something that he is well aware Most Coders won't try. He was right to make that code and build from it, but he also translates his code into the better known, widely shared forms of XML and TEI. In a way, Piez invents "outside the box" (or nested dolls or tree), but then shows that he can make XML out of that code for when he needs to work within the hierarchies. @spadafour, I should clarify that Piez's code is something that does work to describe and model a text as code in its own right (readable to us AS a text). LMNL is a kind of scholarly editing: We can read his code AS a model of Frankenstein's weird overlapping structure! But he transforms LMNL into XML when he needs the tree structure: and usually that is for when he needs tools designed for XML, including representation on the web.

Piez understands, I think, the benefit of working in XML (and TEI) to connect with a community, AND to pioneer new methods on his own. (It's a balance I really admire in his work!)

from dhclass-hub.

ebeshero avatar ebeshero commented on August 25, 2024

Some of you may find Wendell's homepage of interest: Check out what he does here: http://www.wendellpiez.com/

from dhclass-hub.

KariWomack avatar KariWomack commented on August 25, 2024

I know that in class, we discussed the use of self-closing tags and attributes as a way of avoiding or solving the problem of overlapping hierarchies like Kiriloff mentions, and I think that this may be one of the simpler and slightly less complex methods of dealing with this problem, but I propose that another way of doing this, I you are especially comfortable with tags which are not self-closing, is to try and puzzle your way through rearranging attributes so that the hierarchies no longer overlap, although I recognize that this method does not always work and can be particularly more stressful.

from dhclass-hub.

ghbondar avatar ghbondar commented on August 25, 2024

Hmmm... as attributes are contained within elements, some clarification would help @KariWomack . Can you please sketch out an example, perhaps? Two consecutive elements containing the same attributes might do it, probably with some indication of sequence as well.

from dhclass-hub.

brookestewart avatar brookestewart commented on August 25, 2024

Before class on Wednesday, I wouldn't have considered making the line element the self-closing tag instead of the word itself. To me, this makes much more sense than using the self-closing tag around the split word (or any other similar situation). Using the self-closing tag at the end of the line looks a lot less complicated and I believe it would leave less room for error. If you use the self-closing tag before or after the split word, you have to be consistent with where you place it. I feel that, with everything else going on, it would just add another thing to remember and get jumbled in with all the other elements. If you use a self-closing element at the end of the line, it's more organized and concise without being a burden. This is just my opinion, but it definitely helped to see another way of doing it in class.

from dhclass-hub.

alexthattalks avatar alexthattalks commented on August 25, 2024

@ebeshero don't get me wrong, I too think Piez's LMNL code is great out of the box thinking, and it is backed up by his extensive documentation and use of it. I am merely taking about people that are doing projects that may not document properly or are very cryptic with their coding structure for their projects purposes, that would otherwise be of great help to another project working on the same material, for different purposes, if they could only figure the code out. The projects, whether of small or large scope, should transcend the immediate purpose and work towards a larger goal. Even if you are only doing something for a simple purpose, you are taking the time to do something, so why not go the extra mile and make it so someone else could adapt your work for further development.

from dhclass-hub.

ebeshero avatar ebeshero commented on August 25, 2024

@amielnicki Point well taken: Documentation may be key to posterity for a coding project!

from dhclass-hub.

nlottig94 avatar nlottig94 commented on August 25, 2024

I don't know about anyone else, but after reading the instructions for how to make a Relax NG, I found the use of self-closing tags much more useful. I was a little confused during class about where I could put these self-closing elements, but having everything almost spelled out in the Relax NG reading helped me figure out that you could pretty much use these elements anywhere that you wanted to signal a certain idea that didn't particularly need an element or attribute. I see self-closing elements almost like an emphasis on where certain words, lines, etc. begin and end.

from dhclass-hub.

ghbondar avatar ghbondar commented on August 25, 2024

As we get into other applications of the xml mark-up @brookestewart @nlottig94 we'll see how the presence of self-closing tags can be used to extract text as if there were a text node defined (which, of course, there isn't with a self-closing tag)

from dhclass-hub.

wendellpiez avatar wendellpiez commented on August 25, 2024

Many thanks to everyone for the kind words and to Elisa for having me on. :-)

The only thing I'd hasten to add to what has been said is that LMNL is an experiment, while XML and its associated technologies (which go well beyond what has ever been attempted for LMNL) are an entire standards ecosystem. Among other things, what this means is that there is no toolset to handle LMNL the way there is to handle XML. This is one reason I use XML -- I couldn't replace it if I tried and it does everything I need. (Especially since I also know workarounds to its limitations.) The only toolset there is for LMNL is what I have cobbled together. This doesn't mean it's not fun: it's fun in much the sort of way an automobile you built in your garage could be fun -- not as good as the one in the showroom (and maybe not even really safe to drive in), but fun.

But we didn't create LMNL because we wanted to be "merely practical"; in many ways LMNL is designed specifically as a radically different approach to markup and modeling than XML, where you have to make early commitments to markup structures, or things start to get painful. (XML is great because you get to name things yourself. XML is terrible, since having defined the rules of naming and combining things, you have to live with the rules you have made.) This difference is most acutely felt when one is messing with early concepts for a design. As long as we're just playing around with the tagging, permitting overlap is really helpful. Start tags and end tags can go more or less anywhere, and we can process early and then fix things. It's strange, making for a much more relaxed approach to development. Instead of performing a comprehensive document analysis up front, with LMNL one can improvise as one proceeds.

It's probably true that in practice, an XML practitioner can almost always make a good judgement regarding the most "natural" structures of a text (that is, what should be the primary structural hierarchy), at which point you get more power for the level of effort with XML. LMNL remains interesting when the ranges of interest are known to overlap. But most projects don't have these problems, or can afford to ignore or postpone them. For them, I recommend XML.

So the differences between XML and LMNL are not so much in what they make possible, but what sorts of processes are easy and straightforward to conceptualize and execute (and to do so efficiently of time and resources, if a data set is large). The technology becomes a kind of "lens" through which we see the text or the problem of the text (whatever we wish to represent about the text for whatever purpose). Yet the purpose of LMNL was not so much to offer an alternative to XML as test out an alternative way of using markup (tags or code embedded into texts) to explore issues that happened to be "humanistic" -- including issues reflecting my own interest in literary and narrative forms. It then turned out that the tooling I made for LMNL (specifically, the document-visualization stuff) actually revealed something interesting about Frankenstein, the novel, well, so much the better ...

from dhclass-hub.

ebeshero avatar ebeshero commented on August 25, 2024

@wendellpiez Thanks for responding, Wendell! I like your idea that we might think of our code as a "lens": we see a text differently based on what we decide to mark. We're experimenting this weekend with TEI encoding, after we've been coding and rolling our own XML and schemas for the first few weeks. Working in the big TEI consortium's rule set and consulting the guidelines is its own experience of constraint and hierarchy, but also offers some new possibilities we might not have come up with on our own--one can learn a lot by peering through the TEI's collection of lenses. Now that we're working within a constrained (if pretty enormous) system, it's interesting to think of things we could model differently by making our own lenses. We didn't get much of a chance at the start of our class to look closely at LMNL markup to see how it works, but it would be neat to take a look at how it helps us to look at overlaps, and how it helps visualize documents.

from dhclass-hub.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.