Giter VIP home page Giter VIP logo

scribe's Introduction

This tool was created by The Baki Project, a Newbook Digital Texts in the Humanities project. Researchers, primarily from the University of Washington (UW), Turkish transcriptionists, Ottomanist scholars, and a UW iSchool Informatics capstone team worked together to produce Scribe, a tool designed to mitigate the faith gap between Arabic and Latin-based Turkish manuscripts.

Information about the tool and its purpose can be found on the Wiki

Below you will find a Project Overview, a List of Contents, a summary of the major technology decisions made related to the Technology Stack, and Contact Information.

Project Overview

The Baki Project began with the fall of the Ottoman Empire. In the years following written language in Turkey shifted from Ottoman Arabic script to a Latin-based Turkish alphabet. This language transformation occurred rapidly, and consequently, some texts were transcribed incorrectly or called for subjective editorial decisions. These discrepancies resulted in inaccurate and incongruent transcriptions, which have been detrimental to the preservation of Ottoman history and culture.

Transcription is a primitive and tedious task, as most transcribers perform their work manually, using pen and paper or simple text editing programs. No digital tools exist exclusively for Turkish manuscript transcription, so researchers do a considerable amount of work by hand just to verify the integrity of these documents. Digital tools created to facilitate this process could dramatically reduce overhead needed to study Turkish literature.

Scribe, a digital tool created by The Baki Project, mitigates the faith gap between Arabic and Latin-based Turkish manuscripts by standardizing the transcription workflow. Transcribers can dynamically verify the integrity of these manuscripts and transcriptions because Scribe facilitates the comparison of original Ottoman text with Latin Turkish. Scribe maintains the authenticity of primary source texts in order to preserve Ottoman tradition and literature.

The core feature of this tool is that every transcription decision made is captured. This decision is recorded by a 'disambiguation rule'. Each Latin character typed has an entity attached to it, so there is no ambiguity and each transcription decision made is conscious and purposeful, eliminating the subjectivity that is introduced into these texts during transcription.

The tool currently, from left to right, has a Latin-based Turkish input box, an image viewer, and an Arabic output box. Below the Latin input box is a dropdown menu where the user can select whether or not they are typing on an English or Turkish keyboard. The dropdown options change based on the characters the keyboard has available. The transcriber can upload a manuscript image of their choosing in the middle panel so they can reference the original manuscript without turning away from the screen. As the user types Latin-based Turkish, the corresponding Ottoman Arabic updates dynamically. The Arabic output is populated with the option selected in the dropdown.

This tool is a work in progress, and future iterations will focus on building out the collaborative features of the tool, such as authentication, a project management system, and integrating with The Baki Project's manuscript database.

Next Steps:

  • implement comment functionality
  • continue usability testing
  • add the ability to save transcriptions to a database
  • refine UI
  • build project management structure
  • capture transcription metadata
  • display a read-only view of The Baki Project's manuscript database for users who are not part of the project

List of Contents

  • Baki-RTT/
    • build/
      • static/
      • asset-manifest.json
      • favicon.ico
      • index.html
    • node_modules/
    • public/
      • index.html
      • favicon.ico
    • src/
      • index.css
      • index.js
      • routes.js
      • assets/
        • disambiguationRules.js
          • This outlines the unique transcription rules for both English and Turkish keyboards
        • logo.svg
      • client/
        • components/
          • AmbiguousCharacter.js -This renders to the DOM a character that the user has not assigned metadata to yet. When the character displayed in this way, it will be red. They user must choose an option from the dropdown menu to indicate which exact character they want to type.
          • Comment.js
            • An individual comment. It will contain the text that has been associated with a particular selection of text
          • CommentPopup.js
            • This will be the comment input/display that will show when the user clicks to add or view a comment
          • DisambiguatedCharacter.js
            • This is a character for which a metadata decision has been made
          • DisplayComment.js
            • This is an alternative method for displaying the comment if a user clicks to view
          • Draft-js-gutter.js
            • This component is a Draftjs plugin. It exports a component which takes in options from the user and adds line numbers to the editor. It will wrap the generic Draftjs Editor component in an "EditorGutter" which maps all of the content blocks (each line in this case) to a number.
          • DropDown.js
            • An individual dropdown. This is what contains the disambiguation options the user is prompted with as the type.
          • ImageUpload.js
            • A component to let the user temporarily upload an image. It is meant to be used to compare original manuscripts with the generated translation. It also stores the image in the temporary component's state so that we can include it in our database in future iterations.
          • InputBox.js
            • This component creates the transcription input box as a whole. It calls the draft-js-gutter component and passes style preferences as props so that the plugin can generate the appropriate draft Editor component.
          • OutputBox.js
            • This displays each content block in Arabic for the user. They can compare our generated output to original manuscripts.
          • Transcribe.js
            • This is a little like a parent component. It calls (ands to the DOM) the inputbos, imageUpload, and output box components. Taken together, this makes up the functionality of our tool.
          • TranscribeFooterTools.js
            • This is a component for next steps. It will generate a footer with options for the user to save, download, publish, and so on.
        • styles/
          • App.css
          • COmmentPopup.css
          • Draft-js-gutter.css
          • Dropdown.css
          • ImageUpload.css
          • InputBox.css
          • OutputBox.css
          • Transcribe.css
          • TranscribeFooterTools.css
        • utils/
          • bufferComboSearch.js -Searches the current character buffer for string matches of mutli-character combinations in the ruleset
          • findWithRegex.js -Uses regular expressions to find ambiguous characters in the input area using the ruleset
          • generateDraftStateObject.js -Helper function that returns an object containing information related to the current EditorState, including current selection, content blocks, etc.
          • getCurrentWordBuffer.js -Returns the current buffer of characters that follow the nearest space to the karet
          • getRelativeParentElement.js -given an element on the DOM, this function returns the nearest relative parent element.
          • groupByKey.js -Set of helper functions that create groupings of disambiguation rules that eventually go to the Dropdown component
          • selectionStateHelpers.js -utility functions that adjust the currrent selection using given parameters
      • routes/
        • About.js
        • App.js
        • NotFound.js
      • server/
        • app.js
        • index.js
      • tests/

Technology Stack

The technologies involved in Scribe can be broken up into three layers: client-facing, server-side, and database. Our client-facing layer uses React JS, which is a framework made to display data and create interfaces on web pages. The server-side layer is written in Express, which provides user authentication and data retrieval functionality. We store our application data in a non-relational database that uses MongoDB as the driver. These technologies are used because they are widely accepted, relevant, and frequently updated. They will persist with support for years to come.

  • Application Hosting / Compute instances: Heroku Hobby/Standard Tier: $7-25/ month
  • Domain hosting: UW Servers, hosted by the Near Eastern Languages and Civilization (NELC) department
  • Client View layer: React JS - React makes writing and re-using UI components for the browser easy. There are plenty of resources at developers’ disposal.
  • Server Middleware: Express JS - Express is a fast, unopinionated server solution that is one of the most popular Node-server solutions today.
  • Authentication: Auth0 - Auth0 is an industry leader when it comes to authentication frameworks, and they have simple plugins for using their auth on ExpressJS.
  • Database for Document Storage: MongoDB - MongoDB’s NoSQL, JSON document-based storage matches the structure of our working application data. Using MongoDB allows us to work with native JSON.

Learn more about the technology stack here: https://github.com/Baki-Projesi/Baki-RTT/wiki

Contact Information

Sarah Ketchley - [email protected]

scribe's People

Contributors

bradholland84 avatar ketchley avatar labkey-bradh avatar ndietzler avatar rutvimpatel avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

nickv23

scribe's Issues

Red letter/missing from Arabic version

It doesn't happen all the time, but when we change a letter, it looks red and does not appear in the Arabic version. If we use the enter key after this letter, the next one disappears. If we use "shift" key after this letter, it will appear in the AO. What is the reason for this?

Combination entered is not recognized

The combinations "eyi" and "ī-yi" are not recognized by the tool because "yi" is also a valid combination.

If the combination of "yi" is removed, "eyi" and "ī-yi" show up in the dropdown.

Solution: If "eyi" is typed, show the "yi" and "eyi" combinations in the dropdown.

Marginalia

Capture marginal notes:

numbers
comments
leading words
poems
related texts

how to capture these? Develop categories or keep the comment box general for now?

marginalia in other TEI projects - zonal relationships

define classes of marginalia.

Next Steps Questions - 12/9/18 (in progress)

Will we be having another conference anytime soon? Maybe we could present some prototypes and have transcribers run through how they might use things?

Search

  • What is the ideal (and non-ideal) workflow?
  • Do we want to search by Latin or Arabic? Both?
  • Do we want to search by date?
  • Do we want the database to be a separate application from the tool?
  • How should a user establish relationships between poems? (Centrally-managed tags, user-defined groupings/families, etc.)
  • Do we want user defined families? What might some families be?
  • Can the user manipulate families?

Research

  • How does a user publish / share their findings?
  • What is the 'grouping' or 'tagging' workflow?
  • Who is allowed to do research, and at what level?

Notes/Marginalia

  • We'll need a list of tags/things we want to sort by
  • Will that list need to be flexible?

General

  • Are they starting from scratch each time? Or loading something up? and editing?
  • Do we need the ability to work completely offline (For database purposes)?
  • What is a "Collection" exactly and do we want to identify those in our tool?
  • Will we want to expand this to other authors? Will search/metadata criteria be different for other poets?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.