Giter VIP home page Giter VIP logo

digitue-gt's Introduction

DigiTue Ground Truth

This repository contains transcriptions for digitized books and journals of the University Library of Tübingen (http://idb.ub.uni-tuebingen.de/digitue/).

The transcriptions were done with eScriptorium, a transcription platform developed as part of the Scripta and RESILIENCE projects (https://gitlab.com/scripta/escriptorium/).

Get the related images in JPEG format using this script:

for xml in $(find Theo Tue VD18 -name "*.xml"); do (cd $(dirname $xml); page=$(basename $xml .xml); base=$(echo $page|sed 's/_[0-9]*$//'); test -f $page.jpg || (echo $page; curl --silent -Lo $page.jpg https://opendigi.ub.uni-tuebingen.de/opendigi/image/$base/$page.jp2/full/full/0/default.jpg)); done

digitue-gt's People

Contributors

stweil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

tboenig

digitue-gt's Issues

Page Error in Allgemeine_kirchliche_Zeitung

Hallo,
I have parsed/validated the folder 'Allgemeine_kirchliche_Zeitung' and found the following errors:

System-ID: \DTGT\Data\data_line\Allgemeine_kirchliche_Zeitung\1860\page\16_a23d4_default.xml
Schema: http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd
Programmname: Xerces
Fehlerlevel: error
Beschreibung: Value '718,113 707,-2 765,-4 776,108 776,117 718,126' is not facet-valid with respect to pattern '([0-9]+,[0-9]+ )+([0-9]+,[0-9]+)' for type 'PointsType'.
Anfang: 12:73
URL: http://www.w3.org/TR/xmlschema-2/#cvc-pattern-valid

System-ID: \DTGT\Data\data_line\Allgemeine_kirchliche_Zeitung\1860\page\16_a23d4_default.xml
Schema: http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd
Programmname: Xerces
Fehlerlevel: error
Beschreibung: The value '718,113 707,-2 765,-4 776,108 776,117 718,126' of attribute 'points' on element 'Coords' is not valid with respect to its type, 'PointsType'.
Anfang: 12:24
Ende: 12:71
URL: http://www.w3.org/TR/xmlschema-1/#cvc-attribute

System-ID: \DTGT\Data\data_line\Allgemeine_kirchliche_Zeitung\1860\page\18_1f153_default.xml
Schema: http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd
Programmname: Xerces
Fehlerlevel: error
Beschreibung: Invalid content was found starting with element '{"http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":TextLine}'. One of '{"http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":AlternativeImage, "http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":Coords}' is expected.
Anfang: 317:8
Ende: 317:16
URL: http://www.w3.org/TR/xmlschema-1/#cvc-complex-type

System-ID: \DTGT\Data\data_line\Allgemeine_kirchliche_Zeitung\1860\page\20_4e68f_default.xml
Schema: http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd
Programmname: Xerces
Fehlerlevel: error
Beschreibung: Invalid content was found starting with element '{"http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":TextLine}'. One of '{"http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":AlternativeImage, "http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":Coords}' is expected.
Anfang: 309:8
Ende: 309:16
URL: http://www.w3.org/TR/xmlschema-1/#cvc-complex-type

System-ID: \DTGT\Data\data_line\Allgemeine_kirchliche_Zeitung\1860\page\21_da7a2_default.xml
Schema: http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd
Programmname: Xerces
Fehlerlevel: error
Beschreibung: Invalid content was found starting with element '{"http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":TextLine}'. One of '{"http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":AlternativeImage, "http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":Coords}' is expected.
Anfang: 310:8
Ende: 310:16
URL: http://www.w3.org/TR/xmlschema-1/#cvc-complex-type

System-ID: \DTGT\Data\data_line\Allgemeine_kirchliche_Zeitung\1860\page\26_63d1f_default.xml
Schema: http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd
Programmname: Xerces
Fehlerlevel: error
Beschreibung: Invalid content was found starting with element '{"http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":TextLine}'. One of '{"http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":AlternativeImage, "http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":Coords}' is expected.
Anfang: 310:8
Ende: 310:16
URL: http://www.w3.org/TR/xmlschema-1/#cvc-complex-type

System-ID: \DTGT\Data\data_line\Allgemeine_kirchliche_Zeitung\1860\page\29_557de_default.xml
Schema: http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd
Programmname: Xerces
Fehlerlevel: error
Beschreibung: Invalid content was found starting with element '{"http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":TextLine}'. One of '{"http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":AlternativeImage, "http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":Coords}' is expected.
Anfang: 317:8
Ende: 317:16
URL: http://www.w3.org/TR/xmlschema-1/#cvc-complex-type

System-ID: \DTGT\Data\data_line\Allgemeine_kirchliche_Zeitung\1860\page\2_aa780_default.xml
Schema: http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd
Programmname: Xerces
Fehlerlevel: error
Beschreibung: Value '628,132 628,-4 889,-4 889,132 889,150 746,160 744,160 744,160 626,145' is not facet-valid with respect to pattern '([0-9]+,[0-9]+ )+([0-9]+,[0-9]+)' for type 'PointsType'.
Anfang: 12:97
URL: http://www.w3.org/TR/xmlschema-2/#cvc-pattern-valid

System-ID: \DTGT\Data\data_line\Allgemeine_kirchliche_Zeitung\1860\page\2_aa780_default.xml
Schema: http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd
Programmname: Xerces
Fehlerlevel: error
Beschreibung: The value '628,132 628,-4 889,-4 889,132 889,150 746,160 744,160 744,160 626,145' of attribute 'points' on element 'Coords' is not valid with respect to its type, 'PointsType'.
Anfang: 12:24
Ende: 12:95
URL: http://www.w3.org/TR/xmlschema-1/#cvc-attribute

System-ID: \DTGT\Data\data_line\Allgemeine_kirchliche_Zeitung\1860\page\3_b28d4_default.xml
Schema: http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd
Programmname: Xerces
Fehlerlevel: error
Beschreibung: Invalid content was found starting with element '{"http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":TextLine}'. One of '{"http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":AlternativeImage, "http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15":Coords}' is expected.
Anfang: 310:8
Ende: 310:16
URL: http://www.w3.org/TR/xmlschema-1/#cvc-complex-type

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.