Giter VIP home page Giter VIP logo

aspyre-gt's Introduction

MIT License Version

ASPYRE GT

A converter to help making your data compatible for import in eScriptorium.

Mascot Aspyre

SUMMARY

  1. How to use Aspyre
  2. Configuring the export from Transkribus
  3. Reporting Errors
  4. Wiki

How to use Aspyre

As a library

Aspyre is a library. To install it, simply download aspyrelib/ and make sure to install the dependencies! Use from aspyrelib import aspyre to import it in your program.

Parsing parameters with aspyre.AspyreArgs()

Start your project parsing all the required information with AspyreArgs() objects.

Process essential information to run Aspyre
          :param scenario: keyword describing the scenario (string)
          :param source: path to source file (string)
    [opt] :param destination: path to output (string)
    [opt] :param talkative: activate a few print commands (bool)
    [opt] :param vpadding: value to add to VPOS attr. in String nodes (int)

supported values for scenario: "tkb", "pdfalto", "limb"

vpadding is only used in PDFALTO and LIMB scenarios

Transkribus to eScriptorium scenario with aspyre.TkbToEs()

⚠️ really not the best way to transfer data between these two softwares.

Run Transkribus to eScriptorium (mainly resolve schema declaration, source image information).

Handle a Transkribus to eScriptorium transformation scenario
        :param args: essential information to run transformation scenario (AspyreArgs)
PDFALTO to eScriptorium scenario with aspyre.PdfaltoToEs()

Run PDFALTO to eScriptorium scenario (mainly resolve schema declaration, source image information and homothety)

Handle a PDFALTO to eScriptorium transformation scenario
        :param args: essential information to run transformation scenario (AspyreArgs)

As a CLI

A legacy script (run.py) from earlier stage enables you to use Aspyre as a CLI fairly easily.

Step by step (Transkribus scenario)

  • Export the transcriptions and the images from Transkribus; you now have a zip file
  • Create a virtual environment based on Python 3 and install dependencies (cf. requirements.txt)
  • Run aspyre/run.py (python3 aspyre/run.py) with the fitting options
  • See the CLI's options with --help* (python3 aspyre/run.py --help)
  • Aspyre will create a new ZIP that can be loaded onto eScriptorium

Example

$ virtualenv venv -p python3
$ source venv/bin/activate
(venv)$ pip install -r requirements.txt 
(venv)$ python3 aspyre/run.py -i /path/to/exported/documents

As a service online

This is no longer an option, following Heroku's decision in 2021 to stop supporting free hosting services.

You can now access Aspyre as a service online (GUI)! ➡️ go to Aspyre GUI

Step by step (Transkribus scenario)

  • Export the transcriptions and the images from Transkribus; you now have a zip file
  • If your archive weighs more than 500 MB, remove the images from the zip file (unzip the archive and rezip it keeping only the alto/ directory and the 'mets.xml' file)
  • Load the zip file onto the application and download the returned zip file
  • You can now directly load this new ZIP onto eScriptorium

Configuring the export from Transkribus

Export your data checking the “Transkribus Document” format option and checking the “Export ALTO” and “Export Image” sub-options.

Transkribus Export Parameters

Which input from PDFALTO?

Contenu minimum:

dossier(.zip)/
    - out/
        - identifiant.xml_data/
            - image-1.png
        - identifiant.xml

Pour le moment les archives tar.gz ne sont pas supportées. Seules les archives zip le sont.


Reporting Errors

If you notice unexpected errors or bugs or if you wish to add more complexity to the way Aspyre transforms the ALTO XML files, please create an issue and contribute!


Wiki

aspyre-gt's People

Contributors

alix-tz avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

aspyre-gt's Issues

Créer un scénario pour intégrer pdfalto

Il s'agirait d'utiliser Aspyre pour créer un système de conversion pour passer des ALTO XML (3) produits par le script pdfalto en intégrant les modifications nécessaires (schéma, filename, ...) et la question de l'homothétie (qui se pose aussi pour les ALTO de Limb (#15).

<?xml version="1.0" encoding="UTF-8"?>
<!-- added manually for compatibility with eScriptorium -->
<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns="http://www.loc.gov/standards/alto/ns-v2#"
      xmlns:page="http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15"
      xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v2# http://www.loc.gov/standards/alto/alto.xsd">
<!-- end of added -->
<!--<alto xmlns="http://www.loc.gov/standards/alto/ns-v3#">-->
<Description>
<MeasurementUnit>pixel</MeasurementUnit>
<sourceImageInformation>
	<!-- <fileName>test_aspyre/1903 159_258 3.pdf</fileName> -->
	<!-- added manually for compatibility with eScriptorium -->
	<fileName>Annuaire_1903 161.tif</fileName>
	<!-- end of added -->
</sourceImageInformation>
<OCRProcessing ID="IdOcr">
...

Autre modification: pdfalto met des float dans les attributs comme "HEIGHT", "WIDTH", "HPOS" etc, au lieu de int.


Cette feature est une solution pour répondre à https://gitlab.inria.fr/scripta/escriptorium/-/issues/331

Revoir le workflow...

En intégrant des objets pour parser les informations essentielles et gérer les différents scénarios de transformation.

En prévision de #17 et #15

Update README

Remarks from JB Camps

  • aspyre/main.py n'existe pas, c'est aspyre/run.py, je crois;

  • (du coup, dans le readme, est-ce que tout ne serait pas fait en
    préservant le chemin relatif depuis la racine du dépôt?)

  • fournir un --help avec la commande ? -> --help fonctionne avec run.py ; help() fonctionne sur aspyre.main()

  • TRP Export = Transkribus export ? -> corrigé (+ explicit)

  • Ajouter la commande zip dans le README ? Important, car ne faut pas
    ziper le dossier mais les fichiers directement (sinon, ça ne fonctionne
    pas), et que du coup, il faut l'option -j -> à faire dans v.0.2.4


  • Also update the doc now that Aspyre is a library

fileName contains None instead of the file name

ex:
<sourceImageInformation><fileName>None</fileName></sourceImageInformation>

This later causes the import of the ALTO files into eScriptorium to fail because the ALTO files cannot be matched with their corresponding image files.

English Edit Suggestions

  1. Add a period at the end of "Aspyre GUI is a simple application to make Aspyre GT available as a service online"
  2. Change "compatible for import" to "compatible with importation" in the following sentence. "Aspyre GT is a pipeline to make Ground Truth exported from Transkribus compatible for import into eScriptorium using ALTO XML as a pivotal format."
  3. Add "the" in the following sentence: "Use the Transkribus export option to download a ZIP file containing image files, ALTO XML files and a METS XML file"
  4. To remain consistent with the previous step, add an "s" to the following": "you may need to remove the image files from the ZIP file in..."
  5. Add punctuation: "initially checking the "Image" option on Transkribus**,** however**,** is necessary in order to get a properly documented METS XML file..."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.