Giter VIP home page Giter VIP logo

imagehub's People

Contributors

dependabot[bot] avatar hero-solutions avatar hobbesball avatar kitania avatar nvanderperren avatar robwyse avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

imagehub's Issues

app:fill-resourcespace command crashes if it detects a subdirectory

Problem:

When a directory is present in the "images" folder, the app:fill-resourcespace command will crash with an error:

ImagickException: The path is a directory: /iiif-data/lost+found in /srv/imagehub/imagehub/src/AppBundle/ResourceSpace/Command/FillResourceSpaceCommand.php:82

Logic would dictate that the images directory doesn't contain any subdirectories. However, on the testbed VPS, the our images directory is actually /iiif-data which is a mount point for a different disk volume. This disk volume contains a standard UNIX lost+found directory we can't delete.

So, we end up with this error message:

ImagickException: The path is a directory: /iiif-data/lost+found in /srv/imagehub/imagehub/src/AppBundle/ResourceSpace/Command/FillResourceSpaceCommand.php:82

Solution

use is_file() or is_dir() functions to detect whether or not an item from the scandir() function is a directory or a file. Omit directories, only process files.

incorporate some phpleague/skeleton files into Symfony

Detailed description of the issue.
To mirror the symfony structure of the datahub, Imagehub should have some 'default' files that are part of the phpleague/skeleton. Cfr the datahub for its implementation.

Possible implementation
Important files to integrate are:

  • editorconfig
  • travis
  • templates
    • CHANGELOG
    • CONTRIBUTING
    • LICENSE
    • PULL-REQUEST TEMPLATE

files that shouldn't be integrated are:

  • composer.json
  • phpunit
  • scrutinizer
  • styleci

add dutch localisation to UV 2?

Detailed description of the issue.
Our metadata is translated, but Universal Viewer itself doesn't have Dutch localisation. We could explore the possibility of adding nl-BE localisation with a PR on the main Universal Viewer github page, following the way that the Swedish translations were added.

Additionally, we could also look into localising the metadata fields within the JSON manifests themselves. currently, the metadata array looks like this:

"metadata": [
        {
            "label": "Title",
            "value": [
                {
                    "@language": "nl",
                    "@value": "Vechtende mannen"
                },
                {
                    "@language": "en",
                    "@value": "Men Fighting"
                }
            ]
        },
...

metadata labels can be localised within the manifests in this way:

"metadata": [
        {
            "label": [
                {
                    "@language": "nl-BE",
                    "@value": "Titel"
                },
                {
                    "@language": "en-GB",
                    "@value": "Title"
                }
            ],
            "value": [
                {
                    "@language": "nl-BE",
                    "@value": "Vechtende mannen"
                },
                {
                    "@language": "en-GB",
                    "@value": "Men Fighting"
                }
            ]
        },
...

I personally am not sure how to tackle this exactly. Do we do both the pull request and in-manifest localisation, or just one of them, or a different solution altogether? @netsensei @nvanderperren , thoughts?

bug: fix array to string conversion when trying to output manifest count

$this->logger->info('Done, created and stored ' . $manifests . ' manifests.');

On the above offending line you try to output how many manifests were generated in a log. In this log you call on the array that holds all the manifests. This causes Symfony to throw an error: array to string conversion.

Solution: initialize a new variable , e.g. 'manifestcount', and count the amount of members in the $manifests array. Call on the $manifestcount variable in your log.

Documentation for ETL2/3

Detailed description of the issue.
Since our ETLs will run periodically on a remote server, we need the capability to monitor their performance and catch any issues that might occur in them. Detailed logging information is necessary for the VKC to maintain the ETLs after they have been delivered by Kitania.

Additional context
The main needed goals of logging are to:

give insights into the volume and performance of the ETLs for monitoring and reporting
raise errors and warnings that give a clear cause and context so that they can be resolved easily by VKC staff

Possible implementation

  • We recommend using monolog's implementation in Symfony , monolog-bundle. You can find documentation on how to initialise it here.
  • Logs should be stored in a logfile on the server, rotating out every two months
  • Logs should raise errors and warnings in every important step of the ETL process, and give the ID of the objects that failed.
  • Logs should count the amount of objects taken as input, how many were successfully put through the ETL, and how many were output.

More frontend info on Imagehub

Detailed description of the issue.
Just like on the Datahub, some more contextual information on the Symfony front page of imagehub.vlaamsekunstcollectie is needed.

Possible implementation

  • a count of the amount of manifests is needed on the frontpage, formatted as 'This ImageHub currently contains X IIIF manifests.'
  • more information on what the Imagehub is should be put in a div under the count, reading:
    " The Imagehub is an implementation of the IIIF Image API and the IIIF Presentation API by the Vlaamse Kunstcollectie VZW. Its goal is to serve high-quality images of the collections of Flemish Museums using the IIIF framework. It can be used to populate IIIF viewers in web applications, possibly in conjunction with serving metadata about those images using the Datahub."
  • The imagehub should have a navigation bar like the datahub does, with the navbar color
    #00AC7B and a white version of the imagehub logo, attached here.
    logo-imagehub-white
  • the front page should have contact information and CTA as follows: "This datahub is managed by Vlaamse Kunstcollectie. Are you interested in publishing your collections online using IIIF? Contact us via e-mail at [email protected]."

Possible alternatives

  • If the green navbar color proves to be too gaudy, you can always make it gray and use the green imagehub logo. I'll leave it to your best judgment to see what works.
    logo-imagehub-rgb

create thumbnails from the startCanvas image URL

Detailed description of the issue.
Thumbnails of images need to be generated to populate the search results page of the Arthub. For this we need links to Cantaloupe image URLs that correspond with the startcanvas image of a work.

Additional context
Currently, thumbnails are generated based on the work ID of an object, using it to form a valid URL pointing to an image on the old VKC website. Currently, images use a new VKC image ID which can't be derived from the work PID. there needs to be a way to link to these VKC Image IDs from LIDO XML.

Possible implementation
at the end of the generate-manifests command, when writing the manifest.json file into LIDO XML, take the startCanvas image URI value from manifest.json and write it into a ResourceSet of the LIDO file. Follow this structure:

<lido:resourceWrap>

   <lido:resourceSet>

      <lido:resourceID lido:type="purl" lido:source="Imagehub">

https://imagehub.vlaamsekunstcollectie.be/iiif/2/100000022

      </lido:resourceID>

      <lido:resourceType>

          <lido:term lido:pref="preferred">thumbnail  
          </lido:term>

      </lido:resourceType>

   </lido:resourceSet>

</lido:resourceWrap>

doesn't detect TIFF files without extension

foreach ($supportedExtensions as $supportedExtension) {

The files that will be put in the images/ folder will not have a file extension because we don't want that to show up in the manifest URL. a url of an image with a file extension would be imagehub.vlaamsekunstcollectie.be/iiif/2/testimage.TIFF/manifest, while we want our URLs to be e.g. imagehub.vlaamsekunstcollectie.be/iiif/2/testimage/manifest.

When checking for correct file formats, don't check for file extensions like you do in this loop, but use the Imagemagick GetFormat function or something similar from packages.org

Improve performance for Fill Resourcespace command

Detailed description of the issue.
Uploading images to resourcespace using the app:fill-resourcespace command is now rather slow, and won't scale for thousands of images.

Additional context
If we want this ETL to run periodically, it's important that it is optimised as much as possible. The biggest bottleneck will always be the speed at which the Resourcespace API handles requests and processes images, so there is a hard limit to how optimised this can be without changing the resourcespace code itself.

Possible implementation

  • refactor the code to have less nested loops to cut down on processing time

Current environment
This issue was encountered when testing the app:fill-resourcespace command to fill Resourcespace on the VPS with the 185 images used in the test setup.

  • Installed setup using the imagehub-box viaa ansible playbook
  • Ubuntu 18.04.2
  • Symfony 3.4
  • PHP 7

enable SSL in Cantaloupe so info.json metadata is served over https

Detailed description of the issue.
We want to be able to use Cantaloupe images in the Storiiies editor by Cogapp: https://storiiies-editor.cogapp.com. The errors we get when trying to load in an info.json file from our Cantaloupe server are these:

(index):1 Access to image at 'http://imagehub.vlaamsekunstcollectie.be/iiif/2/100000150/full/529,/0/default.jpg' from origin 'https://storiiies-editor.cogapp.com' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.

We feel that the issue here isn't CORS in itself, but that the Storiiies editor tries to get the images from a HTTP endpoint instead of HTTPS. We believe this is because the info.json file that cantaloupe creates creates a HTTP URL in the 'id' field. cfr. https://imagehub.vlaamsekunstcollectie.be/iiif/2/100000150/info.json. Change the cantaloupe configuration so it creates IDs with HTTPS instead of HTTP, which should fix the issues the storiiies editor shows.

Possible implementation
the first step would be to enable HTTPS in the cantaloupe config and see if this solves the issue. If it doesn't, maybe changes to TomCat and/or Nginx are needed to correctly serve https.

#serve info.json over https
#cantaloupe_https_enabled : "true"
#cantaloupe_https_host : imagehub.vlaamsekunstcollectie.be
#cantaloupe_https_port : 443

Validate files in drop folder before uploading to resourcespace

Detailed description of the issue.
Images need to be in the correct file format before being uploaded to resourcespace, to ensure a predictable upload process to resourcespace. This validation also ensures that the images in the drop folder are suitable for ingestion by cantaloupe image server.

Additional context
Images need to be :

  • in PTIF format
  • without extension
  • JPEG compressed

If an image is not in the desired extension, do one of two things:

  • either skip the uploading of the image and output a warning that a certain image is not in the right file format
  • qeue the image to be processed after the rest of the fillresourcespace command has run, and use imagemagick to convert the image to the correct format, so it will be uploaded in the next run. Output to logfile that an image was not in the correct format, and has been changed to fix that. The generic imagemagick command to put images in the correct format is:

convert inputImage -define tiff:tile-geometry=256x256 -compress jpeg ptif:outputimage

instead of using convert in your program, make sure to use mogrify so images are edited in-line and keep their filename.

Possible implementation
images can be identified using the imagemagick identify command. Checking if there's no extension and if it is PTIF is pretty standard information. Discovering how it was compressed can be done with imagemagick with the following command

magick imagetitle -format "%[compression]" info:
more info here

bug: "related" field in manifest.json are not formatted properly

Detailed description of the issue.
the command that creates IIIF Presentation manifests adds a link to the arthub representation of the work, in the top-level "related" field, using the work PID of the object. The URL structure of the arthub omits the ".be" part of a work pid, making "mskgent.be:2008-D-1" as a work PID into "mskgent:2008-D-1". The command logic should remove the ".be" of Work PIDs before adding them to the "related" field.

Additional context
in this manifest the "related" field has the value "https://arthub.vlaamsekunstcollectie.be/nl/catalog/mskgent.be:2008-D-1". That url does not resolve, because it should point to "https://arthub.vlaamsekunstcollectie.be/nl/catalog/mskgent:2008-D-1"
This is an artefact of how the datahub was built, and it's the only place where the ".be" is omitted in work PIDs. In any other field where the work PID is present, the ".be" should stay added.

manifests.json are not IIIF Presentation API 2.1 valid

The manifest.json files created from the generatemanifests command are currently not valid following the 2.1 Presentation API Spec.

You can validate manifests by entering a manifest URL here: https://iiif.io/api/presentation/validator/service/ , or by downloading the validator locally here: https://github.com/IIIF/presentation-validator .

URL Tested: https://imagehub.vlaamsekunstcollectie.be/iiif/2/kmska.be:410/manifest.json
Validation Error: Every resource must have @type
Warnings:
URL does not have correct access-control-allow-origin header: got "", expected *
The remote server did not use the requested gzip transfer compression, which will slow access. (Content-Encoding: )

every resource must have a type. For the imagehub, this type should always be:
"@type": "dctypes:Image"
for more info check https://jena.apache.org/documentation/javadoc/jena/org/apache/jena/vocabulary/DCTypes.html

Furthermore CORS isn't configured correctly, this should be changed in the Imagehub to conform how CORS has been configured in the Datahub. You should use nelmio/CORS, you can find how it is configured in the config.yml of the datahub .

Create a README

Possible implementation

  • Create a README based on the README of the Datahub.
  • Crucial subheadings to fill out are:
    • requirements
    • install instructions
    • front-end development

change language labels to RFC-5646 codes

Detailed description of the issue.
Universal viewer expects an extended ISO-639 language code instead of the base ISO-639 language code for localisation. The standard codes should be extended to specify region locale following RFC-5646.

Additional context
in practice, this means the following tags should be replaced:
nl -> nl-BE
en -> en-GB
de -> de-DE
fr -> fr-BE

Possible implementation
The generate-manifests command should hardcode a switch from the basic IS-639 standard in LIDO to the extended standard for manifest.json files

Fix CORS issue in Imagehub

Detailed description of the issue.
When IIIF viewers want to request a manifest file, CORS needs to be enabled and set to *.

Additional context

  • See the spec for enabling CORS here
  • currently, CORS is enabled but set to ""

fix resourcespace not showing thumbnails of images

Detailed description of the issue.
After changing the filestore location of ResourceSpace's images to save space on our system disk, thumbnails didn't show up for any resources. After fucking around in Resourcespace for way too long we changed the filestore location back to default, resulting in upload working fine and most images loading thumbnails this time. This is likely caused by permission issues for resourcespace.

An additional issue when changing back filestores was that not all images loaded correct thumbnails, still. We think this might be because of the JPG compression put on these TIFF files, which require a significant amount of RAM to decompress. This could be fixed by either uploading a different file format than JPG-compressed TIFFs, or by giving more RAM to RS.

Validation of images with missing exif data values

$value = $exifData[$field['exif']];

We can't exclude the possibility that images without proper exif data will be uploaded. So, we will need to require validation here: whether or not an exif tag is present in the extracted exif data.

Currently, images without exif data, will return an error when running the command

$ ./bin/console app:fill-resourcespace

In FillResourceSpaceCommand.php line 109:

  Notice: Undefined index: DocumentName

app:fill-resourcespace [-h|--help] [-q|--quiet] [-v|vv|vvv|--verbose] [-V|--version] [--ansi] [--no-ansi] [-n|--no-interaction] [-e|--env ENV] [--no-debug] [--] <command> [<folder>] [<url>]

When an image doesn't contain exif data and can't be validated: just skip to the next image. Logging of validation errors could be added in a later stage.

validate manifest.json files after their creation

Detailed description of the issue.
manifest.json files created by the generatemanifestscommand need to be valid following the IIIF Presentation API 2.1 specification. Manifests should be valid before being written to the imagehub database.

Additional context
Since only valid manifests can successfully be consumed by IIIF viewers like Universal Viewer, the manifests we publish through imagehub need to be valid as to not create problems on the frontend for Arthub. Not all manifests should individually be validated, but validation of a sample of manifests should ensure that no fatal errors have happened throughout the pipeline. In this way, we can catch errors in metadata that might have been caused by metadata creators or changes in software before exposing invalid manifests. The manifest generation command is the last command before manifests are exposed to users, so a validation step is necessary here.

Possible implementation
As a test to see if the whole ETL has run correctly, all 165 manifests should be validated. Be aware that this isn't scalable, since every validation requires a GET request. In future development we'll need to either require more detailed validation in the different steps of the pipeline so we don't need manifest validation or integrate a validation schema in our pipeline in the same way that the current python implementation validates the manifests.

The GET request should use the following endpoint: http://iiif.io/api/presentation/validator/service/validate?format=json&version=2.0&url=manifest-url-here

If a manifest is not valid, it should not be published, and generate output in the logfile equal to the response the validator endpoint gives, including which manifest these errors are for. If the manifest is valid, it can be safely published.

Documentation for ETL5

Detailed description of the issue.
Since our ETLs will run periodically on a remote server, we need the capability to monitor their performance and catch any issues that might occur in them. Detailed logging information is necessary for the VKC to maintain the ETLs after they have been delivered by Kitania.

Additional context
The main needed goals of logging are to:

  • give insights into the volume and performance of the ETLs for monitoring and reporting
  • raise errors and warnings that give a clear cause and context so that they can be resolved easily by VKC staff

Possible implementation

  • We recommend using monolog's implementation in Symfony , monolog-bundle. You can find documentation on how to initialise it here.
  • Logs should be stored in a logfile on the server, rotating out every two months
  • Logs should raise errors and warnings in every important step of the ETL process, and give the ID of the objects that failed.
  • Logs should count the amount of objects taken as input, how many were successfully put through the ETL, and how many were output.

issue: fix filled up VIAA server

Detailed description of the issue.
with the last run of fill-resourcespace the VIAA VPS main disk filled up to its capacity of 15GB. This has been fixed by:

  • removing system log files from the /var folder
  • removing unused TIFF files from the system disk
  • removing a stray vagrant installation with full Perl installed
  • moving the ResourceSpace filestore from the system folder to the iiif-data disk with the following line in ResourceSpace's config.php:
$storagedir="/path/to/filestore"; 
# Where to put the media files. Can be absolute (/var/www/blah/blah) or relative to the installation. Note: no trailing slash 

Link back to IIIF Manifests in LIDO XML at the end of ETL5

Detailed description of the issue.
Given that:

  • a single manifest.json file should always correspond to a single LIDO XML File and a single Work PID
  • there already is a link from the IIIF manifest to the LIDO XML file in the "related" field

It makes sense to also link back to the IIIF manifest in the LIDO XML file, so that both metadata files are pointing to each other and you can get from one metadata file to the other and vice versa.

Additional context
referencing other resources within Lido XML happens in the ResourceWrap. you'll have to create a new ResourceSet (in AdministrativeMetadata/ResourceWrap) and link to the IIIF manifest there. That ResourceSet should then be copied into the other language versions of AdministrativeMetadata, with no changes to the content of the set.

You will have to authenticate yourself to the Datahub you are going to write this information to. This authentication should be configurable in the ETL5 config settings.

Possible implementation
One you have created a complete manifest and have validated that manifest, but before uploading the manifest to the database, take the Datahub URL corresponding to that LIDO XML file and write the Manifest.json PURL to it.

Follow this structure when implementing IIIF manifests into LIDO:

<lido:resourceWrap>

   <lido:resourceSet>

      <lido:resourceID lido:type="purl" lido:source="Imagehub">

https://imagehub.vlaamsekunstcollectie.be/iiif/2/manifesttitle/manifest.json

      </lido:resourceID>

      <lido:resourceType>

          <lido:term lido:pref="preferred">IIIF Manifest

          </lido:term>

      </lido:resourceType>

      <lido:resourceSource>

	   <lido:legalBodyName>

              <lido:appellationValue>

                   Vlaamse Kunstcollectie VZW

              </lido:appellationValue>

          </lido:legalBodyName>

      </lido:resourceSource>

   </lido:resourceSet>

</lido:resourceWrap>

create multilingual label, description and attribution in manifest.json files

Detailed description of the issue.
Currently only the metadata in the 'metadata' array of a manifest.json file is served multilingually, while the label, description and attribution values are only served in Dutch. This results in Universal viewer displaying the dutch title, description and attribution on english language pages in the arthub.

Possible implementation
Implement the multilingual creation of values in the generate-manifests command. Generating multilingual values for top-level fields can be done following this subchapter in the IIIF spec.

An example manifest would look like this:

{
    "@context": "http://iiif.io/api/presentation/2/context.json",
    "@type": "sc:Manifest",
    "@id": "https://imagehub.vlaamsekunstcollectie.be/iiif/2/mskgent.be:1982-N/manifest.json",
{"label": 
        {"@value": "Vechtende Mannen", "@language": "nl-be"},
        {"@value": "Fighting Men", "@language": "en-gb"}
    },
    "attribution": "Museum voor Schone Kunsten Gent",
    "related": "https://arthub.vlaamsekunstcollectie.be/nl/catalog/mskgent.be:1982-N",
    {"description": 
        {"@value": "beschrijving van Vechtende Mannen", "@language": "nl-be"},
        {"@value": "Description of fighting men", "@language": "en-gb"}
    },
    "metadata": [
...

manifest height and width aren't always created correctly

In a test manifest generated on the VPS a resource and canvas had height and width = 0, resulting in an invalid canvas that didn't show the image.

manifest in question: https://imagehub.vlaamsekunstcollectie.be/iiif/2/kmska.be:410/manifest.json

The manifest generator command should always be able to derive the height and width of a canvas and a resource from the Image API info.json file. If this isn't available, it should skip the creation of the canvas/resource, output an error, and continue with generating the rest of the manifest.

Create a top-level IIIF Collection for discoverability

Detailed description of the issue.
Having a page that lists all the available IIIF manifests in the imagehub enhances the discoverability of the imagehub manifests. In the same way that datahub has the ListRecords command to show all available LIDO records, a similar function for imagehub is needed.

Possible implementation
This can be implemented in several different ways, but following the possibilities of the IIIF spec seems like the most elegant solution. a top-level IIIF Collection is a JSON file that gives a list of manifests. cfr. IIIF v.1. spec about Collections.

  • create a symfony command, UpdateCollection, that is supposed to run after the GenerateManifests command, but is separate from it.
  • This command checks if there is a top-level collection at the URL https://imagehub.vlaamsekunstcollectie.be/iiif/2/collection/top .
  • If there is none, it will make a JSON list of manifests, listing all the manifests in the imagehub database.
  • If a top-level collection exists, it should update that collection with new manifests that might have been added to the database, and delete manifests that have been removed from the database.
  • the Collection should be available at the url https://imagehub.vlaamsekunstcollectie.be/iiif/2/collection/top

an example Collection JSON file would look like this:


{
  "@context": "http://iiif.io/api/presentation/2/context.json",
  "@id": "https://imagehub.vlaamsekunstcollectie.be/iiif/2/collection/top",
  "@type": "sc:Collection",
  "label": "Top Level Collection for Imagehub",
  "viewingHint": "top",
  "description": "This collection lists all the IIIF manifests available in this Imagehub instance",
  "manifests": [
    {
      "@id": "https://imagehub.vlaamsekunstcollectie.be/iiif/2/kmska.be:410/manifest.json",
      "@type": "sc:Manifest",
      "label": "Heilige Barbara van Nicomedië"
    },
 {
      "@id": "https://imagehub.vlaamsekunstcollectie.be/iiif/2/mskgent.be:1998-B-112/manifest.json",
      "@type": "sc:Manifest",
      "label": "De wraak van Hop-Frog"
    }
  ]
}

Remove resourcespace.yml and parameters.yml

The Github repository contains app/config/resourcespace.yml and app/config/parameters.yml files. These files are environment specific => the API key for resourcespace will differ per person, per installation, per environment,...

The goal of the resourcespace.yml.dist file is to have a generic template which can be copied and modified. The YAML files themselves shouldn't be commited in the main vanilla Github repository. The YAML files stay on the local machines.

We need to do these steps:

  • Remove resourcespace.yml and parameters.yml from the Github repo
  • Add them to the .gitignore file to avoid them getting committed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.