Giter VIP home page Giter VIP logo

Comments (6)

mbjones avatar mbjones commented on August 22, 2024

I can verify this for both EML 2.1.1 that @mobb sent, as well as for EML 2.2.0 documents. The issue seems to arise from customUnit using a different XPath selector for its references than other key/keyref pairs in the document, and so unit validation was skipped in the new EMLValidator class.

The original EMLParser class (now Deprecated) configuration included the following key/keyref pair check for units:

  <key name="unitIdentifierKey">
    <selector xpath="//unitList/unit"/>
    <field xpath="@id"/>
  </key>
  <keyref name="unitReference" refer="unitIdentifierKey">
    <selector xpath="//unit/customUnit"/>
    <field xpath="."/>
  </keyref>

This was not implemented in EMLValidator and needs to be added. I committed two test files to the develop branch (for 2.1.1 and 2.2.0) that illustrate the problem. The travis checks for EML on the develop branch will start failing until we fix this.

from eml.

mbjones avatar mbjones commented on August 22, 2024

I commited a fix for this to the develop branch in SHA 4cd1f9d.

In the process, I found several erroneous test documents in the test suite, which are now also fixed in the branch. I updated the validation rules documentation to reflect this additional rule which got lost in translation from the previous release. The validation algorithm for units is now:

        // If a customUnit is referenced in a document, it must have a
        // corresponding STMML unit definition in the document with a matching
        // @id
        ArrayList<String> unitIdentifiers = getXPathValues("//*[local-name() = 'unitList']/*[local-name() = 'unit']/@id");
        ArrayList<String> unitReferences = getXPathValues("//unit/customUnit");
        for (String s : unitReferences) {
            if (!unitIdentifiers.contains(s)) {
                errors.add("Invalid: Custom unit definition missing: " + s);
                isValid = false;
            }
        }

@cboettig and @amoeba and @mobb Please review and let me know if that looks correct to you. If so, we'll need to be sure to add a corresponding validation check to emld in R.

This set of changes does not change the EML schema, but it does affect any system that might have done a validation check before accepting content, such as Metacat. @taojing2002 will need to incorporate a new built version of the eml.jar into Metacat, and we'll need to redeploy the online parser on the KNB. We will also need to redeploy the EML site documentation to reflect the updated validation documentation.

Comments appreciated.

from eml.

mobb avatar mobb commented on August 22, 2024

This fix looks like it will catch docs with missing unit definitions.

HOWEVER: I am pretty sure that the previous parse behavior was to match the <customUnit> to the XPath unitList/unit/@name, and not to unitList/unit/@id. I only know this as a user, having been bit by it several times. Is there a way to confirm that?

IMO, id is a better choice. But if this is a change in behavior, we should be prepared for some docs that passed the parser previously to now fail.

from eml.

mbjones avatar mbjones commented on August 22, 2024

Hey @mobb thanks for the feedback. I accidentally started with @name, and found that produces validation errors in EMLParser. The XPath that I showed above from the configuration file shows that we previously used @id, so I stuck with that. The one thing I did do was make the XPath agnostic with respect to namespace, as I found live examples that were both explicit about the STMML namespace and others that omitted it. So, now both are allowed in my revised XPath via use of local-name().

from eml.

mbjones avatar mbjones commented on August 22, 2024

@taojing2002 When this upgrade is applied to Metacat on our repositories and for DataONE, it may find that documents that previously passed the EMLValidator no longer do so.... which concerns me, and we need a strategy. It would be interesting to find out which documents fail before we deploy it widely (which we might be able to do with a modified quality check with @gothub that runs the new validator on all DataONE content).

from eml.

amoeba avatar amoeba commented on August 22, 2024

Sam Csik found this example of a doc missing a few custom unit defs that passed Metacat's validation so I'll add it here https://search.dataone.org/view/doi:10.18739/A29882N5H.

from eml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.