Giter VIP home page Giter VIP logo

Comments (2)

phackstock avatar phackstock commented on June 6, 2024

Some thoughts and/or open questions on the implementation:

  • In principle we don't need to implement anything as we already have DataStructureDefinition.validate(df) which we could just run after all renaming and/or region aggregation to check whether the all the native and common regions are allowed. However, that would mean running a potentially illegal region aggregation mapping and only finding out after the fact. Additionally, we of course don't need to run any renaming or aggregation to find out if there is a problem. This can be determined by looking solely at the mapping for the current model and the DatastructureDefinition. That leads to the my first question:
    • At what point and how do we want to check the validity of the mappings? In my view there are several options:
      • Package it in a GitHub action that runs upon any changes happening in the mapping directory or to any of the region definitions
      • Run a check of all mapping files at the beginning of each workflow. Advantage: Easy to implement; disadvantage: A faulty mapping could also affect an entirely different model which is uploaded
      • Run through the models in the given data frame one by one and check if the mapping for each model is correct.
    • My preference would probably be option 3
  • If we go with option 3 what should the interface for this validation function look like? We could go with an analogue of what's DataStructureDefinition.validate and create DataStructureDefinition.validate_mapping(df, mapping) where mapping might simply be the directory where all mappings live, a dictionary with all the mappings loaded as RegionAggregationMapping or some custom holing object like mentioned in #27. From an interface point of view I would prefer the first option. We would only provide the mapping directory and everything else is handled inside DataStructureDefinition.validate_mapping.

Would love to hear your thoughts on all that @danielhuppmann.

from nomenclature.

danielhuppmann avatar danielhuppmann commented on June 6, 2024

Happy to share my thoughts...

  1. There is a misunderstanding about the aim of the DataStructureDefinition.validate method - it validates an IamDataFrame instance (e.g., an upload by a modelling team) against an instance of the DataStructureDefinition initialized from a project-specific directory (i.e., correct variables and regions). It is not a validation of the nomenclature-yaml files. This (implicitly) already occurs when initializing a DataStructureDefinition instance from a directory.

  2. The guiding question should be: how to execute the region-aggregation? One option that I see is the following:

    dsd = nomenclature.DataStructureDefinition("<path/to/definitions>")
    reg = nomenclature.RegionProcessor("<path/to/definitions>", dsd)
    
    df = pyam.IamDataFrame(df)d
    dsd.validate(df)
    reg.apply(df)

    The validation that all native[-renamed] and common regions are defined in the DataStructureDefinition instance should happen as part of the initialization of the RegionProcessor instance.

    Alternatively, the collection of region-aggregation-mappings could be implemented as a module of the DataStructureDefinition.

  3. When do we check the validity of configuration files and mappings? This should happen in every (project-specific) repository that has such files - executed via GitHub Actions upon a PR or push, so that invalid or inconsistent files are unlikely to end up on the main branch of any repo.

    The nomenclature package already has a testing module (see here), which is currently used as part of the automated testing of the irp-internal-workflow repository (see here). This should be further developed, I'll start an issue.

    Furthermore, the way the code is structured now, all region-mapping files have to be parsed anyway to find the "correct" mapping file when importing the package.

    In short, we want to do 1 and we have to do 2 (unless we restructure the code and have a central "directory" with the mapping of model-to-mapping-file).

from nomenclature.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.