Building on <a class="issue-link js-issue-link" data-error-text="Failed to load title"

Happy to share my thoughts... There is a misund

Check for allowed regions about nomenclature HOT 2 CLOSED

iamconsortium commented on June 6, 2024

Check for allowed regions

from nomenclature.

Comments (2)

phackstock commented on June 6, 2024

Some thoughts and/or open questions on the implementation:

In principle we don't need to implement anything as we already have DataStructureDefinition.validate(df) which we could just run after all renaming and/or region aggregation to check whether the all the native and common regions are allowed. However, that would mean running a potentially illegal region aggregation mapping and only finding out after the fact. Additionally, we of course don't need to run any renaming or aggregation to find out if there is a problem. This can be determined by looking solely at the mapping for the current model and the DatastructureDefinition. That leads to the my first question:
- At what point and how do we want to check the validity of the mappings? In my view there are several options:
  - Package it in a GitHub action that runs upon any changes happening in the mapping directory or to any of the region definitions
  - Run a check of all mapping files at the beginning of each workflow. Advantage: Easy to implement; disadvantage: A faulty mapping could also affect an entirely different model which is uploaded
  - Run through the models in the given data frame one by one and check if the mapping for each model is correct.
- My preference would probably be option 3
If we go with option 3 what should the interface for this validation function look like? We could go with an analogue of what's DataStructureDefinition.validate and create DataStructureDefinition.validate_mapping(df, mapping) where mapping might simply be the directory where all mappings live, a dictionary with all the mappings loaded as RegionAggregationMapping or some custom holing object like mentioned in #27. From an interface point of view I would prefer the first option. We would only provide the mapping directory and everything else is handled inside DataStructureDefinition.validate_mapping.

Would love to hear your thoughts on all that @danielhuppmann.

from nomenclature.

danielhuppmann commented on June 6, 2024

Happy to share my thoughts...

There is a misunderstanding about the aim of the DataStructureDefinition.validate method - it validates an IamDataFrame instance (e.g., an upload by a modelling team) against an instance of the DataStructureDefinition initialized from a project-specific directory (i.e., correct variables and regions). It is not a validation of the nomenclature-yaml files. This (implicitly) already occurs when initializing a DataStructureDefinition instance from a directory.
The guiding question should be: how to execute the region-aggregation? One option that I see is the following:
```
dsd = nomenclature.DataStructureDefinition("<path/to/definitions>")
reg = nomenclature.RegionProcessor("<path/to/definitions>", dsd)

df = pyam.IamDataFrame(df)d
dsd.validate(df)
reg.apply(df)
```
The validation that all native[-renamed] and common regions are defined in the DataStructureDefinition instance should happen as part of the initialization of the RegionProcessor instance.

Alternatively, the collection of region-aggregation-mappings could be implemented as a module of the DataStructureDefinition.
When do we check the validity of configuration files and mappings? This should happen in every (project-specific) repository that has such files - executed via GitHub Actions upon a PR or push, so that invalid or inconsistent files are unlikely to end up on the main branch of any repo.

The nomenclature package already has a testing module (see here), which is currently used as part of the automated testing of the irp-internal-workflow repository (see here). This should be further developed, I'll start an issue.

Furthermore, the way the code is structured now, all region-mapping files have to be parsed anyway to find the "correct" mapping file when importing the package.

In short, we want to do 1 and we have to do 2 (unless we restructure the code and have a central "directory" with the mapping of model-to-mapping-file).

from nomenclature.

Check for allowed regions about nomenclature HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent