Giter VIP home page Giter VIP logo

Comments (11)

danielhuppmann avatar danielhuppmann commented on May 26, 2024 1

For 1, if there is a "require-all" feature, the list of variables (and units) and regions would have to be an exact match.

The way that the nomenclature API is structured, you can easily use the codelists to downselect an IamDataFrame. If you look closely in the example above, you'll see that only the filtered IamDataFrame is used for the validation.

magicc.validate(df.filter(variable=magicc.variable), require_all=True)

from nomenclature.

danielhuppmann avatar danielhuppmann commented on May 26, 2024 1

Suggestion: we create a new class WorkflowProcessor, which has its own folder "workflows" (similar to mappings for the RegionProcessor) with a specific yaml structure.

Attributes:

  • name (instead of "required_for")
  • required_timeseries

The WorkflowProcessor can be called via nomenclature.process(df, dsd, processor=a_workflow_processor).

from nomenclature.

phackstock avatar phackstock commented on May 26, 2024

Do I understand option 1. correctly that this would mean that each upload would have to contain every variable?
In case it is that way I would say options 2. and 3. are the better ones.
We could also do it a different way, by creating something like a minimum specification for input data. This could cover every dimension, model, scenario, variable, year(s), etc...
Something like this:

variable:
  - Final Energy
  - Final Energy|Electricity
year: [2020, 2025, ...]
region: World

from nomenclature.

phackstock avatar phackstock commented on May 26, 2024

Now that I think about it, what such a list or required variables would be super useful for is whenever there are post-processing steps involved in a model comparison study. This way you can ensure that the post-processing colleagues have everything they need to work with. That would save a lot of time and frustration.

from nomenclature.

danielhuppmann avatar danielhuppmann commented on May 26, 2024

For clarification, I meant that we would need all three options, not as an either-or.

And yes, use case 1 is for post-processing, basically re-using the DataStructureDefinition as the "minimum specification".

Say, for the openENTRANCE project using MAGICC climate-postprocessing, there would be two DataStructureDefinitions. And the code could look something like

df = pyam.IamDataFrame("<file>")

oe = nomenclature.DataStructureDefinition("openentrance")
oe.validate(df)

magicc = nomenclature.DataStructureDefinition("magicc")
magicc.validate(df.filter(variable=magicc.variable), require_all=True)

where the MAGICC variables are a subset of the openentrance variables...

from nomenclature.

phackstock avatar phackstock commented on May 26, 2024

Ah right. Just so that I understand this correctly, this means that there would be two folders in the openENTRANCE project then. One for the list of all allowed variables and one for the list of required variables for, in this case, MAGICC?
In this case I think it might make sense to start talking about centralizing these post-processing specific variable requirement lists. They should be largely static right?

from nomenclature.

danielhuppmann avatar danielhuppmann commented on May 26, 2024

Yes, but that is not a discussion to be had in this repository or this issue... First, we need to be able to "require" variables in the nomenclature package.

from nomenclature.

lewisjared avatar lewisjared commented on May 26, 2024

For case 1) is the test that at least the required set variables/regions are provided or that exact set of variables/regions are provided? i.e explode if additional information is provided.

Ideally downstream models are indifferent to additional data, but some may not be. Is that something that you would want to support?

from nomenclature.

phackstock avatar phackstock commented on May 26, 2024

Coming back to this issue as the MAGICC use case is high on the list right now.
I really like to idea of re-using the codelists, either by having a required attribute in the original codelist or having a separate one.
The one potential limitation that I would see is that we would essentially combine "everything with everything". An example of that would be requiring a timeseries variable to be present for a number of years. If we created a list of allowed years and enforced that we would probably run into trouble as different models feature different time resolutions. This could be again solved by using a required attribute for the required years, but for some variables we might need data until 2050 while for others until 2100.

Maybe the cleaner approach would be to use a different structure with more fine grained control. Something like this:

required_for: MAGICC
required_timeseries:
  - variable: Emissions|CO2
    region: World
    years: [2020, 2025, 2050, 2075, 2100]
    required: True
  - variable: Emissions|CH4
    region: World
    years: [2020, 2025, 2050, 2075, 2100]
    optional: True
  - variable: [Final Energy, Final Energy|Electricity, Final Energy...] 
    region: [R5 Asia, R5 Middle East & Africa, ...]
    scenario: [Current policies, NDCs, ...] 
    years: [2020, 2025, 2030, 2035, 2040, 2045, 2050, 2060, ...]
    required: True

the first attribute required_for is mainly for user communication. Considering a situation where we have multiple different post-processings and/or just general project requirements, it is probably important to know what's required for what. Also for error messages this should be helpful.
required_timesiers contains a list of timeseries definitions. In the above example we have a Emissions|CO2 as a required timeseries and Emissions|CH4 as a optional one (if I recall correctly that is actually how the AR6 climate assessment pipeline runs with requiring some variables to be model native and allowing infilling for others).
Within each "timeseries" definition we still require "everything with everything", as the third timeseries requirement demonstrates. A dataframe would only pass if it has all the variables (Final Energy, Final Energy|Electricity, ...) for each variable each scenario, for each variable and scenario combination each region and finally for each variable, scenario, region combination each year.

This way we should get the ease of being able to define a large amount of timeseries in a single chunk while maintaining fine grained control.

@lewisjared and @danielhuppmann would love to hear your thoughts on the matter. Do you think this would be a good way to go?

from nomenclature.

phackstock avatar phackstock commented on May 26, 2024

Sounds good to me. I'll get on that.

from nomenclature.

phackstock avatar phackstock commented on May 26, 2024

@danielhuppmann I've started implementing this feature and I would have a few design questions. Should we discuss them here or is it better if I open a draft PR and we tackle them over the actual code?

from nomenclature.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.