Giter VIP home page Giter VIP logo

Comments (14)

matthew-mizielinski avatar matthew-mizielinski commented on July 19, 2024

We've worked out what is wrong in the minimal example being used here -- credit to @piotr-florek-mohc for debugging this

The CMIP6Plus_CV.json contains null entries for some parent experiment ids -- I think this is the cause of the segfaults as CMOR may be attempting to do string operations on None which will almost certainly fail.

Once this is corrected then CMOR behaves itself again. One option might be to prescan every JSON file read and fail if a null or None value is found anywhere.

The other fixes to get the above to work are; correction to mip table id in the table_id section of the CVs file and an update to the table_id in the header of the mip table itself (to include the MIP_ prefix.

from cmor.

durack1 avatar durack1 commented on July 19, 2024

Just noting the associated issue PCMDI/mip-cmor-tables#27

from cmor.

mauzey1 avatar mauzey1 commented on July 19, 2024

@matthew-mizielinski @durack1 Since the issues occurring with the example can be fixed within the MIP tables, should we close this issue?

from cmor.

durack1 avatar durack1 commented on July 19, 2024

@mauzey1 thanks for circling. As it's a segfault, it would be great to catch this issue and ensure that such a poorly defined use case doesn't segfault.

We also wanted to make sure that the structure of the json input files made sense, and then CMOR could read this more sensible format - in the case of a list type vs space-separated string type.

So it would be good to make some tweaks to CMOR to deal with the segfault, and separately deal with the json type formats as well

from cmor.

mauzey1 avatar mauzey1 commented on July 19, 2024

How should we handle null in the tables? Throw an error? Ignore them? Treat as empty strings?

from cmor.

durack1 avatar durack1 commented on July 19, 2024

In a situation where poorly constructed tables/jsons inputs are provided, it would be great to note the problem and throw a traceback, rather than a segfault - if CMOR doesn't know about it, then we shouldn't have to deal with it.

@matthew-mizielinski may have a subtly different take which I would like to hear

from cmor.

matthew-mizielinski avatar matthew-mizielinski commented on July 19, 2024

@mauzey1, @durack1

the output from CMOR when it segfaults is pretty unhelpful, and it is pretty easy to sink a lot of time trying to work out where the issue is. For example, I think the example I attached implies that the tables are at fault, in that the failing call is load_table(), so I spent a fair amount of time with the tables before Piotr identified the issue.

I don't think there is a valid situation where a null value should appear in any of the input files used by CMOR (feel free to correct me if I've missed something), so we could have a sanity checking function that walks the dictionary immediately after loading the tables/CVs and raises an error that a null has been found.

Note that I don't think that this is a critical thing to fix and release immediately. We'll get something into any CVs/MIP tables code to prevent null's appearing, but given we've identified this issue it would be good to cover it off.

from cmor.

mauzey1 avatar mauzey1 commented on July 19, 2024

I was looking at where the segfault was happening in CMOR and found it within the cmor_CV_set_att function in cmor_CV.c. It happens when it tries to read a null value from an array as a string.

cmor/Src/cmor_CV.c

Lines 74 to 89 in 047fd2c

} else if (json_object_is_type(joValue, json_type_array)) {
int i;
pArray = json_object_get_array(joValue);
length = array_list_length(pArray);
CV->aszValue = (char **) malloc(length * sizeof(char**));
for(i=0; i < length; i++) {
CV->aszValue[i] = (char *) malloc(sizeof(char) * CMOR_MAX_STRING);
}
json_object *joItem;
CV->anElements = length;
for (k = 0; k < length; k++) {
joItem = (json_object *) array_list_get_idx(pArray, k);
strcpy(CV->aszValue[k], json_object_get_string(joItem));
}
CV->type = CV_stringarray;

However, I noticed in this function that there is a case where it will ignore a null value found inside a JSON object.

cmor/Src/cmor_CV.c

Lines 33 to 35 in 047fd2c

if (json_object_is_type(joValue, json_type_null)) {
printf("Will not save NULL JSON type from CV.json\n");

I'm guessing there have been cases where null was found in tables, but just wasn't anticipated to be found inside arrays.

Either way, I agree that there shouldn't be a valid case for having null values in tables. I will go ahead with the idea of having a check for every JSON table that throws an error at the first instance of a null value.

from cmor.

durack1 avatar durack1 commented on July 19, 2024

@mauzey1 perfect, I think such a check would make the software more robust, particularly as we're anticipating increased use over time with people text editing json files in the worst cases - if it has a problem reading the file, it should point out the issue to whatever granularity is relatively easily possible, and exit, throwing the error and alerting a user to where to look to fix it

from cmor.

durack1 avatar durack1 commented on July 19, 2024

The additional tweak is to the format of the dimensions, modeling_realm attributes, from a string to a json list type

CMIP6

        "tas": {
            "frequency": "mon", 
            "modeling_realm": "atmos", 
            "standard_name": "air_temperature", 
            "units": "K", 
            "cell_methods": "area: time: mean", 
            "cell_measures": "area: areacella", 
            "long_name": "Near-Surface Air Temperature", 
            "comment": "near-surface (usually, 2 meter) air temperature", 
            "dimensions": "longitude latitude time height2m", 
            "out_name": "tas", 
            "type": "real", 
            "positive": "", 
            "valid_min": "", 
            "valid_max": "", 
            "ok_min_mean_abs": "", 
            "ok_max_mean_abs": ""
        }, 

https://github.com/PCMDI/cmip6-cmor-tables/blob/main/Tables/CMIP6_Amon.json#L1151-L1168

CMIP6Plus

    "tas": {
      "cell_measures": "area: areacella",
      "cell_methods": "area: time: mean",
      "comment": "near-surface (usually, 2 meter) air temperature",
      "dimensions": [
        "longitude",
        "latitude",
        "time",
        "height2m"
      ],
      "frequency": "mon",
      "long_name": "Near-Surface Air Temperature",
      "modeling_realm": [
        "atmos"
      ],
      "ok_max_mean_abs": 295.0,
      "ok_min_mean_abs": 255.0,
      "out_name": "tas",
      "positive": "",
      "standard_name": "air_temperature",
      "type": "real",
      "units": "K",
      "valid_max": 350.0,
      "valid_min": 170.0
    },

https://github.com/PCMDI/mip-cmor-tables/blob/c46c240e448ad17c2cfec31fcf639aea1f51d1f0/Tables/MIP_APmon.json#L3077-L3101

@wolfiex ping

from cmor.

taylor13 avatar taylor13 commented on July 19, 2024

I vaguely recall that a json list type doesn't care about order, but for dimensions and modeling_realm we definitely do care. Is this a problem?

from cmor.

mauzey1 avatar mauzey1 commented on July 19, 2024

Getting back to this issue, I have so far made some changes for finding null values in JSON tables and throwing an error message when found. Are there other invalid values that we should look for in tables?

When I run CMOR with the mip-cmor-tables, I get a lot of warning messages for the attributes provenance, validation, and branded_variable_name. We could extract ok_max_mean_abs, ok_min_mean_abs, valid_max, and valid_min out of validation, however I'm not sure how to useprovenance and branded_variable_name. Should CMOR just ignore them?

from cmor.

durack1 avatar durack1 commented on July 19, 2024

Thanks @mauzey1 we need to get onto finalizing this format early this year, so @wolfiex did you want to update us on 2024 status so we can plot?

from cmor.

matthew-mizielinski avatar matthew-mizielinski commented on July 19, 2024

Some of these might be best as their own issues, but the small tweaks we've discussed are;

  • remove product field from output - currently defaults to model-output with the value at the top of a table overwriting this
  • avoid further_info_url automatic creation as this isn't going to be used beyond CMIP6 for now
  • (possibly) remove short_description from output -- it relies on mip_era from tables (now removed) and this field will do something strange when the CORDEX tables are picked up.

from cmor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.