Comments (14)
We've worked out what is wrong in the minimal example being used here -- credit to @piotr-florek-mohc for debugging this
The CMIP6Plus_CV.json contains null
entries for some parent experiment ids -- I think this is the cause of the segfaults as CMOR may be attempting to do string operations on None
which will almost certainly fail.
Once this is corrected then CMOR behaves itself again. One option might be to prescan every JSON file read and fail if a null
or None
value is found anywhere.
The other fixes to get the above to work are; correction to mip table id in the table_id
section of the CVs file and an update to the table_id
in the header of the mip table itself (to include the MIP_
prefix.
from cmor.
Just noting the associated issue PCMDI/mip-cmor-tables#27
from cmor.
@matthew-mizielinski @durack1 Since the issues occurring with the example can be fixed within the MIP tables, should we close this issue?
from cmor.
@mauzey1 thanks for circling. As it's a segfault, it would be great to catch this issue and ensure that such a poorly defined use case doesn't segfault.
We also wanted to make sure that the structure of the json input files made sense, and then CMOR could read this more sensible format - in the case of a list type vs space-separated string type.
So it would be good to make some tweaks to CMOR to deal with the segfault, and separately deal with the json type formats as well
from cmor.
How should we handle null
in the tables? Throw an error? Ignore them? Treat as empty strings?
from cmor.
In a situation where poorly constructed tables/jsons inputs are provided, it would be great to note the problem and throw a traceback, rather than a segfault - if CMOR doesn't know about it, then we shouldn't have to deal with it.
@matthew-mizielinski may have a subtly different take which I would like to hear
from cmor.
the output from CMOR when it segfaults is pretty unhelpful, and it is pretty easy to sink a lot of time trying to work out where the issue is. For example, I think the example I attached implies that the tables are at fault, in that the failing call is load_table()
, so I spent a fair amount of time with the tables before Piotr identified the issue.
I don't think there is a valid situation where a null
value should appear in any of the input files used by CMOR (feel free to correct me if I've missed something), so we could have a sanity checking function that walks the dictionary immediately after loading the tables/CVs and raises an error that a null has been found.
Note that I don't think that this is a critical thing to fix and release immediately. We'll get something into any CVs/MIP tables code to prevent null's appearing, but given we've identified this issue it would be good to cover it off.
from cmor.
I was looking at where the segfault was happening in CMOR and found it within the cmor_CV_set_att
function in cmor_CV.c. It happens when it tries to read a null
value from an array as a string.
Lines 74 to 89 in 047fd2c
However, I noticed in this function that there is a case where it will ignore a
null
value found inside a JSON object.Lines 33 to 35 in 047fd2c
I'm guessing there have been cases where
null
was found in tables, but just wasn't anticipated to be found inside arrays.
Either way, I agree that there shouldn't be a valid case for having null
values in tables. I will go ahead with the idea of having a check for every JSON table that throws an error at the first instance of a null
value.
from cmor.
@mauzey1 perfect, I think such a check would make the software more robust, particularly as we're anticipating increased use over time with people text editing json files in the worst cases - if it has a problem reading the file, it should point out the issue to whatever granularity is relatively easily possible, and exit, throwing the error and alerting a user to where to look to fix it
from cmor.
The additional tweak is to the format of the dimensions
, modeling_realm
attributes, from a string to a json list type
CMIP6
"tas": {
"frequency": "mon",
"modeling_realm": "atmos",
"standard_name": "air_temperature",
"units": "K",
"cell_methods": "area: time: mean",
"cell_measures": "area: areacella",
"long_name": "Near-Surface Air Temperature",
"comment": "near-surface (usually, 2 meter) air temperature",
"dimensions": "longitude latitude time height2m",
"out_name": "tas",
"type": "real",
"positive": "",
"valid_min": "",
"valid_max": "",
"ok_min_mean_abs": "",
"ok_max_mean_abs": ""
},
https://github.com/PCMDI/cmip6-cmor-tables/blob/main/Tables/CMIP6_Amon.json#L1151-L1168
CMIP6Plus
"tas": {
"cell_measures": "area: areacella",
"cell_methods": "area: time: mean",
"comment": "near-surface (usually, 2 meter) air temperature",
"dimensions": [
"longitude",
"latitude",
"time",
"height2m"
],
"frequency": "mon",
"long_name": "Near-Surface Air Temperature",
"modeling_realm": [
"atmos"
],
"ok_max_mean_abs": 295.0,
"ok_min_mean_abs": 255.0,
"out_name": "tas",
"positive": "",
"standard_name": "air_temperature",
"type": "real",
"units": "K",
"valid_max": 350.0,
"valid_min": 170.0
},
@wolfiex ping
from cmor.
I vaguely recall that a json list type doesn't care about order, but for dimensions and modeling_realm we definitely do care. Is this a problem?
from cmor.
Getting back to this issue, I have so far made some changes for finding null values in JSON tables and throwing an error message when found. Are there other invalid values that we should look for in tables?
When I run CMOR with the mip-cmor-tables, I get a lot of warning messages for the attributes provenance
, validation
, and branded_variable_name
. We could extract ok_max_mean_abs
, ok_min_mean_abs
, valid_max
, and valid_min
out of validation
, however I'm not sure how to useprovenance
and branded_variable_name
. Should CMOR just ignore them?
from cmor.
Thanks @mauzey1 we need to get onto finalizing this format early this year, so @wolfiex did you want to update us on 2024 status so we can plot?
from cmor.
Some of these might be best as their own issues, but the small tweaks we've discussed are;
- remove
product
field from output - currently defaults tomodel-output
with the value at the top of a table overwriting this - avoid
further_info_url
automatic creation as this isn't going to be used beyond CMIP6 for now - (possibly) remove
short_description
from output -- it relies on mip_era from tables (now removed) and this field will do something strange when the CORDEX tables are picked up.
from cmor.
Related Issues (20)
- Exposing latest netcdf 4.9.x library functionality: quantize, zstandard HOT 13
- Remove unused attributes when processing CMIP6Plus datasets HOT 14
- Exposing latest netcdf 4.9.x library functionality: quantize, zstandard HOT 48
- unclear warning... HOT 5
- bounds required on singleton lon and lat? HOT 5
- avoid attributes of bounds of auxilliary coordinates (`vertices_latitudes` / `vertices_longitude`) HOT 5
- Calibrating CMOR3 & 4 forward development plans HOT 7
- CMOR 3.8.0 Release HOT 4
- Update README.md to remove v3.7 reference
- default `realm = "REALM"` is always written although not required by CV HOT 2
- order in `required_global_attributes` matters HOT 1
- input time type as INT HOT 3
- CircleCI current image deprecated HOT 1
- Numpy 2.0 compatibility issue HOT 2
- File output size and chunking HOT 10
- Message Logging generates duplicated output lines
- DESTDIR support HOT 2
- Failure invoking set_deflate HOT 1
- user defined grid mapping
- Add CITATION.cff file HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cmor.