ec-earth / ece2cmor3 Goto Github PK
View Code? Open in Web Editor NEWPost-processing and cmorization of ec-earth output
License: Apache License 2.0
Post-processing and cmorization of ec-earth output
License: Apache License 2.0
Various land/landice variables need to be set to the correct missing value outside their domain of applicability. Build a generic masking functionality to automatically perform this.
When processing NEMO output I get the error message
! Error: Output file ( ./cmor/CMIP6/institute_idEC-Earth-3-HR/CMIP/historical/r1i1p1f1/Omon/tos/gr/v20170508/tos_Omon_EC-Earth-3-HR_historical_r1i1p1f1_gr_199001-199012.nc ) already exists,
despite the fact that I the output directory cmor didn't even exist before starting processing.
The directory structure should follow the template on p15 in https://docs.google.com/document/d/1h0r8RZr_f3-8egBMMh7aqLwy3snpD6_MrDz1q8n5XUk/edit
Directory structure = <mip_era>/
<activity_id>/
<institution_id>/
<source_id>/
<experiment_id>/
<member_id>/
<table_id>/
<variable_id>/
<grid_label>/
<version>
where mip_era=CMIP6
and activity_id=HighResMIP
in the case of PRIMAVERA.
Most metadata referring to a parent experiment are set to no parent
in the cmorised files from PRIMAVERA, apart from
:parent_activity_id = "CMIP" ; :parent_experiment_id = "piControl" ;
These 2 entries should also be set to no parent
according to https://docs.google.com/document/d/1h0r8RZr_f3-8egBMMh7aqLwy3snpD6_MrDz1q8n5XUk/edit
alternatively all parent_*
entries could be omitted when no parent exists (as in HiResMIP).
environment.yml still says
- pcmdi::cmor=3.2.2=np111py27_0
yet ece2cmor3 will not work properly with 3.2.2, needs 3.2.3.
Should be HighResMIP for the PRIMAVERA experiments (not CMIP).
We should attempt to remove this branch and merge all its features to the main branch. These features mainly include the 6h-filtering and the primavera tables.
I have made a run with the Primavera version and Primavera output, then split the atmosphere output in 3-hourly and 6-hourly files with filter6hr.py. After that I tried to run ece2cmor.py with the -a
flag:
./ece2cmor.py --exp pr03 -a --freq 3 tmp-pr03/3hr 199001
I get errors about missing sea-ice and ocean variables:
ERROR:jsonloader:Could not find parameter table entry for masscello in table Omon...skipping variable.
ERROR:jsonloader:Could not find parameter table entry for hfbasin in table Omon...skipping variable.
ERROR:jsonloader:Could not find parameter table entry for hfx in table Omon...skipping variable.
...
Some atmosphere variables are also missing:
ERROR:jsonloader:Could not find cmor target for variable snw in table day
ERROR:jsonloader:Could not find cmor target for variable mrro in table day
ERROR:jsonloader:Could not find cmor target for variable orog in table AERmon
ERROR:jsonloader:Could not find cmor target for variable ps in table AERmon
And then it crashes with
Traceback (most recent call last):
File "./ece2cmor.py", line 89, in <module>
main(sys.argv[1:])
File "./ece2cmor.py", line 72, in main
startdate = dateutil.parser.parse(args.date)
File "/home/sm_wyser/.conda/envs/CMOR/lib/python2.7/site-packages/dateutil/parser.py", line 1168, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/home/sm_wyser/.conda/envs/CMOR/lib/python2.7/site-packages/dateutil/parser.py", line 581, in parse
ret = default.replace(**repl)
ValueError: month must be in 1..12
Once IFS test data is checked in, we need to write post-processing + cmorization tests for failing tasks. Prioritize according to the Primavera output request: write a test for each failing variable, to ensure stability upon upgrading tables or the cmor version.
Hi,
After discussion with @goord, we wanted to make a point about the use of the filter6h in the primavera branch. For the 1st month of the leg/chunk, the "previous file" must be prescribed (filter6h.py -p ICMGG$expid+000000 ICMGG$expid+195001) else the CMORization works but doesn't produce the good timestamp (half a time step forward shift). This must be taken into account when cmorizing the files.
Thanks.
primavera.json
contains information that describes where the output will be stored, and information that goes in the header of the cmorised netCDF files. SOme of this information is hardcoded (e.g. outpath=/tmp/cmor3
) and some infromation refers explicitely to KNMI (e.g. further_info_url=http://furtherinfo.es-doc.org/CMIP6.KNMI.EC-Earth.historical
). This kind of information needs to be handled more flexible so other institutes can process their simulations.
What reference time should be set in the cmorized output? The reference time enters the time (and time_bnds) in the units, e.g. days since 1990-01-01 00:00:00
For CMIP5 the reference time was 1850-01-01T00:00:00, do we already know what it should be for CMIP6? If it is not specified (yet) I suggest to set 1950-1-1 as the reference date for all PRIMAVERA experiments.
The 0h output of the previous month for instantaneous fields should be included in time averages
We need very coarse-resolution (T21 or so) test data in the repository for ifs2cmor integration tests. The files however do need to contain many output variables (preferably the PRIMAVERA output configuration) so we can use them to test the real-world post-processing and cmorization pipeline.
Here is my take for a metadata file for the PRIMAVERA AMIP experiment: https://drive.google.com/open?id=0B-E865OyTp51MC1jeEQ3T0ZLMjQ
It works with cmor-3.2.3
The filename (after cmorisation) should follow the template on p13 of https://docs.google.com/document/d/1h0r8RZr_f3-8egBMMh7aqLwy3snpD6_MrDz1q8n5XUk/edit
file name = <variable_id>_<table_id>_<source_id>_<experiment_id >_<member_id>_<grid_label>[_<time_range>].nc
Currently, the order of the filename components is slightly different, i.e. <source_id> and <experiment_id > have switched position.
Upon startup this message is logged for rlu,rsu,rld,rsd,masscello,mrfso,mrlsl,lai,vortmean.
This error occurs when a variable is requested but cannot be produced by ece2cmor. Figure out whether the variables can be produced (is there a grib code or postprocessing combination of existing data) and extend the ifspar.json, nemopar.json if this is the case. Otherwise, remove them from the varlist.
Probably has to do with the surface pressure being in both GG and SH files.
We lack a robust master script that performs postprocessing+cmorization. It needs to be highly configurable via command-line options so it can be directly called by platform-dependent job submission scripts.
Hi
I started using ece2mor3. I compiled and make cmor ok. When I do python setup.py install --user I get
gcc: error: unrecognized command line option ‘-ifort’
json_util.o -ifort" failed with exit status 1
I am using intel ifort. I changed -Dgfortran in setup.py to ifort, intel, but get the error.
Do I need to specify the compiler elsewhere too?
When I then run ece2cmor.py in ece2mor3 I get
File "/data/jmcgovern/p-process-ECEarth/other/ece2cmor3-master/ece2cmorlib.py", line 1, in
import cmor
ImportError: No module named cmor
I supposethis is due to the compilation error in cmor. Do I need to specify the cmor path in ece2m
Thanks
I try to run examples/primavera/prim_atm_6hr.py
with a very simple variablelist:
{ "Amon": [ "tas", "ts" ] }
Everything seems to go well, no error message, but nothing happens - I don't get any cmorized output. Here is the log:
INFO:ifs2cmor:Executing 0 IFS tasks... INFO:ifs2cmor:Post-processing surface pressures... INFO:ifs2cmor:Post-processing 0 IFS tasks... INFO:ifs2cmor:Post-processed batch of 0 tasks. INFO:ifs2cmor:Cmorizing 0 IFS tasks...
What am I missing?
Just like we have a dependency on the pcmdi cmip6 tables, we want a link to the primavera cmorization tables in our resources.
Following up on #3:
I just tested the cmorization of the output from a run made with the PRIMAVERA branch and get
ERROR:nemo2cmor:Could not find depth axis variable depthu in NEMO output file u; skipping depth axis creation.
ERROR:nemo2cmor:Could not find depth axis variable deptht in NEMO output file t; skipping depth axis creation.
ERROR:nemo2cmor:Could not find depth axis variable depthv in NEMO output file v; skipping depth axis creation.
Do we have to worry about missing depth axis?
where x is tas, uas or vas. This error occurs when running ece2cmor3 on 3-hourly atmosphere, at the point when depth axes are created for table 3hr.
Hi,
Does anyone know these errors I get when I try a test run (./ece2cmor.py).
(I am using resources/varlist.json, resources/metadata-template.json files, CMIP6 table in cmor/TestTables/ CMIP_xxx.json files, input file created with fixmonths.py)
Thanks
Documentation is a prerequisite for using, developing and reviewing the program. Here are some things to check for.
```Returns bla
Description:
blabla
Input variables:
varname, vartype; description
Output:
varname, vartype, description
```
Figure out whether the CMOR library allows chunking of netcdf files into yearly files, and if so, expose this functionality in ece2cmor3. Just like in the original ece2cmor script it would be useful to be able to control the chunk size as a function of output frequency. In this way it will be easy to have equal chunking for both ocean and atmosphere data.
The time bounds are somewhat off.
Here an example of a file with monthly means (just 1 month processed):
time_bnds = 0, 30.875
The month should extend to the end of the first month, i.e. 31 days. Otherwise we will have gaps in the timeline when stitching together two files.
And here an example of time_bnds from daily output:
time_bnds = -0.0625, 0.9375, 0.9375, 1.9375, 1.9375, 2.9375, 2.9375, 3.9375,
The time bounds don't start 00Z and end 24Z of each day but are off by 1.5 hours (=half of a 3-hour timestep)
When running prim_atm_6hr.py
I get:
Traceback (most recent call last): File "./prim_atm_6hr.py", line 62, in main(sys.argv[1:]) File "./prim_atm_6hr.py", line 59, in main ece2cmor.perform_ifs_tasks(outputfreq = 6,tempdir = opt.temp,maxsizegb = 128,taskthreads = 16) File "/home/sm_wyser/ece2cmor3/testing/ece2cmor.py", line 85, in perform_ifs_tasks ifs2cmor.cleanup(ifs_tasks) File "/home/sm_wyser/ece2cmor3/testing/ifs2cmor.py", line 120, in cleanup if(cleanupdir and tempdir_created_ and len(os.listdir(temp_dir_)) == 0): UnboundLocalError: local variable 'temp_dir_' referenced before assignment
Instantaneous time axis tables (e.g. 6hrPlevPt) contain variables for which the time method is not averaging but sampling (time: point). The time axis for these tables should not be given any cell_bounds array since cmor will not accept it.
Following up on #3:
Processing the output from a run with the PRIMAVERA configuration results in dozens of missing variables errors:
...
ERROR:nemo2cmor:The source variable qt could not be found in the input NEMO data
ERROR:nemo2cmor:The source variable scsaltot could not be found in the input NEMO data
ERROR:nemo2cmor:The source variable sctemtot could not be found in the input NEMO data
...
I have used the default varlist and metadata file that comes with ece2cmor.
I noticed that the cdo command fro processing IFS output always contains -shifttime,-${output_freq}hours
, e.g.
Post-processing target tas [...] with cdo command -setgridtype,regular -monmean -shifttime,-6hours -selcode,167
This is not correct for instantaneous fields such as temperature. I explain this with daily averages, but it works equally for monthly means.
The daily mean of an instantaneous field (e.g. temperature) consists of the output at 00, 03, 06, ...18, 21. On the other hand, the daily mean of an accumulated field (e.g. radiation) consists of the output at 03, 06, 09,..., 18, 21, 00+1day. The reason is that the accumulated field is saved at the end of the accumulation period (and with the corresponding timestamp), so the radiation at 03 is the accumulated radiation between 00 and 03. Therefore, the 00 timestep of the day is not needed for the daily average (it would contain hours 21-24 from the day before), but the 00 timestep of the next day is needed.
In the old ece2cmor tool I did this by shifting the timeaxis for accumulated fields before computing daily or monthly means. But only for accumulated, not instantaneous fields. This can be achieved by first adding the 00 timestep of the first day of a month to the monthly IFS output file (the last timestep from the previous month), and then apply -selmon,$month
for instantaneous fields, and -selmon,$month -shifttime,-${output_freq}hours
for accumulated fields.
Note: the new grib_filter file that I presented in #12 (comment) adds the first timestep of a month from the IFS output of the previous month when splitting 3-houlry and 6-hourly output.
Upon startup, many log errors are produced indicating that certain variables cannot be found in certain tables. This can occur when a variable in a new version of the tables is renamed or removed. In the first case, we should rename the corresponding variable in varlist.json, in the second case we should remove it.
Create some Primavera-specific tests for both IFS and Nemo.
This prevents the cmorization and needs to fixed ASAP.
I can process NEMO output with the master branch, but when trying to process IFS output I get
Traceback (most recent call last):
File "./ece2cmor.py", line 68, in <module>
main(sys.argv[1:])
File "./ece2cmor.py", line 63, in main
maxsizegb = args.tmpsize)
File "/home/sm_wyser/ece2cmor3/ece2cmor3/ece2cmorlib.py", line 131, in perform_ifs_tasks
outputfreq = outputfreq,tempdir = tempdir,maxsizegb = maxsizegb)):
File "/home/sm_wyser/ece2cmor3/ece2cmor3/ifs2cmor.py", line 88, in initialize
ifs_grid_descr_ = cdoapi.cdo_command().get_griddes(ifs_gridpoint_file_) if os.path.exists(ifs_gridpoint_file_) else {}
File "/home/sm_wyser/ece2cmor3/ece2cmor3/cdoapi.py", line 59, in __init__
self.app = cdo.Cdo()
File "/home/sm_wyser/.conda/envs/CMOR/lib/python2.7/site-packages/cdo.py", line 87, in __init__
self.operators = self.getOperators()
File "/home/sm_wyser/.conda/envs/CMOR/lib/python2.7/site-packages/cdo.py", line 261, in getOperators
if (parse_version(getCdoVersion(self.CDO)) > parse_version('1.7.0')):
File "/home/sm_wyser/.conda/envs/CMOR/lib/python2.7/site-packages/cdo.py", line 40, in getCdoVersion
return match.group(1)
AttributeError: 'NoneType' object has no attribute 'group'
Interestingly enough, the IFS output processing worked fine with the PRIMAVERA branch but not the NEMO output processing (see #31)
The netCDF files at the end of the cmorisation process include information about the grid. The fields from IFS are saved on a regular Gauss grid. and the netCDF file contains information about grid centers as well as grid corners (vertices).
A few issues with the grid need more attention:
octave:32> squeeze(vertices_longitude(1,1:3,1:3))
ans =
358.24 356.84 355.43
358.24 356.84 355.43
358.24 356.84 355.43octave:33> squeeze(vertices_longitude(2,1:3,1:3))
ans =
358.95 357.54 356.13
358.95 357.54 356.13
358.95 357.54 356.13
The first index loops over the vertices. We would expect the 1st corner of the first gripdoint to be identical to the 2nd corner of the next gridpoint, but apparently there is a gap.
Hi @goord ,
CMORizing our files from the latest EC-Earth version, it fails because it is looking for the surface pressure in the SH files to calculate the levels. It is due to the commit 3842 (https://dev.ec-earth.org/projects/ecearth3/repository/revisions/3842) that you made in the primavera branch saying to output the pressure level in the GG files and not in the SH as before (keeping only lnsp in the SH).
I can update the code to tell ece2cmor to compute the levels from exp(ln(sp)) but I wanted to check if you hadn't already had the same problem. I think we would lose less information than using ps from the GG files.
Thanks,
Cheers,
Remove the large test-data files from the github repo, there are better solutions like github lfs for this.
when running the tool on recent primavera output the following problems appear
Hi,
ece2cmor is failing (and exiting) when it can't find a variable that was asked in the varlist.
the ifspostproc just shows a warning but then, the cmorization itself fails when trying to open the nc file instead of just going to the next variable. This is an issue of ny varlist that didn't comply with my IFS outputs but it makes the whole process fail.
Error in calling operator setgridtype with:
/usr/local/apps/cdo/1.7.2/bin/cdo -O -f nc -P 4 setgridtype,regular -daymean -selmon,2 -selcode,9 /scratch/ms/spesiccf/c3b/a0oe/19500101/fc0/runtime/Output_1/out-195002/3hr/ICMGGa0oe+195002 /scratch/ms/spesiccf/c3b/a0oe/auto-ecearth3/ece2cmor/ece2cmor3/atm-CMIP6-3h-195002/mrrob_Eday.nc<<<
STDOUT:
STDERR:cdo setgridtype: Started child process "daymean -selmon,2 -selcode,9 /scratch/ms/spesiccf/c3b/a0oe/19500101/fc0/runtime/Output_1/out-195002/3hr/ICMGGa0oe+195002 (pipe1.1)".
cdo(2) daymean: Started child process "selmon,2 -selcode,9 /scratch/ms/spesiccf/c3b/a0oe/19500101/fc0/runtime/Output_1/out-195002/3hr/ICMGGa0oe+195002 (pipe2.1)".
cdo(3) selmonth: Started child process "selcode,9 /scratch/ms/spesiccf/c3b/a0oe/19500101/fc0/runtime/Output_1/out-195002/3hr/ICMGGa0oe+195002 (pipe3.1)".
cdo(4) selcode (Warning): Code number 9 not found!
cdo(4) selcode (Abort): No variables selected!
ERROR:cdoapi:(returncode:1) cdo setgridtype: Started child process "daymean -selmon,2 -selcode,9 /scratch/ms/spesiccf/c3b/a0oe/19500101/fc0/runtime/Output_1/out-195002/3hr/ICMGGa0oe+195002 (pipe1.1)".
cdo(2) daymean: Started child process "selmon,2 -selcode,9 /scratch/ms/spesiccf/c3b/a0oe/19500101/fc0/runtime/Output_1/out-195002/3hr/ICMGGa0oe+195002 (pipe2.1)".
cdo(3) selmonth: Started child process "selcode,9 /scratch/ms/spesiccf/c3b/a0oe/19500101/fc0/runtime/Output_1/out-195002/3hr/ICMGGa0oe+195002 (pipe3.1)".
cdo(4) selcode (Warning): Code number 9 not found!
cdo(4) selcode (Abort): No variables selected!
INFO:postproc:Post-processing target tasmin in table Amon from file /scratch/ms/spesiccf/c3b/a0oe/19500101/fc0/runtime/Output_1/out-195002/3hr/ICMGGa0oe+195002 with cdo command -monmean -daymin -setgridtype,regular -selmon,2 -selcode,202
INFO:postproc:Post-processing target tasmax in table Amon from file /scratch/ms/spesiccf/c3b/a0oe/19500101/fc0/runtime/Output_1/out-195002/3hr/ICMGGa0oe+195002 with cdo command -monmean -daymax -setgridtype,regular -selmon,2 -selcode,201
(...)
INFO:ifs2cmor:Creating time axis using variable mrrob...
Error in calling operator showtimestamp with:
/usr/local/apps/cdo/1.7.2/bin/cdo -O showtimestamp /scratch/ms/spesiccf/c3b/a0oe/auto-ecearth3/ece2cmor/ece2cmor3/atm-CMIP6-3h-195002/mrrob_Eday.nc<<<
STDOUT:
STDERR:cdo showtimestamp: Open failed on >/scratch/ms/spesiccf/c3b/a0oe/auto-ecearth3/ece2cmor/ece2cmor3/atm-CMIP6-3h-195002/mrrob_Eday.nc<
No such file or directory
Traceback (most recent call last):
File "./ece2cmor.py", line 90, in
main(sys.argv[1:])
File "./ece2cmor.py", line 85, in main
maxsizegb = args.tmpsize)
File "/lus/snx11062/scratch/ms/spesiccf/c3b/a0oe/auto-ecearth3/ece2cmor/ece2cmor3/ece2cmorlib.py", line 137, in perform_ifs_tasks
ifs2cmor.execute(ifs_tasks,cleanup = cleanup)
File "/lus/snx11062/scratch/ms/spesiccf/c3b/a0oe/auto-ecearth3/ece2cmor/ece2cmor3/ifs2cmor.py", line 141, in execute
cmorize([t for t in processedtasks if getattr(t,"path",None) != None])
File "/lus/snx11062/scratch/ms/spesiccf/c3b/a0oe/auto-ecearth3/ece2cmor/ece2cmor3/ifs2cmor.py", line 345, in cmorize
create_time_axes(tskgroup)
File "/lus/snx11062/scratch/ms/spesiccf/c3b/a0oe/auto-ecearth3/ece2cmor/ece2cmor3/ifs2cmor.py", line 458, in create_time_axes
tid = create_time_axis(freq = task.target.frequency,path = getattr(task,"path"),name = tdim,hasbnds = (timop != ["point"]))
File "/lus/snx11062/scratch/ms/spesiccf/c3b/a0oe/auto-ecearth3/ece2cmor/ece2cmor3/ifs2cmor.py", line 602, in create_time_axis
times = command.showtimestamp(input = path)[0].split()
File "/home/ms/spesiccf/c3b/.local/lib/python2.7/site-packages/cdo.py", line 206, in get
raise CDOException(**retvals)
cdo.CDOException: (returncode:1) cdo showtimestamp: Open failed on >/scratch/ms/spesiccf/c3b/a0oe/auto-ecearth3/ece2cmor/ece2cmor3/atm-CMIP6-3h-195002/mrrob_Eday.nc<
No such file or directory
Create a work queue for the cmorization of tasks (within each table to achieve good performance).
The ocean cmorization needs to be tested again with the most recent version of the CMIP6 tables and the most recent nemo output. Report problems as github issues.
Hi,
Did anyone try to CMORize the variables siitdconc (sea_ice_area_fraction_over_categories) and siitdsnthick (snow_thickness_over_categories) from the SImon CMIP6 table (1m__icemod nemo file)?
They are priority 3 variables but we tried to produce them anyway and as they have the dimensions time,iceband,latitude,longitude (iceband being the category, called ncatice in nemo) and no bounds associated nor a name starting with "depth", cmor fails.
I can work something out but I just wonder if someone already faced a similar issue before starting rewriting all the "depth management".
Thank you,
Where does the global attribute min_number_yrs_per_sim
come from? I cannot find it in the CV.
This will allow us to use regular grid re-interpolated ec-earth output as test data.
If we decide not to have missing values over land in NEMO for performance reasons, we should at least provide ece2cmor3 with a mask so that it can apply it during cmorization. This functionality will need to be implemented in the nemo2cmor module.
To avoid crashes in the middle of the workflow, it would be extremely useful to have a piece of code that inspects the input files and decides whether or not an output variable can be constructed. If not, we may choose to abort the process or proceed while blacklisting the task.
I forked into this project to create a distribution of ece2cmor3 that could be installed through pip. It is available for testing.
To create a source distribution:
python setup.py sdist
To install the resulting file:
pip install dist/ece2cmor3-1.0.0.tar.gz
As Klaus explained during the meeting, these are not simply derived from 164.128, but they need to be produced by the COSP simulators.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.