Giter VIP home page Giter VIP logo

libaccessom2's Introduction

GitHub CI Travis CI Jenkins Tests
GitHub Build Status Build Status Test Status

libaccessom2

libaccessom2 is a library that is linked into all of the ACCESS-OM2 component models, including YATM, CICE and MOM. libaccessom2 provides functionality used by all models as well as providing a interface to inter-model communication and synchronisation tasks. Using a common library reduces code duplication and provides a uniform way for all models to be integrated into ACCESS-OM2.

libaccessom2 functionality includes: * simplified interface to the OASIS3-MCT coupler API * date handling, logging and simple performance timers * configuration synchronisation between models * a single configuration file for common configs (accessom2.nml)

Further information about ACCESS-OM2 can be found in the ACCESS-OM2 wiki

Downloading

This respository contains submodules, so you will need to clone it with the --recursive flag:

git clone --recursive https://github.com/COSIMA/libaccessom2.git

To update a previous clone of this repository to the latest version, you will need to do

git pull

followed by

git submodule update --init --recursive

to update all the submodules.

Configuration

libaccessom2 has a single configuration file called accessom2.nml which is usually found in the top-level of the model configuration directory (also known as the control directory). This configuration contains model-wide configuration. Presently the options most important and worthy of explanation are:

  • forcing_start_date the date (and time) when forcing begins.
  • forcing_end_date the start (and time) at which the forcing ends. The time between the forcing_start_date and forcing_end_date is called the forcing period. The model will be forced by a continuous repetition of this period.
  • restart_period: interval of time between successive model restarts. This is provided as a tuple: years, months, seconds. This breaks the entire experiment into a collection of runs or segments.

These is no configruation option that controls when an experiment ends, it will simply continue until it is stopped.

YATM

This repository also includes YATM (Yet another data-driven atmosphere model). The purpose of YATM is to keep track of the current model time, then read current atmospheric forcing data and deliver it to the rest of the model via the coupler (OASIS3-MCT).

YATM uses two configuration files: atm.nml, and forcing.json. The latter is used to define details of the model coupling fields. The former is only used to configure the river runoff remapping (discussed below).

Date handling

A unique feature of YATM is that it does not read forcing data by iterating over records. That is, the code does not explicitly read and deliver the 1st forcing record followed by the 2nd etc. The reason for this is that when accounting for complications such as different calendar types, fields with different periods, restarts, etc. this approach can quickly become complex and is error prone. Instead YATM iterates over datetime objects. At the current date (and time) YATM finds all matching forcing fields, reads them from disk, delivers them the coupler and then incrementes current date (and time). This simplification has led to much more concise and easy to understand code.

To further simplify things YATM gathers a lot of it's configuration automatically from the forcing dataset metadata. For example the calendar type and timestep information.

River runoff remapping

It is difficult to regrid river runoff in a distributed memory system because moving runoff from a land point to the nearest ocean point may involve an interprocess communication. It makes more sense to regrid the river runoff within YATM since it is a single process application.

YATM regrids runoff in a two step process:

  1. Apply a conservative regridding operation to move the runoff from the source grid to the ACCESS-OM2 ocean/ice grid. The remapping interpolation weights are calculated using ESMF_RegridWeightGen from ESMF.
  2. Find any runoff that the previous step has distributed to land points and move it to the nearest ocean neighbour. This is done using an efficient nearest neighbour data structure called a k-dimensional tree. The kdtree2 Fortran package is used for this.

Ice and ocean stubs

This repository also includes ice and ocean stubs. These are stand-ins for the the ice and ocean models. They demonstrate how libaccessom2 can be used and are also very useful for testing.

Build

How to build libaccessom2, YATM, ice_stub and ocean_stub on gadi (NCI):

git clone https://github.com/COSIMA/libaccessom2.git
cd libaccessom2
./build_on_gadi.sh

Run tests on Gadi (NCI)

First do build as above. Then to get some computer resources:

qsub -I -P x77 -q normal -lncpus=4 -lmem=16Gb -lwalltime=3:00:00 -lstorage=gdata/ua8+gdata/qv56+gdata/hh5+gdata/ik11

/g/data1b/qv56/

The tests: JRA55_IAF JRA55_IAF_SINGLE_FIELD JRA55_RYF JRA55_RYF_MINIMAL JRA55_v1p4_IAF can all be run manually as follows. Replace JRA55_IAF with the test to be run.

export LIBACCESSOM2_ROOT=$(pwd)
module load openmpi
cd tests/
./copy_test_data.sh
cd JRA55_IAF
rm -rf log ; mkdir log ; rm -f accessom2_restart.nml ; cp ../test_data/i2o.nc ./ ; cp ../test_data/o2i.nc ./
export UCX_LOG_LEVEL=error; mpirun -np 1 $LIBACCESSOM2_ROOT/build/bin/yatm.exe : -np 1 $LIBACCESSOM2_ROOT/build/bin/ice_stub.exe : -np 1 $LIBACCESSOM2_ROOT/build/bin/ocean_stub.exe

If Python3 and pytest are installed then all of the above and some additional tests can be run with:

module load openmpi
python -m pytest tests/

Any individual pytest test can be run using pytest as follows:

module load openmpi
python -m pytest test_stubs.py::TestStubs::test_field_scaling

The above is the only way to run the FORCING_SCALING test case because it relies on the Python test code to create one of the inputs.

libaccessom2's People

Contributors

nichannah avatar aidanheerdegen avatar aekiss avatar marshallward avatar penguian avatar russfiedler avatar nicholash avatar

Stargazers

 avatar  avatar Izaak "Zaak" Beekman avatar

Watchers

James Cloos avatar  avatar  avatar  avatar  avatar Ryan Holmes avatar Stewart Allen avatar  avatar

libaccessom2's Issues

Yatm crashes on 20/1/1968

HI @nic good morning, i am getting same error as you fixed for me before like WARNING from PE 183: set_date_c: Year zero is invalid. Resetting year to 1

WARNING from PE 183: set_date_c: Year zero is invalid. Resetting year to 1

forrtl: error (65): floating invalid
Image PC Routine Line Source
yatm_c2868e5b.exe 00000000006F8001 Unknown Unknown Unknown
yatm_c2868e5b.exe 00000000006F613B Unknown Unknown Unknown.

Abhishek [12:20 PM]
Here is the last few lines,it seems like we have problem in 1968 field_update_data: index 0000000153
{ “checksum-matmxx-snow_ai-0001641600”: 0.7015145634E+000 }
forcing_update_field at 1968-01-20T00:00:00.000
field_update_data: file /g/data1/ua8/JRA55-do/v1-3/slp.1968.18Aug2017.nc
field_update_data: index 0000000153
{ “checksum-matmxx-press_ai-0001641600": 0.2070275826E+011 }
forcing_update_field at 1968-01-20T00:00:00.000

Early abort if models are out of sync

It would be good to check model sync at both the start and end of runs.

Currently the models are out of sync check and abort is done at the end of runs, but sometimes (e.g. #48) they are out of sync from the start. In such cases it would save a lot of walltime and SU to check model sync at the start of runs and abort early.

Release incompatible with GitHub Actions CI for MOM5

MOM5 is being updated to GitHub Actions CI (mom-ocean/MOM5#343) but needs to run on ubuntu-18.04 to do incompatibilities between glibc on later ubuntu runners and FMS.

The MOM5 CI uses a pre-built library built using CI, but currently this is being built using ubuntu-latest (ubuntu-20.04) which uses an incompatible version of gfortran. Upgrading the version of gfortran on the MOM5 CI doesn't work as there is no compatible netCDF library available.

Proposed solution is to add the ubuntu-18.04 runner to the CI and make releases for both system versions.

libaccessom2 doesn't deal with netcdf unpacking properly (we think...) in ERA-5 runs

As described on the ERA-5 forcing issue I think libaccessom2 may have an issue dealing with netcdf unpacking across file boundaries. I'll summarize the problem here.

The problem occurs when transitioning between two months (the ERA-5 forcing is stored in monthly files), best demonstrated by plotting daily minimum wind stress at 92W, 0N from a 1deg_era5_iaf run spanning 1980-01-01 to 1980-05-01:

Capture

There is a large burst of negative wind stress in the first day of April in the "raw" run (this causes all sorts of crazy stuff...). The add_offset netcdf packing value in the ERA-5 10m zonal winds file is particularly anomalous for March of this year (listed below per month of the files in /g/data/rt52/era5/single-levels/reanalysis/10u/1980/

                u10:add_offset = -3.54318567240813 ;
                u10:add_offset = 0.856332909292688 ;
                u10:add_offset = -32.1081480318141 ;
                u10:add_offset = -0.761652898754254 ;
                u10:add_offset = -0.10650583633675 ;
                u10:add_offset = -2.55211599669929 ;
...

If I change the netcdf packing values in the single March 1980 10m winds file (using the below python) and rerun, then I remove the burst of wind stress ("Altered packing" run above). This confirms to me that it is a packing issue.

file_in = '/g/data/rt52/era5/single-levels/reanalysis/10u/1980/10u_era5_oper_sfc_19800301-19800331.nc'
file_out = '/g/data/e14/rmh561/access-om2/input/ERA-5/IAF/10u/1980/10u_era5_oper_sfc_19800301-19800331.nc'
DS = xr.open_dataset(file_in)
encoding = {}
scale = 0.000966930321007164 # Apr 1980 value
offset = -0.761652898754254 # Apr 1980 value
encoding['u10'] = {'scale_factor': scale, 'add_offset': offset, 'dtype': 'int16'}
DS.to_netcdf(file_out,encoding=encoding)

Yes, the packing in the ERA-5 files is weird. But in any case, libaccessom2 should be able to deal with the variable packing. Xarray in python can, as shown by this plot of the time series of 10m zonal wind at the same point from the original file:

Capture

I've had a quick look through the code and am none the wiser. As @aekiss said, the netcdf unpacking seems to be handled by the netcdf library, so I don't understand how there can be a problem. Clearly it only affects the times between months when an interpolation has to be done. The rest of the month is fine.

Add timers to YATM

I’m going to put three new timers into YATM: 1) field read from disk 2) wait for ice 3) remap runoff

OK, I think we need a 4th timer: time to build k-d tree. That is happening once for every run

These are needed to help understand the variance in model timestep timing. e.g. @marshallward reported that in the 0.1 deg config:

{ timer-ice_step time(  0.000):   1.442 }
 { timer-ice_step time(  0.000):   1.443 }
 { timer-ice_step time(  0.000):   1.509 }
 { timer-ice_step time(  0.000):   1.441 }
 { timer-ice_step time(  0.000):  48.172 }
 { timer-ice_step time(  0.000):   0.023 }
 { timer-ice_step time(  0.000):  18.663 }
 { timer-ice_step time(  0.000):   1.440 }
 { timer-ice_step time(  0.000):   1.458 }
 { timer-ice_step time(  0.000):   1.440 }
 { timer-ice_step time(  0.000):   1.453 }

Using the latest executables in an older simulation

We (Ryan, me) would like to run some perturbation experiments in ACCESS-OM2-025 using the new perturbations code in libaccessom2 using atmosphere/forcing.json. We would like to branch these simulations off the 650-year 025deg_jra55_ryf spin-up at /g/data/ik11/outputs/access-om2-025/025deg_jra55_ryf9091_gadi.

However, this spin-up was performed with old executables (see https://github.com/rmholmes/025deg_jra55_ryf/blob/ryf9091_gadi/config.yaml) that do not contain the new libaccessom2 perturbations code. Unfortunately it looks like the new executables (with libaccessom2 hash _a227a61.exe) aren't backwards compatible with the config files from the old spin-up. Specifically, we get the error:
assertion failed: accessom2_sync_config incompatible config between atm and ice: num_atm_to_ice_fields
which seems to be linked to the to ice/input_ice.nml that now requires exchanged fields to be specified (e.g. through the fields_from_atm input - and the number of fields no longer matches).

@nichannah @aekiss do you have any suggestions on the best approach to pursue in order to get this working? We would really like to avoid doing another spin-up given the cost and time involved.

One approach might be to create new executables based on those used for the spin-up that only include the new libaccessom2 code involving the perturbations. Another might be to update the config files as much as possible (still using JRA55 v1.3), but still use the old restarts, and hope/evaluate that nothing material to the solution has changed?
Any suggestions would be really helpful.

Support input4MIPs file naming convention.

We need to support the input4MIPS file naming convention as the JRA55-do dataset is being distributed through this channel.

A full description is here:

https://docs.google.com/document/d/1h0r8RZr_f3-8egBMMh7aqLwy3snpD6_MrDz1q8n5XUk/edit

But a representative example is the river runoff:

ocean/day/friver/gn/v20180412/friver/friver_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-3_gn_19580101-19590101.nc

So the forcing.json file will need (at a minimum) to specify some arithmetic, so {{year+1}}, and hard code the month and day, or also support {{month}} and {{day}}

accessom2_restart.nml not being written to restart dir

The latest libaccessom2 (d7c3c28) writes accessom2_restart.nml to outputnnn but not to restartnnn, causing an abort with Error in accessom2_deinit: atm and ice models are out of sync. at the end of the subsequent run.
(Also, accessom2_restart.nml is written to outputnnn as a regular file, rather than a symlink to restartnnn-1/accessom2_restart.nml).

4198e15 worked fine, so the problem was introduced somewhere in these changes : 4198e15...d7c3c28

I suspect this change is the culprit:

inquire(file=trim(self%config_dir)//'/RESTART', exist=dir_exists)

which had been
inquire(directory=trim(self%config_dir)//'/RESTART', exist=dir_exists)

I'll revert this line and see if it fixes this problem

Mismatch in calendar when using a temporal perturbation on the "experiment" calendar.

I am experiencing a slow-down in my ACCESS-OM2-1 simulations when applying a time-dependent, spatially constant perturbation.

However, the slow-down appears only to arise when I start from a spun-up control simulation. When I start from a cold start, my simulation runs fine (finishing 1 year in ~14 minutes wall time). Whereas when I start from a spun-up control simulation (starting at model year 2100) my simulation runs fine until it experiences a dramatic slow-down at around 2100-11-10 (finishing 1 year in ~1 hr wall time).

The cold-start run can be found at /home/561/mp2135/access-om2/1deg_jra55_ryf_wcwc_test1 and the run experiencing the slow-down can be found at /home/561/mp2135/access-om2/1deg_jra55_ryf_wcwc_test2. The only difference between the two is that test2 is branched from a spin-up.

There has been some previous discussion in this thread on Slack https://arccss.slack.com/archives/C08KM5KS6/p1625456199042400.

support additive forcing perturbations

Branching off from #25 as a separate issue.

Implement this via a new offset_filename key, so that we can have both additive and multiplicative perturbations to any forcing field, which would be a useful extension to our current scaling-only perturbation capability discussed in the wiki.

An example forcing.json entry:

    {
      "filename": "/g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0/land/day/friver/gr/v20190429/friver/friver_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-4-0_gr_{{year}}0101-{{year}}1231.nc",
      "fieldname": "friver",
      "offset_filename": "river_offset_{{year}}0101-{{year}}1231.nc",
      "cname": "runof_ai"
    },

Build test failing due to missing data file

Build test is failing due to missing test_data file:

https://accessdev.nci.org.au/jenkins/job/ACCESS-OM2/job/libaccessom2/lastBuild/console

Error - from NetCDF library
ncvar opening ../test_data/scaling.RYF.rsds.1990_1991.nc
No such file or directory

Seems this data is copied over in this script:

https://github.com/COSIMA/libaccessom2/blob/master/tests/copy_test_data_from_raijin.sh

But that file doesn't exist in the directory data is copied from

$ find  /short/public/access-om2/input_rc -name scaling.RYF.rsds.1990_1991.nc
$

@nichannah Looks like the last time this test worked was the end of January

https://accessdev.nci.org.au/jenkins/job/ACCESS-OM2/job/libaccessom2/12/

Is there a copy of this file somewhere else?

In any case these paths will need to be changed for gadi as /short is disappearing.

Check for comment in perturbation definition earlier

There was confusion from someone defining a perturbation. They got this error:

assertion failed: forcing_parse_field: wrong number of fields in perturbation definition, should be 5.

even though the perturbation definition seemed fine:

    {
      "filename": "INPUT/RYF.rlds.1990_1991.nc",
      "fieldname": "rlds",
      "cname": "lwfld_ai",
      "perturbations": [
        {
          "type": "offset",
          "dimension": "temporal",
          "value": "INPUT/RYF.rlds.1990_1991_wcwc10.nc",
          "calendar": "experiment"
        }
      ]
    },

The issue was a missing comment field, which is now mandatory, but the check for the comment is after this section

call self%core%get(perturbation_jv_ptr, "comment", &
comment, found)
call assert(found, 'forcing_parse_field: perturbation missing "comment" field')

so never gets flagged, and the user does not know why their perturbation definition is incorrect.

I believe moving the assert block which checks for a comment to the beginning of the loop (say line 194) makes sense, and the rest of the logic can remain as-is.

simple_timers have a bug: optional arguments in init() are the wrong way around

OK, I think there is a confluence of bugs … simple_timers_enabled() is returning .false. but it is being used to control include_first_call. And the timers are working overall because that’s the default
that would explain everything you are seeing

marshall [3:50 PM]
lmao....
also explains the low numbers youre getting too i suppose
https://arccss.slack.com/archives/C9Q7Y1400/p1544157553168800
nic
{ timer-ice_step min: 2.021 }
{ timer-ice_step max: 48.916 }
{ timer-ice_step mean: 2.212 }
{ timer-ice_step variance: 7.627 }
Posted in #om2devToday at 3:39 PM
i mean high number.. i guess thats the anomalous one

nic [3:51 PM]
yeah
OK, great.
thanks
for finding all those bugs!
I’m gonna fix things up
I think I’ll just swap the last two arguments of the init()

marshall [3:52 PM]
yes, i can confirm that explicitly setting enabled with the value as the others removed my timer!

Forcing with daily coupled model output

Rishav is trying to force ACCESS-OM2 with daily output from the ACCESS-CM2 coupled model. At first he tried monthly but it gave this unhelpful error

assertion failed: accessom2_sync_config: total runtime in seconds not integer multiple of atm_ice_timestep
1

which suggests an error in the time steps, but this is a red herring.

He switched to daily forcing and it worked, but crashes at the end of the year with this error

 MCT::m_ExchangeMaps::ExGSMapGSMap_:: MCTERROR, Grid Size mismatch 
 LocalMap Gsize =        27648  RemoteMap Gsize =       204800
MCT::m_ExchangeMaps::ExGSMapGSMap_: Map Grid Size mismatch error, stat =3
 MCT::m_ExchangeMaps::ExGSMapGSMap_:: MCTERROR, Grid Size mismatch 
 LocalMap Gsize =       204800  RemoteMap Gsize =        27648
MCT::m_ExchangeMaps::ExGSMapGSMap_: Map Grid Size mismatch error, stat =3

Image              PC                Routine            Line        Source             
yatm_4198e150.exe  00000000007F57CA  Unknown               Unknown  Unknown
yatm_4198e150.exe  0000000000708E4C  Unknown               Unknown  Unknown
yatm_4198e150.exe  00000000007076E4  Unknown               Unknown  Unknown
yatm_4198e150.exe  0000000000709E39  Unknown               Unknown  Unknown
yatm_4198e150.exe  0000000000673538  Unknown               Unknown  Unknown
yatm_4198e150.exe  00000000005E958A  mod_oasis_coupler        1055  mod_oasis_coupler.F90
yatm_4198e150.exe  000000000049F454  mod_oasis_method_         741  mod_oasis_method.F90
yatm_4198e150.exe  00000000004461A1  coupler_mod_mp_co         149  coupler.F90
yatm_4198e150.exe  0000000000412A6B  MAIN__.V                  133  atm.F90
yatm_4198e150.exe  0000000000410D62  Unknown               Unknown  Unknown
libc-2.28.so       00001524A67696A3  __libc_start_main     Unknown  Unknown
yatm_4198e150.exe  0000000000410C6E  Unknown               Unknown  Unknown

which is eerily similar to an error encountered in the tests

#36 (comment)

His control directory is here /scratch/e14/rg1653/access-om2/control/1deg_50years

Any ideas @nichannah ?

build_on_raijin.sh fails: can't find NetCDF

Here's what happens when I try to compile:

$ module list
Currently Loaded Modulefiles:
  1) pbs            2) dot            3) ncview/2.1.2   4) git/2.9.5
$ cd /short/v45/aek156/sources/new/libaccessom2/
$ ./build_on_raijin.sh
mpifort executable found:
Will assume system MPI implementation is sound. Remove mpifort from PATH to automatically configure MPI
-- Failed to find NetCDF interface for F90
CMake Error at /apps/CMake/3.6.2/share/cmake-3.6/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
  Could NOT find NetCDF (missing: NETCDF_LIBRARIES NETCDF_INCLUDE_DIRS
  NETCDF_HAS_INTERFACES)
Call Stack (most recent call first):
  /apps/CMake/3.6.2/share/cmake-3.6/Modules/FindPackageHandleStandardArgs.cmake:388 (_FPHSA_FAILURE_MESSAGE)
  cmake/FindNetCDF.cmake:119 (find_package_handle_standard_args)
  CMakeLists.txt:38 (find_package)


-- Configuring incomplete, errors occurred!
See also "/short/v45/aek156/sources/new/libaccessom2/build/CMakeFiles/CMakeOutput.log".
/short/v45/aek156/sources/new/libaccessom2

YATM crashing when trying to determine forcing input

@aekiss and @marshallward have both reported an error like this:

forrtl: severe (124): Invalid command supplied to EXECUTE_COMMAND_LINE
Image PC Routine Line Source
yatm_1044c48e.exe 00000000006326D1 Unknown Unknown Unknown
yatm_1044c48e.exe 00000000004E07F6 util_mod_mp_first 53 util.F90
yatm_1044c48e.exe 000000000040E54B forcing_mod_mp_fo 131 forcing.F90
yatm_1044c48e.exe 000000000040D864 MAIN__ 98 atm.F90
yatm_1044c48e.exe 000000000040C31E Unknown Unknown Unknown
libc-2.12.so 00002AF70A83AD1D __libc_start_main Unknown Unknown
yatm_1044c48e.exe 000000000040C229 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)

The fix will removing the offending function altogether and instead use a more deterministic source for the forcing which as been setup by @aidanheerdegen. The atmosphere/forcing.json file will then include the names of the forcing files in a format like:

/g/data1/ua8/JRA55-do/latest/rain.{{year}}.nc

Previously this was:

/g/data1/ua8/JRA55-do/v1-3/rain.{{year}}.*.nc

back-compatibility with old forcing.json format

The current libaccessom2 master has breaking changes to the forcing.json format, in particular a451a7f and 467e3e2 which were made to support ERA5.

Consequently the standard ACCESS-OM2 configurations are incompatible with the latest libaccessom2 - see COSIMA/access-om2#262.

It would be much nicer to support both the old and new formats. This backwards compatibility would probably require a version number in forcing.json (which would signify the original format if absent).

use submodules for included libraries (oasis, json-fortran, datetime-fortran)?

I'm wondering why we don't incorporate these into the libaccessom2 repo as submodules?
https://github.com/COSIMA/oasis3-mct
https://github.com/jacobwilliams/json-fortran
https://github.com/nicjhan/datetime-fortran

At present these are git cloned by cmake (see ExternalProject_Add entries in CMakeLists.txt) which seems unnecessarily obscure, as it means we don't see the full codebase until cmake is run.

I'd prefer these to be submodules, so a git clone --recursive (either of libaccessom2 or access-om2) will deliver all the source code, with no cmake dependency.

The present arrangement makes it too easy to overlook these libraries when searching the codebase (e.g. #24), especially if using an offline clone of the code (e.g. on my laptop, where I wouldn't normally run cmake).

Building outside a git repo

libaccessom2 has to build inside a git repository to create a git hash for versioning

libaccessom2/CMakeLists.txt

Lines 120 to 125 in d750b4b

execute_process(
COMMAND git rev-parse HEAD
WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}
OUTPUT_VARIABLE GIT_COMMIT_HASH
OUTPUT_STRIP_TRAILING_WHITESPACE
)

This is the same for MOM5 mom-ocean/MOM5#308, which this PR is changing mom-ocean/MOM5#384

For the same reasons as MOM5 it would be good to have the option of build libaccessom2 without needing to be in a git repo.

Add 'days' as a unit for the model restart_period

Presently the control namelist looks like this:

&date_manager_nml
    forcing_start_date = '1958-01-01T00:00:00'
    forcing_end_date = '2018-01-01T00:00:00'

    ! Runtime for a single segment/job/submit, format is years, months, seconds,
    ! two of which must be zero.
    restart_period = 0, 0, 0
&end

We want it to be:

&date_manager_nml
    forcing_start_date = '1958-01-01T00:00:00'
    forcing_end_date = '2018-01-01T00:00:00'

    ! Runtime for a single segment/job/submit, format is years, months, days, seconds,
    ! three of which must be zero.
    restart_period = 0, 0, 0, 0
&end

support time-invariant forcing perturbations

It would be useful to have support time-invariant additive or multiplicative perturbations, so we don't necessarily need to specify the perturbation at every forcing timestep. This would be particularly handy for the IAF runs.

This could be implemented by applying the perturbation at every time if the perturbation file is missing a time axis.

Unclean MPI termination leading to crash

The JRA55_IAF test case in libaccessom2 is crashing on termination on the new machine (Gadi) with new openmpi libraries.

The error message (pasted below) makes it look like not all MPI resources are being cleaned up properly. Given that this is a very self-contained test case hopefully it's possible to find the problem from code review.

579482456.666229] [gadi-cpu-clx-0455:61857:0] mpool.c:38 UCX WARN object 0x11bf980 was not returned to mpool ucp_requests
[1579482456.666232] [gadi-cpu-clx-0455:61857:0] mpool.c:38 UCX WARN object 0x11bfb40 was not returned to mpool ucp_requests
[1579482456.680676] [gadi-cpu-clx-0455:61859:0] rcache.c:360 UCX WARN knem rcache device: destroying inuse region 0x2c7ac10 [0x2d4bc40..0x2e1eb40] g- rw ref 1 cookie 716100775085951232 addr 0x2d4bc40
[gadi-cpu-clx-0455:61859:0:61859] rcache.c:200 Assertion `region->refcount == 0' failed
==== backtrace (tid: 61859) ====
0 0x0000000000051959 ucs_fatal_error_message() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/ucx/1.6.1/source/ucx-1.6.1/build/src/ucs/../../../src/ucs/debug/assert.c:36
1 0x0000000000051a36 ucs_fatal_error_format() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/ucx/1.6.1/source/ucx-1.6.1/build/src/ucs/../../../src/ucs/debug/assert.c:52
2 0x00000000000562f0 ucs_mem_region_destroy_internal() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/ucx/1.6.1/source/ucx-1.6.1/build/src/ucs/../../../src/ucs/memory/rcache.c:200
3 0x000000000005c6c6 ucs_class_call_cleanup_chain() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/ucx/1.6.1/source/ucx-1.6.1/build/src/ucs/../../../src/ucs/type/class.c:52
4 0x0000000000056f38 ucs_rcache_destroy() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/ucx/1.6.1/source/ucx-1.6.1/build/src/ucs/../../../src/ucs/memory/rcache.c:729
5 0x00000000000030f2 uct_knem_md_close() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/ucx/1.6.1/source/ucx-1.6.1/build/src/uct/sm/knem/../../../../../src/uct/sm/knem/knem_md.c:91
6 0x000000000000f1c9 ucp_free_resources() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/ucx/1.6.1/source/ucx-1.6.1/build/src/ucp/../../../src/ucp/core/ucp_context.c:710
7 0x000000000000f1c9 ucp_cleanup() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/ucx/1.6.1/source/ucx-1.6.1/build/src/ucp/../../../src/ucp/core/ucp_context.c:1266
8 0x0000000000005bcc mca_pml_ucx_close() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/openmpi/4.0.2/source/openmpi-4.0.2/gcc-opt/ompi/mca/pml/ucx/../../../../../ompi/mca/pml/ucx/pml_ucx.c:247
9 0x0000000000007909 mca_pml_ucx_component_close() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/openmpi/4.0.2/source/openmpi-4.0.2/gcc-opt/ompi/mca/pml/ucx/../../../../../ompi/mca/pml/ucx/pml_ucx_component.c:82
10 0x00000000000582b9 mca_base_component_close() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/openmpi/4.0.2/source/openmpi-4.0.2/gcc-opt/opal/mca/base/../../../../opal/mca/base/mca_base_components_close.c:53
11 0x0000000000058345 mca_base_components_close() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/openmpi/4.0.2/source/openmpi-4.0.2/gcc-opt/opal/mca/base/../../../../opal/mca/base/mca_base_components_close.c:85
12 0x0000000000058345 mca_base_components_close() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/openmpi/4.0.2/source/openmpi-4.0.2/gcc-opt/opal/mca/base/../../../../opal/mca/base/mca_base_components_close.c:86
13 0x00000000000621da mca_base_framework_close() /home/900/z30_apps/builds/UYaaG8i/0/nci/gadi-apps/openmpi/4.0.2/source/openmpi-4.0.2/gcc-opt/opal/mca/base/../../../../opal/mca/base/mca_base_framework.c:216
14 0x000000000004f479 ompi_mpi_finalize() /home/900/z30_apps/builds/UYaaG8i/0/nci/gadi-apps/openmpi/4.0.2/source/openmpi-4.0.2/gcc-opt/ompi/../../ompi/runtime/ompi_mpi_finalize.c:363
15 0x000000000004ac29 ompi_finalize_f() /home/900/z30_apps/builds/UYaaG8i/0/nci/gadi-apps/openmpi/4.0.2/source/openmpi-4.0.2/intel-opt/ompi/mpi/fortran/mpif-h/profile/pfinalize_f.c:71
16 0x000000000041a5b8 accessom2_mod_mp_accessom2_deinit
() ???:0
17 0x000000000040e768 MAIN
.a() ocean.F90:0
18 0x000000000040c9e2 main() ???:0
19 0x0000000000023813 __libc_start_main() ???:0
20 0x000000000040c8ee _start() ???:0

Improve YATM performance

There are a couple of ways that YATM performance could be improved:

  1. The runoff remapping checks conservation after applying weights to move from one grid to another and then again after moving all runoff that has landed to the ocean. Although it helps to do this twice if something goes wrong in one step but not the other the first check is redundant and it would make sense to remove this.

  2. Russ Fiedler [12:53 PM]
    @nic I think the remapping is very slow because you are checking all the ocean cells in the world. This makes the tree much larger than it needs to be too. Given that the runoff is coming from a 1.25 degree grid (?) you can mask the ocean points to be near the coast via if (any(land_sea_mask(i-halo:i+halo,j-halo:j+halo) < 0.5 and. land_sea_mask(i,j) > 0.5) then... where halo= 6 or so and is adjusted for boundaries.

  3. Russ Fiedler [1:59 PM]
    @nic How about hoisting the runoff remapping out of the loop starting at 113 and do it before entering? If you calculate the runoff ahead of time you don't need to synchronise with the fields that are passed earlier.

  4. read all the forcing fields before doing any OASIS puts. Keeping in mind that the puts are asynchronous so it should not be slowing things down as long as the ice is not waiting.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.