Giter VIP home page Giter VIP logo

pop2-cesm's Introduction

Parallel Ocean Program (POP2) in CESM

The ocean component of the CESM2 is the Parallel Ocean Program version 2 (POP2) based on the POP v2.1 of the Los Alamos National Laboratory. The version used in CESM includes many physical and software developments incorporated by the members of the Ocean Model Working Group as detailed in: http://www.cesm.ucar.edu/models/cesm2/ocean/

pop2-cesm's People

Contributors

alperaltuntas avatar apcraig avatar bandre-ucar avatar billsacks avatar feiliuesmf avatar fischer-ncar avatar jedwards4b avatar jtruesdal avatar klindsay28 avatar marcost2 avatar mathewvrothstein avatar mnlevy1981 avatar njn01 avatar qingli411 avatar quantheory avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pop2-cesm's Issues

Add GIAFECO_JRA_HR compset

Description of the issue:

A group of us are putting together a run that is based on the GIAF_JRA_HR compset, but turns on the ecosys tracer module. Rather than treat this as a one-off experiment, I plan to bring the updated compset / namelist defaults to master... ideally in time for the CESM 2.2 release.

Clean up default setting for OCN_CHL_TYPE

Description of the issue:

For production runs, we want OCN_CHL_TYPE=prognostic if POP is running with the ecosystem tracer. For testing purposes, it would be useful to have a compset modifier to switch back to diagnostic in https://github.com/ESCOMP/POP2-CESM/blob/master/cime_config/config_component.xml#L37..L54

So something like

  <entry id="OCN_CHL_TYPE">
    <type>char</type>
    <valid_values>diagnostic,prognostic</valid_values>
    <default_value>diagnostic</default_value>
    <values>
      <value compset="_POP2%[^_]*ECO">prognostic</value>
      <value compset="_POP2%[^_]*DIAG_CHL">diagnostic</value>
    </values>
    <group>run_component_pop</group>
    <file>env_run.xml</file>
    <desc>
      Determines provenance of surface Chl for radiative penetration,
      computations. This option is used in the POP ecosystem model.
      The default is diagnostic.
    </desc>
  </entry>

and then update G compsets we use in the test suite to be POP2%ECO%DIAG_CHL if we want to keep the tests bit-for-bit.

Version:

  • CESM: latest
  • POP2: latest (master as of cesm_pop_2_1_20200709)

inputnamelists for files that appear with 'unkown_' or 'same_as' should not appear in the pop

Description of the issue:

the pop.input_data_list should not have entries for filenames that start with 'unknown_' or 'same_as'
since these are not on disk and the latest version of cime will not recognize them. Special logic should not be in cime to ignore these (as was the case before). These fixes need to be made in pop.

Version:

  • CESM: [version]
  • POP2: [version]

Machine/Environment Description:

Any xml/namelist changes or SourceMods:

Generate OMIP-consistent initial data files

Description of the issue:

Moving issue from marbl-ecosys/MARBL#218; note that @matt-long was also assigned to the original ticket but isn't currently available as an assignee (also editing this to point to correct original issue).

Version:

  • CESM: 2.1 (and 2.2)
  • POP2: 2.1 release branch (and master)

Machine/Environment Description:

Machine independent

Any xml/namelist changes or SourceMods:

Problem exists out of the box

Clean up logic in config_pes.xml

Description of the issue:

We typically define pe layouts in cime_config/config_pes.xml to match the regex POP2[^%] for non-ecosystem compsets and POP2%ECO for compsets with the ecosystem enabled. If we introduce a compset to use POP2%ABIO-DIC or POP2%ABIO-DIC%ECO, neither of the two compset attributes will match.

Version:

  • CESM: latest
  • POP2: latest

Machine/Environment Description:

cheyenne and hobart are currently the only machines specified in config_pes.xml

Any xml/namelist changes or SourceMods:

no

print_MARBL_log sometimes hangs on error instead of cleanly exiting

Description of the issue:

@kristenkrumhardt pointed this out -- under some situations, she was seeing

ERROR reported from MARBL library

in cesm.log but then MPI_Abort() was never triggered. It looks like something funny is happening in exit_POP()?

Version:

  • CESM: 2.2.0
  • POP2: cesm_pop_2_1_20200730

Machine/Environment Description:

cheyenne

Any xml/namelist changes or SourceMods:

I was able to recreate this error with the following steps:

  1. create a G1850ECOIAF compset with the T62_g17 resolution

  2. copy /glade/work/kristenk/cesm_work/cesm2.2_4P4Z_tuningcases/g.e22a06.G1850ECOIAF.T62_g17.4p4z.001/user_nl_marbl3.old into the case directory as user_nl_marbl (to enable 4 autotrophs / 4 zooplankton)

  3. add

    init_ecosys_init_file='/glade/u/home/kristenk/adding_zooplankton/IC_files/ecosys_jan_IC_4p4z_200906.nc'
    

    to user_nl_pop.

See the case in /glade/work/mlevy/codes/CESM/cesm2.2.0/cases/g.e22.G1850ECOIAF.T62_g17.hang_not_ab ort (rundir: /glade/scratch/mlevy/g.e22.G1850ECOIAF.T62_g17.hang_not_abort/run)

dt_count is being set to the correct value for the wrong reasons

Description of the issue:

While putting together #27, I was surprised that

<dt_count ocn_grid="gx1v7"  ocn_transient="ssp585ext">48</dt_count>

Took precendence over

<dt_count ocn_ncpl="24" time_mix="robert"  ocn_grid="gx1v7">24</dt_count>

In a run where OCN_NCPL=24 and time_mix_opt = 'robert'. Adding a debug print statement to build-namelist, I realized that time_mix isn't robert, it's 'robert' (with the single quotes); changing to

<dt_count ocn_ncpl="24" time_mix="'robert'"  ocn_grid="gx1v7">24</dt_count>

Produces the expected (though undesired) behavior that 3 matches > 2 matches so dt_count=24. We were getting the correct dt_count in out-of-the-box simulations because

<dt_count ocn_grid="gx1v7"  > 24</dt_count>

I was confused about how this hasn't been noticed with

<dt_count ocn_ncpl="12" time_mix="avg_mix" ocn_grid="gx1v7">23</dt_count>

but apparently we don't test with this option because avg_mix is not a valid value for time_mix_opt (perhaps we should have a testcase for gx1v7 where we set time_mix_opt = "avgfit"?)

Anyway, I think the proper fixes are

  1. Remove the quotes around 'robert' (or 'avgfit') in build-namelist (rather than adding single quotes to namelist defaults, which I just did as a quick / dirty check)
  2. change avg_mix -> avgfit in dt_count definition attributes

And possibly add a leapfrog test to make sure dt_count=23?

Version:

  • CESM: 2.1.x (and probably 2.2, though I haven't verified)
  • POP2: cesm2_1_x_rel (and probably master, though I haven't verified)

Machine/Environment Description:

I was working on cheyenne when this cropped up

Any xml/namelist changes or SourceMods:

Nope

cesm2.2.0 fails test ERS.T62_g16.G1850ECO.cheyenne_pgi.pop-cice_ecosys

Description of the issue:

The test fails on cheyenne with a threading error at line 1069 of passive_tracers.F90

Version:

  • CESM: cesm2.2.0
  • POP2: cesm_pop_2_1_20200730

Machine/Environment Description:

Currently Loaded Modules

  1. ncarenv/1.3 2) cmake/3.14.4 3) pgi/19.3 4) openmpi/3.1.4 5) netcdf-mpi/4.7.3 6) pnetcdf/1.12.1 7) ncarcompilers/0.5.0

Any xml/namelist changes or SourceMods:

As defined by the test.

GIAF_JRA_HR compset should specify CICE%CICE4

Description of the issue:

As of cice5_20200430, you can specify -phys cice4 in CICE_CONFIG_OPTS by using CICE%CICE4 in the compset long name. All the experiments I know of using GIAF_JRA_HR have used xmlchange to set this value, and my tests with CICE5 died in the first timestep of CICE, so I think it makes sense to make this the compset default.

restore_year_* not in build_namelist infrastructure

Description of the issue:

restore_year_first, restore_year_last, and restore_year_align are in the ecosys_forcing_nml namelist, but they are not in the namelist definitions file so users are stuck with the default values.

Version:

Machine/Environment Description:

Machine independent

Any xml/namelist changes or SourceMods:

Problem exists out of the box

Bad logic statement in strdata_interface_mod.F90

Description of the issue:

@matt-long tried to add a new forcing field to MARBL and have POP read it in through shared stream, but logic in strdata_interface_mod::POP_strdata_type_match() caused POP to ignore the fact that the field was in a different file and instead was appending the new field to the list of variables read from the stream providing NDEP forcing (which is bad, because the new field doesn't exist there).

Basically,

    POP_strdata_type_match = &
      strdata_input_var1%file_name     == strdata_input_var2%file_name  .and. &
      strdata_input_var1%year_first    == strdata_input_var2%year_first .and. &
      strdata_input_var1%year_last     == strdata_input_var2%year_last  .and. &
      strdata_input_var1%year_align    == strdata_input_var2%year_align .and. &
      strdata_input_var1%depth_flag .eqv. strdata_input_var2%depth_flag .and. &
      strdata_input_var1%tintalgo      == strdata_input_var2%tintalgo   .and. &
      strdata_input_var1%taxMode       == strdata_input_var2%taxMode

is returning True even when strdata_input_var1%file_name and strdata_input_var2%file_name differ. It looks like the solution is to wrap each individual logical statement in parentheses:

    POP_strdata_type_match = &
      (strdata_input_var1%file_name     == strdata_input_var2%file_name)  .and. &
      (strdata_input_var1%year_first    == strdata_input_var2%year_first) .and. &
      (strdata_input_var1%year_last     == strdata_input_var2%year_last)  .and. &
      (strdata_input_var1%year_align    == strdata_input_var2%year_align) .and. &
      (strdata_input_var1%depth_flag .eqv. strdata_input_var2%depth_flag) .and. &
      (strdata_input_var1%tintalgo      == strdata_input_var2%tintalgo)   .and. &
      (strdata_input_var1%taxMode       == strdata_input_var2%taxMode)

behaves as expected.

Version:

  • CESM: this is probably in every version of POP containing strdata_interface_mod.F90; I can track down exactly when this was introduced, but I think it was in the CESM 1.5 or CESM 2.0 development cycle.
  • POP2: We should definitely update main and the POP branch for CESM 2.1; it probably makes sense to patch CESM 2.2 as well, since that's the latest release.

Machine/Environment Description:

This was discovered on cheyenne in CESM 2.2, building with intel

Any xml/namelist changes or SourceMods:

Long output filename being truncated

Description of the issue:

Ran into an issue during testing where long output filenames for pop.dd, pop.do, pop.dt, and pop.dv are being
truncated.
ERS_Vmct.f09_g17.1850_CAM60_CLM50%BGC-CROP_CICE5_POP2%ECO_MOSART_CISM2%GRIS-NOEVOLVE_WW3_BGC%BDRD.cheyenne_intel.allactive-defaultio.GC_D_c2_3_a09c_intel.pop.dd.0001-01-0
ERS_Vmct.f09_g17.1850_CAM60_CLM50%BGC-CROP_CICE5_POP2%ECO_MOSART_CISM2%GRIS-NOEVOLVE_WW3_BGC%BDRD.cheyenne_intel.allactive-defaultio.GC_D_c2_3_a09c_intel.pop.do.0001-01-0
ERS_Vmct.f09_g17.1850_CAM60_CLM50%BGC-CROP_CICE5_POP2%ECO_MOSART_CISM2%GRIS-NOEVOLVE_WW3_BGC%BDRD.cheyenne_intel.allactive-defaultio.GC_D_c2_3_a09c_intel.pop.dt.0001-01-0
ERS_Vmct.f09_g17.1850_CAM60_CLM50%BGC-CROP_CICE5_POP2%ECO_MOSART_CISM2%GRIS-NOEVOLVE_WW3_BGC%BDRD.cheyenne_intel.allactive-defaultio.GC_D_c2_3_a09c_intel.pop.dv.0001-01-0

There should be 0001-01-0000 at the end of the files.

These files are in

/glade/scratch/fischer/t/cesm2_3_alpha09c_alpha_intel/ERS_Vmct.f09_g17.1850_CAM60_CLM50%BGC-CROP_CICE5_POP2%ECO_MOSART_CISM2%GRIS-NOEVOLVE_WW3_BGC%BDRD.cheyenne_intel.allactive-defaultio.GC_D_c2_3_a09c_intel/run

So combine the filename and the directory, you get a string of 383 characters.

The command used to create the test is

./create_test ERS_Vmct.f09_g17.1850_CAM60_CLM50%BGC-CROP_CICE5_POP2%ECO_MOSART_CISM2%GRIS-NOEVOLVE_WW3_BGC%BDRD.cheyenne_intel.allactive-defaultio --test-root /glade/scratch/fischer/t/cesm2_3_alpha09c_alpha_intel/ --output-root /glade/scratch/fischer/t/cesm2_3_alpha09c_alpha_intel/ --test-id GC_D_c2_3_a09c_intel

Which is very similar to a prealpha test.

Version:

  • CESM: cesm2_3_alpha09c
  • POP2: cesm_pop_2_1_20220322

Machine/Environment Description:

cheyenne_intel

Any xml/namelist changes or SourceMods:

allactive-defaultio test mods.

Need a user-mod directory for the SMYLE simulations for ndep...

I need a user-mod directory for the CESM2.1.x series for POP for the SMYLE simulations. I think the only thing I need for POP is to point to an ndep file that includes the historical period as well as the future scenario so you can run from 1850-2025.

The changes needed are documented here:

https://docs.google.com/document/d/1rV3fWCIt-D1fAbzQ6uaQeZyLagzzKnopofr88eXegzY

The CESM issue for this is here...

ESCOMP/CESM#178

Who can I work with for this?

ocn_import_export using unassigned variable

Description of the issue:

Code in nuopc ocn_import_export.F90 is using my_task and master_task from the communications.F90 before they have been assigned.

Version:

  • CESM: 2.3.x
  • POP2: cesm_pop_2_1_20210930

Machine/Environment Description:

Any

Any xml/namelist changes or SourceMods:

none

using ladjust_bury_coeff in MARBL requires specific properties from PE layout

Description of the issue:

A user was trying to run with ladjust_bury_coeff in user_nl_marbl (which is not a very common configuration); he was also trying to get 100+ SYPD out of the gx3v7 grid (which is not a very common requirement), so he was running with 288 ocean tasks. gen_pop_decomp was giving a layout that creating 290 blocks, and reported the model crashing in ecosys_driver.F90:513 at

    508     allocate(rmean_vals(size(marbl_instances(1)%glo_avg_rmean_interior_tendency)))
    509     lscalar = .false.
    510     call ecosys_running_mean_saved_state_get_var_vals('interior_tendency', lscalar, rmean_vals(:))
    511     do n = 1, size(rmean_vals)
    512        do iblock = 1, size(marbl_instances)
    513           marbl_instances(iblock)%glo_avg_rmean_interior_tendency(n)%rmean = rmean_vals(n)
    514        end do
    515     end do
    516     deallocate(rmean_vals)

it turns out the issue is that marbl_instances is size max_blocks_clinic (2, in his configuration) and we only want these loops running through nblocks_clinic (1 on most tasks), so ladjust_bury_coeff currently can't be true if any block has nblocks_clinic < max_blocks_clinic. Fixing that moved the error to ecosys_driver:640:

    637     if ((size(glo_avg_fields_interior, dim=4) /= 0) .or. (size(glo_avg_fields_surface, dim=4) /= 0)) then
    638        allocate(glo_avg_area_masked(nx_block, ny_block, nblocks_clinic))
    639        where (land_mask(:,:,:))
    640           glo_avg_area_masked(:,:,:) = TAREA(:,:,:)
    641        else where
    642           glo_avg_area_masked(:,:,:) = c0
    643        end where

(I think the third dimension of land_mask and TAREA are both max_blocks_clinic while the allocate() statement for glo_avg_area_masked in line 638 shows it uses nblocks_clinic instead.)

As you can tell, I've started working on a fix for this... I think I changed the above block to explicitly use 1:nblocks_clinic for the third dimension of land_mask in 639 and TAREA in 640, but got yet another error elsewhere.

The original user who reported the problem was happy to be given a 252 task layout that keeps max_blocks_clinic=1, so fixing this is not urgent. I'm putting all this detail in the issue ticket because I'm going to set it aside for a few weeks while I focus on more pressing issues, but it would probably be good to eventually come back and fix the bug.

I also think it would be useful to update the test suite to try to explicitly test cases where ladjust_bury_coeff = .true. and either some tasks have more blocks than others, or some tasks have no blocks. I expect both of those tests would fail currently.

Version:

  • CESM: 2_3_beta09; I believe the first user was running CESM 2.1.x
  • POP2: cesm_pop_2_1_20220322

Machine/Environment Description:

error was reported on cheyenne and that's also where I reproduced the issue in the latest codebase

Any xml/namelist changes or SourceMods:

POP namelist variables that are controlled by XML

Description of the issue:

There are several variables in POP that the user should not be able to change via user_nl_pop, and many of them are ecosystem-related.

Low-hanging fruit:

 abio_dic_dic14_on
 cfc_on
 ciso_on
 ecosys_on
 iage_on
 irf_on
 sf6_on

are all set according to what's in OCN_TRACER_MODULES

@klindsay28 expanded the list with

Here are some (probably more to come later):
chl_option depends upon OCN_CHL_TYPE
lactive_ice depends upon OCN_ICE_FORCING
atm_co2_opt depends upon OCN_CO2_TYPE
lvariable_PtoC will depend upon OCN_TRACER_MODULES_OPT

This is not just a MARBL driver issue, so I'm not applying any labels

Version:

  • CESM: 2.2
  • POP2: cesm_pop_2_1_20190410
  • Migrating from marbl-ecosys/MARBL#130, originally noted in January 2017 and assigned to @mvertens because at the time she was working on a python-based build-namelist script. I'm not assigning her this ticket because I don't believe that work has continued.

Machine/Environment Description:

Machine independent

Any xml/namelist changes or SourceMods:

Problem exists out of the box

namelist variable chl_file_fmt not broadcast

The namelist variable chl_file_fmt is not being broadcast after the sw_absorption_nml namelist is read by master task.

If a user specifies a non-default valid value, e.g. 'nc', then the code hangs during file open, because master task calls open_read_netcdf and non-master tasks call open_read_binary.

This affects all versions of POP.

The patch is straightforward:

--- /gpfs/u/home/cmip6/cesm_tags/cesm2_omip_n02/components/pop/source/sw_absorption.F90	2019-06-15 15:12:39.421423340 -0600
+++ ./sw_absorption.F90	2020-03-11 09:46:49.052803398 -0600
@@ -327,6 +327,7 @@
    call broadcast_scalar(jerlov_water_type,     master_task)
    call broadcast_scalar(chl_option,            master_task)
    call broadcast_scalar(chl_filename,          master_task)
+   call broadcast_scalar(chl_file_fmt,          master_task)
 
    if (sw_absorption_type .ne. 'top-layer'.and.  &
           sw_absorption_type .ne. 'jerlov'.and.  &

Bad XML entry in config_components.xml

Description of the issue:

The following block of code in cime_config/config_components.xml doesn't behave as expected:

  <entry id="OCN_COUPLING">
    <type>char</type>
    <valid_values>full,partial</valid_values>
    <default_value>full</default_value>
    <values>
      <value compset="_DATM%CPLHIST.*_POP2">full</value>
      <value compset="_DATM.*_POP2">partial</value>
    </values>
    <group>build_pop</group>
    <file>env_build.xml</file>
    <desc>Determine surface freshwater and heat forcing settings.
      The full option yields settings that are appropriate for coupling to an
      active atmospheric model (e.g., a B-type compset). The partial option yields
      settings that are appropriate for coupling to a data atmospheric model
      (e.g., a C or G-type compset). The create_newcase command selects the
      appropriate setting for this variable based on the specified compset.
      Users should NOT EDIT this setting.</desc>
</entry>

The default behavior of CIME is to take the last value option that matches the maximum number of attributes, so a compset with DATM%CPLHIST.*_POP2 in the long name will end up using OCN_COUPLING=partial rather than full. To fix this, either start the values block with <values match="first"> OR swap the order of the <value> items (doing both will continue to provide the wrong value for CPLHIST compsets!). I think swapping the <value> items is the better solution as we don't use the match attribute anywhere else in cime_config/.

Version:

  • CESM: 2.X (2.0, 2.1, and the 2.2 development branches)
  • POP2: cesm2_0_x_rel, cesm2_1_x_rel, and master

Machine/Environment Description:

Discovered on cheyenne during a code review with @klindsay28 but problem should be evident on any machine

Any xml/namelist changes or SourceMods:

No

Bad XML entry in namelist_defaults_pop.xml

Description of the issue:

The following block of code in bld/namelist_files/namelist_defaults_pop.xml doesn't behave as expected:

<sfwf_weak_restore ocn_grid="gx1v7"  >0.0115</sfwf_weak_restore>
<sfwf_weak_restore ocn_grid="gx1v7"  datm_mode="CORE_IAF_JRA">0.046</sfwf_weak_restore>
<sfwf_weak_restore ocn_grid="gx1v7"  ocn_onedim="TRUE">0.0</sfwf_weak_restore>

build-namelist looks at all entries where every attribute matches, and then takes the value corresponding the entry with the most matches. Unlike in #15, if multiple entries match the same number of attributes, the namelist generation tool defaults to the FIRST item in the list. So in this case, if you run with the gx1v7 grid, CORE_IAF_JRA forcing, and POP in 1D mode, sfwf_weak_restore = 0.046. We want sfwf_weak_restore = 0.0 whenever running in 1D mode, regardless of forcing... so the last two lines in the block above should be swapped.

Version:

  • CESM: 2.1 and the 2.2 development branches
  • POP2: cesm2_1_x_rel and master
    Note that JRA forcing is not available in CESM 2.0, so that version of the code is not affected.

Machine/Environment Description:

Discovered on cheyenne during a code review with @klindsay28 but problem should be evident on any machine

Any xml/namelist changes or SourceMods:

None

POP is not consistent in how it applies scale factor to forcing fields

Description of the issue:

POP has three different ways it applies a scale factor to ecosys forcing data read from a file, depending on how the field_source argument to forcing_fields_add():

  1. For shr_stream, we pass a unit_conv_factor argument
  2. For POP monthly calendar, the scale factor is pulled out of the forcing_calendar_name argument (forcing_calendar_name%input%scale_factor)
  3. For file_time_invariant, the scale factor must be applied separate from the file read (in forcing_init_post_processing)

For the time invariant files, we should definitely use the unit_conv_factor argument and apply the scale factor when reading the data. Note that this may not be bit-for-bit because currently the subsurface sediment flux is summed prior to applying the scale factor => this will change the order of operations.

For the POP monthly calendar files, there seem to be two options:

  1. Pass unit_conv_factor=file_details%input%scale_factor in the forcing_fields_add() calls
  2. Add a comment in the forcing_fields_metadata_type definition that unit_conv_factor is not used for monthly calendar files

This is not a high priority and should wait until after @klindsay28 has merged some of his science changes back to the marbl_dev branch of POP

Version:

  • CESM: 2.2
  • POP2: cesm_pop_2_1_20190410
  • Migrating from marbl-ecosys/MARBL#123, originally noted in January 2017

Machine/Environment Description:

Machine independent

Any xml/namelist changes or SourceMods:

Problem exists out of the box

Update testlist to use izumi instead of hobart

Description of the issue:

Some CESM components define tests on izumi for testing on the CGD system, while POP is still defining tests on hobart. From today's CSEG meeting, it would be good if all components could update the testlists to define tests on izumi (it sounds like current CESM version doesn't run on hobart at the moment, and there is no interest in updating the machine configuration)

moving VVEL or WVEL to daily stream leads to model crash when monthly tavg files written

Description of the issue:

Reported on DiscussCESM Forum by user ZhangZJ.

Moving VVEL or WVEL to daily stream leads to model crash in tavg_write_vars_nstd_ccsm. The problem is hard-coded indices in 2nd dimension of io_dims_nstd_ccsm.

Version:

  • CESM: 2.2, 2.1, 2.0, 1.2, 1.1, 1.0

Machine/Environment Description:
cheyenne

Any xml/namelist changes or SourceMods:
Copy gx1v7_tavg_contents into SourceMods/src.pop and change stream of VVEL or WVEL from 1 to 2.

Fix is to replace hardcoded indices in 2nd dimension of io_dims_nstd_ccsm with tavg_MOC, tavg_N_HEAT, or tavg_N_SALT, depending on which variable is being written. User confirms that this fixes problem on their platform.

Better loop order in ecosys_driver

Description of the issue:

When copying data in and out of the MARBL instance prior to calling surface_flux_compute(), the loops are not ordered in an efficient manner for memory management. The existing code is

    !-----------------------------------------------------------------------
    ! Copy data from slab data structure to column input for marbl
    !-----------------------------------------------------------------------

    do index_marbl = 1, marbl_col_cnt(iblock)
       i = marbl_col_to_pop_i(index_marbl,iblock)
       j = marbl_col_to_pop_j(index_marbl,iblock)

       do n = 1,size(surface_flux_forcings)
          marbl_instances(iblock)%surface_flux_forcings(n)%field_0d(index_marbl) = &
               surface_flux_forcings(n)%field_0d(i,j,iblock)
       end do

       do n = 1,ecosys_tracer_cnt
          marbl_instances(iblock)%tracers_at_surface(index_marbl,n) = &
               p5*(tracers_at_surface_old(i,j,n) + tracers_at_surface_cur(i,j,n))
       end do

       do n=1,size(surface_flux_saved_state)
         marbl_instances(iblock)%surface_flux_saved_state%state(n)%field_2d(index_marbl) = &
           surface_flux_saved_state(n)%field_2d(i,j,iblock)
       end do

    end do

but we would be better served having the n loops outside of the (i,j) loop. This came up because @klindsay28 and I were looking at the the tracers_at_surface loop

       do n = 1,ecosys_tracer_cnt
          marbl_instances(iblock)%tracers_at_surface(index_marbl,n) = &
               p5*(tracers_at_surface_old(i,j,n) + tracers_at_surface_cur(i,j,n))
       end do

and talking about how we don't actually need the surface values of every tracer... so if MARBL could tell POP which tracer indices it actually cares about, we could have an if (nth tracer not required for surface flux computation) cycle line (and it would be much better to have that if statement outside the (i,j) loop)

Note that this is not applicable to interior_tendency_compute() because MARBL needs to make that call column-by-column (and also there is a transpose happening when we copy data into the MARBL structure), but it's something to keep in mind when we introduce marbl_instance%reset()

Version:

  • CESM: latest (2_2_beta02)
  • POP2: latest ( f943f01)

Machine/Environment Description:

N/A

Any xml/namelist changes or SourceMods:

N/A

Current CESM test failures

(given the impending move from POP -> MOM6, I don't expect to fix these; opening an issue ticket in case I get asked about testing in the future)

Description of the issue:

Some tests are failing on cheyenne with gfortran and DEBUG=TRUE (but not all tests in that configuration). With cesm2_3_beta12 the only test that fails is

SMS_Ld2_P80_D.T62_g37.C1850ECO.cheyenne_gnu.pop-ecosys_81blocks_100x116_spacecurve

I updated from MARBL from marbl0.40.3 to marbl0.41.0 (which required small POP changes as well) and two tests failed:

ERS_Ld5_D.T62_g37.C1850ECO.cheyenne_gnu.pop-ecosys_box_atm_co2
SMS_Ld2_P80_D.T62_g37.C1850ECO.cheyenne_gnu.pop-ecosys_81blocks_100x116_spacecurve

Moving to marbl0.42.0 (also making minor changes to POP) had a slightly different pair of failed tests

SMS_Ld2_D.T62_g37.C1850ECO.cheyenne_gnu.pop-ciso_daily_r4_tavg
SMS_Ld2_P80_D.T62_g37.C1850ECO.cheyenne_gnu.pop-ecosys_81blocks_100x116_spacecurve

And moving to the version of MARBL in marbl-ecosys/MARBL#423 was the same

SMS_Ld2_D.T62_g37.C1850ECO.cheyenne_gnu.pop-ciso_daily_r4_tavg
SMS_Ld2_P80_D.T62_g37.C1850ECO.cheyenne_gnu.pop-ecosys_81blocks_100x116_spacecurve

The traceback for each failed test is the same, pointing at something in the tidal mixing module:

51:
51:Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
51:
51:Backtrace for this error:
51:#0  0x2ad3855b7bff in ???
51:#1  0x67c993 in __tidal_mixing_MOD_init_tidal_mixing1
51:     at $EXEROOT/ocn/source/tidal_mixing.F90:919
51:#2  0x8e1191 in __initial_MOD_pop_init_phase1
51:     at $EXEROOT/ocn/source/initial.F90:386
51:#3  0x586e09 in initializerealize
51:     at $EXEROOT/ocn/source/ocn_comp_nuopc.F90:389
51:#4  0x2ad38001705b in _ZN5ESMCI6FTable12callVFuncPtrEPKcPNS_2VMEPi
51:     at /glade/p/cesmdata/cseg/PROGS/build/63684/esmf-8.5.0b19/src/Superstructure/Component/src/ESMCI_FTable.C:2167
51:#5  0x2ad380014198 in ESMCI_FTableCallEntryPointVMHop
51:     at /glade/p/cesmdata/cseg/PROGS/build/63684/esmf-8.5.0b19/src/Superstructure/Component/src/ESMCI_FTable.C:824
51:#6  0x2ad3803e7250 in _ZN5ESMCI3VMK5enterEPNS_7VMKPlanEPvS3_
51:     at /glade/p/cesmdata/cseg/PROGS/build/63684/esmf-8.5.0b19/src/Infrastructure/VM/src/ESMCI_VMKernel.C:2320
51:#7  0x2ad38040150c in _ZN5ESMCI2VM5enterEPNS_6VMPlanEPvS3_
51:     at /glade/p/cesmdata/cseg/PROGS/build/63684/esmf-8.5.0b19/src/Infrastructure/VM/src/ESMCI_VM.C:1216

Running the same test on izumi, however, tells a different story

Runtime Error: *** Arithmetic exception: Floating divide by zero Runtime Error: - aborting
$SRCROOT/components/cmeps/cime_config/../cesm/flux_atmocn/shr_flux_mod.F90, line 331: Error occurred in SHR_FLUX_MOD:FLUX_ATMOCN
$SRCROOT/components/cmeps/cime_config/../mediator/med_phases_aofluxes_mod.F90, line 1047: Called by MED_PHASES_AOFLUXES_MOD:MED_AOFLUXES_UPDATE
$SRCROOT/components/cmeps/cime_config/../mediator/med_phases_aofluxes_mod.F90, line 315: Called by MED_PHASES_AOFLUXES_MOD:MED_PHASES_AOFLUXES_RUN
$SRCROOT/components/cmeps/cime_config/../cesm/driver/esmApp.F90, line 141: Called by ESMAPP
[i039.cgd.ucar.edu:mpi_rank_39][error_sighandler] Caught error: Aborted (signal 6)

Version:

  • CESM: cesm2_3_beta12
  • POP2: cesm_pop_2_1_20230209

Machine/Environment Description:

cheyenne (gfortran) and izumi (nag)

Any xml/namelist changes or SourceMods:

no

Allow more MARBL-generated scripts to be copied from SourceMods

Description of the issue:

The process for calling define_tavg() for the MARBL-generated diagnostics script is complicated, and involves generating several different files in Buildconf/popconf/ for different purposes. ecosys_diagnostics contains a list of variables to include in POP's tavg_contents file, and a modified copy can be placed in SourceMods/. marbl_diagnostics_operators is read by the Fortran code and used to call define_tavg(), but it can not be copied from SourceMods/.

The crux of the issue is that one-off experiments where a new diagnostic is added to MARBL can lead to an inconsistency between those two files and, if the new variable is in ecosys_diagnostics is not in marbl_diagnostics_operators then POP will abort during init with a message about an undefined diagnostic being requested in tavg_contents. (There's also an intermediate file marbl_diagnostics_list that is used in conjunction with ecosys_diagnostics to generate marbl_diagnostics_operators, but I need to track down how that file is generated and if it can be placed in SourceMods or not).

I've opened a similar issue in the MARBL repo to update the documentation for this process, but some POP script changes will be necessary to allow users to have more control over the diagnostic lists.

Version:

  • CESM: this issue has existed since MARBL was added to POP
  • POP2: latest, and probably the CESM 2.1 release branch as well?

Machine/Environment Description:

Any xml/namelist changes or SourceMods:

bug in iron flux computation

There is a bug in the iron flux computation in ecosys_forcing.F90. The intended implementation was for black carbon (BC) to have a constant iron fraction and constant bioavailable fraction (of 100%).

The flux computation is here.

We have

forcing_field%field_0d(:,:,iblock) = atm_fe_bioavail_frac(:,:) * &
     (iron_frac_in_atm_fine_dust * atm_fine_dust_flux(:,:,iblock) + &
      iron_frac_in_atm_coarse_dust * atm_coarse_dust_flux(:,:,iblock) + &
      iron_frac_in_atm_bc * atm_black_carbon_flux(:,:,iblock))

! add component from seaice

seaice_fe_bioavail_frac(:,:) = atm_fe_bioavail_frac(:,:)

forcing_field%field_0d(:,:,iblock) = forcing_field%field_0d(:,:,iblock) + seaice_fe_bioavail_frac(:,:) * &
     (iron_frac_in_seaice_dust * seaice_dust_flux(:,:,iblock) + &
      iron_frac_in_seaice_bc * seaice_black_carbon_flux(:,:,iblock))

We should have

atm_bc_fe_bioavail_frac = 1.0_r8
seaice_bc_fe_bioavail_frac = 1.0_r8
...

forcing_field%field_0d(:,:,iblock) = atm_fe_bioavail_frac(:,:) * &
     (iron_frac_in_atm_fine_dust * atm_fine_dust_flux(:,:,iblock) + &
      iron_frac_in_atm_coarse_dust * atm_coarse_dust_flux(:,:,iblock)) + &
      atm_bc_fe_bioavail_frac  * iron_frac_in_atm_bc * atm_black_carbon_flux(:,:,iblock)

! add component from seaice

seaice_fe_bioavail_frac(:,:) = atm_fe_bioavail_frac(:,:)

forcing_field%field_0d(:,:,iblock) = forcing_field%field_0d(:,:,iblock) + seaice_fe_bioavail_frac(:,:) * &
     (iron_frac_in_seaice_dust * seaice_dust_flux(:,:,iblock)) + &
      seaice_bc_fe_bioavail_frac * iron_frac_in_seaice_bc * seaice_black_carbon_flux(:,:,iblock)

The introduction of atm_bc_fe_bioavail_frac and seaice_bc_fe_bioavail_frac parameters is for clarity; this makes it explicit that we are assuming Fe in BC is 100% bioavailable.

I don't think this should be fixed on the CMIP branch as will dramatically change the forcing. I don't see any reason, however, that it should not be fixed moving forward with new runs, including ocean-ice runs.

cc @mnlevy1981, @klindsay28, @kristenkrumhardt

Memory leak in NUOPC cap

Description of the issue:

There is a memory leak in nuopc cap that's crashing the run after about 4 years of non-stop C case run. One can simply observe the memory usage reported in cesm log file to confirm the issue (and don't have to run it for 4 years). I haven't been able to pinpoint the source yet, but I am reporting it here in case @mvertens or others would also like to tackle.

Version:

  • CESM: cesm2_3_alpha03a
  • POP2: cesm2_3_alpha03a

Machine/Environment Description: cheyenne_intel

Any xml/namelist changes or SourceMods: none

Add 120 task (and maybe 160 task) decomposition for gx3v7

Description of the issue:

The UCI supercomputer has 40 cores per node, and I set up an 80 task decomposition to spread POP across 2 nodes when running gx3v7; they have tried running with 120 and 160 nodes, and run into some issues (see bottom of this section). The auto-decomp tool doesn't provide a very good distribution for those task counts at this resolution, so I'd like to add

<decomp nproc="120" res="gx3v7" >
  <maxblocks >1</maxblocks>
  <bsize_x   >10</bsize_x>
  <bsize_y   >10</bsize_y>
  <nx_blocks >10</nx_blocks>
  <ny_blocks >12</ny_blocks>
  <decomptype>cartesian</decomptype>
</decomp>

<decomp nproc="160" res="gx3v7" >
  <maxblocks >1</maxblocks>
  <bsize_x   >10</bsize_x>
  <bsize_y   >8</bsize_y>
  <nx_blocks >10</nx_blocks>
  <ny_blocks >16</ny_blocks>
  <decomptype>cartesian</decomptype>
</decomp>

to bld/generate_pop_decomp.xml at some point.

The "issues" I've alluded to only seem to be present on their machine, which is using intel 2018.0.3 - there is a crash in running_means_mod.F90 when ladjust_bury_coeff = .true., but I can't reproduce it on any other machine I have access to. I'm hopeful that it is a compiler bug and updating to a more recent compiler will make it go away, but if it persists I'll open a new issue ticket (I have some ideas on how to investigate if I need to).

Version:

  • CESM: 2.2.0, but working on moving to the latest 2.3 beta tag (waiting on ESMF library on their machine)
  • POP2: this just needs to go in the latest, no need to add to 2.2 release tags

Machine/Environment Description:

GreenPlanet (UCI super computer)

Any xml/namelist changes or SourceMods:

n/a

scale factor for CISO tracers

Description of the issue:

@klindsay28 emailed me the following:

I suspect that the values of ciso_tracer_init_ext(:)%scale_factor that are not 1.0 are not correct, now that we've changed the ciso IC file.

For instance, I think the 1.025 value for DI13C, reading from DIC, was replicating the value for DIC from an old ecosys IC file. Now that the ecosys IC file has the 1.025 value applied in the file, and not in the code, it isn't appropriate to apply 1.025 for DI13C. If possible, I think we should change the scale_factor to be the same as the scale factor for DIC. This might be tricky to pull off in build-namelist. If it isn't straightforward, perhaps we should just use 1 for ciso's scale_factor. We might consider removing ciso's scale_factor from namelist defaults and build-namelist.

Additionally, I think the 0.9225 value for DI14C, reading from DIC, which is intended to account for decay of DI14C, takes into account the old 1.025 factor being applied to DIC. I think we want 0.9 times the factor being used for DIC. For simplicity, we might want to just use 0.9.

Further discussion with Alex Jahn confirmed that decay for DI13C should be 1.0 but offered other suggestions instead of using 0.9 for DI14C to represent decay of 14C. At a MARBL meeting, we decided on the following solution:

add GLODAP PI DI14C to BEC IC file and add fallback to use DIC with default scalefactor (=1)

For abiotic tracers, we want scale factors of 1.0 for ABIO_DIC and 0.9 for ABIO_DIC14

Version:

Machine/Environment Description:

Machine independent

Any xml/namelist changes or SourceMods:

Problem exists out of the box

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.