marbl-ecosys / cesm2-marbl Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 2.0 276.99 MB

Documentation of MARBL in CESM2

Shell 0.02% Jupyter Notebook 98.75% Python 1.23%

cesm2-marbl's People

Contributors

Stargazers

Watchers

Forkers

mnlevy1981 mgrover1

cesm2-marbl's Issues

Global flux table (and N imbalance) should include nitrogen emissions

I'll fix this in #42

time series plots

Globally integrated NPP, POC flux, air-sea CO2 flux time series -- full historical and RCPs; use forcing_iron_flux notebook as sample. Maybe single RCP per variable? Alternative would be ensemble mean / spread with all RCPs in same plot.

What should go into the global fluxes table for publication?

Two related questions:

What rows do we want in the table that eventually gets published? (i.e. what variables should we include, and in what order)
What columns do we want in the table that eventually gets published? (i.e. what model(s) / configurations should we include?)
- Should I add any of the CESM 1 columns?
- Should I remove any of the CESM 2 SSP columns?

Additionally, I want to make sure that we are using the same time periods for the historical and preindustrial averages as @klindsay28 uses in his paper -- I think we've settled on 1990 - 2014 for the historical, with CESM1 using 1990 - 2004 as well as 2005 - 2014 from RCP8.5. @klindsay28 is going to do a little analysis to determine the best preindustrial slice(s), and then I'll update this notebook.

intake-esm can't concatenate files

notebooks/forcing_iron_flux.ipynb hangs up on this command:

dq = cesm2.search(experiment=['historical'], variable='IRON_FLUX').to_xarray(chunks={'time': 48})

I get the following error. It's possible that one ensemble member has a different file, I haven't diagnosed. First need to diagnose the problem and might need a mechanism to get around this with intake_esm.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-67f3a48652c5> in <module>
----> 1 dq = cesm2.search(experiment=['historical'], variable='IRON_FLUX').to_xarray(chunks={'time': 48})
      2 _, ds2 = dq.popitem()
      3 ds2 = ds2.drop([v for v in ds2.variables if v not in keep_vars])
      4 ds2

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_esm/source.py in to_xarray(self, **kwargs)
    157         _kwargs.update(kwargs)
    158         self.kwargs = _kwargs
--> 159         return self.to_dask()
    160 
    161     def _get_schema(self):

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_xarray/base.py in to_dask(self)
     67     def to_dask(self):
     68         """Return xarray object where variables are dask arrays"""
---> 69         return self.read_chunked()
     70 
     71     def close(self):

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_xarray/base.py in read_chunked(self)
     42     def read_chunked(self):
     43         """Return xarray object (which will have chunks)"""
---> 44         self._load_metadata()
     45         return self._ds
     46 

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake/source/base.py in _load_metadata(self)
    115         """load metadata only if needed"""
    116         if self._schema is None:
--> 117             self._schema = self._get_schema()
    118             self.datashape = self._schema.datashape
    119             self.dtype = self._schema.dtype

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_esm/source.py in _get_schema(self)
    166 
    167         if self._ds is None:
--> 168             self._open_dataset()
    169             metadata = {}
    170 

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_esm/cesm.py in _open_dataset(self)
    151             member_column_name='member_id',
    152             variable_column_name='variable',
--> 153             file_fullpath_column_name='file_fullpath',
    154         )

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_esm/source.py in _open_dataset_groups(self, dataset_fields, member_column_name, variable_column_name, file_fullpath_column_name, file_basename_column_name)
    134                         dsets,
    135                         time_coord_name_default=kwargs['time_coord_name'],
--> 136                         override_coords=kwargs['override_coords'],
    137                     )
    138                     var_dsets.append(var_dset_i)

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_esm/aggregate.py in concat_time_levels(dsets, time_coord_name_default, restore_non_dim_coords, override_coords)
    181     objs_to_concat = [first] + rest
    182 
--> 183     ds = xr.concat(objs_to_concat, dim=time_coord_name, coords='minimal')
    184 
    185     new_history = f"\n{datetime.now()} xarray.concat(<ALL_TIMESTEPS>, dim='{time_coord_name}', coords='minimal')"

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join)
    131             "objects, got %s" % type(first_obj)
    132         )
--> 133     return f(objs, dim, data_vars, coords, compat, positions, fill_value, join)
    134 
    135 

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join)
    321                 raise ValueError(
    322                     "variables %r are present in some datasets but not others. "
--> 323                     % absent_merge_vars
    324                 )
    325 

ValueError: variables {'DXU', 'latent_heat_fusion_mks', 'nsurface_u', 'HU', 'rho_sw', 'UAREA', 'radius', 'TAREA', 'HUS', 'vonkar', 'latent_heat_fusion', 'salinity_factor', 'dzw', 'DXT', 'hflux_factor', 'moc_components', 'salt_to_ppt', 'T0_Kelvin', 'HUW', 'sound', 'nsurface_t', 'HTN', 'ANGLE', 'DYU', 'rho_fw', 'salt_to_mmday', 'KMT', 'omega', 'TLAT', 'sea_ice_salinity', 'ULONG', 'KMU', 'DYT', 'momentum_factor', 'stefan_boltzmann', 'HT', 'dz', 'latent_heat_vapor', 'ULAT', 'cp_sw', 'rho_air', 'ocn_ref_salinity', 'transport_regions', 'ppt_to_salt', 'fwflux_factor', 'heat_to_PW', 'cp_air', 'transport_components', 'mass_to_Sv', 'REGION_MASK', 'grav', 'HTE', 'TLONG', 'ANGLET', 'salt_to_Svppt', 'days_in_norm_year', 'sflux_factor'} are present in some datasets but not others.

cc @andersy005, @mnlevy1981

PO4 contourf map is wrong

I added code to plot surface nutrients for the paper, but for some reason, the PO4 map from the model is messed up (panel D).

This code (no projection)

cf = plt.contourf(ds_surf_plot.TLONG, ds_surf_plot.TLAT, ds_surf_plot['PO4'], **contour_spec['PO4'])
plt.colorbar(cf)

yield this (the right answer):

However, adding the projection seems to screw things up:

ax = plt.axes(projection=prj)
cf = ax.contourf(ds_surf_plot.TLONG, ds_surf_plot.TLAT, ds_surf_plot['PO4'], 
                  **contour_spec['PO4'],
                 transform=ccrs.PlateCarree())
plt.colorbar(cf)

The other nutrients seem to plot fine; something about the PO4 field is screwy...but the obs and bias plots seem ok. I am at a loss.

JJA Chl data not mapping correctly?

From zulip, @matt-long pointed out

I would have expected greater coverage at high northern latitudes in JJA. At some point, would be good to confirm that we're not missing something there.

Plots of N cycle terms

Two panel plot: (a) N fixation (b) column integral of denitrification. Include global integrals in title or caption.

abiotic radiocarbon sections

I would like to include zonal section plots of ABIO_DIC14, comparable to nutrient sections. I think the best observational dataset is GLODAPv1.

These data are available here,
https://www.nodc.noaa.gov/ocads/oceans/glodap/GlopDV.html
though my recollection is there used to be netCDF files available...but those seem to have disappeared.

Fortunately, the data are available on disk at /glade/p/cesm/bgcwg/GLODAP

Proper depth for NPP in tables?

@klindsay28 pointed out there's a discrepancy in how NPP is computed for CESM1 and CESM2 data. CESM1 contains 3D fields for productivity by each autotroph, but those fields are only 150m deep. CESM2 contains pre-computed integrals for 100m and full-depth.

Right now I'm computing the 150m integral of the CESM1 data and reading in the full-depth integral from CESM2. Keith's recommendation would be to compute the 100m integral from CESM1 data and read in the 100m integral from CESM2 as that's the only available apples-to-apples comparison.

Missing nitrogen fixation in global flux table

I'll update this in #42 since we want the updated data anyway

O2 distributions

Maps of 400-600m mean O2 from model and WOA.
Zonal mean sections by basin of O2; enhanced resolution in upper 1 km.

Table of globally-integrated fluxes

Net primary production (PgC/yr)
Diatom primary production (%)
Sinking POC at 100 m (PgC/yr)
Sinking CaCO3 at 100 m (PgC/yr)
Rain ratio (CaCO3/POC) 100 m
Nitrogen fixation (TgN/yr)
Nitrogen deposition (TgN/yr)
Denitrification (TgN/yr)
N cycle imbalance = deposition + fixation - denitrification (TgN/yr)
Air–sea CO2 flux (PgC yr21)
Mean ocean oxygen (mmol/m^3)
Volume where O2 <80 mmol/m^3 (10^15 m^3)
Volume where O2 <60 mmol/m^3 (10^15 m^3)
Volume where O2 <5 mmol/m^3 (10^15 m^3)

Download WOA2018 data

Want to download the data, interpolate to gx1v7 grid, and then generate an intake-esm catalog for it. (Interpolation should result in something similar to /glade/work/mclong/woa2013v2/POP_gx1v7, perhaps with a data structure more akin to /glade/p/cesm/bgcwg/obgc_diag/data/obs_data/WOA2005?)

Remove oxygen rows from CESM2 table

We don't want to include oxygen results in the table because they can not be trusted.

code refactor

I am thinking about my OMWG talk...so this list is coming out of me right now. I think my refactor of the nutrient plotting tries to move towards some of these principles.

Principles

Rely on APIs for data access

Data access based on hard-coded paths is fragile
Sometimes, relationships between data and actual desired product involves computation
- For instance, pop_tools.get_grid(...) reads CESM INPUTDATA files via web protocol and
  does the same computations as the model to derive the grid variables. Super portable. Efficient because of Numba!
Access details are messy
- Simulations discretized over arbitrary number of files: time levels, variables, ensemble
  members: does not conform to meaningful conceptual discretization
- Many file formats, temporal frequencies, spatial resolutions: APIs enable standardizations steps to be applied en route
An API can be parameterized
- Flexibility enables more reuseable components
- Build codes to perform operation over key dimension in query returns

Consume and produce `xarray` datasets

xarray.Dataset objects encapsulate data and metadata
i/o and Cloud storage formats are well supported
Operators enable rapid dimension reduction, interpolation, resampling, &

Plotting code should not do "computation"

Computation should be clearly separate from visualization
The data behind plots should be cached
If the data to make plots is available, we can build web-visualization maps around it

Isolate dependencies on glade

Data behind a authentication layers precludes reproducibility

To be continued....

Update notebooks to use latest version of intake-esm

Use pop-tools for regions fed to zonal mean tool

pop-tools provides a mechanism to divvy the Southern Ocean among the Atlantic, Pacific, and Indian Oceans so that zonal means for those three extend beyond 40 S. We want that in the nutrient plots (and @kristenkrumhardt would like that feature for her work as well)

Older version of xESMF

Is there a reason for using an older version of xESMF (v. 0.1.1) instead of the more recent Pangeo led releases (v. 0.5+)?

table of parameters

I would like to include a table of model parameters in the paper.

In the Latex document we have a table:

\begin{table}[p!]
\caption{Model parameters (from pandas).}
\centering
\begin{tabular}{lrll}
\toprule
                          Parameter &  Value &              Units &                         Description \\
\midrule
   $\mu_{\mathrm{ref},\mathrm{sp}}$ &    5.0 &  $\mathrm{d^{-1}}$ &  Maximum C-spec growth rate at Tref \\
 $\mu_{\mathrm{ref},\mathrm{diat}}$ &    5.0 &  $\mathrm{d^{-1}}$ &  Maximum C-spec growth rate at Tref \\
 $\mu_{\mathrm{ref},\mathrm{diaz}}$ &    2.5 &  $\mathrm{d^{-1}}$ &  Maximum C-spec growth rate at Tref \\
\bottomrule
\end{tabular}
\end{table}

The "from pandas" leads me to believe this was auto generated. Can we revive this workflow and expand to include other parameters?

add IRON_FLUX to get_ann_means_and_units

Fix column header in global flux table

2090s for CESM2 is an SSP run, not RCP8.5 (slightly different forcings). I'll fix this in #42

clean up timeseries access

ann_avg_utils has unnecessary input parameter:
experiment_longnames

Also, the user pulls things like units out of the module, then puts them back into the model via this API. I would prefer that the routine returns a dictionary of xarray datasets with the units conversion already applied. I am applying the unit conversion post facto:

for var in variables:
    for exp in cesm2_exp:        
        ds = dsets[var][exp]
        
        var_units = ds[var].attrs['units']
        
        var_data_new_units = (
            ds[var].data * pint_units[var_units]
        ).to(final_units[var]).magnitude
        
        ds[var].values = var_data_new_units
        new_units = final_units[var]
        if new_units in pretty_units:
            new_units = pretty_units[new_units]
        ds[var].attrs['units'] = new_units

This routine doesn't seem to actually be computing timeseries, but rather constructing cache file names. I would prefer that the computation happens, if necessary, here.

Must we do any concatenation inside this function? It seems like mission creep.

Comparison with satellite chlorophyll

We should make

Maps showing model and obs
Zonal means hovmöller showing seasonal cycle in Atlantic and Pacific

Which chlorophyll product?
We could use a http://wiki-measures.eri.ucsb.edu/GSM product or SeaWiFS.

cc @kristenkrumhardt

contour levels and lines for nutrient plots

I should have submitted a PR for the changes I pushed today.

I split nutrient data.ipynb into

_data-nutrient-plots.ipynb: produces zarr archives of map, zonal-mean, and global profile datasets
nutrient-plots.ipynb: makes map, zonal-mean, and global profile plots

The plots are nearly there, but I would like to refine the contour levels for the surface maps. For the zonal sections, I used the levels from Sarmiento & Gruber (2006) so I think we're good there.

I think we should aim for a number of levels that yields clear dilineations on the colorbar.

I would also like to add contour lines to all color plots. Labels preferably included. Might need a stride > 1 for color levels.

Add standard deviation from piControl to global tables

Using a 200-year mean from the piControl runs, it would be useful to also get the standard deviation among the 20 decade-long chunks.

Bug in nitrogen imbalance computation

Sign wrong in https://github.com/marbl-ecosys/cesm2-marbl/blob/master/notebooks/ann_avg_utils.py#L386 -- should be adding river flux, not subtracting it.

I'll fix in #42 since we want this table to use latest SSP data

Refactor notebooks to avoid duplicate code

There is a ton of code that I simply copied for one notebook to another (or copied and then made a minor tweak to). Once all the notebooks are up and running, I'd like to go back and see what should be pulled into a .py file or two.

forcing_iron_flux notebook doesn't use xpersist

It took ~6 minutes to re-run the computation-intensive cell in forcing_iron_flux_notebook.ipynb, and it would be nice if the results of that were dumped to netcdf

Nutrient limitation maps

Compute biomass-weighted-mean limitation terms in the upper ocean (i.e., top 150 m).

Make 3 panel plot: maps of most limiting nutrient for each phytoplankton taxa (diatom, small phyto, diazotrophs).

watermark is broken

Something in utils.py breaks watermark. I get the following error from the watermark magic.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-173-35b364ad726b> in <module>
     12 
     13 get_ipython().run_line_magic('load_ext', 'watermark')
---> 14 get_ipython().run_line_magic('watermark', '-d -iv -m -g -h')

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth)
   2305                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2306             with self.builtin_trap:
-> 2307                 result = fn(*args, **kwargs)
   2308             return result
   2309 

</glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/decorator.py:decorator-gen-126> in watermark(self, line)

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/watermark/watermark.py in watermark(self, line)
    134                 self._get_git_branch(bool(args.machine))
    135             if args.iversions:
--> 136                 self._print_all_import_versions(self.shell.user_ns)
    137             if args.watermark:
    138                 if self.out:

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/watermark/watermark.py in _print_all_import_versions(vars)
    248         longest = max([len(i[0]) for i in to_print] + [0]) + 1
    249         for entry in to_print:
--> 250             print(('%s' % entry[0]).ljust(longest) + entry[1])
    251 
    252 

TypeError: can only concatenate str (not "NoneType") to str

removed control drift

We should have the option to apply a drift correction computed from the control integration.

This is made more complicated because the historical runs have different branch years.

catalog = intake.open_esm_datastore('data/campaign-cesm2-cmip6-timeseries.json', sep=':')
catalog.search(experiment='historical').df.ctrl_branch_year.unique()
array([601, 631, 661, 501, 691, 721, 751, 781, 811, 841, 871])

Nutrient comparison plots

We need

plots of surface nutrients (NO4, PO4, SiO3) compared to obs (WOA 2018).
Global-mean profiles
Zonal mean sections for each basin (Global, Pacific, Atlantic, Indian)

Dump intermediate datasets to disk?

e.g. in the forcing_iron_flux notebook, it would be much faster to dump the annual means (and the global annual means) to disk.

Should there be a tool that precomputes annual means and climatologies and creates an intake catalog? Then the analysis notebooks wouldn't need to repeat the esmlab calls every time they are run.

Proper time frame to average over?

attn: @matt-long (this issue cropped up in a conversation I had with @klindsay28)

What time period should we use when computing global averages from historical runs for our table? There are several factors at play, some of which conflict with others:

We wanted to compare to Moore et al. (2013) table, which looks at the 1990s
We want a long enough time period to damp out some variability: Lindsay et al. (2014) uses the last 25 years of the CMIP5 period
If we are comparing CMIP5 and CMIP6 data, we should average over the same time period for both
If we are comparing to observations, we should match the time period the observations cover

@klindsay28 thinks there might be guidance from the IPCC on this (e.g. a rule of thumb like "use 1990-2014, and for CMIP5 runs the 2005-2014 data should come from RCP")