Giter VIP home page Giter VIP logo

cesm2-marbl's People

Contributors

hackmd-deploy avatar matt-long avatar mnlevy1981 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cesm2-marbl's Issues

time series plots

Globally integrated NPP, POC flux, air-sea CO2 flux time series -- full historical and RCPs; use forcing_iron_flux notebook as sample. Maybe single RCP per variable? Alternative would be ensemble mean / spread with all RCPs in same plot.

What should go into the global fluxes table for publication?

Two related questions:

  1. What rows do we want in the table that eventually gets published? (i.e. what variables should we include, and in what order)

  2. What columns do we want in the table that eventually gets published? (i.e. what model(s) / configurations should we include?)

    • Should I add any of the CESM 1 columns?
    • Should I remove any of the CESM 2 SSP columns?

Additionally, I want to make sure that we are using the same time periods for the historical and preindustrial averages as @klindsay28 uses in his paper -- I think we've settled on 1990 - 2014 for the historical, with CESM1 using 1990 - 2004 as well as 2005 - 2014 from RCP8.5. @klindsay28 is going to do a little analysis to determine the best preindustrial slice(s), and then I'll update this notebook.

intake-esm can't concatenate files

notebooks/forcing_iron_flux.ipynb hangs up on this command:

dq = cesm2.search(experiment=['historical'], variable='IRON_FLUX').to_xarray(chunks={'time': 48})

I get the following error. It's possible that one ensemble member has a different file, I haven't diagnosed. First need to diagnose the problem and might need a mechanism to get around this with intake_esm.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-67f3a48652c5> in <module>
----> 1 dq = cesm2.search(experiment=['historical'], variable='IRON_FLUX').to_xarray(chunks={'time': 48})
      2 _, ds2 = dq.popitem()
      3 ds2 = ds2.drop([v for v in ds2.variables if v not in keep_vars])
      4 ds2

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_esm/source.py in to_xarray(self, **kwargs)
    157         _kwargs.update(kwargs)
    158         self.kwargs = _kwargs
--> 159         return self.to_dask()
    160 
    161     def _get_schema(self):

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_xarray/base.py in to_dask(self)
     67     def to_dask(self):
     68         """Return xarray object where variables are dask arrays"""
---> 69         return self.read_chunked()
     70 
     71     def close(self):

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_xarray/base.py in read_chunked(self)
     42     def read_chunked(self):
     43         """Return xarray object (which will have chunks)"""
---> 44         self._load_metadata()
     45         return self._ds
     46 

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake/source/base.py in _load_metadata(self)
    115         """load metadata only if needed"""
    116         if self._schema is None:
--> 117             self._schema = self._get_schema()
    118             self.datashape = self._schema.datashape
    119             self.dtype = self._schema.dtype

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_esm/source.py in _get_schema(self)
    166 
    167         if self._ds is None:
--> 168             self._open_dataset()
    169             metadata = {}
    170 

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_esm/cesm.py in _open_dataset(self)
    151             member_column_name='member_id',
    152             variable_column_name='variable',
--> 153             file_fullpath_column_name='file_fullpath',
    154         )

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_esm/source.py in _open_dataset_groups(self, dataset_fields, member_column_name, variable_column_name, file_fullpath_column_name, file_basename_column_name)
    134                         dsets,
    135                         time_coord_name_default=kwargs['time_coord_name'],
--> 136                         override_coords=kwargs['override_coords'],
    137                     )
    138                     var_dsets.append(var_dset_i)

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/intake_esm/aggregate.py in concat_time_levels(dsets, time_coord_name_default, restore_non_dim_coords, override_coords)
    181     objs_to_concat = [first] + rest
    182 
--> 183     ds = xr.concat(objs_to_concat, dim=time_coord_name, coords='minimal')
    184 
    185     new_history = f"\n{datetime.now()} xarray.concat(<ALL_TIMESTEPS>, dim='{time_coord_name}', coords='minimal')"

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join)
    131             "objects, got %s" % type(first_obj)
    132         )
--> 133     return f(objs, dim, data_vars, coords, compat, positions, fill_value, join)
    134 
    135 

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join)
    321                 raise ValueError(
    322                     "variables %r are present in some datasets but not others. "
--> 323                     % absent_merge_vars
    324                 )
    325 

ValueError: variables {'DXU', 'latent_heat_fusion_mks', 'nsurface_u', 'HU', 'rho_sw', 'UAREA', 'radius', 'TAREA', 'HUS', 'vonkar', 'latent_heat_fusion', 'salinity_factor', 'dzw', 'DXT', 'hflux_factor', 'moc_components', 'salt_to_ppt', 'T0_Kelvin', 'HUW', 'sound', 'nsurface_t', 'HTN', 'ANGLE', 'DYU', 'rho_fw', 'salt_to_mmday', 'KMT', 'omega', 'TLAT', 'sea_ice_salinity', 'ULONG', 'KMU', 'DYT', 'momentum_factor', 'stefan_boltzmann', 'HT', 'dz', 'latent_heat_vapor', 'ULAT', 'cp_sw', 'rho_air', 'ocn_ref_salinity', 'transport_regions', 'ppt_to_salt', 'fwflux_factor', 'heat_to_PW', 'cp_air', 'transport_components', 'mass_to_Sv', 'REGION_MASK', 'grav', 'HTE', 'TLONG', 'ANGLET', 'salt_to_Svppt', 'days_in_norm_year', 'sflux_factor'} are present in some datasets but not others. 

cc @andersy005, @mnlevy1981

PO4 contourf map is wrong

I added code to plot surface nutrients for the paper, but for some reason, the PO4 map from the model is messed up (panel D).

This code (no projection)

cf = plt.contourf(ds_surf_plot.TLONG, ds_surf_plot.TLAT, ds_surf_plot['PO4'], **contour_spec['PO4'])
plt.colorbar(cf)

yield this (the right answer):
image

However, adding the projection seems to screw things up:

ax = plt.axes(projection=prj)
cf = ax.contourf(ds_surf_plot.TLONG, ds_surf_plot.TLAT, ds_surf_plot['PO4'], 
                  **contour_spec['PO4'],
                 transform=ccrs.PlateCarree())
plt.colorbar(cf)

image

The other nutrients seem to plot fine; something about the PO4 field is screwy...but the obs and bias plots seem ok. I am at a loss.

JJA Chl data not mapping correctly?

From zulip, @matt-long pointed out

I would have expected greater coverage at high northern latitudes in JJA. At some point, would be good to confirm that we're not missing something there.

Plots of N cycle terms

Two panel plot: (a) N fixation (b) column integral of denitrification. Include global integrals in title or caption.

abiotic radiocarbon sections

I would like to include zonal section plots of ABIO_DIC14, comparable to nutrient sections. I think the best observational dataset is GLODAPv1.

These data are available here,
https://www.nodc.noaa.gov/ocads/oceans/glodap/GlopDV.html
though my recollection is there used to be netCDF files available...but those seem to have disappeared.

Fortunately, the data are available on disk at /glade/p/cesm/bgcwg/GLODAP

Proper depth for NPP in tables?

@klindsay28 pointed out there's a discrepancy in how NPP is computed for CESM1 and CESM2 data. CESM1 contains 3D fields for productivity by each autotroph, but those fields are only 150m deep. CESM2 contains pre-computed integrals for 100m and full-depth.

Right now I'm computing the 150m integral of the CESM1 data and reading in the full-depth integral from CESM2. Keith's recommendation would be to compute the 100m integral from CESM1 data and read in the 100m integral from CESM2 as that's the only available apples-to-apples comparison.

O2 distributions

  • Maps of 400-600m mean O2 from model and WOA.
  • Zonal mean sections by basin of O2; enhanced resolution in upper 1 km.

Table of globally-integrated fluxes

  • Net primary production (PgC/yr)
  • Diatom primary production (%)
  • Sinking POC at 100 m (PgC/yr)
  • Sinking CaCO3 at 100 m (PgC/yr)
  • Rain ratio (CaCO3/POC) 100 m
  • Nitrogen fixation (TgN/yr)
  • Nitrogen deposition (TgN/yr)
  • Denitrification (TgN/yr)
  • N cycle imbalance = deposition + fixation - denitrification (TgN/yr)
  • Air–sea CO2 flux (PgC yr21)
  • Mean ocean oxygen (mmol/m^3)
  • Volume where O2 <80 mmol/m^3 (10^15 m^3)
  • Volume where O2 <60 mmol/m^3 (10^15 m^3)
  • Volume where O2 <5 mmol/m^3 (10^15 m^3)

Download WOA2018 data

Want to download the data, interpolate to gx1v7 grid, and then generate an intake-esm catalog for it. (Interpolation should result in something similar to /glade/work/mclong/woa2013v2/POP_gx1v7, perhaps with a data structure more akin to /glade/p/cesm/bgcwg/obgc_diag/data/obs_data/WOA2005?)

code refactor

I am thinking about my OMWG talk...so this list is coming out of me right now. I think my refactor of the nutrient plotting tries to move towards some of these principles.

Principles

Rely on APIs for data access

  • Data access based on hard-coded paths is fragile
  • Sometimes, relationships between data and actual desired product involves computation
    • For instance, pop_tools.get_grid(...) reads CESM INPUTDATA files via web protocol and
      does the same computations as the model to derive the grid variables. Super portable. Efficient because of Numba!
  • Access details are messy
    • Simulations discretized over arbitrary number of files: time levels, variables, ensemble
      members: does not conform to meaningful conceptual discretization
    • Many file formats, temporal frequencies, spatial resolutions: APIs enable standardizations steps to be applied en route
  • An API can be parameterized
    • Flexibility enables more reuseable components
    • Build codes to perform operation over key dimension in query returns

Consume and produce xarray datasets

  • xarray.Dataset objects encapsulate data and metadata
  • i/o and Cloud storage formats are well supported
  • Operators enable rapid dimension reduction, interpolation, resampling, &

Plotting code should not do "computation"

  • Computation should be clearly separate from visualization
  • The data behind plots should be cached
  • If the data to make plots is available, we can build web-visualization maps around it

Isolate dependencies on glade

  • Data behind a authentication layers precludes reproducibility

To be continued....

Older version of xESMF

Is there a reason for using an older version of xESMF (v. 0.1.1) instead of the more recent Pangeo led releases (v. 0.5+)?

table of parameters

I would like to include a table of model parameters in the paper.

In the Latex document we have a table:

\begin{table}[p!]
\caption{Model parameters (from pandas).}
\centering
\begin{tabular}{lrll}
\toprule
                          Parameter &  Value &              Units &                         Description \\
\midrule
   $\mu_{\mathrm{ref},\mathrm{sp}}$ &    5.0 &  $\mathrm{d^{-1}}$ &  Maximum C-spec growth rate at Tref \\
 $\mu_{\mathrm{ref},\mathrm{diat}}$ &    5.0 &  $\mathrm{d^{-1}}$ &  Maximum C-spec growth rate at Tref \\
 $\mu_{\mathrm{ref},\mathrm{diaz}}$ &    2.5 &  $\mathrm{d^{-1}}$ &  Maximum C-spec growth rate at Tref \\
\bottomrule
\end{tabular}
\end{table}

The "from pandas" leads me to believe this was auto generated. Can we revive this workflow and expand to include other parameters?

clean up timeseries access

ann_avg_utils has unnecessary input parameter:
experiment_longnames

Also, the user pulls things like units out of the module, then puts them back into the model via this API. I would prefer that the routine returns a dictionary of xarray datasets with the units conversion already applied. I am applying the unit conversion post facto:

for var in variables:
    for exp in cesm2_exp:        
        ds = dsets[var][exp]
        
        var_units = ds[var].attrs['units']
        
        var_data_new_units = (
            ds[var].data * pint_units[var_units]
        ).to(final_units[var]).magnitude
        
        ds[var].values = var_data_new_units
        new_units = final_units[var]
        if new_units in pretty_units:
            new_units = pretty_units[new_units]
        ds[var].attrs['units'] = new_units

This routine doesn't seem to actually be computing timeseries, but rather constructing cache file names. I would prefer that the computation happens, if necessary, here.

Must we do any concatenation inside this function? It seems like mission creep.

contour levels and lines for nutrient plots

I should have submitted a PR for the changes I pushed today.

I split nutrient data.ipynb into

  • _data-nutrient-plots.ipynb: produces zarr archives of map, zonal-mean, and global profile datasets
  • nutrient-plots.ipynb: makes map, zonal-mean, and global profile plots

The plots are nearly there, but I would like to refine the contour levels for the surface maps. For the zonal sections, I used the levels from Sarmiento & Gruber (2006) so I think we're good there.

I think we should aim for a number of levels that yields clear dilineations on the colorbar.

I would also like to add contour lines to all color plots. Labels preferably included. Might need a stride > 1 for color levels.

Refactor notebooks to avoid duplicate code

There is a ton of code that I simply copied for one notebook to another (or copied and then made a minor tweak to). Once all the notebooks are up and running, I'd like to go back and see what should be pulled into a .py file or two.

Nutrient limitation maps

Compute biomass-weighted-mean limitation terms in the upper ocean (i.e., top 150 m).

Make 3 panel plot: maps of most limiting nutrient for each phytoplankton taxa (diatom, small phyto, diazotrophs).

watermark is broken

Something in utils.py breaks watermark. I get the following error from the watermark magic.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-173-35b364ad726b> in <module>
     12 
     13 get_ipython().run_line_magic('load_ext', 'watermark')
---> 14 get_ipython().run_line_magic('watermark', '-d -iv -m -g -h')

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth)
   2305                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2306             with self.builtin_trap:
-> 2307                 result = fn(*args, **kwargs)
   2308             return result
   2309 

</glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/decorator.py:decorator-gen-126> in watermark(self, line)

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/watermark/watermark.py in watermark(self, line)
    134                 self._get_git_branch(bool(args.machine))
    135             if args.iversions:
--> 136                 self._print_all_import_versions(self.shell.user_ns)
    137             if args.watermark:
    138                 if self.out:

/glade/work/mclong/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/watermark/watermark.py in _print_all_import_versions(vars)
    248         longest = max([len(i[0]) for i in to_print] + [0]) + 1
    249         for entry in to_print:
--> 250             print(('%s' % entry[0]).ljust(longest) + entry[1])
    251 
    252 

TypeError: can only concatenate str (not "NoneType") to str

removed control drift

We should have the option to apply a drift correction computed from the control integration.

This is made more complicated because the historical runs have different branch years.

catalog = intake.open_esm_datastore('data/campaign-cesm2-cmip6-timeseries.json', sep=':')
catalog.search(experiment='historical').df.ctrl_branch_year.unique()
array([601, 631, 661, 501, 691, 721, 751, 781, 811, 841, 871])

Nutrient comparison plots

We need

  • plots of surface nutrients (NO4, PO4, SiO3) compared to obs (WOA 2018).
  • Global-mean profiles
  • Zonal mean sections for each basin (Global, Pacific, Atlantic, Indian)

Dump intermediate datasets to disk?

e.g. in the forcing_iron_flux notebook, it would be much faster to dump the annual means (and the global annual means) to disk.

Should there be a tool that precomputes annual means and climatologies and creates an intake catalog? Then the analysis notebooks wouldn't need to repeat the esmlab calls every time they are run.

Proper time frame to average over?

attn: @matt-long (this issue cropped up in a conversation I had with @klindsay28)

What time period should we use when computing global averages from historical runs for our table? There are several factors at play, some of which conflict with others:

  1. We wanted to compare to Moore et al. (2013) table, which looks at the 1990s
  2. We want a long enough time period to damp out some variability: Lindsay et al. (2014) uses the last 25 years of the CMIP5 period
  3. If we are comparing CMIP5 and CMIP6 data, we should average over the same time period for both
  4. If we are comparing to observations, we should match the time period the observations cover

@klindsay28 thinks there might be guidance from the IPCC on this (e.g. a rule of thumb like "use 1990-2014, and for CMIP5 runs the 2005-2014 data should come from RCP")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.