Giter VIP home page Giter VIP logo

cmip6's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

lixin6395

cmip6's Issues

Use Apply function

Objective

Use an apply family function and the function you made in #4 to process multiple files.

  1. A script that is capable of processing data from multiple variables / models / experiments / ensemble members
  2. A csv file that stores the results of the global weight mean

Purpose

This is going to be the script that eventually we are going to submit as an sbatch job to pic. This script it going is what is going to process a section of the CMIP6 files you downloaded on pic.

Tips

  • You are going to use one of the apply family functions read up on the R documentation about the different options (lapply, mapply, apply, ect.) and decide which one to use.
  • Before you use the apply family function to call #4 you are going to want to develop some code that selects a certain set of CMIP data (temperature for the historical concentration driven run for all models / ensemble members). This section of code is going to be useful for when we create the index (#6).
  • When you use the apply function to process data from multiple models using #4 do not do a large list of them, do 2 or 3 models!!

Potential Archive Problems

First check out the CMIP6 errata for a list of known data problems and use this issue to track problems/inconsistencies in JGCRI's CMIP6 data archive. (TODO organize these issues by a searchable table??)

CNRM-CM6-1

  • rh output reported in units of kg CO2 per second meter squared where as all other modeling groups use the units of kg C per second meter squared.

Do we need to split up the archive?

Are we going to run into issues with storage space on pic / GitHub?

Potential Data to deprioritized download of

  • hourly data
  • the global values -- are we sure that they contain good values? some seem to be corrupt

issue regridding one tos netcdf

/pic/projects/GCAM/CMIP6/archive/tos/tos_Omon_IPSL-CM6A-LR_ssp245_r1i1p1f1_gn_201501-210012.nc is having problems being regridded from its native resolution up to 2.5 degrees

When trying to regrid on pic, error cdo remapbil (Abort): Unsupported grid type: generic is returned.

command used for regridding
/share/apps/netcdf/4.3.2/gcc/4.4.7/bin/cdo remapbil,/pic/projects/GCAM/CMIP5-CHartin/CMIP5_RCP45/tos/gridtype.txt /pic/projects/GCAM/CMIP6/archive/tos/tos_Omon_IPSL-CM6A-LR_ssp245_r1i1p1f1_gn_201501-210012.nc /pic/projects/GCAM/Abigail/regridded_cmip6_2p5deg/tos_Omon_rgr2p5-IPSL-CM6A-LR_ssp245_r1i1p1f1_gn_201501-210012.nc

Not pressing, we just ended up using other models for the project this came up in.

Put test scripts under version control

Objective

Incorporate the scripts you developed that use R (only) and cdo (only) to calculate and plot the global annual mean for a single model / experiment.

  1. A R script that processes the netcdf using R only
  2. A txt script that contains the lines of code that you used to use CDO to process the results.
  3. A R script that plot the results of the from script 1 and 2 against one another.

Purpose

Become familiar with git and pull requests

Tips

  • This is a great resource for working with git https://git-scm.com/book/en/v2
  • You are going to want to create your own branch
  • When you are done you are going to open a pull request (PR) and ping me to review it

Sequence of commands -> R function

Objective

Translate the sequence of commands that process monthly gridded netcdf into a data frame of annual global values into a function.

  1. A function that processes gridded monthly to weighted global annual mean data for a dingle model / variable / scenario / ensemble member
  2. Proof that the function worked

Purpose

This function is going to be the main function that we use to process the CMIP netcdfs from monthly gridded data to annual monthly data. This function will be used to process the sets of netcdf files (40 files or more).

Tips

Here are some things that you are going to want to think about you are working on designing this function.

  • What arguments are we going to want this function to have? What needs to change pending on the netcdf that is being processed? What things do not need to be changed and can be hard coded in?
  • You are going to want to use a combination of R and CDO so this is function should contain a system2 call.
  • Debugging CDO errors can be a pain, what sorts of things will break a CDO pipeline? What can we do to provide informational error messages? Or prevent errors from being thrown?
  • How do we know that the function worked?

Data request - CMIP6 ocean temperature data.

Timescale: Annual is all we need; we can process monthly to annual if that's available instead.

Variable: tos and tas (tas is needed to calculate the global average T in each year only).

Model: 2-5 distinct models. GFLD-CM4 is already on pic. We're aiming to capture a range of behavior so are ambivalent about actual model choice to start.

Experiment: sspN45 and pre-industrial control (so we can use the values to calculate delta temperatures)

@cahartin @kdorheim

Modify the Python Scipt

It there some way that we can parse the shell script to remove files that are already downloaded on pic and are in the index?

Is there some way that we can make this into a function so that the user has the option to source it the file without modifying the inputs?

py python_script.py keep = [something]

Variables to Download

@cahartin thought of some more variables we will want to download

  • ph
  • ocean heat flux
  • atmospheric co2
  • tos
  • hfls
  • rlus
  • rsus
  • rlds
  • rsds
  • tas
  • pr
  • hfss
  • areacella
  • sftlf
  • npp

Index

Objective

We want something that we can use to look up what we have on pic / where it lives.

  1. Code that can be used to find the netcdf files on pic.
  2. A csv file that lives on pic / on git that is easy to check out to see what data we have.
  • Code that can help use figure out what data we are missing and still need to download ?

Purpose

Now instead of developing code in #5 to find all the files to process we can use this csv file.

Tips

  • Think about what sort of information someone that is processing or using your cmip data will want to know?
  • What sort of documentation do we need for the index?

Empty NETCDF files

It turns out that some of the netcdf files that have been downloaded are ... empty! Whether this is an issue with the cmip6 portals or occurred during an incomplete download this is horrendous and must be correct.

  • Need to go through the current files and remove any file that is not a netcdf format
  • Add/implement some sort of quality check for the netcdf files before adding them to to the archive

Submit and Run PIC Script on PIC

Objective

Process all of the temperature data from the concentration runs for the future scenarios.

  1. A .R script that can run on PIC
  2. A .zsh script that can be submitted as a PIC job
  3. A .csv file that contains global annual mean values

Purpose

These are the temperature values that we are interested and will be used by other projects, including to calibrate Hector! Latter on we are going to visualize these results.

Tips

  1. Checkout confluence resources about setting up a sbatch job https://confluence.pnnl.gov/confluence/display/RC/Submitting+a+Job+with+SLURM.
  2. Submit a test job (a .zsh script that will only process a few files)
  3. Submit the full job and process the temperature data files

Check if daily files are getting updated

Only daily files from SSP1-1.9 appear to have been downloaded for tasmax/tasmin (there are more SSPs available for pr). Wondering if there is something that prevents updating the daily output for some variables, or if it is a problem at the source. SSP1-1.9 is a Tier 2 experiment in ScenarioMIP so I would expect other SSPs to have had daily data processed and submitted before it.

Thanks!

Area Weights Attributes

Problem/FUNNN fact! If the netcdf output data files does not have cell_measures field then cdo cannot be used to calculate the weighted mean :( and the user has no ability to overwrite the gridded area using setgridarea. This is a problem because it means that even when we apply some sort of land or ocean area weights the weighted averages will be the same...

how to check to see if the output data file has the cell area information

nc <- some/cmip/output/data.nc
atts <- ncdf4::ncatt_get(ncdf4::nc_open(nc), 'var')
atts$cell_measures

Ideally the will return something like...

atts$cell_measures
[1] "area: areacella"]

but in some instances the atts$cell_measures <- NULL ah!

Possible Solutions

  • Depending on how many files are returned we could remove them from the CMIP6 archive...
  • Prevent cdo functions from performing weighted averages on files that do not have the cell-measures attribute (I think we could technically do the mean in R with the specific weights applied, the downside is that those files would pretty large to open in R)

Add documentation

As of right now there is some documentation but it needs to be improved.

  • update the read me so that it contains info about the repo contents
  • add wiki pages that contain information about the CMIP6 data, how to work with the archive and request new data

Visualize and Analyze Results

Objective

This is the fun part, let's take a look at the data there are some things that we are going to need to be on the look out for. Do the results make sense if not do we think it has to do with our processing code or not?

  1. Comparison of model output for the different scenarios
  2. How to the scenarios for each model compare with one another ?
  3. Any other plot that you are interested in

Purpose

Checkout the results we have been working to get!

Tips

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.