Light

cct-datascience / ed2-mandifore Goto Github PK

View Code? Open in Web Editor NEW

0.0 3.0 1.0 22.38 MB

PEcAn.ED2 runs using MANDIFORE site, patch, cohort, and met data

R 97.25% Shell 2.75%

ed2-mandifore's Introduction

ed2-mandifore

The goal of ed2-mandifore is to run ED2 with Setaria in already grown, complex ecosystems using weather data from MANDIFORE sites.

Reproducibility

This project uses renv to manage package dependencies. This is essential for reproducing this work as the version of PEcAn.ED2 used is installed from a pull request that will likely never be merged. Using a different version of PEcAn.ED2 will result in errored runs. Run renv::restore() to install dependencies.

Setup scripts

Scripts 00 and 01 have already been run and have generated the data in data/. They do not need to be run again.

Start with sourcing 02_setup-runs.R to generate files in the transect/ directory.
In the terminal, navigate to a particular run (e.g. ./transect/MANDIFORE-SEUS-352/pine) and start the job as a background process with ./run.sh.
Follow the checklist below to check that the job is running correctly

Job Start Checklist

Do all of this before starting the next job!

Is the R output being saved to workflow.Rout?
Are all the expected run/ and out/ folders created locally?
Find settings_checked.xml and look through it.
Are all the expected run/ and out/ folders created on the HPC?
Once the job starts on the HPC, record the SLURM job ID. It is not printed anywhere in the logs, so you will need to manually copy and paste it somewhere (e.g. into the pid.nohup file that has the local PID)
Spot-check the log files for multiple runs to see that simulation has started

...now you can start another job.

Job analytics

If you want to know how long a job took on the HPC you can use:

sacct -j <jobid> -o Start,End,Elapsed

To check remaining compute hours:

va

ed2-mandifore's People

Contributors

Watchers

Forkers

ed2-mandifore's Issues

Decide on disturbance type in patch files

The trk variable for the .pss files is set at 1, I think, but maybe it should be different for different patches/ecosystems? I don't really know how it works

https://github.com/EDmodel/ED2/wiki/Initial-conditions#files-types-and-formats-for-nlied_init_mode6

Training data for gridded model

Increase runs to more sites besides N/S transect to get data to train a gridded model like in cct-datascience/model-vignettes#89.

Add correctness tests to PEcAn PR #3129

And possibly run tests with old version to compare.

Old pss and css files not compatible with modern ED2

The history files from Mike were created with ED2.1.0 at some point (not sure exactly when) and are meant to work with IED_INIT_MODE=3, which is now deprecated in favor of IED_INIT_MODE=6. The .pss files have a different format with the new mode. Also, the .css files contain some PFTs that have been replaced (12, 13, 14), so these need to be replaced with some modern PFTs. Other changes might be necessary also.

Fix messed up git situation

Accidentally committed a large file on #19, now can't push to GitHub. Untracked large file, but still can't push. Can I squash commits or something??

Some jobs not starting using run.sh

Some jobs apparently don't start when using the run.sh script. The jobid is NULL and the workflow. Not sure if there is a pattern to this or if it's stochastic.

More than 3 simultaneous HPC jobs hits CPU limit?

It seems like we are either getting close to the compute limit, or there's some other limit about the number of simultaneous jobs/nodes/cores.

(puma) [ericrscott@wentletrap ~]$ squeue -u ericrscott
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           6753242  standard PEcAn-SA ericrsco PD       0:00      2 (AssocGrpCPUMinutesLimit)
           6753209  standard PEcAn-SA ericrsco  R   17:28:25      2 r1u11n2,r4u29n2
           6753173  standard PEcAn-SA ericrsco  R   17:40:42      2 r1u26n1,r2u32n1
           6753207  standard PEcAn-SA ericrsco  R   17:34:33      2 r4u08n2,r4u13n2

Figure out what to do about PEcAn breaking allometry equations

Currently PEcAn overwrites ED2 defaults with its own defaults resulting in errored runs.

The simplest fix is to just turn off this behavior from PEcAn and let ED2 use it's own defaults, but this will never be merged into the develop branch.

Keep a fork/branch that we install PEcAn.ED2 from
Try to fix PEcAn.ED2 in a less invasive way:
- re-name ED2IN.r2.2.0.github to ED2IN.rgit so history.rgit is used for defaults
- possibly edit ED2IN to use ED 2.2.0 defaults for everything (i.e. IMETRAD, IALLOM and ISTRUCT_GROWTH_SCHEME) (compare with what other ED2IN files in inst/ do)
- update suspected bad values in history.rgit

Figure out if SA output is broken

The sensitivity.output...Rdata file loads dataframes with all NAs in most recent run

Figure out why numbers seem wrong

NPP seems way too high
AGB seems way too low
transpiration seems way too low

Inspect raw data coming from .h5 files
Triple-check calculations and conversions
Ask in ED2 discussions if I'm getting the units and conversions right

revise work plan

Revise work plan and get approval from David/Kristina before starting more runs

New sites for model training data

Select 20 sites from SEUS that we haven't run yet
Generate run files
Change number of cores (programatically in setup script?) to encumber fewer cpu hours
Start all runs

Update singularity container on Welsch (?)

Now that PecanProject/pecan#3155 is merged, should I update the container on Welsch? I'd have to wait until a set of runs finish to do that. Maybe not worth it until these 9 sites are finished.

Get Setaria run working at a more complex site

Currently planting Setaria in an already existing forest is only working for a site with a single PFT. Need to figure out if it works in other sites and if not, why not.

Get list of ED2 pfts

Create table with columns: pft number, number of sites in PNW, number of sites in SEUS

Remove irrelevant parameters from Setaria PFT

mort2
growth_resp_factor
leaf_width
root_respiration_rate
Vm_low_temp

Marked as not relevant to current study. Remove these from SetariaWT pft to use ED2 defaults.

Separate Patches

The initial idea of having three patches representing three distinct ecosystems didn't appear to work. After just a few timepoints in the simulation it appears that all PFTs were in all patches. Rather than figuring out how to make patches not interact and not split, it might be better to just do these as separate runs. It'll take ~3x longer, but we already know how to do it.

turn on SA

Gridded predictions and figures

Train gridded model with data from #35. Use that model to predict AGB of setaria across the southeast US. Create a heatmap.

Modify workflow.R to delete SA runs for non-setaria PFTs

Need to also modify runlist, and this should be fine according to Mike

Reproduce modeling results

Work on understanding Kristina's code and getting it to run with new results.

https://github.com/cct-datascience/model-vignettes/blob/master/ED2/ensembles_modeling/predict_growth.Rmd

Figure out PFT mappings

Files from Mike have ED2 pft numbers, but we need to create a lookup table to decide which PEcAn PFTs these match to, potentially taking into account whether the sites are in PNW (pacific northwest) or SEUS (south east US). Also keeping in mind that 3 PFTs are "upgraded":

12 -> 9
13 -> 10
14 -> 3

Although I'm suspicious about that last one since in modern PEcAn ED PFT 3 is broadleaf_evergreen_tropical_tree and it doesn't make sense that it occurs in the PNW sites

Work out code for data wrangling and plotting

Can start work on this after brainstorming figures (https://github.com/danforthcenter/sentinel-detection/issues/421)

Extract and clean data from .nc files
re-create SA plots so they are customizable
create timeseries plots of ensemble analysis

Jobs started with run.sh don't use `renv`

R CMD BATCH doesn't load .Rprofile, so doesn't use renv and uses different package versions.

Solution:
at the top of workflow.R source(".Rprofile"). This doesn't work though, unless you provide an absolute path or setwd(), both of which aren't great solutions. Maybe can do somethuing like setwd(../..)?

April MSR

Make a slide for the April MSR

Split workflow.R

Split workflow.R into setup script, submit runs script, and process results script. This will hopefully reduce fragility and allow for some manual checking while maintaining some automation.

Also create checklists (could be programmatic, could be literal list) for each stage.

"Production" runs

Longer runs, more ensembles, all the sites, SA with ± 1, 2, 3, SD, etc.

Simplify setup code

Use bash script to edit met headers instead of complicated R code.

copy to HPC
Edit files in place with sed -i
cct-datascience/organization#1049 (comment)

Create plots for report

Sensitivity / elasticity plots
- For Setaria AGB
- Can be default output by PEcAn
Timeseries plot (median ± uncertainty from ensemble run)
Pointrange, slab, raincloud, or something like that showing median ± uncertainty AGB after 10 yrs (https://mjskay.github.io/ggdist/articles/freq-uncertainty-vis.html)

debug model2netcdf.ED2()

For some reason model2netcdf.ED2() fails for the prairie run at site 1123 for ensemble member 1 at 2003.

> model2netcdf.ED2(
       # .x,
       outdirs[[1]],
       29.365195,
       -82.810137,
       '2002-06-01',
       '2012-06-30',
       c(
         SetariaWT = 1L,
         sentinel_ebifarm.c3grass = 5L,
         sentinel_ebifarm.c4grass = 16L,
         ebifarm.forb = 12L
       ),
       process_partial = TRUE
     )

2023-03-20 14:57:05 INFO   [model2netcdf.ED2] : 
   ----- Processing year: 2002 
2023-03-20 14:57:05 INFO   [read_E_files] : 
   *** Reading -E- file *** 
2023-03-20 14:57:15 INFO   [model2netcdf.ED2] : 
   *** Writing netCDF file *** 
2023-03-20 14:57:15 INFO   [model2netcdf.ED2] : 
   ----- Processing year: 2003 
2023-03-20 14:57:15 INFO   [read_E_files] : 
   *** Reading -E- file *** 
2023-03-20 14:57:25 INFO   [model2netcdf.ED2] : 
   *** Writing netCDF file *** 
Error in ncdf4::ncvar_put(nc = nc, varid = varid, vals = vals, start = start,  : 
  ncvar_put: error: you asked to write 48 values, but the passed data array only has 44 entries!
In addition: Warning message:
Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped
  data frame and adjust accordingly.
ℹ The deprecated feature was likely used in the PEcAn.ED2 package.
  Please report the issue to the authors.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

Update singularity container on HPC (again)

Wait for PecanProject/pecan#3140 to be merged and docker image to be pushed

Start ED2 jobs

Decide how to "plant" Setaria

Currently I've added Setaria by editing .css files to include Setaria in every patch with dbh = 0.6 (cm) and n = 1 (plants/m^2). dbh =0.6 came from what @KristinaRiemer used in previous runs with .css files and n=1 came from @dlebauer's suggestion. If these numbers are fine, this can be closed. Otherwise let's decide on a method.

Update signularity container on HPC

So that model2netcdf.ED2() runs on the HPC

Select sites

We said we'd try for ~ 50 sites. Someone needs to come up with criteria and/or a sampling procedure for narrowing down the sites.

Try forking just PEcAn.ED2

https://docs.github.com/en/get-started/using-git/splitting-a-subfolder-out-into-a-new-repository

Clone and simplify non-setaria PFTs

Remove most priors from PFTs. Keep a few parameters so SA and ensemble analysis generate variation. Replace uniform priors when possible.

Params to keep:

SLA
Vcmax
stomatal slope
cuticular conductance
quantim efficiency
fineroot2leaf

Some more "custom" pfts like the ebifarm.forb might need to have more priors to make it more distinct from the default ED2 PFT it's based on.

Add cleanup code

Remove .h5 files if .nc files were successfully created at the end of a run so Welsch doesn't fill up.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.