pecanproject / pecan Goto Github PK

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.

License: Other

R 66.73% Shell 5.22% Fortran 21.28% Makefile 0.11% CSS 0.50% Python 2.09% C++ 0.64% PHP 2.46% JavaScript 0.02% HTML 0.60% TeX 0.09% MATLAB 0.01% C 0.03% Dockerfile 0.23% AMPL 0.01%

ecosystem-model pecan r national-science-foundation ecosystem-science bayesian plants meta-analysis data-science data-assimilation

pecan's People

Contributors

Stargazers

Watchers

Forkers

tonycohen braczka dwng jingxia yihua2 azeizat jschen3 lily2 kemball betydb aabarker lovelilyinfall yaass3 jhmatthes mtmarsh2 simongoring cltucker mdietze rgknox jam2767 hardimanb bcow apourmok ekearsle serbinsh ankurdesai marvinj sidvas fjz92419 kkremers ajmann4 crollinson ebimodeling francescaschiav heyuting adrienfinzi crlsierra araiho zhangwenx rykelly ashiklom gbromley npp97 mchenry ncsa sludge1 viskari alvinaj ch1eroe1 ecfield2 gtomic2 tonygardella davidjpmoore haiyangzhang798 gitter-badger haohuuiuc lumingtan andydawson istfer jsimkins2 andreschettino pengbinpeluo mdekauwe condontd gavg712 bquillin12 mjwang1010 arocha1 mccabete mekevans harshagrawal28 mathurshikhar vandit15 infotroph yogeshdarji feng22 zhangyaonju amanskywalker shubhamagarwal92 annethomas eliandrotavares kragosta eliask1 liamburke24 nemochina2008 benda1997 sbs2019 judgementc colinaverill yjkim1028 femeunier deephydro tobeycarman walkeranthonyp yan130 kenpryor67 emilywhitaker yanzhao2008 chaitanyachavali para2x

pecan's Issues

How to use dbfile.input.check (and related functions)

I am not sure if this is a bug, but I can't get dbfile.input.check to work, for example, I would like to find the MsTMIP climate driver data so I use

dbfile.input.check(1118, "2004-01-01 06:00:00", "2004-01-01 06:00:00", "application/x-netcdf", "MsTMIP driver", con)

But I don't get anything returned.

Also, most of the functions in dbfiles.R are not exported. Is this supposed to be the case? Are these functions in progress (I don't see any tests or other uses in pecan directory)?

append, concatenate, rechunk, and register global met drivers

Feature definition

format should be MsTMIP standard (these are MsTMIP drivers so that should be easy)
facilitate code that will quickly read a time series for each point on the global land surface at ~0.5 degree resolution.
- chunk for fast time series reads.
- combine multiple variables into a single file
- combine multiple years into a single file

Rob and I wrote a set of scripts to combine the NCEP 2 degree global data set. These are in pecan/modules/data.atmosphere/inst/scripts/ncep.

We want to do the same with cru-ncep 0.5 degree data, as described in pecan/modules/data.atmosphere/inst/scripts/cruncep/README.md

But the Cru-ncep 0.5 degree x > 100 year data set (~1TB) requires more careful thought.

Todo:

review scripts in pecan/modules/data.atmosphere/inst/scripts/cruncep/
determine:
- should rechunking come before should we use compression (what is the space vs io speed tradeoff?)
- what is the "optimal" chunking (or perhaps this is project specific, depending on size of regional simulations) (relevant SO question)
- do formats of these files meet the defacto MsTMIP standards, else revise and update scripts to modify files
- What hardware is required / preferred:
- if possible, to be done on the 'pecandev.igb.illinois.edu' server. (20GB RAM).
- If you would benefit from parallel processing or large memory (64 or 1024GB RAM) and are familiar with queuing, modules, and parallel processing I can get you an account on this machine: http://help.igb.illinois.edu/Biocluster
revise scripts and run them based on results of above

Possible bug with ~/model/ed/R/write.configs.ed.R

@serbinsh @dlebauer I think I found a problem with the way that 'write.configs.ed.R' converts the leaf_respiration_rate_m2 to dark respiration factor. In order to convert: the code does the following 1) re-scales both the leaf_respiration_rate_m2 and Vcmax from 25C to 15C 2) then dark_resp_factor= leaf_resp@15C/Vcmax@15C.

However, it seems the code rescales Vcmax from 25 C two times, for example starting at line 53.

 if('Vcmax' %in% names(trait.samples)) {
    vcmax <- trait.samples[['Vcmax']]
    trait.samples[['Vcmax']] <- arrhenius.scaling(vcmax, old.temp = 25, new.temp = 15)
  }

  ## Convert leaf_respiration_rate_m2 to dark_resp_factor
  if('leaf_respiration_rate_m2' %in% names(trait.samples)) {
    leaf_resp = trait.samples[['leaf_respiration_rate_m2']]
    vcmax <- trait.samples[['Vcmax']]

    ## First scale variables to 15 degC
    trait.samples[['leaf_respiration_rate_m2']] <- 
      arrhenius.scaling(leaf_resp, old.temp = 25, new.temp = 15)
    vcmax_15 <- arrhenius.scaling(vcmax, old.temp = 25, new.temp = 15)

    ## Calculate dark_resp_factor -- Will be depreciated when moving from older versions of ED2
    trait.samples[['dark_respiration_factor']] < trait.samples[['leaf_respiration_rate_m2']]/
      vcmax_15

It seems that Vcmax has already been re-scaled to 15C even before it enters into the dark respiration factor loop, because the code rescales the Vcmax parameter anyway in the previous section of code. I tested this out and it seems to be the case. For example my median posterior value for leaf respiration rate from the meta-analysis is 2.433. If I convert this by hand correctly, assuming Vcmax@25C=79.357, then I get a dark respiration factor of 0.0306. However, if I rescale Vcmax twice, I get a dark respiration factor of 0.0435, which is exactly what the code is giving me.
Does my reasoning make sense or am I way off on this? This might explain why I have been getting 'high' dark respiration factor values from PECAN, that don't work with my simulation. Not sure if the code has recently switched to using Rd0 only, but I am still using dark respiration factor.

if outdir not specified use workflow id

If outdir is not specified use PEcAn_ as with web.

quickbuild script to load all source without building.

when dev_mode is on, it uses a library (~/R-dev by default) that is only in the .libPaths() when dev_mode() is on, (and not in R_LIBS_USER paths). The build.sh should not write to this development library by default.

To get around having to recompile all of the packages, I have a script that just loads all of the functions and data into a dev environment:

#!/usr/bin/env Rscript                                                                                                           

library(devtools)                                                                                                                
dev_mode(on = TRUE)                                                                                                                                                    
lapply(list("utils", "db", "settings", ...package list truncated, ... "all"),                                   
       function(x) install(x, quick = TRUE, local = TRUE, quiet = TRUE))

It would be quicker if it only loaded updated files / packages, but I have already written the above and can submit, but I wanted to write this up as a place for feedback.

An alternative would be to update build.sh to install to dev library packages and a -q to specify --no-install-vignettes and other speed enhancing features.

write a function to generate density plots for each prior in the bety priors table

first review related code, e.g.

utils/R/distn.stats.R
utils/R/plots.R
utils/R/utils.R

and the modules/priors package.

Checking dependencies when modules are required

build.sh will not run without the following on HPC systems:

module load netcdf nco R jags gdal

Some options:

place a message in "check dependences"? (e.g. instructing user to echo "module load netcdf nco R jags gdal" > ~/.bashrc)
add a line to scripts/build.sh to load modules if module system is being used

if type module > /dev/null; then 
  module load netcdf/4.3.1.1 nco R/3.0.2 JAGS  gdal/1.10.1
fi

Suggestions?

qsub not found

I get the following error when I perform the 'start.model.run' step in PECAN

bash: qsub: command not found
Error in system2("ssh", c(settings$run$host$name, qsub, file.path(settings$run$host$rundir,  : 
  error in running command

I think it may be a problem with the way I declared my tag in the xml file:

<qsub>qsub -N @NAME@ -o @STDOUT@ -e @STDERR@ -l nodes=1:ppn=1 -l walltime=23:00:00</qsub>

Do you notice anything I am doing wrong in declaring the qsub tag?
(see redmine #1920)

job.sh on remote cluster doesn't have pecan libraries

running job.sh on the BU geo cluster gave the following error

require (PEcAn.ED2)
Loading required package: PEcAn.ED2
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called 'PEcAn.ED2'
model2netcdf.ED2('/usr4/ugrad/mgianott/geoDemo//15', 71.3225, -156.626, '2000/01/01', '2001/01/01')
Error: could not find function "model2netcdf.ED2"
Execution halted
cp: cannot stat `/usr4/ugrad/mgianott/ED.r82/ED/run/15/README.txt': No such file or directory

Small Results page tweak

Just thought of a small tweak to 08-finished.php that I think will make it a bit more intuitive. Under "Outputs" could you move the box "File" and the button "Show Run Output" to after "Plot run/year/variable" and then rename "Show Run Output" to "Show Run File". This way the selection boxes for Run, Year, and Variable are right in a row and followed by "Plot run/year/variable" and there is less ambiguity in "Show Run Output" (for me show output and plot output are too close to synonyms, but show file makes this more clear)

spaces in path names break runs

Having a space in the path name breaks the job.sh

PEcAn R modules demo - permissions bug

Description: Cannot open connection to database and write Rdata files for get.trait.data() in R modules demo (step 4.6) (writing from Path: home/carya/pecan/output/PEcAn_1/pft/). Below is one of the twenty warning messages:

In file.remove(old.files[which(file.info(list.files(path = settings$pfts[i]$pft$outdir, ... :
cannot remove file '/home/carya/output//PEcAn_1/pft/1//SLA.model.bug', reason 'Permission denied'

Reproduction Steps: Follow R modules demo hand out up to step 4.6. Running get.trait.data() will fail and return multiple error / warning messages.

Steps to Fix: Changed R/W permissions via shell

add tests to build script

Should be able to launch integration tests from build script.

Error upgrading bety on 1.3.0 virtual machine

When I try to follow the instructions on the wiki for upgrading bety, the sudo bundle install line results in the following error:

Installing nokogiri (1.6.0) 
Gem::InstallError: nokogiri requires Ruby version >= 1.9.2.
An error occurred while installing nokogiri (1.6.0), and Bundler cannot continue.
Make sure that `gem install nokogiri -v '1.6.0'` succeeds before bundling.

I have tried upgrading ruby, installing rvm, running sudo bundle update, but none of these fix the issue.

Document special tags in ED

Need to document all the special tags in ED2IN templates.

How should I update tests/workflow.R for simpler debugging?

What I have been doing is putting my test pecan.xml in the working directory, adding a debugonce function after require(PEcAn.all), e.g.

require(PEcAn.all)
debugonce(write.config.BIOCRO)

and then launching R and using source to run the file

source('workflow.R')

Is there a way to launch this script from the command line, and have it stay in an R session if / when it enters a debugging situation?
Is it okay if I convert this to an Rscript so that I can launch it with

workflow.R <pecan.xml>

pull down menu in webinterface should show mstmip variables

Should show Long.Name instead of abbreviation

How can I keep an ssh tunnel open?

Using a server that doesn't support rsa keys, I'd like to be able to connect without entering my password each time PEcAn exchanges information with a server (rsync, ssh, etc). What settings do I need?

workflow should abort if model fails

Currently we don't check the result of a model run. If the model fails we should have our script throw an error (exit code != 0). We can either leverage the error code from the model, or check if the output file was generated.

move biocro.xml from web/template to biocro/inst

rename file to be biocro.revsion same as for ED. If no template specified in pecan.xml it should try to load the biocro.revision file.

HDF5 source updated, need to update Wiki page instructions

the source found at http://www.hdfgroup.org/ftp/HDF5/current/src/ has been updated to hdf5-1.8.11.tag.gz. I had to make this change when installing on my make to get HDF5 libs installed (on my Mac OSX machine). All other instructions for HDF5 still worked fine.

read.settings overwrites pecan.xml

read.settings writes the corrected settings to a file in current folder as pecan.xml. Often this will overwrite the pecan.xml file that was originally loaded and will result in a complex pecan.xml file.

db.exists leaks database handles

db.exists checks to see if the database exists. Right now it opens a connection and returns. We should add a simple select statement to test for read access and a simple write to test write permissions (we could rewrite a single field that was fetched for the write test).

merge ED templates in models/ed and web

Right now the templates are in both models/ed/inst and web/templates. This is confusing and the files should be merged

Change settings database/username to database/user

since R arguments use partial matching, user will work for both the MySQL and PostgreSQL drivers.

Need to

Update wiki page (https://github.com/PecanProject/pecan/wiki/PEcAn-Configuration#functions)
find and update all test settings files: find . -name "*.xml" | xargs grep username

related to redmine 1860

model "id" tag required in settings file

If I run read.settings with the current biocro settings file (inst/extdata/settings.biocro.xml), I get this error:

Error in mysqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not run statement: Incorrect integer value: '' for column 'model_id' at row 1)
Called from: mysqlExecStatement(conn, statement, ...)

I propose to fix it by setting the default model id = -1 (not all uses of a PEcAn settings file require a model). An alternative would be to use other model tags (like the name or path) to fix this.

Thoughts?

(github testing) What happens to this issue?

Who receives it?

When received by email, will the reply via email be recorded on github (e.g. without requiring a luddite to open up a browser)?

That would be a huge benefit over Redmine.

Reducing size of T file sent to PECAN

In an effort to try to speed up my sensitivity analysis runs, I have realized I only need a handful of state/flux variables in my T file in order to get the results I need from PECAN. Right now there are roughly 40+ state variables loaded to the T files where I really only need ~5.

I think I can do the following: 1) go into the ED2 source code and turn off those state variables written to the T file I don't need, then 2) edit the model2netcdf.ED2.R script within PECAN to eliminate the conversion of variables I don't need. Does that sound reasonable, or will this cause big problems? Any advice on what files to edit would be appreciated.

selectsite.php needs to reselect items if possible

When selecting a new host/model it should try to preserve site/model

start/end date sensitivity analysis

set start.date/end.date for sensitivy.analysis default to start/end date run if no start/end date is specified in check.settings.

New Spatial Features for PostGIS (Discussion)

please update original issue as ideas are added / changed in comments

What are data types that we are dealing with?

PALSAR data
Met data as points and polygons
Soils data (polygon format)
Remote sensing

Scenarios / stories for how we will use data?

global / regional runs
selection of drivers; interpolation / integration of site level and reanalysis data
hierarchical sites (flea on fly on leaf on plant in plot in block in field ...)

What new tables do we need?

regions (possibly polymorphic) linked to sites, inputs, ???
- fields: id, site_id, geom, created_at, updated_at, (citation_id / source)

How will we convert this to model regions?

can we start assuming that we use bounding boxes?
what new tags should we add to the settings file?

updatedb.sh and addsites.sh database incompatibility

Versions of updatedb.sh and addsites.sh available here do not match the database version http://isda.ncsa.illinois.edu/~kooper/EBI/betydump.sql .

The columns have changed. models.revision is now numeric where it is passed a value of 'unk' from the script.

addsites.sh also references nonexistent columns.

I want to use a parameter prior without performing a meta-analysis.

Is there a way to include a parameter in the PEcAn sensitivity analysis, but to 'turn off' the meta-analysis. In other words is there a way to use the parameter prior from the database directly and not automatically query trait data if some exists.... other than removing the trait data itself? Can this be parameter specific (i.e. still perform meta-analysis for other parameters?)

revision in bety.models should be string not decimal

web interface could end up using posteriors as inputs

Right now the web interface uses file_id instead of id as well as it does not check to see if a file is an input and not posterior.

This is was not really an issue before but now can result in trouble since posteriors are also stored in dbfiles. This could result in files being returned that are posteriors and not inputs.

Enable login to pecan-web

Should enable a way to put pecan-web behind a username/password. We can use the same username/password as stored in bety database. Most of the code already exists in the db app.

Add mstpmip variables from utils.R into mstmip_local.csv

Right now utils.R defines some variables for mstmip that are not in standard. These should be added to a local addition to the mstmip file.

ED template should use met start/end date not run start/end dates

When writing the ED2IN file it should use the MET start/end date for NL%METCYC1 and NL%METCYCF. The templates are wrong and have @START_YEAR@ which should be @MET_START_YEAR@ and should be defined in write.config.ED.

Update parameter density plotting functions to use ggplot2 v>=9.2

The function plot.trait and its dependencies were written using the syntax of ggplot2 v 0.90.

Mostly this will require changing use of the opts() function to theme() and renaming theme_xx() functions to element_xx(). There is a comprehensive guide here: https://github.com/wch/ggplot2/wiki/New-theme-system

This should be very similar to the changes made when I updated the variance decomposition and sensitivity plots (0dde7f0)

To get a list of the warnings thrown by ggplot2 deprecated functions, try the following:

library(PEcAn.priors)
plot.trait("Vcmax", prior = list('norm',1,1), posterior.sample = rnorm(1000, 1.1, 0.5))
help(package = "PEcAn.priors")
?plot.trait

Remove settings as a global variable

Many places in the code assume settings as a global variable, this should be removed redmine-1636

start.model.runs not connecting to remote host in 1.3.4rc1

I can run the command that this generates at the linux prompt and it runs fine, but it fails when run within R. In this context I'm connecting to geo.bu.edu and I already have an ssh connection open between the VM and the cluster.

All subsequent ssh and rsync commands fail as well.

We were able to get all of this to (almost) work in the previous VM version (all files were copied back and forth but the submission to the queue was failing), but now I can't even get the files to copy.

> system2("ssh", c(settings$run$host$name, "mkdir", 
+                  "-p", file.path(settings$run$host$rundir, run)), 
+         stdout = TRUE)
ssh_askpass: exec(rpostback-askpass): No such file or directory
Permission denied, please try again.
ssh_askpass: exec(rpostback-askpass): No such file or directory
Permission denied, please try again.
ssh_askpass: exec(rpostback-askpass): No such file or directory
Permission denied (password,hostbased).
Warning: running command ''ssh' [email protected] mkdir -p /usr2/faculty/dietze/ED.r82/ED/run/56' had status 255

Update "Troubleshooting PEcAn" page (or migrate existing content)

The "troubleshooting pecan" wiki provides a some details about a very specific PHP issue. This should probably be filed as a resolved issue with question and answer here on GitHub. But a more general description of how to debug could be handy, including use of debugonce (with links to relevant documentation such as Hadley Wickham's chapter on debugging and exceptions as well as links to our wiki page on testing

tests for read.output commented out

This contains tests dependent on model specific code, which should be in the model specific package as well as code that depends on the DB package for read.settings.

Util package should not be dependent on any other PEcAn package.

ED failure does not trigger Error

If ED fails to execute the PEcAn code thinks it finished correctly and does not trigger an error, resulting the web interface sitting in an endless loop.

Alternate model output visualizations

It would be great if in addition to a standard time-series plot of output we could have PEcAn generate a flux "fingerprint" plot (e.g. figure 4 in https://www.ideals.illinois.edu/bitstream/handle/2142/34295/Hickman_George.pdf ) and/or a "weather radial" plot (http://www.weather-radials.com)

ED2 output type and frequency required for PECAN

Does PECAN always require 'T' files in order to perform the sensitivity analysis and variance decomposition plots. If I could output the yearly mean flux of NEE and NPP with the 'Y' file can PECAN use that instead?
The way PECAN is set up currently, is it even necessary to output any other files besides the 'T' files in order to perform the sensitivity analysis? It seems it doesn't use the monthly or yearly files, but I haven't looked at this very closely.

create met2CF.Alma

Eventually will need a function that converts old ALMA files to netCDF CF format. Some legacy ALMA files have already been converted to ED specific format.

Installing packages with complex dependencies

On Stampede, I was unable to install ncdf4, udunits2, rjags, MCMCpack, and rgdal.

These will all require local installation of dependencies. Has anyone done this? Is the best approach to build all of the dependencies from source? Are there tricks to identifying and loading the correctly configured modules? I currently have this in my .bashrc but am still testing it out:

module load gsl hdf5 parallel-netcdf/4.2.1.1 netcdf nco udunits/2.1.24 R_mkl/3.0.1

In general, it seems that it might be useful to share libraries on such a machine rather than have everyone fiddle around. e.g. if I set /home1/02014/dlebauer/lib/R to read-only, and people put libPaths("~/lib/R/", "/home1/02014/dlebauer/lib/R") it in their .Rprofile, they wouldn't have to install any dependencies, but could if they wanted to (e.g. a different version of PEcAn etc).

biocro should list all sites in bety

biocro does not have restrictions on input data and thus should work on all sites.

Calculation of variance in variance decomposition plot

I noticed that for the variance decomposition plot that the absolute variance is calculated for each parameter and not the partial variance. This is because within line 145 of 'sensitivity.analysis.R' :

partial.variances <- variances #/ sum(variances)

The sum(variances) was commented out leaving behind the total variance for each parameter. This doesn't fundamentally change the interpretation of the variance decomposition plot, but was just curious why this was done Was it to keep the variance units??