Giter VIP home page Giter VIP logo

pecanproject / pecan Goto Github PK

View Code? Open in Web Editor NEW
198.0 38.0 228.0 398.6 MB

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.

Home Page: www.pecanproject.org

License: Other

R 66.73% Shell 5.22% Fortran 21.28% Makefile 0.11% CSS 0.50% Python 2.09% C++ 0.64% PHP 2.46% JavaScript 0.02% HTML 0.60% TeX 0.09% MATLAB 0.01% C 0.03% Dockerfile 0.23% AMPL 0.01%
ecosystem-model pecan r national-science-foundation ecosystem-science bayesian plants meta-analysis data-science data-assimilation

pecan's People

Contributors

aariq avatar amanskywalker avatar ankurdesai avatar annethomas avatar apourmok avatar araiho avatar ashiklom avatar ayushprd avatar bcow avatar bpbond avatar crollinson avatar dlebauer avatar dongchenz avatar henrikajasilta avatar infotroph avatar istfer avatar jam2767 avatar kzarada avatar liamburke24 avatar luke-dramko avatar mdietze avatar meetagrawal09 avatar moki1202 avatar mukulmaheshwari avatar nanu1605 avatar para2x avatar robkooper avatar tezansahu avatar tobeycarman avatar tonygardella avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pecan's Issues

How to use dbfile.input.check (and related functions)

I am not sure if this is a bug, but I can't get dbfile.input.check to work, for example, I would like to find the MsTMIP climate driver data so I use

dbfile.input.check(1118, "2004-01-01 06:00:00", "2004-01-01 06:00:00", "application/x-netcdf", "MsTMIP driver", con)

But I don't get anything returned.

Also, most of the functions in dbfiles.R are not exported. Is this supposed to be the case? Are these functions in progress (I don't see any tests or other uses in pecan directory)?

append, concatenate, rechunk, and register global met drivers

Feature definition

  1. format should be MsTMIP standard (these are MsTMIP drivers so that should be easy)
  2. facilitate code that will quickly read a time series for each point on the global land surface at ~0.5 degree resolution.
    • chunk for fast time series reads.
    • combine multiple variables into a single file
    • combine multiple years into a single file

Rob and I wrote a set of scripts to combine the NCEP 2 degree global data set. These are in pecan/modules/data.atmosphere/inst/scripts/ncep.

We want to do the same with cru-ncep 0.5 degree data, as described in pecan/modules/data.atmosphere/inst/scripts/cruncep/README.md

But the Cru-ncep 0.5 degree x > 100 year data set (~1TB) requires more careful thought.

Todo:

  1. review scripts in pecan/modules/data.atmosphere/inst/scripts/cruncep/
  2. determine:
    • should rechunking come before should we use compression (what is the space vs io speed tradeoff?)
    • what is the "optimal" chunking (or perhaps this is project specific, depending on size of regional simulations) (relevant SO question)
    • do formats of these files meet the defacto MsTMIP standards, else revise and update scripts to modify files
    • What hardware is required / preferred:
    • if possible, to be done on the 'pecandev.igb.illinois.edu' server. (20GB RAM).
    • If you would benefit from parallel processing or large memory (64 or 1024GB RAM) and are familiar with queuing, modules, and parallel processing I can get you an account on this machine: http://help.igb.illinois.edu/Biocluster
  3. revise scripts and run them based on results of above

Possible bug with ~/model/ed/R/write.configs.ed.R

@serbinsh @dlebauer I think I found a problem with the way that 'write.configs.ed.R' converts the leaf_respiration_rate_m2 to dark respiration factor. In order to convert: the code does the following 1) re-scales both the leaf_respiration_rate_m2 and Vcmax from 25C to 15C 2) then dark_resp_factor= leaf_resp@15C/Vcmax@15C.

However, it seems the code rescales Vcmax from 25 C two times, for example starting at line 53.

 if('Vcmax' %in% names(trait.samples)) {
    vcmax <- trait.samples[['Vcmax']]
    trait.samples[['Vcmax']] <- arrhenius.scaling(vcmax, old.temp = 25, new.temp = 15)
  }

  ## Convert leaf_respiration_rate_m2 to dark_resp_factor
  if('leaf_respiration_rate_m2' %in% names(trait.samples)) {
    leaf_resp = trait.samples[['leaf_respiration_rate_m2']]
    vcmax <- trait.samples[['Vcmax']]

    ## First scale variables to 15 degC
    trait.samples[['leaf_respiration_rate_m2']] <- 
      arrhenius.scaling(leaf_resp, old.temp = 25, new.temp = 15)
    vcmax_15 <- arrhenius.scaling(vcmax, old.temp = 25, new.temp = 15)

    ## Calculate dark_resp_factor -- Will be depreciated when moving from older versions of ED2
    trait.samples[['dark_respiration_factor']] < trait.samples[['leaf_respiration_rate_m2']]/
      vcmax_15

It seems that Vcmax has already been re-scaled to 15C even before it enters into the dark respiration factor loop, because the code rescales the Vcmax parameter anyway in the previous section of code. I tested this out and it seems to be the case. For example my median posterior value for leaf respiration rate from the meta-analysis is 2.433. If I convert this by hand correctly, assuming Vcmax@25C=79.357, then I get a dark respiration factor of 0.0306. However, if I rescale Vcmax twice, I get a dark respiration factor of 0.0435, which is exactly what the code is giving me.
Does my reasoning make sense or am I way off on this? This might explain why I have been getting 'high' dark respiration factor values from PECAN, that don't work with my simulation. Not sure if the code has recently switched to using Rd0 only, but I am still using dark respiration factor.

quickbuild script to load all source without building.

when dev_mode is on, it uses a library (~/R-dev by default) that is only in the .libPaths() when dev_mode() is on, (and not in R_LIBS_USER paths). The build.sh should not write to this development library by default.

To get around having to recompile all of the packages, I have a script that just loads all of the functions and data into a dev environment:

#!/usr/bin/env Rscript                                                                                                           

library(devtools)                                                                                                                
dev_mode(on = TRUE)                                                                                                                                                    
lapply(list("utils", "db", "settings", ...package list truncated, ... "all"),                                   
       function(x) install(x, quick = TRUE, local = TRUE, quiet = TRUE))                                                         

It would be quicker if it only loaded updated files / packages, but I have already written the above and can submit, but I wanted to write this up as a place for feedback.

An alternative would be to update build.sh to install to dev library packages and a -q to specify --no-install-vignettes and other speed enhancing features.

Checking dependencies when modules are required

build.sh will not run without the following on HPC systems:

module load netcdf nco R jags gdal

Some options:

  1. place a message in "check dependences"? (e.g. instructing user to echo "module load netcdf nco R jags gdal" > ~/.bashrc)
  2. add a line to scripts/build.sh to load modules if module system is being used
if type module > /dev/null; then 
  module load netcdf/4.3.1.1 nco R/3.0.2 JAGS  gdal/1.10.1
fi

Suggestions?

qsub not found

I get the following error when I perform the 'start.model.run' step in PECAN

bash: qsub: command not found
Error in system2("ssh", c(settings$run$host$name, qsub, file.path(settings$run$host$rundir,  : 
  error in running command

I think it may be a problem with the way I declared my tag in the xml file:

<qsub>qsub -N @NAME@ -o @STDOUT@ -e @STDERR@ -l nodes=1:ppn=1 -l walltime=23:00:00</qsub>

Do you notice anything I am doing wrong in declaring the qsub tag?
(see redmine #1920)

job.sh on remote cluster doesn't have pecan libraries

running job.sh on the BU geo cluster gave the following error

require (PEcAn.ED2)
Loading required package: PEcAn.ED2
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called 'PEcAn.ED2'
model2netcdf.ED2('/usr4/ugrad/mgianott/geoDemo//15', 71.3225, -156.626, '2000/01/01', '2001/01/01')
Error: could not find function "model2netcdf.ED2"
Execution halted
cp: cannot stat `/usr4/ugrad/mgianott/ED.r82/ED/run/15/README.txt': No such file or directory

Small Results page tweak

Just thought of a small tweak to 08-finished.php that I think will make it a bit more intuitive. Under "Outputs" could you move the box "File" and the button "Show Run Output" to after "Plot run/year/variable" and then rename "Show Run Output" to "Show Run File". This way the selection boxes for Run, Year, and Variable are right in a row and followed by "Plot run/year/variable" and there is less ambiguity in "Show Run Output" (for me show output and plot output are too close to synonyms, but show file makes this more clear)

PEcAn R modules demo - permissions bug

Description: Cannot open connection to database and write Rdata files for get.trait.data() in R modules demo (step 4.6) (writing from Path: home/carya/pecan/output/PEcAn_1/pft/). Below is one of the twenty warning messages:

In file.remove(old.files[which(file.info(list.files(path = settings$pfts[i]$pft$outdir, ... :
cannot remove file '/home/carya/output//PEcAn_1/pft/1//SLA.model.bug', reason 'Permission denied'

Reproduction Steps: Follow R modules demo hand out up to step 4.6. Running get.trait.data() will fail and return multiple error / warning messages.

Steps to Fix: Changed R/W permissions via shell

Error upgrading bety on 1.3.0 virtual machine

When I try to follow the instructions on the wiki for upgrading bety, the sudo bundle install line results in the following error:

Installing nokogiri (1.6.0) 
Gem::InstallError: nokogiri requires Ruby version >= 1.9.2.
An error occurred while installing nokogiri (1.6.0), and Bundler cannot continue.
Make sure that `gem install nokogiri -v '1.6.0'` succeeds before bundling.

I have tried upgrading ruby, installing rvm, running sudo bundle update, but none of these fix the issue.

How should I update tests/workflow.R for simpler debugging?

What I have been doing is putting my test pecan.xml in the working directory, adding a debugonce function after require(PEcAn.all), e.g.

require(PEcAn.all)
debugonce(write.config.BIOCRO)

and then launching R and using source to run the file

source('workflow.R')
  1. Is there a way to launch this script from the command line, and have it stay in an R session if / when it enters a debugging situation?
  2. Is it okay if I convert this to an Rscript so that I can launch it with
workflow.R <pecan.xml>

How can I keep an ssh tunnel open?

Using a server that doesn't support rsa keys, I'd like to be able to connect without entering my password each time PEcAn exchanges information with a server (rsync, ssh, etc). What settings do I need?

workflow should abort if model fails

Currently we don't check the result of a model run. If the model fails we should have our script throw an error (exit code != 0). We can either leverage the error code from the model, or check if the output file was generated.

read.settings overwrites pecan.xml

read.settings writes the corrected settings to a file in current folder as pecan.xml. Often this will overwrite the pecan.xml file that was originally loaded and will result in a complex pecan.xml file.

db.exists leaks database handles

db.exists checks to see if the database exists. Right now it opens a connection and returns. We should add a simple select statement to test for read access and a simple write to test write permissions (we could rewrite a single field that was fetched for the write test).

model "id" tag required in settings file

If I run read.settings with the current biocro settings file (inst/extdata/settings.biocro.xml), I get this error:

Error in mysqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not run statement: Incorrect integer value: '' for column 'model_id' at row 1)
Called from: mysqlExecStatement(conn, statement, ...)

I propose to fix it by setting the default model id = -1 (not all uses of a PEcAn settings file require a model). An alternative would be to use other model tags (like the name or path) to fix this.

Thoughts?

(github testing) What happens to this issue?

Who receives it?

When received by email, will the reply via email be recorded on github (e.g. without requiring a luddite to open up a browser)?

That would be a huge benefit over Redmine.

Reducing size of T file sent to PECAN

In an effort to try to speed up my sensitivity analysis runs, I have realized I only need a handful of state/flux variables in my T file in order to get the results I need from PECAN. Right now there are roughly 40+ state variables loaded to the T files where I really only need ~5.

I think I can do the following: 1) go into the ED2 source code and turn off those state variables written to the T file I don't need, then 2) edit the model2netcdf.ED2.R script within PECAN to eliminate the conversion of variables I don't need. Does that sound reasonable, or will this cause big problems? Any advice on what files to edit would be appreciated.

New Spatial Features for PostGIS (Discussion)

please update original issue as ideas are added / changed in comments

What are data types that we are dealing with?

  • PALSAR data
  • Met data as points and polygons
  • Soils data (polygon format)
  • Remote sensing

Scenarios / stories for how we will use data?

  • global / regional runs
  • selection of drivers; interpolation / integration of site level and reanalysis data
  • hierarchical sites (flea on fly on leaf on plant in plot in block in field ...)

What new tables do we need?

  • regions (possibly polymorphic) linked to sites, inputs, ???
    • fields: id, site_id, geom, created_at, updated_at, (citation_id / source)

How will we convert this to model regions?

  • can we start assuming that we use bounding boxes?
  • what new tags should we add to the settings file?

I want to use a parameter prior without performing a meta-analysis.

Is there a way to include a parameter in the PEcAn sensitivity analysis, but to 'turn off' the meta-analysis. In other words is there a way to use the parameter prior from the database directly and not automatically query trait data if some exists.... other than removing the trait data itself? Can this be parameter specific (i.e. still perform meta-analysis for other parameters?)

web interface could end up using posteriors as inputs

Right now the web interface uses file_id instead of id as well as it does not check to see if a file is an input and not posterior.

This is was not really an issue before but now can result in trouble since posteriors are also stored in dbfiles. This could result in files being returned that are posteriors and not inputs.

Enable login to pecan-web

Should enable a way to put pecan-web behind a username/password. We can use the same username/password as stored in bety database. Most of the code already exists in the db app.

Update parameter density plotting functions to use ggplot2 v>=9.2

The function plot.trait and its dependencies were written using the syntax of ggplot2 v 0.90.

Mostly this will require changing use of the opts() function to theme() and renaming theme_xx() functions to element_xx(). There is a comprehensive guide here: https://github.com/wch/ggplot2/wiki/New-theme-system

This should be very similar to the changes made when I updated the variance decomposition and sensitivity plots (0dde7f0)

To get a list of the warnings thrown by ggplot2 deprecated functions, try the following:

library(PEcAn.priors)
plot.trait("Vcmax", prior = list('norm',1,1), posterior.sample = rnorm(1000, 1.1, 0.5))
help(package = "PEcAn.priors")
?plot.trait

start.model.runs not connecting to remote host in 1.3.4rc1

I can run the command that this generates at the linux prompt and it runs fine, but it fails when run within R. In this context I'm connecting to geo.bu.edu and I already have an ssh connection open between the VM and the cluster.

All subsequent ssh and rsync commands fail as well.

We were able to get all of this to (almost) work in the previous VM version (all files were copied back and forth but the submission to the queue was failing), but now I can't even get the files to copy.

> system2("ssh", c(settings$run$host$name, "mkdir", 
+                  "-p", file.path(settings$run$host$rundir, run)), 
+         stdout = TRUE)
ssh_askpass: exec(rpostback-askpass): No such file or directory
Permission denied, please try again.
ssh_askpass: exec(rpostback-askpass): No such file or directory
Permission denied, please try again.
ssh_askpass: exec(rpostback-askpass): No such file or directory
Permission denied (password,hostbased).
Warning: running command ''ssh' [email protected] mkdir -p /usr2/faculty/dietze/ED.r82/ED/run/56' had status 255

tests for read.output commented out

This contains tests dependent on model specific code, which should be in the model specific package as well as code that depends on the DB package for read.settings.

Util package should not be dependent on any other PEcAn package.

ED failure does not trigger Error

If ED fails to execute the PEcAn code thinks it finished correctly and does not trigger an error, resulting the web interface sitting in an endless loop.

ED2 output type and frequency required for PECAN

  1. Does PECAN always require 'T' files in order to perform the sensitivity analysis and variance decomposition plots. If I could output the yearly mean flux of NEE and NPP with the 'Y' file can PECAN use that instead?

  2. The way PECAN is set up currently, is it even necessary to output any other files besides the 'T' files in order to perform the sensitivity analysis? It seems it doesn't use the monthly or yearly files, but I haven't looked at this very closely.

create met2CF.Alma

Eventually will need a function that converts old ALMA files to netCDF CF format. Some legacy ALMA files have already been converted to ED specific format.

Installing packages with complex dependencies

On Stampede, I was unable to install ncdf4, udunits2, rjags, MCMCpack, and rgdal.

These will all require local installation of dependencies. Has anyone done this? Is the best approach to build all of the dependencies from source? Are there tricks to identifying and loading the correctly configured modules? I currently have this in my .bashrc but am still testing it out:

module load gsl hdf5 parallel-netcdf/4.2.1.1 netcdf nco udunits/2.1.24 R_mkl/3.0.1

In general, it seems that it might be useful to share libraries on such a machine rather than have everyone fiddle around. e.g. if I set /home1/02014/dlebauer/lib/R to read-only, and people put libPaths("~/lib/R/", "/home1/02014/dlebauer/lib/R") it in their .Rprofile, they wouldn't have to install any dependencies, but could if they wanted to (e.g. a different version of PEcAn etc).

Calculation of variance in variance decomposition plot

I noticed that for the variance decomposition plot that the absolute variance is calculated for each parameter and not the partial variance. This is because within line 145 of 'sensitivity.analysis.R' :

partial.variances <- variances #/ sum(variances)

The sum(variances) was commented out leaving behind the total variance for each parameter. This doesn't fundamentally change the interpretation of the variance decomposition plot, but was just curious why this was done Was it to keep the variance units??

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.