Giter VIP home page Giter VIP logo

portalcasting's Introduction

hexagon software logo, light grey blue background, basic lettering at the top says portalcasting, main image is a drawn all black rodent standing on two feet with a fishing rod in hand and a brown fishing hat on head, standing next to a tan and green tackle box.

R-CMD-check Docker Codecov test coverage Lifecycle:maturing Project Status: Active – The project has reached a stable, usable state and is being actively developed. License DOI NSF-1929730 JOSS

Overview

The portalcasting package offers a comprehensive system for developing, deploying, and evaluating ecological models that forecast changes in ecological systems over time. It particularly focuses on the long-term study of mammal population and community dynamics, known as the Portal Project.

Core Depedencies

The portalcasting package depends on the PortalData and portalr packages.

  • PortalData is the collection of all the Portal project data.
  • portalr is a collection of functions to summarize the Portal data.

The portalcasting package integrates the PortalData repository and the portalr data management package into a streamlined pipeline. This pipeline is used to forecast.

The functionality of portalcasting extends beyond its deployment, as its functions are portable. This allows users to establish a fully-functional replica repository on either a local or remote machine, facilitating the development and testing of new models within a sandbox environment.

Current deployment:

The Portal-forecasts houses tools that leverage the portalcasting pipeline to generate weekly forecasts. The forecasts are then showcased on the Portal Forecasts website. This website offers users an interactive interface to explore the forecasting results. The source code for this website is hosted on GitHub. Additionally, the portal-forecasts repository archives the forecasts on both GitHub and Zenodo

Docker Container

We leverage a Docker container to enable reproducibility of the Portal forecasting. Presently, we use a Docker image of the software environment to create a container for running the code. The image is automatically rebuilt when there is a new portalcasting release, tagged with both the latest and version-specific (vX.X.X) tags, and pushed to DockerHub.

Because the latest image is updated with releases, the current main branch code in portalcasting is typically, but not necessarily always, being executed within the predictions repository.

The API is actively developed and welcomes any contributors.

Installation

You can install the package from github:

install.packages("remotes")
remotes::install_github("weecology/portalcasting")

You will need to install rjags and JAGS.

MacOS users are recommended to install rjags after reading the instructions on the package's README file, or use the JAGS discussion forum thread for help under the MacOS installation of JAGS.

install.packages("rjags", configure.args="--enable-rpath")

Production environment

If you wish to spin up a local container from the latest portalcasting image (to ensure that you are using a copy of the current production environment for implementation of the portalcasting pipeline), you can run

sudo docker pull weecology/portalcasting

from a shell on a computer with Docker installed.

Usage

Get started with the "how to set up a Portal Predictions directory" vignette.

If you are interested in adding a model to the preloaded set of models, see the "adding a model and data" vignette. That document also details how to expand the datasets available to new and existing models.

Developer and Contributor notes

We welcome any contributions in form of models or pipeline changes.

For the workflow, please checkout the contribution and code of conduct pages.

Acknowledgements

This project is developed in active collaboration with DAPPER Stats.

The motivating study—the Portal Project—has been funded nearly continuously since 1977 by the National Science Foundation, most recently by DEB-1622425 to S. K. M. Ernest. Much of the computational work was supported by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through Grant GBMF4563 to E. P. White.

We thank Heather Bradley for logistical support, John Abatzoglou for assistance with climate forecasts, and James Brown for establishing the Portal Project.

portalcasting's People

Contributors

arfon avatar ethanwhite avatar gmyenni avatar ha0ye avatar henrykironde avatar juniperlsimonis avatar patdumandan avatar skmorgane avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

portalcasting's Issues

Add or swap in shorter run version of `portalcast` in Getting Started

Running portalcast() is pretty slow and since it's a command listed in the Getting Started vignette folks are likely to just run it (like I just did). It's probably useful to provide (either as the only option or an alternative) one that produces a quicker run time for those just starting to get involved in the project.

`pass_and_call`

consider the issues from the first attempt at this approach
work in a confined space to get the code working well for complex toy examples first

get off AIC for ensemble building

AIC-based comparison by definition requires that all models have used the same data set for model fitting, and since some of the models don't handle unequal steps between surveys, we have to interpolate abundances for ALL models.

the sooner we can move to an out-of-sample validation of some sort, the sooner we can actually leverage models that don't require equi-spaced surveys

soften model running errors

build wrappers to catch any errors that would break the full analysis pipeline (as exemplified by the pevGARCH throwing errors and breaking the whole thing because of missing covariates, when the other models fit fine)

`model_names` API thoughts

Both @skmorgane and I found the model_names API a bit confusing. We like the idea of having the ability to both have a prefab list and add additional models, so maybe it's just a docs things.

UX issues that we experienced:

  1. I read the docs down through "character value of the type of model (currently only support for "prefab" and "wEnsemble"). Use NULL to build a custom set from scratch via add." and thought - Oh, that's a complicated config feature, I think I'll stop. Obviously I should have kept going, but if I didn't then it's likely that others won't either.
  2. We both expected the first argument to be where we would list the models we wanted to run.
  3. We felt like in the sandbox the number 1 thing we would want to do is run our model a bunch of times and only optionally later add in comparisons to other models.

With that in mind our thought was that an API like this might be a good option:

model_names(models, add_model_set = NULL)

So then users can do initial development like

model_names("mymodel")

Test against a primary competitor

model_names(c("mymodel", "ARIMA"))

And then do a full comparison to existing models

model_names("mymodel", add_model_set = "prefab")

improve test runtimes

the tests have long runtimes (requiring a hack on travis due to quietness and skipping of some testing except for locally), although much of that is caused by the repeated downloading of the raw data.
judicious reorganization of both the codebase and the testing setup should allow tests to be executed much faster.

for example, the directory should probably only really need to ever get made once from scratch and then should be able to be re-initiated from the PortalData subdirectory if needed, but that should only ever need to be downloaded once

also, a simple messaging function should allow there not to need to be an excessive amount of things being run again just with and without quieting (rather the quieting or not can happen separated from the actual execution)

"Adding a Model" vignette

I'm currently working through the process of trying to add a new model, and getting confused by the specific implementation steps.

One of the main challenges with documentation is that portalcasting has (at least) 2 audiences:

  1. the people who are running the Portal Forecasting website within the lab and need to know how the infrastructure is connected
  2. contributors / end-users of portalcasting who may want to run the models on different subsets of the code or add new models.

For 2. as @gmyenni put it, "the process of adding a new model should be plug-and-play", but it's not clear that this is the case right now, and the directions for adding a new model aren't quite at that step yet? (or at least, the "Adding a Model" vignette does not clarify the steps to me)

As a basic example, the vignette starts by mentioning a data/ and models/ subdirectory, but they don't exist as part of the repo. (and they should be created through the setup_dir() function)

So, I think the higher-level organization of the vignette should be:

  • (assume installation of the package and dependencies)
  • set up the portalcasting folders and data locally
  • example for adding a new model to read in data and make forecasts
  • running the new model locally (ideally, without having to run all of the models in portalcasting -- not sure how much of this configuration should be done at this step or is part of the portalcasting codebase to be setup earlier, e.g. when setting up the folders)
  • steps for moving the model files into the portalcasting codebase (as a PR, etc.)

include edge case testing for GARCH models

the 0-abundance and failed fitting for non-0 but nearly-0 abundances should be included in testing for the three GARCH models
can probably include this as part of the tidying up of those functions towards building better model utilities for folks making new models
or maybe just do it on its own?

improve error messaging around folder locations

currently the error messaging when you try to read from a non-existent directory (for example, if you are working in another spot, but forget to set main) is a bit gnarles. it could stand to be tidied and made more explicit

image

allow differentiation among forecasts within a day

currently, forecasts are named by date, which prevents having forecasts made on the same date (if a forecast needs to be fixed but the old one should be retained for posterity, for example)
a time stamp needs to be added in a way that works simply for file naming, etc.
and all previous files will need to be updated

change drop_spp to species

currently, the rodent data are based on exclusion, rather than inclusion, which isn't intuitive from a user perspective. update to inclusion, and carry through to full functionality under the hood

incorporate tmnt_type flexibility

presently, some of the package components assume that there are two and only two treatment types ("all" and "controls"). for example, the model script writing and cast processing codes.
however, the basic data machinery now is much more flexible with respect to treatment type, allowing users to construct and work with data according to their own specs, so we'll need to carry that new capacity through the rest of the code base.

generalize update_list

make update_list work where you pass it a second list
maybe update_list <- function(orig_list, ..., new_list = NULL)
and if new_list isn't NULL then unwind it and use its elements
that way you don't have to pass in each argument as x = x

function documentation examples

from the LDATS code review, which I think we should follow here:

Please add small executable examples in your Rd-files to illustrate the use of the exported function.

\dontrun{} should be only used if the example really cannot be executed (e.g. because of missing additional software, missing API keys, ...) by the user. That's why wrapping examples in \dontrun{} adds the comment ("# Not run:") as a warning for the user.
Please unwrap the examples if they are executable in < 5 sec, or create additionally small toy examples to allow automatic testing, (or replace \dontrun{} with \donttest{}).

When creating the examples please keep in mind that the structure
would be desirable:
\examples{
examples for users and checks:
executable in < 5 sec
\dontshow{
examples for checks:
executable in < 5 sec together with the examples above
not shown to users
}
\donttest{
further examples for users; not used for checks
(f.i. data loading examples )
}
}

`setup_sandbox` function

i've decided it will be nice to have a version of the main setup_dir() function tailored to the sandbox situation (whereas setup_dir() is meant to work on the main pipeline with default settings, which makes it much easier to tinker with code within the package, we want a version of setup_dir() that makes it nice for a sandboxing user): setup_sandbox()
as of now the only things that jump out to me as being needed in the sandbox that aren't being default done is the downloading of the historic forecasts: the historic covariate forecasts as well as the historic rodent forecasts are housed within the main forecasting repo, so that server doesn't need to spend time dowloading files and the default in the control lists is to not download them, although it's a simple toggle to do it. the way the current API is, it's actually really easy to manage this, so i decided to jump for it and create setup_sandbox() . it's basically a call to setup_dir() but with those download settings set to TRUE, and you would interact with it in the same way you would interact with setup_dir().

at this point, it's easy to change other defaults, so i was wondering if folks who have interacted with setup_dir() in a sandbox setting (@ethanwhite @skmorgane @ha0ye? who else?) had any feedback/requests/suggestions for changed settings for sandboxing. this function is part of the in-works v0.9.0, but i wanted to get thoughts!

NA related error in `plot_cast_point(with_census = TRUE)`?

Following the Getting Started vignette I run into the following error:

> plot_cast_point(with_census = TRUE)
Error in read_cast(tree, cast_type = cast_type, cast_date = cast_date) : 
  forecasts from NA not available
In addition: Warning message:
In max.default(numeric(0), na.rm = FALSE) :
  no non-missing arguments to max; returning -Inf

I'm guessing this was something to do with NA being interpreted as a null value given the error, but it also triggers when setting species = "DM".

condense input checking

there's a lot of code space taken up (in particular in the options list functions) with checking inputs that could get packaged into a tighter set of functions

test all functions

full battery of tests
also add in checks within functions for argument inputs

  • add_addl_future_moons
  • add_ensemble
  • add_future_moons
  • all_options
  • append_cov_fcast_csv
  • append_csv
  • append_past_moons_to_raw
  • AutoArima
  • AutoArima_options
  • base_path
  • cast
  • cast_models
  • cast_options
  • casts
  • check_to_skip
  • classy
  • cleanup_dir
  • clear_tmp
  • combine_forecasts
  • compile_aic_weights
  • covariate_models
  • covariates_options
  • create_dir
  • create_main_dir
  • create_sub_dir
  • create_sub_dirs
  • create_tmp
  • data_options
  • dataout
  • dir_options
  • dirtree
  • download_predictions
  • enforce_rodents_options
  • ESSS
  • ESSS_options
  • fcast0
  • file_path
  • fill_data
  • fill_dir
  • fill_models
  • fill_PortalData
  • fill_predictions
  • forecast_covariates
  • forecast_ndvi
  • forecast_weather
  • format_moons
  • get_climate_forecasts
  • interpolate_abundance
  • is.spcol
  • lag_data
  • main_path
  • make_ensemble
  • metadata_options
  • model_options
  • model_path
  • model_template
  • models
  • models_options
  • models_to_cast
  • moons_options
  • nbGARCH
  • nbGARCH_options
  • pevGARCH
  • pevGARCH_options
  • portalcast
  • PortalData_options
  • predictions_options
  • prep_covariates
  • prep_data
  • prep_fcast_covariates
  • prep_hist_covariates
  • prep_metadata
  • prep_moons
  • prep_rodents
  • prep_weather_data
  • read_data
  • remove_incompletes
  • remove_spp
  • rodent_spp
  • rodents_data
  • rodents_options
  • save_forecast_output
  • setup_dir
  • step_casts
  • step_hind_forward
  • sub_path
  • sub_paths
  • subdirs
  • today
  • transfer_hist_covariate_forecasts
  • transfer_trapping_table
  • trim_moons_fcast
  • trim_treatment
  • update_covariates
  • update_covfcast_options
  • update_data
  • update_rodents
  • verify_models
  • verify_PortalData
  • write_model

message functionality to handle quiet argument

add a function that allows for a printing of a message or not based on the quiet call
reduces testing time (stuff is being re-run with and without quiet)
tidies up the code by eliminating the logical operators

github url download capacity

similar to the NMME and Zenodo API URL building, set up the capacity to point to GitHub, in the case that components haven't been archived or perhaps for built-in backups if a Zenodo download is slow or the server is down or whatever

Vignette order issue

For the howto vignette if you follow along in the current order in the vignette, the cleanup_dir() command cleans out the species list which is needed in the plotting steps that follow, yielding an error (and no plots). Here's the error: cannot open file 'C:\Users\skmorgane\Documents\PortalData\Rodents\Portal_rodent_species.csv': No such file or directory. I had @ethanwhite confirm this happened for him as well.

We should either change the order in the vignette so that cleanup_dir() comes at the end of the document or change cleanup_dir() so that it leaves a copy of the file for the plot functions. If we leave the function cleanup_dir as is, we should warn users that they shouldn't until they are completely done.

improve function documentation

  • add_addl_future_moons
  • add_ensemble
  • add_future_moons
  • all_options
  • append_cov_fcast_csv
  • append_csv
  • append_past_moons_to_raw
  • AutoArima
  • AutoArima_options
  • base_path
  • cast
  • cast_models
  • cast_options
  • casts
  • check_to_skip
  • classy
  • cleanup_dir
  • clear_tmp
  • combine_forecasts
  • compile_aic_weights
  • covariate_models
  • covariates_options
  • create_dir
  • create_main_dir
  • create_sub_dir
  • create_sub_dirs
  • create_tmp
  • data_options
  • dataout
  • dir_options
  • dirtree
  • download_predictions
  • enforce_rodents_options
  • ESSS
  • ESSS_options
  • fcast0
  • file_path
  • fill_data
  • fill_dir
  • fill_models
  • fill_PortalData
  • fill_predictions
  • forecast_covariates
  • forecast_ndvi
  • forecast_weather
  • format_moons
  • get_climate_forecasts
  • interpolate_abundance
  • is.spcol
  • lag_data
  • main_path
  • make_ensemble
  • metadata_options
  • model_options
  • model_path
  • model_template
  • models
  • models_options
  • models_to_cast
  • moons_options
  • nbGARCH
  • nbGARCH_options
  • pevGARCH
  • pevGARCH_options
  • portalcast
  • PortalData_options
  • predictions_options
  • prep_covariates
  • prep_data
  • prep_fcast_covariates
  • prep_hist_covariates
  • prep_metadata
  • prep_moons
  • prep_rodents
  • prep_weather_data
  • read_data
  • remove_incompletes
  • remove_spp
  • rodent_spp
  • rodents_data
  • rodents_options
  • save_forecast_output
  • setup_dir
  • step_casts
  • step_hind_forward
  • sub_path
  • sub_paths
  • subdirs
  • today
  • transfer_hist_covariate_forecasts
  • transfer_trapping_table
  • trim_moons_fcast
  • trim_treatment
  • update_covariates
  • update_covfcast_options
  • update_data
  • update_rodents
  • verify_models
  • verify_PortalData
  • write_model

set up data classes for the steps starting with the model scripts

the functions for all of the processing pre-model running require data arguments to be of specific classes (like rodents and covariates), but the classes are lost when the files are written out then read back in by the model scripts
it would be good to leverage the classes in the model functions, but that will require some validation step or something and wrapper functions on the read csv function to append the classes in a meaningful way (in case the external file gets corrupted or something)

name alignments

many of the names of objects and arguments have been aligned with #129 but there are still some key spots that need to be updated
casts should be saved out with a tmnt_type column, not a level column (level was removed as a signifier here, as that conflicts directly with the portalr function argument). enacting this will require some back-compatibility code to make the old files work

illustrate the code flow

use this as a good opportunity to develop a set of visual constructs for representing the codebase

check contributors list

please check the contribs list (basically copied over from portalPredictions) and let me know if i should make any changes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.