bbsbayes / bbsbayes2 Goto Github PK

An R Package for Hierarchical Bayesian Analysis of North American Breeding Bird Survey Data (second iteration)

Home Page: https://bbsbayes.github.io/bbsBayes2/

License: Other

R 69.53% Stan 30.47%

bbsbayes2's Introduction

bbsBayes

This README file provides an overview of the functionality that can be accomplished with 'bbsBayes'. It is intended to provide enough information for users to perform, at the very least, replications of status and trend estimates from the Canadian Wildlife Service and/or United States Geological Survey. However, it provides more in-depth and advanced examples for subsetting data, custom regional summaries, and more.

Additional resources:

Overview

bbsBayes is a package to perform hierarchical Bayesian analysis of North American Breeding Bird Survey (BBS) data. 'bbsBayes' will run a full model analysis for one or more species that you choose, or you can take more control and specify how the data should be stratified, prepared for JAGS, or modelled.

Installation

Note: bbsBayes has been removed from CRAN, and effectively replaced by bbsBayes2

Install archived package bbsBayes from GitHub

# To install from GitHub:
install.packages("remotes")
remotes::install_github("bbsBayes/bbsBayes")

Basic Status and Trend Analyses

bbsBayes provides functions for every stage of Breeding Bird Survey data analysis.

Data Retrieval

You can download BBS data by running fetch_bbs_data. This will save it to a package-specific directory on your computer. You must agree to the terms and conditions of the data usage before downloading. You only need run this function once for each annual update of the BBS database.

fetch_bbs_data()

Data Preparation

Stratification

Stratification plays an important role in trend analysis. Use the stratify() function for this job. Set the argument by to stratify by the following options:

bbs_cws -- Political region X Bird Conservation region intersection (Canadian Wildlife Service [CWS] method)
bbs_usgs -- Political region X Bird Conservation region intersection (United Status Geological Survey [USGS] method)
bcr -- Bird Conservation Region only
state -- Political Region only
latlong -- Degree blocks (1 degree of latitude X 1 degree of longitude)

stratified_data <- stratify(by = "bbs_cws")

Jags Data

JAGS models require the data to be sent as a list depending on how the model is set up. prepare_jags_data subsets the stratified data based on species and wrangles relevent data to use for JAGS models.

jags_data <- prepare_jags_data(stratified_data, 
                               species_to_run = "Barn Swallow",
                                 min_max_route_years = 5,
                                 model = "gamye",
                                 heavy_tailed = T)

Note: This can take a very long time to run

MCMC

Once the data has been prepared for JAGS, the model can be run. The following will run MCMC with default number of iterations. Note that this step usually takes a long time (e.g., 6-12 hours, or even days depending on the species, model). If multiple cores are available, the processing time is reduced with the argument parallel = TRUE.

mod <- run_model(jags_data = jags_data)

Alternatively, you can set how many iterations, burn-in steps, or adapt steps to use, and whether to run chains in parallel

jags_mod <- run_model(jags_data = jags_data,
                               n_saved_steps = 1000,
                               n_burnin = 10000,
                               n_chains = 3,
                               n_thin = 10,
                               parallel = FALSE,
                          parameters_to_save = c("n", "n3", "nu", "B.X", "beta.X", "strata", "sdbeta", "sdX"),
                          modules = NULL)

The run_model function generates a large list (object jagsUI) that includes the posterior draws, convergence information, data, etc.

Convergence

The run_model() function will send a warning if Gelman-Rubin Rhat cross-chain convergence criterion is > 1.1 for any of the monitored parameters. Re-running the model with a longer burn-in and/or more posterior iterations or greater thinning rates may improve convergence. The seriousness of these convergence failures is something the user must interpret for themselves. In some cases some parameters of the model may not be separately estimable, but if there is no direct inference drawn from those separate parameters, their convergence may not be necessary. If all or the vast majority of the n parameters have converged (e.g., you're receiving this warning message for other monitored parameters), then inference on population trajectories and trends from the model are reliable.

jags_mod$n.eff #shows the effective sample size for each monitored parameter
jags_mod$Rhat # shows the Rhat values for each monitored parameter

If important monitored parameters have not converged, we recommend inspecting the model diagnostics with the package ggmcmc.

install.packages("ggmcmc")
S <- ggmcmc::ggs(jags_mod$samples,family = "B.X") #samples object is an mcmc.list object
ggmcmc::ggmcmc(S,family = "B.X") ## this will output a pdf with a series of plots useful for assessing convergence. Be warned this function will be overwhelmed if trying to handle all of the n values from a BBS analysis of a broad-ranged species

Alternatively, the shinystan package has some wonderful interactive tools for better understanding convergence issues with MCMC output.

install.packages("shinystan")
my_sso <- shinystan::launch_shinystan(shinystan::as.shinystan(jags_mod$samples, model_name = "My_tricky_model"))

bbsBayes also includes a function to help re-start an MCMC chain, so that you avoid having to wait for an additional burn-in period.

### if jags_mod has failed to converge...
new_initials <- get_final_values(jags_mod)

jags_mod2 <- run_model(jags_data = jags_data,
                               n_saved_steps = 1000,
                               n_burnin = 0,
                               n_chains = 3,
                               n_thin = 10,
                               parallel = FALSE,
                               inits = new_initials,
                          parameters_to_save = c("n", "n3", "nu", "B.X", "beta.X", "strata", "sdbeta", "sdX"),
                          modules = NULL)

Model Predictions

There are a number of tools available to summarize and visualize the posterior predictions from the model. ### Annual Indices of Abundance and Population Trajectories The main monitored parameters are the annual indices of relative abundance within a stratum (i.e., parameters "n[strata,year]"). The time-series of these annual indices form the estimated population trajectories.

indices <- generate_indices(jags_mod = jags_mod,
                            jags_data = jags_data)

By default, this function generates estimates for the continent (i.e., survey-wide) and for the individual strata. However, the user can also select summaries for composite regions (regions made up of collections of strata), such as countries, provinces/states, Bird Conservation Regions, etc. For display, the posterior medians are used for annual indices (instead of the posterior means) due to the asymetrical distributions caused by the log-linear retransformation.

indices <- generate_indices(jags_mod = jags_mod,
                            jags_data = jags_data,
                            regions = c("continental",
                            "national",
                            "prov_state",
                            "stratum"))
                            #also "bcr", "bcr_by_country"

Population Trends

Population trends can be calculated from the series of annual indices of abundance. The trends are expressed as geometric mean rates of change (%/year) between two points in time. $Trend = (\frac {n[Minyear]}{n[Maxyear]})^{(1/(Maxyear-Minyear))}$

trends <- generate_trends(indices = indices,
                          Min_year = 1970,
                          Max_year = 2018)

The generate_trends function returns a dataframe with 1 row for each unit of the region-types requested in the generate_indices function (i.e., 1 for each stratum, 1 continental, etc.). The dataframe has at least 27 columns that report useful information related to each trend, including the start and end year of the trend, lists of included strata, total number of routes, number of strata, mean observed counts, and estimates of the % change in the population between the start and end years.

The generate_trends function includes some other arguments that allow the user to adjust the quantiles used to summarize uncertainty (e.g., interquartile range of the trend estiamtes, or the 67% CIs), as well as include additional calculations, such as the probability a population has declined (or increased) by > X%.

trends <- generate_trends(indices = indices,
                          Min_year = 1970,
                          Max_year = 2018,
                          prob_decrease = c(0,25,30,50),
                          prob_increase = c(0,33,100))

Visualizing Predictions

Population Trajectories

Generate plots of the population trajectories through time. The plot_indices() function produces a list of ggplot figures that can be combined into a single pdf file, or printed to individual devices.

tp = plot_indices(indices = indices,
                         species = "Barn Swallow",
                         add_observed_means = TRUE)
# pdf(file = "Barn Swallow Trajectories.pdf")
# print(tp)
# dev.off()
print(tp[[1]])

print(tp[[2]])

etc.

Trend Maps

The trends can be mapped to produce strata maps coloured by species population trends.

mp = generate_map(trends,
                  select = TRUE,
                  stratify_by = "bbs_cws",
                  species = "Barn Swallow")
print(mp)

Geofacet Trajectories

For stratifications that can be compiled by political regions (i.e., bbs_cws, bbs_usgs, or state), the function geofacet_plot will generate a ggplot object that plots the state and province level population trajectories in facets arranged in an approximately geographic arrangement. These plots offer a concise, range-wide summary of a species' population status and trends.

  gf <- geofacet_plot(indices_list = indices,
                     select = T,
                     stratify_by = "bbs_cws",
                     multiple = T,
                     trends = trends,
                     slope = F,
                     species = "Barn Swallow")
  
  #png("BARS_geofacet.png",width = 1500, height = 750,res = 150)
  print(gf)
  #dev.off()

EXAMPLE - Replicating the CWS status and trend estimates (2018 version onwards)

The CWS analysis, as of the 2018 BBS data-version, uses the GAMYE model. It also monitors two estimates of the population trajectory: * one for visualizing the trajectory that includes the annual fluctuations estimated by the year-effects "n" * and another for calculation trends using a trajectory that removes the annual fluctuations around the smooth "n3".

The full script to run the CWS analysis for the 2018 BBS data version is accessible here: https://github.com/AdamCSmithCWS/BBS_Summaries

species.eng = "Pacific Wren"

stratified_data <- stratify(by = "bbs_cws") #same as USGS but with BCR7 as one stratum and PEI and Nova Scotia combined into one stratum

jags_data <- prepare_jags_data(strat_data = stratified_data,
                                 species_to_run = species.eng,
                                 min_max_route_years = 5,
                                 model = "gamye", 
                                 heavy_tailed = T) #heavy-tailed version of gamye model
                                 
                                 
jags_mod <- run_model(jags_data = jags_data,
                               n_saved_steps = 2000,
                               n_burnin = 10000,
                               n_chains = 3,
                               n_thin = 20,
                               parallel = F, 
                          parameters_to_save = c("n","n3","nu","B.X","beta.X","strata","sdbeta","sdX"),
                          modules = NULL) 

# n and n3 allow for index and trend calculations, other parameters monitored to help assess convergence and for model testing

EXAMPLE - Replicating (approximately) the earlier USGS status and trend estimates (2011 - 2017 data versions)

The USGS analysis, from 2011 through 2017, uses the SLOPE model. Future analyses from the USGS will likely use the first difference model (see, Link et al. 2017 https://doi.org/10.1650/CONDOR-17-1.1)

NOTE: the USGS analysis is not run using the bbsBayes package, and so this analysis may not exactly replicate the published version. However, any variations should be very minor.

species.eng = "Pacific Wren"

stratified_data <- stratify(by = "bbs_usgs") #BCR by province/state/territory intersections

jags_data <- prepare_jags_data(strat_data = stratified_data,
                                 species_to_run = species.eng,
                                 min_max_route_years = 1,
                                 model = "slope", 
                                 heavy_tailed = FALSE) #normal-tailed version of slope model
                                 
                                 
jags_mod <- run_model(jags_data = jags_data,
                               n_saved_steps = 2000,
                               n_burnin = 10000,
                               n_chains = 3,
                               n_thin = 20,
                               parallel = FALSE,
                               track_n = FALSE,
                               parameters_to_save = c("n2"), #more on this alternative annual index below
                          modules = NULL)

Advanced options and customized models

Alternative Models

The package has (currently) four status and trend models that differ somewhat in the way they model the time-series of observations. The four model options are slope, gam, gamye, and firstdiff.

slope

The slope option estimates the time series as a log-linear regression with random year-effect terms that allow the trajectory to depart from the smooth regression line. It is the model used by the USGS and CWS to estimate bbs trends since 2011. The basic model was first described in 2002 (Link and Sauer 2002; https://doi.org/10.1890/0012-9658(2002)083[2832:AHAOPC]2.0.CO;2{.uri}) and its application to the annual status and trend estimates is documented in Sauer and Link (2011; https://doi.org/10.1525/auk.2010.09220) and Smith et al. (2014; https://doi.org/10.22621/cfn.v128i2.1565).

    #stratified_data <- stratify(by = "bbs_usgs")
    
    #jags_data_slope <- prepare_jags_data(stratified_data, 
    #                           species_to_run = "American Kestrel",
    #                           min_max_route_years = 3,
    #                           model = "slope")

    #jags_mod_full_slope <- run_model(jags_data = jags_data)
                               
    slope_ind <- generate_indices(jags_mod = jags_mod_full_slope,
                                  jags_data = jags_data_slope,
                                  regions = c("continental"))
    slope_plot = plot_indices(indices = slope_ind,
                         species = "American Kestrel SLOPE",
                         add_observed_means = TRUE)
    #png("AMKE_Slope.png", width = 1500,height = 900,res = 150)
    print(slope_plot)
    #dev.off()

gam

The gam option models the time series as a semiparametric smooth using a Generalized Additive Model (GAM) structure. See https://github.com/AdamCSmithCWS/Smith_Edwards_GAM_BBS for more information (full publication coming soon)

    #stratified_data <- stratify(by = "bbs_usgs")
    
    #jags_data_gam <- prepare_jags_data(stratified_data, 
    #                           species_to_run = "American Kestrel",
    #                           min_max_route_years = 3,
    #                           model = "gam")

    #jags_mod_full_gam <- run_model(jags_data = jags_data)
                               
    gam_ind <- generate_indices(jags_mod = jags_mod_full_gam,
                                jags_data = jags_data_gam,
                                regions = c("continental"))
    gam_plot = plot_indices(indices = gam_ind,
                         species = "American Kestrel GAM",
                         add_observed_means = TRUE)
    #png("AMKE_gam.png", width = 1500,height = 900,res = 150)
    print(gam_plot)
    #dev.off()

gamye

The gamye option includes the semiparametric smooth used in the gam option, but also includes random year-effect terms that track annual fluctuations around the smooth. This is the model that the Canadian Wildlife Service is now using for the annual status and trend estimates.

    #stratified_data <- stratify(by = "bbs_usgs")
    
    #jags_data_gamye <- prepare_jags_data(stratified_data, 
    #                           species_to_run = "American Kestrel",
    #                           min_max_route_years = 3,
    #                           model = "gamye")

    #jags_mod_full_gamye <- run_model(jags_data = jags_data)
                               
    gamye_ind <- generate_indices(jags_mod = jags_mod_full_gamye,
                                  jags_data = jags_data_gamye,
                                  regions = c("continental"))
    gamye_plot = plot_indices(indices = gamye_ind,
                         species = "American Kestrel GAMYE",
                         add_observed_means = TRUE)
    #png("AMKE_gamye.png", width = 1500,height = 900,res = 150)
    print(gamye_plot)
    #dev.off()

firstdiff

The firstdiff option models the time-series as a random-walk from the first year, so that the first-differences of the sequence of year-effects are random effects with mean = 0 and an estimated variance. This model has been described in Link et al. 2017 https://doi.org/10.1650/CONDOR-17-1.1

    #stratified_data <- stratify(by = "bbs_usgs")
    
    #jags_data_firstdiff <- prepare_jags_data(stratified_data, 
    #                           species_to_run = "American Kestrel",
    #                           min_max_route_years = 3,
    #                           model = "firstdiff")

    #jags_mod_full_firstdiff <- run_model(jags_data = jags_data)
                               
    firstdiff_ind <- generate_indices(jags_mod = jags_mod_full_firstdiff,
                                      jags_data = jags_data_firstdiff,
                                      regions = c("continental"))
    firstdiff_plot = plot_indices(indices = firstdiff_ind,
                         species = "American Kestrel FIRSTDIFF",
                         add_observed_means = TRUE)
    #png("AMKE_firstdiff.png", width = 1500,height = 900,res = 150)
    print(firstdiff_plot)
    #dev.off()

Alternate extra-Poisson error distributions

For all of the models, the BBS counts on a given route and year are modeled as Poisson variables with over-dispersion. The over-dispersion approach used here is to add a count-level random effect that adds extra variance to the unit variance:mean ratio of the Poisson. In the prepare_jags_data function, the user can choose between two distributions to model the extra-Poisson variance:

the default normal distribution (heavy_tailed = FALSE)
an alternative heavy-tailed t-distribution. (heavy_tailed = TRUE)

The heavy-tailed version is well supported for many species, particularly species that are sometimes observed in large groups. Note: the heavy-tailed version can require significantly more time to converge (~2-5 fold increase in processing time).

#stratified_data <- stratify(by = "bbs_usgs")
    
    #jags_data_firstdiff <- prepare_jags_data(stratified_data, 
    #                           species_to_run = "American Kestrel",
    #                           min_max_route_years = 3,
    #                           model = "firstdiff",
    #                           heavy_tailed = TRUE) 

    #jags_mod_full_firstdiff <- run_model(jags_data = jags_data)

Alternate Annual Indices and Resulting Trends

In all the models, the default measure of the annual index of abundance (the yearly component of the population trajectory) is the derived parameter "n". The run_model function monitors n by default, because it is these parameters that form the basis of the estimated population trajectories and trends.

Alternate retransformations

There are two ways of calculating these annual indices for each model. The two approaches differ in the way they calculate the retransformation from the log-scale model parameters to the count-scale predictions. The user can choose using the following arguments in run_model() and generate_indices().

the default, estimates the mean of the expected counts from the existing combinations of observers and routes in a given stratum and year. This approach retransforms an annual prediction for every observer-route combination in the stratum and then averages across those predictions.

mod <- run_model(... ,
                 parameters_to_save = "n",
                 ... )
indices <- generate_indices(... ,
                            alternate_n = "n",
                            ... )

the alternative, parameters_to_save = c("n2"), track_n = FALSE is actually the standard approach used in the USGS status and trend estimates. It estimates the the expected count from a new observer-route combination, assuming the distribution of observer-route effects is approximately normal. This approach uses a log-normal retransformation factor that adds half of the estimated variance of observer-route effects to the log-scale prediction for each year and stratum, then retransforms that log-scale prediction to the count-scale. This is the approach described in Sauer and Link (2011; https://doi.org/10.1525/auk.2010.09220).

mod <- run_model(... ,
                 parameters_to_save = "n2",
                 ... )
indices <- generate_indices(... ,
                            alternate_n = "n2",
                            ... )

The default approach parameters_to_save = c("n") slightly underestimates the uncertainty of the annual indices (slightly narrower CI width). However, we have chosen this approach as the default because:

it much more accurately represents the observed mean counts, and so allows for an intuitive interpretation of the annual indices;
it more accurately represents the relative contribution of each stratum to the combined (e.g., continental or national) population trajectory and trends. The alternative n2 approach tends to overestimate the observed mean counts, and that bias varies among strata, which affects each strata's contribution to the combined regional estimates.
the small underestimate in the uncertainty of the annual indices, does not affect the uncertainty of the trend estimates.

For example, in the figures below, the predicted annual indices (blue line and CI-band) are much more similar to the observed average counts (grey dots) for the Default n approach.

Decomposing the population trajectories for two of the models

For two of the main model types "slope" and "gamye", users can choose two different ways to calculate trajectories and population trends. With these two model types, the population trajectories are composed of two largely independent components, a long-term smooth and the random annual fluctuations around that smooth. Because the two components are largely independent, the population trajectory can be decomposed.
The default approach is to include the annual fluctuations around the linear (slope) or GAM-smooth (gamye) components of the trajectories. These trend estimates are more comprehensive in that they include the full estimated trajectory, but they will vary more between subsequent years (e.g., more variability between a 1970-2017 trend and a 1970-2018 trend), because they include the effects of the annual fluctuations.

mod <- run_model(... ,
                 parameters_to_save = "n",
                 ... )
indices <- generate_indices(... ,
                            alternate_n = "n",
                            ... )

An alternative approach is to decompose the full trajectory and to exclude the annual fluctuations around the linear (slope) or smooth (gamye) components. In this case, the predicted trends will be much more stable between subsequent years. For the CWS status and trend analyses, the visualized population trajectories are calculated using the full trajectory, and the trend estimates are derived from the decomposed trajectory using only the smooth component.

mod <- run_model(... ,
                 parameters_to_save = c("n","n3"),
                 ... )
indices_visualize <- generate_indices(... ,
                                      alternate_n = "n",
                                      ... )
indices_trend_calculation <- generate_indices(... ,
                                              alternate_n = "n3",
                                              ... )

For example, the figure below (produced using a modified version of the standard plotting functions), shows the two kinds of trajectories for Pacific Wren from the 2018 CWS analysis. The light-blue trajectory is the visualized trajectory, including the yearly fluctuations. The orange trajectory is the one used for trend calculations, which includes only the GAM-smooth component. For the kinds of broad-scale status assessments that form the primary use of the published estimates of trends, this decomposition is a particularly useful feature of these two models.

The figure below provides another example of the benefits of removing the year-effect annual fluctuations when calculating trends.

Each point on the graph represents the 10-year trend estimate for Wood Thrush in Canada, ending in a given year (e.g., the points at 2015 represent the species national population trend from 2005-2015). The red and green points are the trend estimates from the default trend estimates derived from the full population trajectories for the gamye and slope models. The Blue points represent the trends calculated using the decomposed trajectory of the gamye model, including only the smooth component. When the annual fluctuations are included (SLOPE and GAMYE including Year Effects), the population trends surpass the IUCN trend-criterion, in some years (e.g., 2011) suggesting that if assessed in those years the species would be listed as Threatened (trend in the orange region). However, a more stable trend estimate from the decomposed trajectory (GAMYE - Smooth only in Blue) shows that the species is probably best thought of as in decline, but not surpassing the Threatened criterion.

Alternate Measures of Trend and Population Change

The generate_trends() function produces much more than just the trend estimates.

Slope Trends

The default trend calculation is an interval-specific estimate of the geometric mean annual change in the population. $Trend = (\frac {n[Minyear]}{n[Maxyear]})^{(1/(Maxyear-Minyear))}$ It relies on a comparison of the annual indices in the first and last years of the trend period to quantify the mean rate of population change. However, it ignores the pattern of change between the two end-points.

The user can choose an alternative estimate of change that is calculated by fitting a log-linear slope to the series of all annual indices between the two end-points (e.g., all 11 years in a 10-year trend from 2008-2018). The slope of this line could be expressed as an average annual percent change across the time-period of interest. If working with estimates derived from a model with strong annual fluctuations and for which no decomposition is possible (e.g., "firstdiff" model), this slope-based trend may be a more comprehensive measure of the average population change, that is less dependent on the particular end-point years. These slope trends can be added to the trend output table by setting the slope = TRUE argument in generate_trends(). The standard trends are still calculated, but additional columns are added that include the alternate estimates. NOTE: the generate_map() function can map slope trends as well with the same slope = TRUE argument.

    #jags_data_firstdiff <- prepare_jags_data(stratified_data, 
    #                           species_to_run = "American Kestrel",
    #                           model = "firstdiff")

    #jags_mod_full_firstdiff <- run_model(jags_data = jags_data)
                               
    #firstdiff_ind <- generate_indices(jags_mod = jags_mod_full_firstdiff,
    #                                  jags_data = jags_data_firstdiff,
    #                                  regions = c("continental","stratum"))
    fd_slope_trends_08_18 <- generate_trends(indices = firstdiff_ind,
                                             Min_year = 2008,
                                             Max_year = 2018,
                                             slope = TRUE)
    generate_map(fd_slope_trends_0.8_18,
                 slope = TRUE,
                 stratify_by = "bbs_usgs")
    )

Percent Change and probability of change

The generate_trends() function also produces estimates of the overall percent-change in the population between the first and last years of the trend-period. This calculation is often easier to interpret than an average annual rate of change. These percent change estimates have associated uncertainty bounds, and so can be helpful for deriving statements such as "between 2008 and 2018, the population has declined by 20 percent, but that estimate is relatively uncertain and the true decline may be as little as 2 percent or as much as 50 percent"

In addition, the function can optionally calculate the posterior conditional probability that a population has changed by at least a certain amount, using the prob_decrease and prob_increase arguments. These values can be useful for deriving statements such as "our model suggests that there is a 95% probability that the species has increased (i.e., > 0% increase) and a 45 percent probability that the species has increased more than 2-fold (i.e., > 100% increase)"

    fd_slope_trends_08_18 <- generate_trends(indices = firstdiff_ind,
                                             Min_year = 2008,
                                             Max_year = 2018,
                                             slope = TRUE,
                                             prob_increase = c(0,100))

Custom regional summaries

Yes, you can calculate the trend and trajectories for custom combinations of strata, such as the trends for Eastern and Western populations of Lincoln's Sparrow.

    #stratification <- "bbs_cws"
    #strat_data <- stratify(by = stratification, sample_data = TRUE)
    #jags_data <- prepare_jags_data(strat_data, 
    #                           species_to_run = "Lincoln's Sparrow",
    #                           model = "gamye")

    #jags_mod <- run_model(jags_data = jags_data)

Assuming the above setup has been run. The user could then generate population trajectories using a customized grouping of the original strata.

First extract a dataframe that defines the original strata used in the analysis.

st_comp_regions <- get_composite_regions(strata_type = stratification)

The add a column to the dataframe that groups the original strata into the desired custom regions.

st_comp_regions$East_West <- ifelse(st_comp_regions$bcr %in% c(7,8,12:14,22:31),"East","West")

st_comp_regions can now be used as the dataframe input to the argument alt_region_names in generate_indices(), with "East_West" as the value for the argument regions. The relevant trends can be calculated using just the generate_trends() function.

east_west_indices <- generate_indices(jags_mod = jags_mod,
                                      jags_data = jags_data,
                                      alt_region_names = st_comp_regions,
                                      regions = "East_West")
east_west_trends <- generate_trends(indices = east_west_indices)

Exporting the JAGS model

You can easily export any of the bbsBayes models to a text file.

model_to_file(model = "slope",
              filename = "my_slope_model.txt")

Then, you can modify the model text (e.g., try a different prior) and run the modified model

run_model <- function(... ,
                      model_file_path = "my_modified_slope_model.txt",
                      ... )

Details coming soon...

Modifying the JAGS model and data

You can even export the bbsBayes model as text, and modify it to add in covariates. For example a GAM smooth to estimate the effect of the day of year on the observations, or an annual weather covariate, or... Then add the relevant covariate data to the jags_data object, and you're off! We'll add some more details and examples soon.

Comparing Models

Finally, bbsBayes can be used to run Bayesian cross-validations. For example, the get_final_values() function is useful to provide an efficient starting point for a cross-validation runs, without having to wait for another full burn-in period.

Paper that includes an example of how to implement a cross-validation using bbsBayes.

Pre-print: https://doi.org/10.1101/2020.03.26.010215 Supplement:

NOTE: although bbsBayes includes functions to calculate WAIC, recent work has shown that WAIC performs very poorly with the BBS data (https://doi.org/10.1650/CONDOR-17-1.1). We recommend a k-fold cross-validation approach, as in the above zenodo archive.

bbsbayes2's People

Contributors

Stargazers

Watchers

Forkers

l-daly brandonedwards unnamedtat

bbsbayes2's Issues

Improve treatment of coastal survey locations with custom stratifications

Currently, there are 100s or 1000s of routes that can easily be omitted from a custom stratification, depending on the underlying land map used. Often these are routes with start locations close to the coast or on small off-shore islands.
The user can buffer the coast of their custom strata map, but this then influences the area weights applied to these coastal strata.

possible solutions:
stratify() could offer an option to include all routes within a minimum distance of a stratum.
the package could include a land-layer that we ensure contains all of the route start locations. Then users could use this base layer to modify their stratifcation to ensure coastal routes are not excluded. This same base-layer could be used to update the strata within the package (e.g., the latlong stratification that currently excludes a handful off island routes.)

Article - Indices and Trends

The basics have been covered in the "Getting Started", but eventually it would be good to have a more advanced vignette with a discussion of how to interpret indices and trends

Finalize article - Convergence

The convergence article (vignettes/articles/convergence.Rmd) needs to be finalized and fleshed out (context added)

Add meaningful error message for 1-stratum data-sets

Currently, datasets with only 1 stratum (species with very small ranges and coarse grained stratifications such as BCRs), throw an obscure error in the run_model() function. The error references a dimension mismatch that relates to a multi-dimensional array that should have a dimension defined by the number of strata.

To do:

Short-term: add a meaningful error message. include suggestions to try a finer grained stratification (latlong).
Long-term: adjust the code and models so that the models can be fit using only a single stratum.

generate_trends - prob_decrease and prob_increase allowable ranges too narrow

In the generate_trends function, the prob_decrease and prob_increase arguments must allow for the following ranges of %-change:

prob_decrease: c(0,100)
Including 0 as an allowable value ensures that the user can calculate the probability that the population has decreased (i.e., that the trend is negative).

prob_increase: c(0,Inf)
Including 0 for the same reason as above. Also including no upper limit because a population can increase by > 100% (e.g., a population that triples in size = 200% increase). Decreases are limited to 100% (i.e., extinction), but not increases.

2023 release of database

update fetch_bbs_data() to pull the most recent release. available as of July, 2023.

https://www.sciencebase.gov/catalog/item/64ad9c3dd34e70357a292cee

Error in obs_mean

The obs_mean column in the output from generate_indices() is incorrectly generating mostly NA values. I suspect this is because it is calculating a mean(na.rm = FALSE) when it should be using mean(na.rm = TRUE).

add the route-level l at long to the raw-data object in prepare-data

would be useful for route-level analyses.

Update the citation information for bbsBayes2

Specifying Output Directory for the Model and Stan Sampling Chains?

First of all, thank you for all the authors and contributors--I find this package extremely useful.

When I run the model of a particular species, I notice that all Stan outputs are automatically generated as .csv files in my working directory (e.g., BBS_STAN_first_diff_nonhier_2023-08-01-3.csv for chain 3). When running models for multiple species in a row, the .csv files seem to have interfered each other because their names are identical and only depend on the model type, not the species.

When I try to export the model to a different directory to avoid the problem, I notice that the Stan outputs are still automatically generated in my working directory, and I am now unable to save the model to my specified directory because compiling the .rds file seems to require the Stan output, which the function is now unable to access because they are not under the same directory.

When I try to run two different models in two R sessions, one of them still failed even all sampling are complete. The error seems to have occurred at the step of exporting the fitted model, with the message Error: All variables in all chains must have the same length.

I have noticed that in the example code of replicating CWS analyses, which if I understand correctly is a prototype of the workflow in bbsBayes2, there are commands specifying the directory of Stan outputs (line 129 of the linked file), and the script is apparently able to handle multiple species in a row, but I am not sure whether the same options and functionalities are built into bbsBayes2.

It would be great if you have any insights. Thanks again for this amazing package!

Colour scale in plot_map isn't centered properly

The colour scale in plot_map needs to be anchored at particular values so that the colours are consistent across maps and species, and so that the red-yellow-blue colours consistently reflect decreases-stable-increases, respectively.
The light-yellow colour that represents 2-4% increases in the map below, should always be linked to the -0.5:0.5% trends.

Code to produce the map:

library(bbsBayes2)

species <- "Carolina Wren"

stratification <- "bbs_cws"

model = "gamye"

model_variant <- "spatial"


s <- stratify(by = stratification,
              species = species)


p <- prepare_data(s)

ps <- prepare_spatial(p,
                      strata_map = load_map(stratification))

pm <- prepare_model(ps,
                    model = model,
                    model_variant = model_variant)
fit <- run_model(pm,
                 refresh = 200,
                 adapt_delta = 0.8)

#fit <- readRDS("BBS_STAN_gamye_spatial_2023-01-19.rds") ## I can upload this saved file, if useful.

indices_smooth <- generate_indices(fit,
                            regions = c("continent","stratum"),
                            alternate_n = "n_smooth")

trends = generate_trends(indices_smooth)


map1 <- plot_map(trends = trends)

print(map1)

Add continuous colour bar for viridis scales in plot map

when plotting the continuous values (non-trend values) in plot_map() the guide = colour_bar() provides a more meaningful legend.

reorder the default sorting of continent and strata in generate_indices()

regions = c( "continent", "stratum")
This ordering means that the highest level summary is easy to find in trajectories (first).

fix the obs_rel_abundance column

Many (or all) NA values in output from generate_trends() function.

hierarchical first difference models need special treatment for 2020

With no data in 2020, the among strata variance can't be estimated.
Possible options:

Needs an alternative treatment of sdbeta so that it varies by year and therefore shrinks to 0 in 2020
Needs a hard-coded fixed at zero treatment so that there's no strata variation in betas in 2020.

Allow plot_map to plot abundance and trend se

Should generalise this function to plot any of the columns in the trend data table

Allow for plot_map with custom stratification map

Currently, plot_map() calls load_map() within the function without checking to see if the stratification is one of the internal stratifications, or a custom one.

The result is that the function fails with an error that suggests the user needs to supply one of the standard stratifications.

Allow for within-chain parallelization using reduce_sum()

At least part of the model files (namely, the count ~ poisson_log() or count ~ neg_binomial_2()) can take advantage of Stan's reduce_sum() feature to allow for within-chain parallelization. I am unsure by how much this would speed up performance because it is only a very small part of the sampling, but could be worth a shot.

I am happy to take care of this because I am about to implement it on my version of bbsBayes2, and so I can also add the code into here.

workflow overview broken in Getting Started Vignette

The workflow flow-chart image is not showing up in the Getting Started Vignette

#> Error in loadNamespace(name): there is no package called 'webshot'

sp package dependencies retiring

The package sp and its dependencies are changing. see the following message that appears when package sp is loaded:

The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
which was just loaded, will retire in October 2023.
Please refer to R-spatial evolution reports for details, especially
https://r-spatial.org/r/2023/05/15/evolution4.html.
It may be desirable to make the sf package available;
package maintainers should consider adding sf to Suggests:.
The sp package is now running under evolution status 2
(status 2 uses the sf package in place of rgdal)

This may be largely due to the use of the sp package in the neighbours function. Not sure it appears elsewhere...

Add American Crow and Meadowlarks to combined species table

Merging of the Northwestern Crow and American Crow may not be complete with the 2023 data release. If not then this species complex will need to be included in the 2023 data release.
Expansion of Chihuahuan Meadowlark and possible complications with separating three species in some regions, suggests it might be worth including a Meadowlark (all forms) option.

Also see #79

get_convergence function may be flawed

tidy application of posterior::rhat(), posterior::ess_bulk(), and posterior::ess_tail(), functions may be flawed.

It may not properly track the chains in the draws object.

confirm that it generates the same values for rhat, ess_bulk, and ess_tail as the get_summary() function, which uses the built-in cmdstanr::summary() functions.

add all data and model prep for pre-fitted models in vignettes

Currently there are some pre-fitted models (BARS and STFL) used to demo plotting and summary features in the vignettes.

Must add all data, model and spatial prep steps to code, as well as model fitting details.

generate_indices() documentation

More in depth information on n vs. n2 vs. n_smooth vs. n_slope in generate_indices() details section of documentation

ongoing dplyr deprecation warning

following on the recent merging of PR #18 , generate_indices() is till producing a dplyr warning.

@steffilazerte, any ideas what we've missed?

inds <- generate_indices(fit, regions = "stratum")
Processing region stratum
Warning message:
Returning more (or less) than 1 row per summarise() group was deprecated in dplyr 1.1.0.
ℹ Please use reframe() instead.
ℹ When switching from summarise() to reframe(), remember that reframe() always returns an ungrouped data frame and adjust accordingly.
ℹ The deprecated feature was likely used in the tidyr package.
Please report the issue at https://github.com/tidyverse/tidyr/issues.
This warning is displayed once every 8 hours.
Call lifecycle::last_lifecycle_warnings() to see where this warning was generated.

Consider more guidance and informative error msg for combined species without full time-series support

Some help documentation and informative warnings would be useful to help users understand the complexity of time-series information for some of the lumped and split species' complexes.

Alternative start dates for

Traill's Flycatcher complex,
Eurasian Collared Dove, Cave Swallow, and others that were missing from the survey for the first ~10 years

Warning messages for Meadowlarks etc, where the ability to separate species or forms may have gotten worse with time (most get better)

Finalize article - Advanced Run

To finalize this vignette we need a model run on the Barn Swallow data

Automate the convergence checks into run_model

Given how flexible we've made the tools in this package, it would be wise to automatically assess convergence for every model run.

Include warning messages for users about poor convergence. Also point them to general discussions about model convergence in Stan.

Add option to keep missing data strata for better neighbourhood structures

Add an option to the prepare_spatial function that would allow the user to retain a minimal set of strata, including those with no data, to create a fully connected neighbours graph.

Will require additional parameters in the models to handle the empty strata.
If all missing data strata can be confidently identified with values > n_strata, then the model changes are relatively minor. Should require just one extra data object to ensure that the dimensions of the spatial parameters are appropriately set (e.g., n_strata_extra).

Will also require an additional dependency in the concaveman package to set the minimal set of retained spatial strata.

Also consider adding a no-data indicator to the prepare_spatial map output, so user can check to see which extra strata have been included.

add WSL advice for Windows installation

The getting started documentation should direct Windows users to install WSL:
https://learn.microsoft.com/en-us/windows/wsl/install

and to use the cmdstanr::install.cmdstan(wsl=TRUE) option during setup.

reduces model fitting times by half.

Finalize article - Models

The models article (vignettes/articles/models.Rmd) needs to be finalized and fleshed out.

Add colour palette that matches eBird's trend colours

To the trend_map() function. We have to ability to map the percent-change column. It would be simple to add a colour scale that matches the one used by eBird in their trend mapping website.

Error: Cannot find original model file location, please specify `path'

This error has started showing up in the last month or two. Multiple users have mentioned it.

The error occurs during the final saving of the fitted model object in the run_model() function. The model runs fine, the .csv files are stored successfully, it's the save_model portion of run_model() that's failing.
I suspect something has changed in cmdstanr.

A band-aid fix is to re-run the data-preparation functions, then read in the csv files using cmdstanr::as_cmdstan_fit()

For example:

s <- stratify(by = "bbs_cws", sample_data = TRUE)
p <- prepare_data(s)
pm <- prepare_model(p, model = "first_diff")
m <- run_model(pm, output_dir = "output", output_basename = "my_model")
run_model throws the following error:
Error: Cannot find original model file location, please specify `path'

To recover the results of the model fitting, you can re-create the object pm by re-running the original set-up code:
s <- stratify(by = "bbs_cws", sample_data = TRUE)
p <- prepare_data(s)
pm <- prepare_model(p, model = "first_diff")
Then create a new fitted model object m_recover
m_recover <- pm

create a vector of paths to the csv files
csv_files <- paste0(paste0("output/",”my_model”, "-",c(1:4),".csv") ) # assuming 4 chains

m_recover[["model_fit"]] <- cmdstanr::as_cmdstan_fit(files = csv_files)
save_model_run(fit_cov,retain_csv = TRUE) # this saves the same .rds file as the default run_model(), but also keeps the csv files (you may decide to delete them later, the .rds saved model object should be sufficient.

This saved m_recover object should work with all of the bbsBayes2 package functions.

Thank you to @izzyjorgensen for helping to identify this error.

change default csv save location to temp file

The csv files take up too much space and should be discarded by default

allow for exporting the colour palettes for trend maps

store the trend colour palette as a ggplot-accessible palette.
add a utility function to apply the palette to any set of trend values (%/year)

Allow for HPDI instead of quantiles

The HPDI is a much better summary of the posterior, particularly for skewed estimates like the annual indices of abundance, than the quantiles. Simple addition to generate_trends and generate_indices.
Consider using something like the following:

  if(hpdi){
  interval_function <- function(x,probs = 0.025){
    if(probs < 0.5){
    q2 <- 1-(probs*2)
    i <- 1
    }else{
      q2 <- 1-((1-probs)*2)
      i <- 2
    }
    y <- HDInterval::hdi(x,q2)[i]
    return(y)
  }
  }else{
  interval_function <- function(x,probs = 0.025){
    y <- stats::quantile(x,probs)
    return(y)
  }
  }

Retain the route level coordinates in teh raw_data object

Currently, the stratified data object includes the route-level start coordinates, and allows for easy visualization of the routes in relation to a custom stratification. However, the stratified route information includes all routes int eh database and not those with data for a given species.
If the coordinate columns latitude nad longitude were retained in the raw_data object of the list output from prepare_data(), then the same functionality would be retained at all stages of the model workflow.

Release!

Getting ready for the release!

To Do's

bbsBayes Organization

(Brandon) Move bbsBayes to this organization (update links therein)
(Anyone) Pin bbsBayes to this organization
(Anyone) Update link to bbsBayes in the README (in the .github repo under profile)
- (Anyone) Edit the placeholder README text

Random

Citations

Vignettes - Need context to finalize

Update README.md

(Steffi) R-universe installation instructions
(Steffi) R-universe badge
(Steffi) Add figures for prettiness
(Steffi) Add release to R-universe branch one we have a release

Explore information

(Adam/Brandon) CODE_DESIGN.md
(Adam/Brandon) RELEASE.R

Release!

(Adam/Brandon/Steffi) Create a signed release of the package (v1.0.0 🎉) (see the RELEASE.R script for helpers)
(Adam/Brandon) Do a soft release and ask a couple of colleagues to test and find problems
(Adam/Brandon) Do full and public release once any bugs/questions/comments have been addressed

All for regions with any name if composite region table is provided

in the generate_indices() function, it throws an error if the user calls for a composite summary on a column that fits the names in the standard "bbs" strata. Even if the user has supplied a table with a composite grouping that should allow for the relavant summaries.

e.g.,

 bbs_usgs <- load_map("bbs_usgs")%>%
      select(-area_sq_km) %>% 
      rename(bbs_strata = strata_name)
    
    
    strata_join <- load_map("latlong") %>% 
    filter(strata_name %in% fit$meta_strata$strata_name) %>%
    sf::st_join(.,bbs_usgs,
                largest = TRUE,
                left = TRUE) %>% # intersection
    sf::st_drop_geometry() %>%
    mutate(bbs_strata = ifelse(is.na(bbs_strata),"other",bbs_strata))


  inds_tmp <- generate_indices(tmp,alternate_n = "n",
                                    regions = c("country","bbs_strata"),
                                    regions_index = strata_join,
                                    hpdi = TRUE,
                                    start_year = 1970)

Error: Stratification does not match desired regions:
BCRs and lat-long degree block stratifications cannot be divided into regions with political boundaries ('country', 'prov_state').

}```

add option to fit models with combined observer-route effects

may be complicated due to the numerous places in the models where these effects are referenced.

Run a small comparison study to determine how much it affects convergence and trends.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.