saudiwin / ordbetareg_pack Goto Github PK

Repository for R package ordbetareg, used to fit continuous data with lower and upper bounds.

License: Other

R 100.00%

ordbetareg_pack's Introduction

README

This is the repository for the ordbetareg package, which can be used to fit the ordered beta regression model as specified in Kubinec (2022) (see paper https://osf.io/preprints/socarxiv/2sx6y/). This package is on CRAN, but to use the development version, use this install command (requires the remotes package to be installed first):

remotes::install_github("saudiwin/ordbetareg_pack",build_vignettes=TRUE,dependencies = TRUE)

To learn more about beta regression modeling and where this package fits in, you can read my blog post on the topic. To learn how to use the package, see the introduction vignette by using the vignette("package_introduction", package="ordbetareg") command at the R console or viewing the vignette on CRAN.

If you use the package, please cite it as:

Kubinec, Robert. “Ordered Beta Regression: A Parsimonious, Well-Fitting Model for Continuous Data with Lower and Upper Bounds.” Political Analysis. 2022. https://doi.org/10.1017/pan.2022.20.

ordbetareg_pack's People

Contributors

Stargazers

Watchers

Forkers

bgall vincentarelbundock zentavious

ordbetareg_pack's Issues

error for group-level model for phi

It is currently not possible to specify a group-level model for phi that has no population-level effects ("fixed effects").
Setting

ordbetareg(..., phi_reg = "both")

and a formula for phi like
phi ~(1|country)

will return an error of the type

Error: The following priors do not correspond to any model parameter:
b_phi ~ normal(0,5)
Function 'get_prior' might be helpful to you.

The reason is simply that for phi_reg = "both" the current code will force priors on the population-level coefficients in the model for phi even if there are none. Setting phi_reg = "none" is no help, because then brms will try to set a prior on the global phi, but again, there is no global phi. I have solved that locally for me, but it might be worth to make this more flexible midterm.

make_stancode argument generates error

Not an issue for me per se, but when trying to extract the stancode from an ordbetareg model using the make_stancode argument, causes an error.

library(dplyr)
library(ordbetareg)

data("pew")

# prepare data

model_data <- dplyr::select(pew,therm,
                     education="F_EDUCCAT2_FINAL",
                     region="F_CREGION_FINAL",
                     income="F_INCOME_FINAL")

ord_fit_mean_T <- ordbetareg(formula=therm ~ education + income +
                             (1|region),
                           data=model_data,
                           cores=2,chains=2,
                           make_stancode = TRUE)

Normalizing using the observed bounds of 0 - 100. If these are incorrect, please pass the bounds to use to the true_bounds parameter.
Error in fit_func(formula = formula, data = data, stanvars = ordbeta_mod$stanvars,  : 
  could not find function "fit_func"

However, simply running the model with make_stancode = FALSE and calling stancode(model) after returns Stan code correctly:

ord_fit_mean_F <- ordbetareg(formula=therm ~ education + income +
                             (1|region),
                           data=model_data,
                           iter = 1000,
                           cores=2,chains=2,
                           make_stancode = FALSE)

brms::stancode(ord_fit_mean_F)

A minor bug in the grand scheme of things but thought you would like to be informed. Thank you for the extremely useful package.

Error when prior on the intercept is specified without center = T or center = F

Thanks for the amazing package!

I noticed that when you specify a prior for the intercept using
intercept_prior_mean = ##, intercept_prior_SD = ##,

This gives an minor error if center = TRUE or center = FALSE is not manually specified within the brm formula. Not sure if this is intended or not.

library(tidyverse)
library(ordbetareg)

data("pew")
model_data <- select(pew,
                     therm,
                     sex="F_SEX_FINAL")

run model with priors on the intercept
m <- ordbetareg(formula = bf(therm ~ sex),
                data = model_data, 
                coef_prior_mean = 0,
                coef_prior_SD = 1.5,
                intercept_prior_mean = 0,
                intercept_prior_SD = 1.5,
                seed = 42,
                chains=4,
                iter=2000)`

#>Error` in if (attr(formula$formula, "center")) { : argument is of length zero

Adding center to the bf fixes it:

# adding center = T or center = F fixes the error
m <- ordbetareg(formula = bf(therm ~ sex, center = T),
                data = model_data, 
                coef_prior_mean = 0,
                coef_prior_SD = 1.5,
                intercept_prior_mean = 0,
                intercept_prior_SD = 1.5,
                seed = 42,
                chains=4,
                iter=2000)

Facilitate usage by exporting family

Firstly, thanks for the package and the paper :)

I'm trying to fix mixed ordbeta models with formulas (ideally, different formulas) on the location and on phi, but using the ordbeta() wrapper throws an error:

library(brms)
library(ordbetareg)


data <- datawizard::rescale(mtcars, select="wt", to=c(0, 1))

m <- ordbetareg::ordbetareg(wt ~  qsec + (1+qsec|gear),
                            data=data,
                            phi_reg = "both",
                            algorithm="meanfield",
                            init = 0)
#> Error: The following priors do not correspond to any model parameter: 
#> b_phi ~ normal(0,5)
#> Function 'default_prior' might be helpful to you.

While it might be fixed by changing the priors, I found it more straightforward and flexible to use the ordbeta family, made by the internal loard_ordbetareg() function (note that the default argument for phi_reg doesn't work currently, which should probably be fixed if this gets reexported).

# phi_reg must be specified as default throws an error
ordbeta <- ordbetareg:::.load_ordbetareg(phi_reg = "both") 

f <- brms::bf(wt ~  qsec + (1+qsec|gear),
              phi ~ qsec + (1+qsec|gear),
              family=ordbeta$family)

m <- brms::brm(formula=f, data=data, 
               stanvars=ordbeta$stanvars,
               algorithm="meanfield",
               init = 0)
#> Compiling Stan program...
#> Start sampling
#> Chain 1: ------------------------------------------------------------
#> Chain 1: EXPERIMENTAL ALGORITHM:
#> Chain 1:   This procedure has not been thoroughly tested and may be unstable
#> Chain 1:   or buggy. The interface is subject to change.
#> Chain 1: ------------------------------------------------------------
# ...

^{Created on 2024-05-30 with reprex v2.0.2}

One can also then see and set priors on these models more easily using the standard brms interface.
I'd like to suggest exporting load_ordbetareg() (one could rename it for example make_ordbetafamily()), which would make it more accessible and reusable.

Thanks!

Syntax error with clean install of stack

I got a new computer, so did a clean install of the Stan/rstan/cmdstanr stack. The model that I previously had working is no longer compiling. Maybe something wasn't done quite right in my install? Anyway, I receive the following error:

Syntax error in 'C:/Users/XXXX/AppData/Local/Temp/RtmpsHU8FO/model_42e711f2d392090e5892f6df506b3a25.stan', line 10, column 5 to column 6, parsing error:
   -------------------------------------------------
     8:     *   an integer sequence from start to end
     9:     */
    10:    int[] sequence(int start, int end) {
              ^
    11:      array[end - start + 1] int seq;
    12:      for (n in 1:num_elements(seq)) {
   -------------------------------------------------

An identifier is expected as a function name.

Error: Syntax error found! See the message above for more information.

My R version information is

platform       x86_64-w64-mingw32               
arch           x86_64                           
os             mingw32                          
crt            ucrt                             
system         x86_64, mingw32                  
status                                          
major          4                                
minor          3.3                              
year           2024                             
month          02                               
day            29                               
svn rev        86002                            
language       R                                
version.string R version 4.3.3 (2024-02-29 ucrt)
nickname       Angel Food Cake

Do you need any other info?

Thanks!

Failure to fit with only intercept model

I noticed that intercept only type models with no slope/offset parameters are not currently possible. Using the example code/data- this model -

ord_fit_mean <- ordbetareg(formula=therm ~ 1 + (1|region),
data=model_data,
cores=2,chains=2)

returns this error -

Error: The following priors do not correspond to any model parameter:
b ~ normal(0,5)
Function 'default_prior' might be helpful to you.

Seems like a possible backend code fix for not pre-specifying b priors without checking the design matrix.

I would be interested in this being fixed to be able to fit a model using ordbetreg.

Thanks!
cheers, Brian

Multivariate Modelling with Multiple DVs

I'm looking for some clarity about ordbetareg's support for multivariate modeling. I've done a fair amount of digging around the source code and have narrowed down what appears to be the issue.

Keeping things simple, I'm trying to fit the following function:

bform1 <- bf(mvbind(Y1, Y2, Y3) ~ X + (1|p|participant.id)) + set_rescor(FALSE)

ord_fit <- ordbetareg(formula=bform1, 
                      data=model_data,
                      cores=4,chains=4,iter=2000,
                      control = list(adapt_delta = 0.95))

The observed behavior is that the model is completed successfully, but an error is thrown after calling fit_func(...) in modeling.R on line 573.

Error in data[[dv_pos]] : recursive indexing failed at level 2

This error stems from the call out_obj$upper_bound <- attr(data[[dv_pos]],'upper_bound') when dv_pos is named list of integers

  Y1  Y2  Y3 
   1   2   3

My question, is ordbetareg intended to support such a function? And if so, what are the intended interpretations of out_obj$upper_bound and out_obj$lower_bound when there could be multiple upper and lower bounds? I will say in my case, I've already normalized all the response variables. In that case, I suspect it'd be ok if line 273 said out_obj$upper_bound <- attr(data[[dv_pos[[1]]]],'upper_bound')

Trouble Specifying No Random Intercept

I have been trying to specify a model formula with ordbetareg() that does not include an intercept by using binary dummies explicitly in the formula. I have found a workaround by running through brm() directly, where family = ord_beta_reg but this is not ideal.

Here is an example (I am aware that this distribution isn't actually ordered beta, but it is faster to do it this way and it shouldn't cause any errors). I want, ideally, a formula like bf(Y ~ 0 + X1 + X2).

Setup:

library(ordbetareg)
library(brms)

Data:

Y = c(rbeta(900, 1.5, 5), rbinom(100, 1, 0.15))
X1 = rbinom(1000, 1, 0.4)
X2 = 1 - X1
df = cbind(Y, X1, X2) %>% as.data.frame()

This is what I would like to do:

fit1 <- ordbetareg(
bf(Y ~ 0 + X1 + X2),
data = df,
iter = 500, warmup = 100,
chains = 1, cores = 1
)

Note that the output of fit1 includes a coefficient for the intercept. This runs, but if you include a prior like extra_prior = set_prior("normal(0.1, 0.1)", class = "b") (which I would like to do) then there is an error about specifying duplicate priors.

I am aware that the documentation for ordbetareg() says "Please avoid using 0 or Intercept in the formula definition." I figured this might just be a feature of the function, so I tried just including the dummy in the intercept and specifying a prior for the intercept.

fit2 <- ordbetareg(
bf(Y ~ 1 + X1),
data = df,
extra_prior = set_prior("normal(0.1, 0.1)", class = "Intercept"),
iter = 500, warmup = 100,
chains = 1, cores = 1
)

This returns an error: Error in nlist(prior, class, coef, group, resp, dpar, nlpar, lb, ub, check) : object 'Intercept' not found. If you run the same line without the extra_prior = argument, there is no issue, and the intercept coefficient is included in the output.

This is my workaround, which works without any issues:

fit3 <- brm(
bf(Y ~ 0 + X1 + X2),
data = df, family = ord_beta_reg,
prior = set_prior("normal(0.1, 0.1)", class = "b"),
iter = 1000, warmup = 500,
chains = 1, cores = 1,
stanvar = stanvar
)

Prior and post-processing function problems for models that predict phi

Hi!

The ordbetareg:::.load_ordbetareg function only allows extra_prior for ‘priors’, not for ‘priors_phireg’ (and the latter also has a hardcoded prior for b coefficients which interferes with set_prior from the outside).

Also, LOO doesn’t work when phi is predicted (at least seems to get stuck for a long time so I stop the process).
Not sure if the following are specific for some characteristic of my model. My formula (without the '0 + Intercept') has the form:

bf( Y ~ factor + (factor | id), phi ~ factor + (factor | id) )

An apparent fix from an adaptation of your early code (https://gist.githubusercontent.com/StaffanBetner/6ec4a10a42d75faf31e612148a8a7789/raw/induced_dirichlet_ordbeta.R) is to change portions of the log_lik and posterior_predict functions, from:

phi <- draws$dpars$phi
to
if(NCOL(draws$dpars$phi)==1){phi <- draws$dpars$phi}else
{phi <- draws$dpars$phi[, i]}

Compilation error when sample_prior=TRUE

I've been working with ZOIB regression models in brms and decided to try the ordered beta model after seeing your posts on the Stan forum and browsing through your paper (which I hope to read in more detail). I included sample_prior=TRUE as an argument in the ordbetareg function, which caused a compilation error. I'm not sure if I should have expected sample_prior_TRUE to work, but I wanted to bring this to your attention in case it's a bug. When I removed the argument, the sampling completed as expected. Now I just have to spend some time getting a better understanding of the model. Thanks for making this model and package available!

Below is a reproducible example using the pew data from the Vignette.

ord_fit_mean <- ordbetareg(formula=therm ~ mo(education)*mo(income) + (1|region), 
                           data=model_data,
                           cores=2,chains=2,iter=1000,
                           sample_prior=TRUE,
                           refresh=0)

Compiling Stan program...
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 

  induced_dirichlet_rng(vector, int)

Function induced_dirichlet_rng not found.
 error in 'model165e36ef63963_4801811fdbe1356d482a6c93380fb357' at line 148, column 55
  -------------------------------------------------
   146:   real prior_phi = exponential_rng(0.1);
   147:   real prior_sd_1 = student_t_rng(3,0,2.5);
   148:   real prior_thresh = induced_dirichlet_rng([1,1,1]',0);
                                                              ^
   149:   // use rejection sampling for truncated priors
  -------------------------------------------------

Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model '4801811fdbe1356d482a6c93380fb357' due to the above error.
In addition: Warning message:
Rows containing NAs were excluded from the model.

Can we remove the sf dependency?

Thanks for this package. It looks potentially well suited to a project that I'm currently working on.

Unfortunately, I am unable to install ordbetareg because (a) I am working in a CentOS 7 environment, and (b) you have a recursive dependency on sf (via the transformr and gganimate deps). Installing sf on CentOS is huge pain because of the GDAL system dependency, which has to be configured and compiled from source.

I'm unfortunately stuck with CentOS 7 b/c of work requirements, so there's not much I can do about (a). But perhaps there's something we can do about (b)? I don't know if transformr and gganimate are playing an essential role in the core ordbetareg functionality but it seems doubtful to me. At least, requiring a hard dependency on an entire geospatial stack (both R and system-level deps) does not seem desirable for a narrowly-focused econometric package, regardless of the user's operating system.

Is there a way to split out any ancillary functionality into a separate, companion package that would remove the recursive sf dependency from ordbetareg?

Thanks for considering.

Error using LOO with: pointwise = TRUE

Thanks for this incredibly useful package!
I'm running into issues trying to use loo() for model comparison. I have very large models (fitted on imputed data using brm_multiple()) so I'm trying to use pointwise estimates for LOO to avoid reaching maximum memory usage. loo(..., pointwise = FALSE) runs fine on (smaller) ordbetareg models, but with pointwise = TRUE I get the following error:
Error in draws$dpars$mu[, i]
It looks to me like the draws are not in the format that log_lik_ord_beta_reg() expects them to be in. Depending on how loo() extracts them, the specific draws for mu are probably stored in draws$dpars$mu$fe. I had a look at this in the example below (I'm also interested in effects on phi, which I think may have a similar problem). I really have no idea about the inner workings of LOO though.

    require(ordbetareg)
    # Test data ---------------------------------------------------------------
    set.seed(20230202)
    #data where both mu & phi change as a function of x
    dd <- data.frame(yy = round(plogis(
                            rnorm(n = 101, 
                                  mean = -50:50 / 5, 
                                  sd = seq(from = 5, 
                                           to = 0.1, 
                                           length.out = 101)
                                  )), 2),
                    xx = -50:50 / 5)
    # Fit ordered beta model --------------------------------------------------
    ordb_model<- ordbetareg( formula =
                  bf(yy ~ xx, phi ~ xx),
                  phi_reg = TRUE,
                  data = dd,
                  cores = parallel::detectCores()-1,
                  chains =  parallel::detectCores()-1,
                  iter = 1e3,
                  init = '0',
                  backend = 'cmdstanr'
                )

    # Inspect model predictions ----------------------------------------------
    loo_heavy <- loo(ordb_model)
    ## Computed from 3500 by 4901 log-likelihood matrix
    ## 
    ## Estimate     SE
    ## elpd_loo  -9065.8  524.2
    ## p_loo      8283.9  482.7
    ## looic     18131.7 1048.5
    ## ------
    ##   Monte Carlo SE of elpd_loo is NA.
    ## 
    ## Pareto k diagnostic values:
    ##   Count Pct.    Min. n_eff
    ## (-Inf, 0.5]   (good)     4047  82.6%   209       
    ## (0.5, 0.7]   (ok)        198   4.0%   57        
    ## (0.7, 1]   (bad)       183   3.7%   14        
    ## (1, Inf)   (very bad)  473   9.7%   1                

    loo_point <- loo(ordb_model, pointwise = TRUE)
    ##Error in draws$dpars$mu[, i] : incorrect number of dimensions
    loo_sub_point <- loo_subsample(ordb_model)
    ##Error in draws$dpars$mu[, i] : incorrect number of dimensions

    #indeed the draws do not have this shape
    sub_draws <- prepare_predictions(ordb_model, point_estimate = 'median')
    summary(sub_draws$dpars$mu)
    ## Length Class        Mode   
    ## family 18     customfamily list   
    ## ndraws  1     -none-       numeric
    ## nobs    1     -none-       numeric
    ## fe      2     -none-       list   
    ## sp      0     -none-       list   
    ## cs      0     -none-       list   
    ## sm      0     -none-       list   
    ## gp      0     -none-       list   
    ## re      0     -none-       list   
    ## ac      0     -none-       list  

    #estimates are stored in $fe$b
    head(sub_draws$dpars$mu$fe$b)

    log_lik_ord_beta_reg <-
      function(i, draws) {
        
        # mu <- draws$dpars$mu[,i]
        mu <- with( draws$dpars$mu$fe, b%*%X[i,]  ) #something like this to give estimate for that datapoint
        phi <- with( draws$dpars$phi$fe, b%*%X[i,] ) #something like this to give estimate for that datapoint
        mu <- plogis(mu) # expected on the response scale?
        phi <- plogis(phi) # expected on the response scale?
        y <- draws$data$Y[i]
        cutzero <- draws$dpars$cutzero
        cutone <- draws$dpars$cutone
    
        thresh1 <- cutzero
        thresh2 <- cutzero + exp(cutone)
    
        if(y==0) {
          out <- log(1 - plogis(qlogis(mu) - thresh1))
        } else if(y==1) {
          out <- log(plogis(qlogis(mu) - thresh2))
        } else {
          out <- log(plogis(qlogis(mu)-thresh1) - plogis(qlogis(mu) - thresh2)) + dbeta(y,mu*phi,(1-mu)*phi,log=T)
        }
    
        out
    
      }

    sum(log_lik_ord_beta_reg(20,sub_draws))
    ##-1.188826

Normalisation to construct bounds vs data bounds

Love the package and approach - stumbling across it has been perfect because I often deal with bounded continuous data.

Just following up from twitter exchange (https://twitter.com/JamesSteeleII/status/1502963025925619712?s=20&t=cYgXau8Y7vsb0LPlkF2tLw).

I have the case where my construct DV is theoretically bounded on a particular interval (say [0,1]) and so could take values of 0, but because of experimental design never will take that value and so it does not appear in the data despite being possible. It seems as though the normalize function treats whatever the lowest value in the data is and treats that as the lowest bound to normalize to (and conversely the highest value for the upper bound). An easy fix as you noted was to add a row with the lowest bound value and NA for covariates (and conversely that could be done in instances where the highest construct value isn't in the data I suppose). But would be great if this could be specified in the normalize function (and ordbetareg function) itself... something like telling it what the construct bounds are so it can normalize observed values to that interval.

Thanks again!

manual intercepts via `intercept_prior_mean` give error

Since v0.7.0, handing intercept_prior_mean = 0 and intercept_prior_SD = 5 should allow to configure a prior N(0,5) for the intercept. However, the error Error: The following priors do not correspond to any model parameter: b_Intercept ~ normal(0,5) is returned.

Reproducible example:

library(dplyr)
data("pew")

model_data <- select(pew,therm, education="F_EDUCCAT2_FINAL", region="F_CREGION_FINAL", income="F_INCOME_FINAL")

ord_fit_mean <- ordbetareg(formula=therm ~ education + income + (1|region),
    data=model_data,
    intercept_prior_mean = 0,
    intercept_prior_SD  = 5,
    cores=2,chains=2)

Session info

R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=English_Germany.utf8 LC_CTYPE=English_Germany.utf8 LC_MONETARY=English_Germany.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Germany.utf8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] dplyr_1.1.1 ordbetareg_0.7.1 brms_2.19.0 Rcpp_1.0.10

loaded via a namespace (and not attached):
[1] TH.data_1.1-1 colorspace_2.1-0 ellipsis_0.3.2 class_7.3-21 estimability_1.4.1 markdown_1.6 base64enc_0.1-3
[8] rstudioapi_0.14 proxy_0.4-27 farver_2.1.1 rstan_2.26.22 DT_0.27 fansi_1.0.4 mvtnorm_1.1-3
[15] bridgesampling_1.1-2 codetools_0.2-19 splines_4.2.3 shinythemes_1.2.0 bayesplot_1.10.0 jsonlite_1.8.4 gganimate_1.0.8
[22] shiny_1.7.4 compiler_4.2.3 emmeans_1.8.5 backports_1.4.1 Matrix_1.5-4 fastmap_1.1.1 cli_3.6.1
[29] later_1.3.0 tweenr_2.0.2 htmltools_0.5.5 prettyunits_1.1.1 tools_4.2.3 igraph_1.4.2 coda_0.19-4
[36] gtable_0.3.3 glue_1.6.2 reshape2_1.4.4 posterior_1.4.1 V8_4.3.0 vctrs_0.6.1 nlme_3.1-162
[43] transformr_0.1.4 crosstalk_1.2.0 tensorA_0.36.2 stringr_1.5.0 ps_1.7.4 mime_0.12 lpSolve_5.6.18
[50] miniUI_0.1.1.1 lifecycle_1.0.3 gtools_3.9.4 MASS_7.3-58.3 zoo_1.8-12 scales_1.2.1 colourpicker_1.2.0
[57] hms_1.1.3 promises_1.2.0.1 Brobdingnag_1.2-9 parallel_4.2.3 sandwich_3.0-2 inline_0.3.19 shinystan_2.6.0
[64] curl_5.0.0 gridExtra_2.3 ggplot2_3.4.2 loo_2.6.0 StanHeaders_2.26.22 stringi_1.7.12 dygraphs_1.1.1.6
[71] e1071_1.7-13 checkmate_2.1.0 pkgbuild_1.4.0 rlang_1.1.0 pkgconfig_2.0.3 matrixStats_0.63.0 distributional_0.3.2
[78] lattice_0.20-45 purrr_1.0.1 sf_1.0-12 rstantools_2.3.1 htmlwidgets_1.6.2 processx_3.8.0 tidyselect_1.2.0
[85] plyr_1.8.8 magrittr_2.0.3 R6_2.5.1 magick_2.7.4 generics_0.1.3 multcomp_1.4-23 DBI_1.1.3
[92] withr_2.2.0 pillar_1.9.0 units_0.8-1 xts_0.13.0 survival_3.5-5 abind_1.4-5 tibble_3.2.1
[99] crayon_1.5.2 KernSmooth_2.23-20 utf8_1.2.3 progress_1.2.2 grid_4.2.3 callr_3.7.3 threejs_0.3.3
[106] digest_0.6.31 classInt_0.4-9 xtable_1.8-4 tidyr_1.3.0 httpuv_1.6.9 RcppParallel_5.1.7 stats4_4.2.3
[113] munsell_0.5.0 shinyjs_2.1.0

Documentation of link functions

At various points in the documentation, it looks like the scale of $\phi$ is described as if its linear predictor is transformed with the logit link function. For example, see below. However, isn't $\phi$ modelled with the log link while $\mu$ uses the logit link?

pp_check_ordbetareg grouped discrete plot shows count of total data rows in every level of grouping variable

Using the group argument in pp_check_ordbetareg should produce a discrete plot that compares the modeled counts in each level of the group variable to the actual counts in each level of the group variable. However, the plot shows the same total count of data rows for every level. Below is a reproducible example:

library(tidyverse)
library(brms)
library(ordbetareg)
library(patchwork)

data("pew")
model_data <- select(pew,therm,age="F_AGECAT_FINAL",
                     sex="F_SEX_FINAL",
                     income="F_INCOME_FINAL",
                     ideology="F_IDEO_FINAL",
                     race="F_RACETHN_RECRUITMENT",
                     education="F_EDUCCAT2_FINAL",
                     region="F_CREGION_FINAL",
                     approval="POL1DT_W28",
                     born_again="F_BORN_FINAL",
                     relig="F_RELIG_FINAL",
                     news="NEWS_PLATFORMA_W28") %>% 
  mutate_at(c("race","ideology","income","approval","sex","education","born_again","relig"), function(c) {
    factor(c, exclude=levels(c)[length(levels(c))])
  }) %>% 
  # need to make these ordered factors for BRMS
  mutate(education=ordered(education),
         income=ordered(income))

ord_fit_mean <- ordbetareg(formula=therm ~ mo(education)*mo(income) +
                             (1|region), 
                           data=model_data,
                           control=list(adapt_delta=0.8),
                           cores=1,chains=1,iter=500,
                           refresh=0)

pp_check_ordbeta(ord_fit_mean, group=ord_fit_mean$data$education)

Default prior on phi

Following up on a comment elsewhere, the default prior for $\phi$ seems informative in typical situations. Thinking about the survey researcher case, although other uses obviously exist for the model, researchers typically offer response scales for survey items that people only choose the extremes on. Usually they would redesign the item to attain more information by adding more response options or making the extremes middling values. Of course, sometimes that doesn't happen and you get extremely bimodal distributions. Yet even in unusual cases, there's still a fair amount of density in the center of the response scale. Perhaps a minimum of 30% of responses falling in (.1, .9) is a reasonable lower bound for what we'd expect given the typical mean response.

We don't really get even close to that unless we move from 0.1 to 0.4 (minimum) as a prior. Does this suggest 0.1 is an extremely informative prior rather than an uninformative one? While this ends up affecting the actual variance/clustering, the issue is that the current prior largely encourages bimodal distributions that avoid concentrated unimodal distributions by assuming people essentially only are using the extreme tails of the distribution.

library(FlexReg)
library(dplyr)
library(tidyr)
library(purrr)
set.seed(123)

# simulation parameters
phi <- c(0.1, 0.4, 3)
mu <- seq(0.1, 0.5, by = 0.1)
sims_n <- 10000
lower <- 0.1
upper <- 0.9

# simulate from mean-precision parameterized beta distributions
grid <- expand_grid(phi, mu)

sims <- map2(.y = grid$phi,
             .x = grid$mu,
             ~ rBeta_mu(n = sims_n, mu = .x, phi = .y))

# calculate density in specified interval
grid %>%
  mutate(pct = map_dbl(sims, ~ mean(ifelse(
    .x < upper & .x > lower, 1, 0
  )))) %>%
  round(2)
#> # A tibble: 15 x 3
#>      phi    mu   pct
#>    <dbl> <dbl> <dbl>
#>  1   0.1   0.1  0.04
#>  2   0.1   0.2  0.06
#>  3   0.1   0.3  0.08
#>  4   0.1   0.4  0.1 
#>  5   0.1   0.5  0.1 
#>  6   0.4   0.1  0.12
#>  7   0.4   0.2  0.2 
#>  8   0.4   0.3  0.27
#>  9   0.4   0.4  0.32
#> 10   0.4   0.5  0.32
#> 11   3     0.1  0.31
#> 12   3     0.2  0.57
#> 13   3     0.3  0.76
#> 14   3     0.4  0.86
#> 15   3     0.5  0.9

Error running sample code

Hello Dr. Kubinec,

Thank you so much for the paper and package. It is very novel to me and I am learning from you.

I run into an error running the sample code on https://cran.r-project.org/web/packages/ordbetareg/vignettes/package_introduction.html , I paste it here. When I set run_model to True,

if(run_model) { ord_fit_mean <- ordbetareg(formula=therm ~ mo(education)*mo(income) + (1|region), data=model_data, control=list(adapt_delta=0.95), cores=1,chains=1,iter=500, refresh=0) } else { data("ord_fit_mean") }

It shows error like this.

Thank you very much.

pp_check() error with ordbetareg brmsfit objects

I've noticed the few times I have fit ordbetareg models that, while I can use plot() to produce trace plots to inspect the model, pp_check() doesn't appear to work. I get the following error each time I try to use the function on an ordbetareg model:

> pp_check(model) 
Using 10 posterior samples for ppc type 'dens_overlay' by default.
Error in checkForRemoteErrors(val) : 
  23 nodes produced errors; first error: invalid arguments