drizopoulos / glmmadaptive Goto Github PK
View Code? Open in Web Editor NEWGLMMs with adaptive Gaussian quadrature
Home Page: https://drizopoulos.github.io/GLMMadaptive/
GLMMs with adaptive Gaussian quadrature
Home Page: https://drizopoulos.github.io/GLMMadaptive/
So I'm trying to compare a two-part model that incorporates random effects with a similar version that does not. I love how wonderful the mixed_model()
function is, but I'm wondering if there's a way to run it without the random effect component.
The normal call (which uses a random effect by group) is below:
mixed_model(Y~ X,
random = ~ 1|group,
data=df,
family = hurdle.lognormal(), n_phis = 1,
zi_fixed = ~ Z")
However, whenever I set random = ~1
or random = NULL
or random = ~0
, I get an error. Is there any way to run this two-part hurdle model without using a random effect? I would really like to see how the model compares without it.
Thank you so much for your time.
I have developed a lognormal hurdle model (m) and have a fake dataframe (fakeDF) that contains median values for all but one model covariate to obtain marginal effects plots. I am unable to use the effectPlotData(m,fakeDF) call. I get the following error:
Error in match.arg(sub_model) :
'arg' should be one of “main”, “zero_part”
When I use predict(m, fakeDF,type_pred="response",type ="mean_subject"), I get different predictions as compared to exp(fakeDF %*% beta).
Can you provide any guidance? Thanks -- Leigh Ann Starcevich
No idea where to locae the source from this error, but this rather simple looking model fails to fit with mixed_model()
. Any ideas, why?
library(GLMMadaptive)
library(lme4)
m1 <- glmer(
cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp,
family = binomial
)
m2 <- mixed_model(
cbind(incidence, size - incidence) ~ period,
random = ~ 1 | herd,
data = cbpp,
family = binomial
)
#> Error in -(X * if (NCOL(y) == 2) y[, 1] else y) + sc: non-conformable arrays
Created on 2019-01-22 by the reprex package (v0.2.1)
Hello,
I first applied the mixed model with the custom beta family available in your website to the data.
Then I tried to use the function resids_plot available in your website:
I get this error message when I use it:
The problem seems to come from the arguments of the function .
Do I need to change them ?
Thanks,
Thomas
Hello,
First off, thank you for this amazing library! It is so helpful, and it's also vital for the research we're doing. We really appreciate you making it available!!
Today I changed a little bit of my model/data and am now getting an error that looks like this:
Error in if (any(zero_ind <- p_yb == 0)) { :
missing value where TRUE/FALSE needed.
Unfortunately, I get this error even if I download and build the library directly from GitHub. And so I searched to discover how it could be possible for the statement "zero_ind <- p_yb == 0.0" (found on line 133 in mixed_fit.R and on lines 32 and 74 in Fit_Funs.R) to return only NA or a list of NAs. If p_yb were all NAs, "any" would also return NA and the "if" statement would fail. Do you know how it is possible for p_yb to consist entirely of NAs? Or is there a recommended action to take when you have NAs in p_yb? Or as an alternative, can you catch the error before this test, perhaps, and help me know what to correct? I'd love to fix my data, but I'm not entirely sure what is wrong. Any chance you could take a look?
Thank you!!
Bradley
I am using the GLMMadaptive package to estimate a two-part mixed model where my groups are geographical regions but cases from the same region may have different weights. As I can see that GLMMadaptive only allow weights matching with the number of groups, would it be possible to use case-specific weights? That is, the length of weights matches the number of rows of data. Thanks!
Also, thanks for creating such a helpful package!
Sorry to post an issue for a question. My question is equivalent to this stackoverflow post. Is it possible to use GLMMadaptive with more than one random effect? For example, with lme4 I might fit a model like
d <- ltm::LSAT
d$respondent <- seq_len(nrow(d))
d <- tidyr::pivot_longer(
d,
-respondent,
names_to = "item",
values_to = "response"
)
m <- lme4::glmer(
response ~ 1 + (1 | respondent) + (1 | item),
data = d,
family = binomial(link = "logit")
)
Is this model possible with GLMMadaptive?
Hi Dimitris,
First of all thank you for this great package. I think it offers the best trade-off between speed, flexibility and ease of use for fitting multilevel left censored models. Much appreciated.
I ran in a problem today though when I wanted to predict values from a model with a not positive definite Hessian matrix. I got the following warning:
Warning message:
In GLMMadaptive::mixed_model(fixed = formula_fixed, random = formula_random, :
Hessian matrix at convergence is not positive definite; unstable solution.
And when trying to use predict on the same data
predict(m, type = 'subject_specific', newdata = df)
I got
Error in y[, 2L] : incorrect number of dimensions
The fixed effect prediction (predict(m, type = 'mean_subject')
works just fine.
Can this be solved or is this intended behaviour? My own hacked together solution that just uses ranef() and fixef() for predictions works just fine FWIW.
Let me know if I need to create a repex with a model that does not converge.
Thank you!
Would it be possible to implement more methods like family()
or formula()
?
Hi Dimitris,
I am currently making another attempt to integrate GLMMadaptive to DHARMa, see florianhartig/DHARMa#96
One point that I noted is that GLMMadaptive does not have a refit function, as in https://www.rdocumentation.org/packages/lme4/versions/1.1-26/topics/refit
I tried to circumvent this via the following code
getRefit.MixMod <- function(object, newresp, ...) {
responsename = colnames(model.frame(object))[1]
newDat = object$data
newDat[, match(responsename,names(newDat))] = newresp
update(object, data = newDat)
}
The issue with that is that this fails for k/n binomial with c(s,f) ~ pred syntax
Any great idea how to solve this with a quick hack that is not super hacky, or do you see an option to supply a refit function in GLMMadaptive?
Maybe I am misunderstanding the max_coef_value
control argument:
dat <- data.frame(xi=rep(1,10), mi=rep(9,10), id=1:10)
dat[1:5,"xi"] <- 4
dat[1:5,"mi"] <- 6
GLMMadaptive::mixed_model(cbind(xi,mi) ~ 1, random = ~ 1 | id, data=dat, family=binomial)
GLMMadaptive::mixed_model(cbind(xi,mi) ~ 1, random = ~ 1 | id, data=dat, family=binomial, control=list(max_coef_value=1))
But I would have expected the intercept to be constrained to <= 1 in the second fit.
Hello Dr. Rizopoulos,
I have a question regarding mixed-effects logistic regression using your very impressive GLMMadpative package:
When creating a model for repeated measures data and the outcome variable has a rare occurrence rate (< 3%), meaning there are a lot of zeroes in the data, would it be appropriate to fit a hurdle or zero-inflated model (e.g., negative binomial)? I know the zi and hurdle models are meant for count data but I read that a binomial regression is first fit to assess the structural nature of the zeroes before fitting the NB thus yielding a better fitting model. I do have access to the counts of the outcome as well, but the true interest is whether we can accurately predict if an event occurs or not; the outcome of interest is not really the number of the particular event considering the data is at an "id and month" level.
I've reviewed King and Zeng (2001) paper about correcting standard errors for logistic regression when analyzing rare events (techniques are drawn from using logistic regression on small sample sizes) but I do not think the corrections are applicable to clustered data - I'm not statistically savvy enough (yet) to develop a proof to fully work through it but the logic seems sound.
In summation, I'm wondering what modeling efforts would be best suited for rare, recurrent, binary events using count covariates for repeated measures using your package?
Tried to implement the residuals for #11, but realised that predict works with mean_subject, but not subject_specific ... is this a bug, or is there a reason why this wouldn't work?
> predict(fittedModel, type_pred = "response", type = "subject_specific")
Error: object of type 'closure' is not subsettable
The residual function doesn't work for a binomial model with response as factor (y/n), I guess you need to convert the factor first before comparing to predictions.
when i have a merMod object from lmer, is there a way i can utilize the predict.MixMod function for dynamic subject-specific predictions?
I just noticed the following when trying to fit a model where the response is constant:
dat <- data.frame(xi=rep(1,10), mi=rep(9,10), id=1:10)
lme4::glmer(cbind(xi,mi) ~ 1 + (1 | id), data=dat, family=binomial)
Throws error (Error: Response is constant
). But:
GLMMadaptive::mixed_model(cbind(xi,mi) ~ 1, random = ~ 1 | id, data=dat, family=binomial)
runs, but gives an estimate larger than 0 for the variance component. Also, the intercept estimate is slightly off (it should be log(1/9)
), although that difference only shows up in the 4th decimal place, so I would consider that negligible.
Switching to Nelder-Mead gives the expected result:
GLMMadaptive::mixed_model(cbind(xi,mi) ~ 1, random = ~ 1 | id, data=dat, family=binomial, control=list(optim_method="Nelder-Mead"))
Now the variance component is estimated to be =~ 0.
This is obviously a degenerate case, but reporting it regardless in case you are interested.
Hello --
I recently reran some analyses that were originally performed in Rv3.5.0 in Rv3.6.1. The marginal coefficient estimates of the models were similar, but the SEs of the marginal coefficient estimates were larger -- nearly twice as large in some cases -- when run in v3.6.1. I see from the Changelog that some changes were made to the marginal_coefs() function for faster implementation -- would this have affected the SEs? If so, which version of the function is more accurate?
Here is the structure and output of one of the models in question:
MM.egg50 <- mixed_model(fixed = Egg.egg ~ Dens.juncta.leaf + Dens.juncta.plant +
Dens.patch.50 + Plant.biomass + Patch.50.biomass + Day.of.year,
random = ~1|Cohort.ID, data = datsc, family = binomial(),
control = list(optim_method = "SANN"), nAGQ = 30)
#output from summary() command is identical
#output from Rv3.5.0
MM.egg50.mc <- marginal_coefs(MM.egg50, std_errors = TRUE)
MM.egg50.mc
Value Std.Err z-value p-value
(Intercept) 3.8304 0.7230 5.2980 < 1e-04
Dens.juncta.leaf 0.6343 0.1733 3.6592 0.00025297
Dens.juncta.plant -2.3122 0.5102 -4.5317 < 1e-04
Dens.patch.50 4.0207 1.0459 3.8443 0.00012091
Plant.biomass -0.4582 0.2464 -1.8597 0.06292951
Patch.50.biomass 0.6686 0.2934 2.2789 0.02267285
Day.of.year -1.7861 0.2829 -6.3135 < 1e-04
#output from Rv3.6.1
MM.egg50.mc <- marginal_coefs(MM.egg50, std_errors = TRUE)
MM.egg50.mc
Estimate Std.Err z-value p-value
(Intercept) 3.8504 0.9785 3.9350 < 1e-04
Dens.juncta.leaf 0.6423 0.3425 1.8750 0.0607878
Dens.juncta.plant -2.3245 0.7137 -3.2570 0.0011259
Dens.patch.50 4.0205 1.3579 2.9607 0.0030695
Plant.biomass -0.4663 0.4778 -0.9760 0.3290571
Patch.50.biomass 0.6710 0.5041 1.3311 0.1831415
Day.of.year -1.8071 0.4642 -3.8930 < 1e-04
Thanks,
Jessie
I attempted to have a random intercept only for the zero-inflated part of the model but I was unable to proceed due to an error saying I needed an argument for "random" (i.e., the count part). It is possible to only have a random intercept for the zi portion?
Hello,
Thanks for this great package!
I am fitting a mixed model for ordinal data (collected from 306 participants at 3 timepoints) and have been able to successfully fit a continuation ratio model under the forward formulation using the example code from: https://drizopoulos.github.io/GLMMadaptive/articles/Ordinal_Mixed_Models.html. However when I try to fit the model under backward formulation (what I would prefer), I run into an issue when using cr_setup() with direction = "backward":
Warning in y == (ncoefs - cuts) :
longer object length is not a multiple of shorter object length
The only thing I can think of as to why the vectors may not be matching is that I have missing data for some participants. I'm not sure why this would be an issue with the backward but not forward formulation set-up.
Any thoughts/guidance would be much appreciated!
Thank you,
Mariann
Is it possible to use weights in mixed_model()
? I haven't found anything in the help files.
Hi Dimitris,
also, I had a question about the weights:
a) if they are simple multipliers to the log likelihood, why can't they be supplied per observation?
b) if weights are supplied, I assume they are ignored in the simulation? Maybe it would be good to throw a warning in this case, as the user should be made aware that the model is fit according to a different likelihood than what it simulates from. This is a problem for DHARMa, but potentially also for other packages / code that uses the simulations for inference.
Best,
Florian
Hello Prof Dimitris Rizopoulos,
Many thanks for developing such a great tool for us.
I'm developing a two-part mixed effects model for semi-continuous data. May I know how to get the effect size of the model? Maybe just like conditional or marginal R square for general mixed effects models?
Have a blessed 2022!
Chenghao
noted this during my unit tests with DHARMa
library(DHARMa)
library(GLMMadaptive)
testData = createData(sampleSize = 200, overdispersion = 0, randomEffectVariance = 0, family = binomial(), binomialTrials = 20)
fittedModel <- mixed_model(cbind(observedResponse1,observedResponse0) ~ Environment1 , random = ~ 1 | group , family = "binomial", data = testData)
simulate(fittedModel)
The hurdle model from pscl package gives this option to predict response, count, probability and zero (which is the ratio that is multiplied by count and gives response). Does "zero_part" from GLMMadaptive package give probability from Poisson or probability from binary logistic regression?
Would it be possible for family()
to also return an element that contains the variance-function of the model? See https://stat.ethz.ch/R-manual/R-patched/library/stats/html/family.html
I have read in the news-file from the dev-version that the effects-package is supported. I tried to compute effects from the first model in your ZI-vignette (https://drizopoulos.github.io/GLMMadaptive/articles/ZeroInflated_and_TwoPart_Models.html), but I get an error:
effects::Effect(c("sex", "time"), fm1)
#> Error: 'family' argument seems not to be a valid family object
Is support not fully working yet?
Hi Dimitris, as you know, I'm trying to integrate GLMMadative in DHARMa.
I'm currently fighting with a somewhat surprising behavior of your predict() function - is it so that predict requires newdata? My expectation from other packages would have been that the function would fall back on the fit data if newdata is not provided, so that you could do, e.g.
fm1 <- mixed_model(fixed = y ~ year * group, random = ~ 1 | id, data = DF,
family = binomial())
predict(fm1)
Dear dr. Rizopoulos
Thanks for writing GLMMadaptive. It's been intuitive to work with, even for somebody who's new to GLMM like myself.
My question is regarding model selection: What approach would you advice in model selection? In your vignette Goodness of Fit you suggest using DHARMa to aid in the selection. With binomial models, only the residuals vs fitted graph is useful to assess fitness. I believe that this is hard, particularly when all models do not fit perfectly. Can you suggest more tools or approaches for selecting models?
Kind regards, Michel
Hi Dimitris,
Thanks a lot for your work on this very useful package.
I would like to use GLMMadaptive
to calculate a cross-validated log-likelihood, i.e., to estimate model parameters using a subset of the observations and then evaluate the marginal log-likelihood for the hold-out set. I was, therefore, wondering if it is possible to do the integration in GLMMadaptive
without any optimization?
Sebastian
Thanks for the awesome package! Not sure if this is an issue, but seems like the best place to post about this: When I try to run a model with some variables, I get the error "$ operator is invalid for atomic vector" and the model fails to run, but I don't see where I am using the $ operator on an atomic vector. Here's the code I'm running:
m1<- mixed_model(DV_zeros~VarA, random = ~1|Subject, data = tmp, family = hurdle.lognormal, zi_fixed = ~VarA, zi_random = ~1|Subject, na.action = na.exclude)
but if I substitute VarA for VarB, which is of the same class, or if I replace VarA in the zi_fixed portion for 1, I can run without an error. I can't figure out why I'm getting this error - I was looking through a lot of your source code but can't seem to find it. Do you know why I'm getting this error? Thanks!
Would it be possible to have the sandwich
argument added directly to the mixed_model
function, rather than as an additional argument in standard generic functions (summary
, vcov
, confint
, etc.).
This would allow for compatibility with broom.mixed
and integration into a tidy workflow.
Love the package, thanks in advance!
Hi developer,
Thanks for creating this package.
I encounter below error when running below code. May I please have your help? Thanks!!
gm1 <- mixed_model(fixed = taxon ~ X + time, random = ~ time|SubjectID, data = df, family=zi.negative.binomial(), zi_fixed = ~ 1)
Error in chol.default(X[[i]], ...) :
the leading minor of order 1 is not positive definite
Hi there,
I was trying to implement the negative binomial model example. However, I got the following error message:
library(GLMMadaptive)
> set.seed(102)
> dd <- expand.grid(f1 = factor(1:3), f2 = LETTERS[1:2], g = 1:30, rep = 1:15,
+ KEEP.OUT.ATTRS = FALSE)
> mu <- 5 * (-4 + with(dd, as.integer(f1) + 4 * as.numeric(f2)))
> dd$y <- rnbinom(nrow(dd), mu = mu, size = 0.5)
>
> gm1 <- mixed_model(fixed = y ~ f1 * f2, random = ~ 1 | g, data = dd,
+ family = negative.binomial())
Error in negative.binomial() : 'theta' must be specified
here is my session information:
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux
Matrix products: default
BLAS/LAPACK: /opt/intel/18.0.3/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] optimParallel_0.8-1 dplyr_0.8.3 data.table_1.12.8 nloptr_1.2.1 optimx_2018-7.10
[6] lme4_1.1-21 Matrix_1.2-17 GLMMadaptive_0.6-9
loaded via a namespace (and not attached):
[1] tidyselect_0.2.5 remotes_2.1.0 purrr_0.3.2 splines_3.6.0 lattice_0.20-38
[6] testthat_2.2.1 usethis_1.5.1 rlang_0.4.0 pkgbuild_1.0.5 pillar_1.4.2
[11] glue_1.3.1 withr_2.1.2 sessioninfo_1.1.1 matrixStats_0.55.0 devtools_2.1.0
[16] memoise_1.1.0 fst_0.9.0 callr_3.3.1 ps_1.3.0 curl_4.0
[21] Rcpp_1.0.3 backports_1.1.4 desc_1.2.0 pkgload_1.0.2 fs_1.3.1
[26] packrat_0.5.0 digest_0.6.20 processx_3.4.1 numDeriv_2016.8-1.1 grid_3.6.0
[31] rprojroot_1.3-2 cli_1.1.0 tools_3.6.0 magrittr_1.5 tibble_2.1.3
[36] pacman_0.5.1 crayon_1.3.4 pkgconfig_2.0.2 MASS_7.3-51.4 prettyunits_1.0.2
[41] assertthat_0.2.1 minqa_1.2.4 rstudioapi_0.10 R6_2.4.0 boot_1.3-22
[46] nlme_3.1-139 compiler_3.6.0
Thank you,
Miao
I'm comparing models fitted with your package and glmmTMB, to see how results match or differ. Here you see two examples. In the first, the coefficients from the count-component differ, while the zero-inflation part are identical. In the second example, coefficients both of the count-component and zero-inflated part are (almost) identical.
Do you have any ideas where the differences in the first model come from?
library(glmmTMB)
library(GLMMadaptive)
data("Salamanders")
m1 <- glmmTMB(
count ~ spp + mined + (1 | site),
ziformula = ~ spp + mined,
family = truncated_poisson(),
data = Salamanders
)
m2 <- mixed_model(
count ~ spp + mined,
random = ~1 | site,
zi_fixed = ~ spp + mined,
family = hurdle.poisson(),
data = Salamanders
)
summary(m1)
#> Family: truncated_poisson ( log )
#> Formula: count ~ spp + mined + (1 | site)
#> Zero inflation: ~spp + mined
#> Data: Salamanders
#>
#> AIC BIC logLik deviance df.resid
#> 1790.2 1866.1 -878.1 1756.2 627
#>
#> Random effects:
#>
#> Conditional model:
#> Groups Name Variance Std.Dev.
#> site (Intercept) 0.05318 0.2306
#> Number of obs: 644, groups: site, 23
#>
#> Conditional model:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.06702 0.20605 -0.325 0.7450
#> sppPR -0.52093 0.27872 -1.869 0.0616 .
#> sppDM 0.22458 0.14485 1.550 0.1210
#> sppEC-A -0.19548 0.20122 -0.972 0.3313
#> sppEC-L 0.64672 0.13229 4.889 1.01e-06 ***
#> sppDES-L 0.60514 0.13027 4.645 3.39e-06 ***
#> sppDF 0.04602 0.15397 0.299 0.7650
#> minedno 1.01447 0.18724 5.418 6.02e-08 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Zero-inflation model:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 1.7556 0.2833 6.196 5.78e-10 ***
#> sppPR 1.6785 0.3964 4.234 2.30e-05 ***
#> sppDM -0.4269 0.3502 -1.219 0.22288
#> sppEC-A 1.1046 0.3684 2.998 0.00272 **
#> sppEC-L -0.4269 0.3502 -1.219 0.22288
#> sppDES-L -0.6716 0.3519 -1.909 0.05631 .
#> sppDF -0.4269 0.3502 -1.219 0.22288
#> minedno -2.4038 0.2089 -11.508 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(m2)
#>
#> Call:
#> mixed_model(fixed = count ~ spp + mined, random = ~1 | site,
#> data = Salamanders, family = hurdle.poisson(), zi_fixed = ~spp +
#> mined)
#>
#> Data Descriptives:
#> Number of Observations: 644
#> Number of Groups: 23
#>
#> Model:
#> family: hurdle poisson
#> link: log
#>
#> Fit statistics:
#> log.Lik AIC BIC
#> -1005.88 2045.759 2065.063
#>
#> Random effects covariance matrix:
#> StdDev
#> (Intercept) 2.603956
#>
#> Fixed effects:
#> Estimate Std.Err z-value p-value
#> (Intercept) -3.1554 0.2253 -14.0030 < 1e-04
#> sppPR 0.2035 0.2431 0.8368 0.40271196
#> sppDM 0.9135 0.1587 5.7577 < 1e-04
#> sppEC-A 0.6263 0.1941 3.2277 0.00124803
#> sppEC-L 1.4843 0.1376 10.7857 < 1e-04
#> sppDES-L 0.9575 0.1449 6.6075 < 1e-04
#> sppDF 0.3277 0.1573 2.0832 0.03722871
#> minedno 0.7276 0.1896 3.8370 0.00012453
#>
#> Zero-part coefficients:
#> Estimate Std.Err z-value p-value
#> (Intercept) 1.7556 0.2833 6.1963 < 1e-04
#> sppPR 1.6785 0.3964 4.2341 < 1e-04
#> sppDM -0.4269 0.3502 -1.2190 0.2228629
#> sppEC-A 1.1045 0.3684 2.9981 0.0027163
#> sppEC-L -0.4269 0.3502 -1.2190 0.2228629
#> sppDES-L -0.6716 0.3519 -1.9087 0.0563035
#> sppDF -0.4269 0.3502 -1.2190 0.2228629
#> minedno -2.4038 0.2089 -11.5077 < 1e-04
#>
#> Integration:
#> method: adaptive Gauss-Hermite quadrature rule
#> quadrature points: 11
#>
#> Optimization:
#> method: hybrid EM and quasi-Newton
#> converged: TRUE
library(glmmTMB)
library(GLMMadaptive)
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 5 # maximum follow-up time
# we constuct a data frame with the design:
# everyone has a baseline measurment, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects non-zero part
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ 1, data = DF)
# design matrices for the fixed and random effects zero part
X_zi <- model.matrix(~ sex, data = DF)
Z_zi <- model.matrix(~ 1, data = DF)
betas <- c(1.5, 0.05, 0.05, -0.03) # fixed effects coefficients non-zero part
shape <- 2 # shape/size parameter of the negative binomial distribution
gammas <- c(-1.5, 0.5) # fixed effects coefficients zero part
D11 <- 0.5 # variance of random intercepts non-zero part
D22 <- 0.4 # variance of random intercepts zero part
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor non-zero part
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, 1, drop = FALSE]))
# linear predictor zero part
eta_zi <- as.vector(X_zi %*% gammas + rowSums(Z_zi * b[DF$id, 2, drop = FALSE]))
# we simulate negative binomial longitudinal data
DF$y <- rnbinom(n * K, size = shape, mu = exp(eta_y))
# we set the extra zeros
DF$y[as.logical(rbinom(n * K, size = 1, prob = plogis(eta_zi)))] <- 0
fm1 <- mixed_model(y ~ sex * time, random = ~ 1 | id, data = DF,
family = zi.poisson(), zi_fixed = ~ sex)
fm2 <- glmmTMB(y ~ sex * time + (1 | id), data = DF,
family = poisson(), ziformula = ~ sex)
summary(fm1)
#>
#> Call:
#> mixed_model(fixed = y ~ sex * time, random = ~1 | id, data = DF,
#> family = zi.poisson(), zi_fixed = ~sex)
#>
#> Data Descriptives:
#> Number of Observations: 800
#> Number of Groups: 100
#>
#> Model:
#> family: zero-inflated poisson
#> link: log
#>
#> Fit statistics:
#> log.Lik AIC BIC
#> -2223.937 4461.874 4480.11
#>
#> Random effects covariance matrix:
#> StdDev
#> (Intercept) 0.8272898
#>
#> Fixed effects:
#> Estimate Std.Err z-value p-value
#> (Intercept) 1.5276 0.1288 11.8605 < 1e-04
#> sexfemale 0.0174 0.1819 0.0954 0.92396
#> time -0.0040 0.0159 -0.2542 0.79937
#> sexfemale:time 0.0004 0.0230 0.0188 0.98496
#>
#> Zero-part coefficients:
#> Estimate Std.Err z-value p-value
#> (Intercept) -1.2106 0.1411 -8.5811 < 1e-04
#> sexfemale 0.4986 0.1823 2.7348 0.0062411
#>
#> Integration:
#> method: adaptive Gauss-Hermite quadrature rule
#> quadrature points: 11
#>
#> Optimization:
#> method: hybrid EM and quasi-Newton
#> converged: TRUE
summary(fm2)
#> Family: poisson ( log )
#> Formula: y ~ sex * time + (1 | id)
#> Zero inflation: ~sex
#> Data: DF
#>
#> AIC BIC logLik deviance df.resid
#> 4462.6 4495.4 -2224.3 4448.6 793
#>
#> Random effects:
#>
#> Conditional model:
#> Groups Name Variance Std.Dev.
#> id (Intercept) 0.6841 0.8271
#> Number of obs: 800, groups: id, 100
#>
#> Conditional model:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 1.5273202 0.1287453 11.863 <2e-16 ***
#> sexfemale 0.0168426 0.1818901 0.093 0.926
#> time -0.0040292 0.0158893 -0.254 0.800
#> sexfemale:time 0.0004784 0.0229638 0.021 0.983
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Zero-inflation model:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -1.2108 0.1413 -8.570 < 2e-16 ***
#> sexfemale 0.4985 0.1826 2.731 0.00632 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Hi this package has been helpful to prototype ideas. Thank you for your work.
In general your parallel implementation using parLapply() to compute the standard errors of marginalized coefs is handy.
But there are situations where the user would want to run this part in serial.
Line 624 in 6e4ed44
Can you add an argument 'lgl_par' in the marginal_coefs.MixMod
function scope that would toggle this portion of the code
to something like the below
if(lgl_par==TRUE){
# or
# if(cores!=1){
cl <- parallel::makeCluster(cores)
res <- parallel::parLapply(cl, blocks, cluster_compute_marg_coefs, tht = tht,
list_thetas = list_thetas, V = V, XX = X, Z = Z,
X_zi = X_zi, Z_zi = Z_zi, M = M,
object = object, compute_marg_coefs = compute_marg_coefs,
chol_transf = chol_transf, link_fun = link_fun, seed = seed)
parallel::stopCluster(cl)
} else {
res <- base::lapply(#cl,
blocks, cluster_compute_marg_coefs, tht = tht,
list_thetas = list_thetas, V = V, XX = X, Z = Z,
X_zi = X_zi, Z_zi = Z_zi, M = M,
object = object, compute_marg_coefs = compute_marg_coefs,
chol_transf = chol_transf, link_fun = link_fun, seed = seed)
}
out$var_betas <- var(do.call("rbind", res))
Currently, without the ifelse parLapply or lapply,
When I set marginal_coefs(cores=1,...)
your function is still internally making a cluster and using parLapply. This is interfereing (opening unnecessary cores) if marginal_coefs() is part of a larger for loop, where the outter loop is parallelized at the top level
# outer parallel simulations where s iterations are parallelized
for s in 1:5000{
...
marginal_coefs(cores=1,...)
...
}
I received an issue related to this package -- rvlenth/emmeans#378. It appears that your emmeans support methods work pretty well, but in a case where you have, say, a binomial family, the returned emmGrid
object does not "know" that a logit link was used. This created some user confusion.
All that is needed is to find the link and include it in the list misc
returned by emm_basis.MixMod()
. Perhaps looking at the code for similar situations, e.g., emmeans:::emm_basis.lm
(which also works for glm
models) will be helpful.
Hi,
I am trying to work with multiply imputed data and the GLMMAdaptive package.
However, it seems that the glance() method is missing for MixMod objects, which complicates pooling and testing of model parameters over multiply imputed data. Is there a glance() method for MixMod in any package?
Kind regards,
Reinier
Dear Prof. Rizopoulos,
I cannot seem to get the truncated normal family to work in a study I am performing.
To recreate the problem, I use the following toy example:
set.seed(123)
fake_data <- data.frame(Y = rnorm(1000), X = rbinom(1000, 1, 0.4), ID = rep(1:500, 2))
fit <- mixed_model(fixed = Y ~ X, random = ~ 1 | ID, data = fake_data, family = censored.normal())
This, just as the real data I am using, returns "Error in y[, 2L] : incorrect number of dimensions".
I am using version 0.8-5 of the package.
Would be grateful for assistance.
Noam
When I use update()
to re-fit a null-model, I get a warning, and the "null-model" seems to be the full-model.
Could you help me regarding how I can refit the model using update()
to get a null-model?
fm1 <- mixed_model(y ~ sex * time, random = ~ 1 | id, data = DF, family = zi.poisson(), zi_fixed = ~ sex)
fm1_null <- update(fm1, "y ~ 1")
#> Warning in mixed_model(fixed = y ~ sex * time, random = ~1 | id, data =
#> DF, : unknown names in control: formula
summary(fm1_null)
#>
#> Call:
#> mixed_model(fixed = y ~ sex * time, random = ~1 | id, data = DF,
#> family = zi.poisson(), zi_fixed = ~sex, formula = y ~ sex +
#> time + sex:time)
#>
#> Data Descriptives:
#> Number of Observations: 800
#> Number of Groups: 100
#>
#> Model:
#> family: zero-inflated poisson
#> link: log
#>
#> Fit statistics:
#> log.Lik AIC BIC
#> -2223.937 4461.874 4480.11
#>
#> Random effects covariance matrix:
#> StdDev
#> (Intercept) 0.8272898
#>
#> Fixed effects:
#> Estimate Std.Err z-value p-value
#> (Intercept) 1.5276 0.1288 11.8605 < 1e-04
#> sexfemale 0.0174 0.1819 0.0954 0.92396
#> time -0.0040 0.0159 -0.2542 0.79937
#> sexfemale:time 0.0004 0.0230 0.0188 0.98496
#>
#> Zero-part coefficients:
#> Estimate Std.Err z-value p-value
#> (Intercept) -1.2106 0.1411 -8.5811 < 1e-04
#> sexfemale 0.4986 0.1823 2.7348 0.0062411
#>
#> Integration:
#> method: adaptive Gauss-Hermite quadrature rule
#> quadrature points: 11
#>
#> Optimization:
#> method: hybrid EM and quasi-Newton
#> converged: TRUE
Created on 2019-03-20 by the reprex package (v0.2.1)
I'm not sure whether this is intended or not... But when I fit a model that is not convering, an error occurs (instead of probably a warning?).
E.g. modifying your example from here:
https://drizopoulos.github.io/GLMMadaptive/articles/ZeroInflated_and_TwoPart_Models.html
fm1 <- mixed_model(y ~ sex * time, random = ~ 1 | id, data = DF,
family = negative.binomial(), zi_fixed = ~ sex)
#> Error in while (iter < maxits && !converged) { :
#> missing value where TRUE/FALSE needed
This looks like a code-issue, but I'm not sure...
I'm not sure whether the random slope, or the model fit itself is the cause of this error?
library(GLMMadaptive)
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 15 # maximum follow-up time
# we constuct a data frame with the design:
# everyone has a baseline measurment, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ time, data = DF)
betas <- c(-2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
D22 <- 0.1 # variance of random slopes
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor
eta_y <- drop(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))
fm2 <- mixed_model(fixed = y ~ sex * time, random = ~ time || id, data = DF,
family = binomial())
mt <- mean(DF$time)
sdt <- sd(DF$time)
nd <- data.frame(
y = mean(DF$y),
sex = c(rep("male", 3), rep("female", 3)),
time = rep(c(mt - sdt, mt, mt + sdt), 2),
id = NA
)
predict(fm2, newdata = nd, type = "subject_specific", type_pred = "link", se.fit = TRUE)
#> Warning in dbinom(y, 1, mu_y, TRUE): non-integer x = 0.353750
#> Warning in dbinom(y, 1, mu_y, TRUE): non-integer x = 0.353750
#> Warning in dbinom(y, 1, mu_y, TRUE): non-integer x = 0.353750
#> Warning in dbinom(y, 1, mu_y, TRUE): non-integer x = 0.353750
#> Warning in dbinom(y, 1, mu_y, TRUE): non-integer x = 0.353750
#> Warning in dbinom(y, 1, mu_y, TRUE): non-integer x = 0.353750
#> Error in optim(par = b_i, fn = log_post_b, gr = score_log_post_b, method = "BFGS", : initial value in 'vmmin' is not finite
Created on 2019-01-17 by the reprex package (v0.2.1)
I'm running a hurdle lognormal model and I'm interested in the main effects of my categorical variables and the pairwise comparisons. Example of my code:
hln <- mixed_model(missteps_perc ~ Genotype*Session, random = ~ 1 | identifier,
data = data, family = hurdle.lognormal(),
zi_fixed = ~ Genotype*Session, zi_random = ~ 1 | identifier)
emmeans::joint_tests(hln)
emmeans::emmeans(hln, pairwise ~ Genotype | Session)
The latter two only give outputs for the "main" model, though I would also like to have the output for the "zero_part" model. Is there a way to specify this in the emmeans functions? Or is there a way to create a model object for the "zero_part" model only, so I can use that as a model object in the emmeans functions?
For easiness, a code using the dataset in the Zero-Inflated and Two-Part Mixed Effects Models vignette
First, run the first code block under the semi-continuous data part of the vignette, which creates the dataset DF
.
#create categorical time variable
DF$time_categorical[DF$time<2.5] <- "early"
DF$time_categorical[DF$time>=2.5] <- "late"
DF$time_categorical <- as.factor(DF$time_categorical)
#model with interaction in fixed effects zero part and adding nesting in zero part as in model above
km3 <- mixed_model(y ~ sex * time_categorical, random = ~ 1 | id, data = DF,
family = hurdle.lognormal(), n_phis = 1,
zi_fixed = ~ sex * time_categorical, zi_random = ~ 1 | id)
#emmeans for main effects and post-hoc tests
joint_tests(km3)
emmeans(km3, pairwise ~ sex|time_categorical)
Hi Dimitris,
Thanks for fixing the previous issue so quickly. Unfortunately, I ran into another issue on a different dataset. I have reduced the problem to the repex below:
df <- readr::read_csv(here::here("data_w_error.csv"))
#> Rows: 628 Columns: 4
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (4): id, n, y1, left_cens
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df |> summary()
#> id n y1 left_cens
#> Min. : 1.0 Min. :1.000 Min. :11.01 Min. :0.00000
#> 1st Qu.:129.0 1st Qu.:1.000 1st Qu.:13.69 1st Qu.:0.00000
#> Median :258.5 Median :1.000 Median :14.09 Median :0.00000
#> Mean :260.2 Mean :1.172 Mean :14.09 Mean :0.01274
#> 3rd Qu.:394.2 3rd Qu.:1.000 3rd Qu.:14.54 3rd Qu.:0.00000
#> Max. :523.0 Max. :3.000 Max. :16.21 Max. :1.00000
df |> dplyr::group_by(left_cens) |> dplyr::tally()
#> # A tibble: 2 × 2
#> left_cens n
#> <dbl> <int>
#> 1 0 620
#> 2 1 8
# doesn't work:
m <- GLMMadaptive::mixed_model(fixed = GLMMadaptive::formula(cbind(y1, left_cens) ~ 1, type = 'fixed'),
random = GLMMadaptive::formula(~ 1|id, type = 'random'),
data = df,
family = GLMMadaptive::censored.normal())
#> Error in eigen(Y, symmetric = TRUE): infinite or missing values in 'x'
# does work:
m <- GLMMadaptive::mixed_model(fixed = GLMMadaptive::formula(cbind(y1, left_cens) ~ 1, type = 'fixed'),
random = GLMMadaptive::formula(~ 1|id, type = 'random'),
data = df |> dplyr::mutate(y1 = y1 - 0.76),
family = GLMMadaptive::censored.normal())
# doesn't work
m <- GLMMadaptive::mixed_model(fixed = GLMMadaptive::formula(cbind(y1, left_cens) ~ 1, type = 'fixed'),
random = GLMMadaptive::formula(~ 1|id, type = 'random'),
data = df |> dplyr::mutate(y1 = y1 - 0.75),
family = GLMMadaptive::censored.normal())
#> Error in eigen(Y, symmetric = TRUE): infinite or missing values in 'x'
Created on 2024-01-16 with reprex v2.1.0
Here's the stubborn dataset that yields the error: data_w_error.csv
Any idea what's going on? I noticed that Hbetas was a matrix of NaNs after the numer_deriv_vec
call.
I'm running version 0.9.2 of GLMMadaptive now on R 4.3.1.
In the mixed_model function, this early if statement:
if (family$family %in% c("zero-inflated poisson", "zero-inflated negative binomial",
"hurdle poisson", "hurdle negative binomial", "hurdle beta",
"zero-inflated binomial") && is.null(zi_fixed)) {
stop("you have defined a family with an extra zero-part;\nat least argument ",
"'zi_fixed' needs to be defined, and potentially also argument 'zi_random'.")
}
should also include the hurdle lognormal family. Otherwise omitting zi_fixed leads to this confusing error:
'data' must be of a vector type, was 'NULL'
Thanks!
Hi,
I don't have questions about this now. Thank you!
Dear Author,
thanks for creating such an amazing package!!! I started to use it and love it, especially for the mixed-effects beta regression. However, recently I could not get realistic emmeans. When I calculate a simple percentage of use a mixed effects model with gaussian family (lmer), I do get more realistic estimates. I know that the estimates supposed to be different, but 32% probability instead of 57% is a big difference, so that I am not certain about the inference. Well, the model itself is most likely fine, because when I plot model predictions, I do get realistic probabilities on the y-axis, but the emmean package does not extract them from mided_model (family = beta.fam()), but produces something bizarre. So, if I may, my question is: how get I extract the contrasts from your model, which has two categorical predictors. Here is the code I use:
library(GLMMadaptive)
gm <- mixed_model(y ~ x1 * x2, random = ~ 1 | id, data = d, family = beta.fam())
plot_model(gm, type = "int")+
scale_y_continuous(labels=scales::percent)
emmeans::emmeans(gm, pairwise ~ x1|x2, type = "response")
Thanks forward for your time!
Kind regards,
Yury
Consider the following reprex from the package article Methods for MixMod Objects
library(GLMMadaptive)
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 15 # maximum follow-up time
# we construct a data frame with the design:
# everyone has a baseline measurement, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ time, data = DF)
betas <- c(-2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
D22 <- 0.1 # variance of random slopes
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))
fm <- mixed_model(fixed = y ~ sex * time, random = ~ time | id, data = DF,
family = binomial())
Now the estimated variance-covariance matrix of the maximum likelihood estimates of random effects is returned using the vcov() method,
vcov(fm, parm = "var-cov")
#> D_11 D_12 D_22
#> D_11 0.42942062 -0.09963969 0.5065884
#> D_12 -0.09963969 0.03701847 -0.2117451
#> D_22 0.50658839 -0.21174511 1.3651870
Then that package article says about the returned value of vcov
,
The elements of this covariance matrix that correspond to the elements of the covariance matrix of the random effects (i.e., the elements D_xx) are on the log-Cholesky scale.
I am really confused by the above line. What does it mean by The elements of this covariance matrix are on the log-Cholesky scale.?
Please Note that my end goal is to get the standard errors of the estimated random effects that is,
Would you please help me to clear up the confusion and help me to reach my goal? Thanks.
nobs returns for me always the value 10, irrespective of the real size of the data, e.g.
library(GLMMadaptive)
library(DHARMa)
testData = createData(sampleSize = 200, family = poisson())
fittedModel <- mixed_model(observedResponse ~ Environment1, random = ~ 1 | group , family = "poisson", data = testData)
nobs(fittedModel)
I looked at the implementation and I'm not sure if I understand the sense of this - what's the function of the id slot? In any case, shouldn't this link to the model frame?
Line 1302 in ca5a891
Hello Sir,
I would like to try using a hurdle or censored.normal GLMMadaptive mixed model, but the help links for the package in the RStudio documentation pane return an error, and there are several documentation pages with dead links on the package website. This is just a general message to call your attention to it, since you may not be aware that the documentation is not accessible.
Thank you,
Caitlin
With example data:
DF <- I(DF)
fm1 <- mixed_model(y ~ sex * time, random = ~ 1 | id, data = DF,
family = zi.poisson(), zi_fixed = ~ sex)
Error in `[.data.frame`(data, groups) : undefined columns selected
class(DF) <- class(DF)[c(2,1)]
fm1 <- mixed_model(y ~ sex * time, random = ~ 1 | id, data = DF,
family = zi.poisson(), zi_fixed = ~ sex)
exists("fm1")
[1] TRUE
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GLMMadaptive_0.6-8
loaded via a namespace (and not attached):
[1] MASS_7.3-51.4 compiler_3.6.1 parallel_3.6.1 tools_3.6.1 yaml_2.2.0 nlme_3.1-142 grid_3.6.1 matrixStats_0.55.0
[9] lattice_0.20-38
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.