simongrund1 / mitml Goto Github PK

Tools for multiple imputation in multilevel modeling

R 97.49% CSS 2.51%

r imputation missing-data multilevel-models mixed-effects multilevel-data r-package

mitml's Introduction

mitml

Tools for multiple imputation in multilevel modeling

This R package provides tools for multiple imputation of missing data in multilevel modeling. It includes a user-friendly interface to the packages pan and jomo, and several functions for visualization, data management, and the analysis of multiply imputed data sets.

The purpose of mitml is to provide users with a set of effective and user-friendly tools for multiple imputation of multilevel data without requiring advanced knowledge of its statistical underpinnings. Examples and additional information can be found in the official documentation of the package and in the Wiki pages on GitHub.

If you use mitml and have suggestions for improvement, please email me (see here) or file an issue at the GitHub repository.

CRAN version

The official version of mitml is hosted on CRAN and may be found here. The CRAN version can be installed from within R using:

install.packages("mitml")

GitHub version

The version hosted here is the development version of mitml, allowing better tracking of issues and possibly containing features and changes in advance. The GitHub version can be installed using devtools as:

install.packages("devtools")
devtools::install_github("simongrund1/mitml")

mitml's People

Contributors

Stargazers

Watchers

Forkers

seonghobae stefvanbuuren rubenarslan tdjorgensen forrestckoch gerkovink

mitml's Issues

Potential scale reduction (Rhat, imputation phase) - What does psi implies?

Hello,

I am not sure if this is the right place to asks questions. If not, please excuse me.

I am currently using the package mitml to impute multilevel data with missings on two levels and I have a question about the summary. I was wondering if one can only try to increase convergence by increasing n.burn or if it can also cohere with the imputation model itself. However, I am not able to grasp the meaning of the values of the potential scale reduction, especially psi. Can someone maybe help me with this?

Many thanks

Cannot change to mids object.

I created a jomo imputation and then tried the following steps to convert it to a mids object using these resources.

https://rdrr.io/cran/mitml/man/mitmlComplete.html
https://rdrr.io/cran/mitml/man/mitml.list2mids.html

implist <- mitmlComplete(jomo_imputation)

This gives the error "Error in order(attr(x$data, "sort")) : argument 1 is not a vector"

mids_data <- mitml::mitml.list2mids(implist, data = original_data)

This gives the error "Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match" when fill = TRUE.

Any resources with instructions? Thank you.

`tidy` and `glance` methods

I am the developer of the modelsummary package which creates summary tables and plots for statistical models.

To extract info from fitted model, this package uses tidy and glance methods which conform to the broom standard: https://broom.tidymodels.org/

The broom standard is nice because it gives users a single unified interface to extract info from various objects. I'm opening this issue in case you think it would be a good idea to include such methods in your package. The benefit is that your objects would be automatically supported by packages like modelsummary and gtsummary and texreg. Users could then do this:

library(mitml)
library(lme4)
library(modelsummary)

data(studentratings)
fml <- ReadAchiev + ReadDis + SchClimate ~ 1 + (1|ID)
imp <- panImpute(studentratings, formula = fml, silent = TRUE, m = 24)
implist <- mitmlComplete(imp, "all")
fit <- with(implist, lmer(ReadAchiev ~ 1 + ReadDis + (1|ID)))

modelsummary(fit)

Here are minimal working methods:

tidy.mitml.result <- function(x, ...) {
    out <- testEstimates(x)
    out <- as.data.frame(out$estimates[, c(1:3, 5)])
    colnames(out) <- c("estimate", "std.error", "statistic", "p.value")
    out$term <- row.names(out)
    return(out)
}

glance.mitml.result <- function(x, ...) {
    data.frame(nimp = length(x),
               nobs = nobs(x[[1]]))
}

fixed-effect estimate p-values in testEstimates() are from one-sided tests

For example, if model_list is a list of merMod objects from multiply imputed datasets, then extracting pooled p-values of fixed-effect estimates using mice and mitml gives different results:

require(mice)
require(mitml)
mice_pval <- summary(mice::pool(mice::as.mira(model_list), method='other'))[ , 'Pr(>|t|)']
mitml_pval <- mitml::testEstimates(model_list)$estimates[ , 'p.value']

cbind(mice_pval, mitml_pval, mitml_pval*2)

Looking through the testEstimates() function, it's clear that p has not been multiplied by two:

p <- 1 - pt(abs(t), df = v)

thus giving an (undirected) one-sided test. Shouldn't this be

p <- 2*(1 - pt(abs(t), df = v))

instead?

Error in mitmlComplete() with single missing value in categorical variables

When there is only a single missing value in a categorical variable, mitmlComplete() gives the following error:

Error in `[<-.data.frame`(`*tmp*`, ind[fi, ], value = "1") :
 new columns would leave holes after existing columns

D4 test for geneal linear models fitted with `nlme::gls()`

Since the intercept-only mixed model can be represented by a general linear model with a compound symmetry covariance structure, the pooled LRT results from both models should be the same. However, when I use testModels function for pooling the LRTs from general linear models, it does not match with the pooling results from corresponding mixed models. Below is the code, where the mixed model is fitted using lme4::lmer function and the general linear model is fitted using nlme:gls function.

library(mitml)
library(nlme)
library(lme4)
data(studentratings)

fml <- ReadDis + SES ~ ReadAchiev + (1|ID)
imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5)

implist <- mitmlComplete(imp)

fit1 <- with(implist, lmer(ReadAchiev ~ ReadDis + (1|ID), REML = FALSE))
fit0 <- with(implist, lmer(ReadAchiev ~ (1|ID), REML = FALSE))

fit1_gls <- with(implist, gls(model = ReadAchiev ~ ReadDis, method = "ML",
                          correlation = corCompSymm(form = ~ 1 | ID)))
fit0_gls <- with(implist, gls(model = ReadAchiev ~ 1, method = "ML",
                          correlation = corCompSymm(form = ~ 1 | ID)))

testModels(fit1, fit0, method = "D4", ariv = "positive")
testModels(fit1_gls, fit0_gls, method = "D4", ariv = "positive", data = implist)
# **results do not match**

Note that: the individual completed-data results are the same, as they should be. You can check, for each completed dataset:

# For the 1st completed datasets:
summary(fit1[[1]])
summary(fit1_gls[[1]])
anova(fit1[[1]], fit0[[1]])
anova(fit1_gls[[1]], fit0_gls[[1]])

In addition:

I downloaded the source code for testModels with all its dependencies. Then I rename it to testModels2 and pool the same models.

source("testModels2-with-its-dependencies.R")  # "testModels2-with-its-dependencies.txt" is attached
testModels2(fit1, fit0, method = "D4", ariv = "positive")
testModels2(fit1_gls, fit0_gls, method = "D4", ariv = "positive", data = implist)

Surprisingly, they resulted in a similar output. But it does not seem to be a good result, since the pooled result does not go with the m individual completed data results. Could someone please explain why is this happening?

testModels2-with-its-dependencies.txt
.

Error in testModels (Error in solve.default(Ubar) : 'a' is 0-diml)

First, thanks so much for your work on mitml! I've been waiting for someone to make pan usable for a long time and to implement multiple df tests for MI.

Unfortunately, I'm getting an error with testEstimates, R output is below. Any thoughts would be greatly appreciated! Thanks, Lee

imp.test <- with(implist, lmer(cbcl_ext_raw ~ Time +(1 + Time|ChildID), REML=F))
imp.g2.ext <- with(implist, lmer(cbcl_ext_raw ~ Time + female + Time:female + (1 + Time|ChildID), REML=F))

testEstimates(imp.test , var.comp=TRUE)

Call:

testEstimates(model = imp.test, var.comp = TRUE)

Final parameter estimates and inferences obtained from 20 imputed data sets.

             Estimate Std.Error   t.value        df   P(>|t|)       RIV       FMI 
(Intercept)     8.808     0.448    19.664   117.838     0.000     0.671     0.411 
Time           -0.123     0.041    -2.983    65.741     0.004     1.163     0.551 

                             Estimate 
Intercept~~Intercept|ChildID   25.031 
Intercept~~Time|ChildID        -0.178 
Time~~Time|ChildID              0.063 
Residual~~Residual             27.086 
ICC|ChildID                     0.480 

Unadjusted hypothesis test as appropriate in larger samples. 

testEstimates(imp.g2.ext , var.comp=TRUE)

Call:

testEstimates(model = imp.g2.ext, var.comp = TRUE)

Final parameter estimates and inferences obtained from 20 imputed data sets.

             Estimate Std.Error   t.value        df   P(>|t|)       RIV       FMI 
(Intercept)     9.206     0.544    16.919   133.098     0.000     0.607     0.387 
Time           -0.147     0.053    -2.782    75.783     0.007     1.003     0.513 
female         -0.828     0.645    -1.284   182.498     0.201     0.476     0.330 
Time:female     0.051     0.068     0.755   104.853     0.452     0.741     0.436 

                             Estimate 
Intercept~~Intercept|ChildID   24.591 
Intercept~~Time|ChildID        -0.156 
Time~~Time|ChildID              0.062 
Residual~~Residual             27.073 
ICC|ChildID                     0.476 

Unadjusted hypothesis test as appropriate in larger samples. 

testModels (imp.test,imp.g2.ext)
Error in solve.default(Ubar) : 'a' is 0-diml

mitml package

I already installed the package but I keep on having this message "could not find function "mitml" how can i fix that

Error message with spatial regression (spml)

Dear Simon Grund,

Please see the example below with a reproducible example.
I try to use mitml with a standard spatial regression spml and receive an error message when trying to run fit1 with formula fm1 which contains an imputed data for unemp1.
If I run fit1 with formula fm2 which does not contain an imputed data, I do not receive an error message.

Ran

# a reproducible example
library(splm)
library(Ecdat)
library(spdep)
library(mitml)


data("Produc", package = "Ecdat")
data("usaww")

#create data with missing valuese
usalw <- mat2listw(usaww)
Produc$unemp1<-Produc$unemp
Produc$unemp1[Produc$unemp>6.2]<-NA #to produce missing observations
usalw=as.matrix(usaww) #spml neads a weighted matrix

#create multiple imputation in variable unemp1
fml<-unemp1~hwy+water+ (1|state)
imp <- panImpute(Produc, formula=fml, n.burn=1000, n.iter=100, m=5)
implist <- mitmlComplete(imp, print=1:5)

fm <- log(gsp) ~ log(pcap) + log(pc) + log(emp)
fm1<- log(gsp) ~ log(pcap) + log(pc) + log(emp)+unemp1
fm2<- log(gsp) ~ log(pcap) + log(pc) + log(emp)+pc 

# using fm1 below will get an error massage in fit1 below
# using fm2 INSTEAD of fm1 you will not recieve an error massage in fit1, however
# this formula does not contain imputation
fit0 <- with(implist, spml(formula = fm, data = Produc, index = NULL,
                           listw = usaww, lag = TRUE, spatial.error = "b", model = "within",
                           method = "eigen", na.action = na.fail))

fit1 <- with(implist, spml(formula = fm1, data = Produc, index = NULL,
                           listw = usaww, lag = TRUE, spatial.error = "b", model = "within",
                           method = "eigen", na.action = na.fail))

testEstimates(fit1)
testModels(fit1, fit0)

Error when viewing output from testEstimates function

This may or may not be an error related to your package, but I receive the error: "Error in fmt[isLarge] <- sub(paste0(postfix, "$"), "e", fmt[isLarge]) : NAs are not allowed in subscripted assignments" when running the final line of code in this example to undertake mediation analysis on multiply imputed datasets. Are you able to clarify for me whether it is an issue with the package?

`jomoImpute` removes `.Random.seed`

The .Random.seed is removed by jomoImpute:

type <- c(-2,0,0,0,0,1,3,1,0,0)
names(type) <- colnames(studentratings)
imp <- jomoImpute(studentratings, type=type, n.burn=100, n.iter=10, m=5)
.Random.seed
Error: object '.Random.seed' not found

It happens also after first running set.seed(1). Cannot see why we would want this to happen.

Estimates and Integration with lme4 package

I have been using the mitml package to calculate variance explained for a set of multilevel models, but noticed a few issues:

When specifying the model using lme4, the estimates are different from when I specify the model using nlme. Specifically, the RB2 calculation changes depending on which package I use.
When I run the function "multilevelR2" for my model that was specified using lme4 I get the following error: Error in multilevelR2(h1.1):Calculation of R-squared statistics not supported for models of class.

feature request: estimated marginal means

As the title implies, I'm interested in calculating marginal means from the results of my pooled multilevel models. If would be wonderful to either see this functionality added to the mitml package, or have the ability to transform the results from testEstimates into an lmer object or something else that can be recognized by packages like emmeans.

glmmTMB: Error when pooling models

When I try to compare fitted glmmTMB models on multiply imputed data (through MICE), I get an error when trying to use the pooled Wald test. There seems to be a problem when parameter estimates and the variance/covariance matrix are extracted:
amices/mice#435

I get the following error when trying to use anova() or D1() from the mice package on the mira objects.

all(p == dim(Uhat[[1]])) is not TRUE

I guess a workaround similarly to the one implemented for polr is necessary?

Print testEstimates to more than 3 decimal places

Hi there - I'm wanting to apply FDR correction to the p-values extracted from mitml. In order to do this, I need greater precision in the p-values printed from testEstimates. I've tried a few workarounds (with my limited programming knowledge) but none have worked for me. I wonder if this is possible? Any help here would be really appreciated.

Thanks,
Louise

Obtaining confidence intervals after multilevel model

Hi,

I'm running a multiple imputation in a three-level imputation model. The code works well and I'm able to obtain reasonable pooled estimates using testEstimates.

However, I get an error when using confint to obtain 95% confidence intervals:
Error in UseMethod("vcov") :
no applicable method for 'vcov' applied to an object of class "c('mitml.result', 'list')"

Any idea how to solve this?

Best wishes,
Sebastián Peña
National Institute for Health and Welfare, Finland

anova doesn't works with coxph function

Good Morning,

I try your function anova.mitml.result with two Cox models (one empty and one with 1 variable), but the function doesn't work. Can you solve this bug

A simple exemple with the data lung from the package survival

library(Amelia)
library(mitml)
library(survival)
data(lung)

lung_mi<-amelia(lung,noms=c(1,3,5,6))
lung_mi2<-amelia2mitml.list(lung_mi)

fit0<-with(lung_mi2,coxph(Surv(time,status)~1))
fit1<-with(lung_mi2,coxph(Surv(time,status)~meal.cal))
anova.mitml.result(fit1,fit0)

Error in max(df) : invalid 'type' (list) of argument

Thank you for advance

Thank you for your work! it will be very usefull in my job !

feature request: mitml.list2mids and mids2mitml

It would sometimes be helpful to be able to switch back and forth between mitml.list, mitml, and mids objects. mitml.lists are most helpful for post-processing using dplyr, but many more out-of-the-box diagnostics can be used on mitml and mids objects.

Parallel computation of multiple imputations by using mitml

Dear Simon,

I was wondering if there is any possibility to run multiple imputation using jomoImpute with parallel processing like futuremice in the mice package does. This would speed up my imputation process.

Thank you very much in advance
Sophie

Obtain original row order in the imputed data

The panImpute function resorts columns and rows. I am trying to figure out how to get the original order, but thus far no luck. What would be the advised way to do this?

library(mitml)
library(mice)

head(nhanes)
type <- c(1, 1, -2, 1)

obj <- panImpute(data = nhanes, type = type, m = 1)

# the object returned by panImpute resorts rows and columns
cmp1 <- mitmlComplete(obj, print = 1)
head(cmp1)

# try 1: get original row and column order
cmp2 <- cmp1[order(attr(obj$data, "sort")), names(nhanes)]
head(cmp2)

# try 2: get original row and column order
cmp3 <- cmp1[attr(obj$data, "sort"), names(nhanes)]
head(cmp3)

simongrund1 / mitml Goto Github PK

mitml's Introduction

mitml

Tools for multiple imputation in multilevel modeling

CRAN version

GitHub version

mitml's People

Contributors

Stargazers

Watchers

Forkers

mitml's Issues

In addition:

Recommend Projects

Recommend Topics

Recommend Org