Giter VIP home page Giter VIP logo

mice's People

Contributors

andland avatar bcjaeger avatar bfgray3 avatar bgall avatar cjvanlissa avatar clbustos avatar dependabot[bot] avatar dnzmarcio avatar edbonneville avatar edoardocostantini avatar gerkovink avatar hadley avatar hanneoberman avatar jeroen avatar kkleinke avatar lukaswallrich avatar michaelchirico avatar mingyang-cai avatar mmaechler avatar prockenschaub avatar ralayax avatar rasel-biswas avatar rianneschouten avatar shahabjolani avatar stefvanbuuren avatar stephematician avatar thomvolker avatar vincentarelbundock avatar vkhodygo avatar wibeasley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mice's Issues

don't impute censored values

Hello,
I am imputing time-varying covariates for a survival analysis.
I set my data in a wide format. Is there a way to not impute censored values?

Thanks!

handling of discrepancy between predictorMatrix and method in v3.0

Had been getting this error on mice() calls in lots of existing code:

Error in reformulate(setdiff(vars, j), response = j) : 
  'termlabels' must be a character vector of length at least one

I finally determined that it arises when you pass a non-empty imputation method for a column with no missing values.

# how many missings in each variable?

lapply(nhanes, function(.x){sum(is.na(.x))})

# runs

mice(nhanes2,
     method = c('sample', 'pmm', 'logreg', 'norm'))

# throws error

mice(nhanes2,
     predictorMatrix = quickpred(nhanes2),
     method = c('sample', 'pmm', 'logreg', 'norm'))

# runs if I set method vector to empty string for age, which has no missing values

mice(nhanes2,
     predictorMatrix = quickpred(nhanes2),
     method = c('' ,'pmm' ,'logreg', 'norm'))

If I am understanding issue correctly:

Passing a non-empty imputation method for a variable that had no missing values was OK in previous versions of the package, so perhaps worth noting in the documentation for method arg that it now MUST agree.

Or perhaps a check for this prior to imputation that provides informative error message?

Function `md.pattern` does not work with character variables

Hi Stef,

thanks for creating the mice package. Great work, great documentation!

When working with mice using md.pattern to inspect my data, I run into a problem which is due to character columns in my data. Here is a stand alone xample based on the iris data set:

First everything is fine and as expected:

# load package
library(mice)

# Load iris data to Global Env.
data(iris)

# Generate missing Values
iris$Species[1:5] <- NA
iris$Sepal.Length[4:10] <- NA
head(iris, n = 11)


# Use `md.pattern` to inspect missing data
md.pattern(iris)  # everything is fine and as expected

Problems arise when the factor variable Species is transformed into character:

# Convert factor variable to character
iris$Species <- as.character(iris$Species)
table(iris$Species, useNA = "a")  # missing are still there
md.pattern(iris)  # generates misleading output with non-informative warning

The unwanted behavior regarding character variables is due to the x <- data.matrix(x) in line 57 of https://github.com/stefvanbuuren/mice/blob/master/R/md.pattern.r. data.matrix converts according its documentation all values into numeric which of course is not meaningful for characters but works for factors.

Therefore I would like to suggest that md.pattern either produces an warning when ``character` variables are used in the input data.frame and converts them automatically into factors using for example the following code:

# Suggestion for line 56 in md.pattern.r
if (is.data.frame(x)) {
  if(any(sapply(x), is.character)){
    x[sapply(x, is.character)] <- lapply(x[sapply(x, is.character)], factor)
    warning('Columns of class `character` transformed into `factor`')
  }
  x <- data.matrix(x)
}

Thanks in advance!

For the sake of completeness:

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mice_2.46.0     lattice_0.20-35

loaded via a namespace (and not attached):
 [1] MASS_7.3-48     compiler_3.4.3  Matrix_1.2-12   tools_3.4.3     survival_2.41-3 nnet_7.3-12     yaml_2.1.14     Rcpp_0.12.12   
 [9] rpart_4.1-12    splines_3.4.3   grid_3.4.3    

rbind(mids, not-mids) and complete()

The rbind() function seems to incorrectly record the where component when the second argument is not a mids object. Here's a fairly small working example:

I make a small data.frame which has some missing values in the first column. I take the first five rows and impute missing data using these rows only. I then take the last five rows of the original data.frame, rbind them to the mids object, and call complete(). Comparing the output of complete to the original data, I see that some of the data in the last five rows have been spuriously replaced.

x <- rnorm(10)
D <- data.frame(x=x, y=2*x+rnorm(10))
D[2:4, 1] <- NA
D

            x          y
1  -0.5022894 -0.6508769
2          NA -3.6318003
3          NA  1.5217955
4          NA -0.3743209
5  -0.8211472 -2.2264775
6  -1.2333469 -2.8954239
7   1.1697205  2.9399150
8   0.3635420  1.2758400
9   0.1315395 -0.1022368
10 -1.4950373 -2.6565171

D_mids <- mice(D[1:5,])
D_rbind <- mice:::rbind.mids(D_mids, D[6:10,])

complete(D_rbind, 1)
            x          y
1  -0.5022894 -0.6508769
2  -0.5022894 -3.6318003
3  -0.8211472  1.5217955
4  -0.8211472 -0.3743209
5  -0.8211472 -2.2264775
6  -1.2333469 -2.8954239
7  -0.5022894  2.9399150
8  -0.8211472  1.2758400
9  -0.8211472 -0.1022368
10 -1.4950373 -2.6565171

D
            x          y
1  -0.5022894 -0.6508769
2          NA -3.6318003
3          NA  1.5217955
4          NA -0.3743209
5  -0.8211472 -2.2264775
6  -1.2333469 -2.8954239
7   1.1697205  2.9399150
8   0.3635420  1.2758400
9   0.1315395 -0.1022368
10 -1.4950373 -2.6565171

Upon debugging - it seems that the source of the problem is that where is not adjusted to the new size of the data.

> mice:::rbind.mids
function (x, y = NULL, ...) 
{
    call <- match.call()
    if (!is.mids(y)) {
        ...
        data <- rbind(x$data, y)
        where <- x$where
        ...
    } ...
}

Get at the final model used in the MICE iterations?

Dear Stef (et al),
this is not a bug report, but a public "request" for advice.

Context: We use mice on medium sized data set of Swiss meteo and bio data, several locations, species, etc. and mainly need to impute one Y variable (which however is also used in lagged form as predictor) in a linear regression model. Imputations work fine (using "ppm" and default least squares regression ((though a perfect model would take into account that errors seem to be more heavy tailed than the Gaussian, and in an ideal world we would use robust regression (e.g. as in robustbase:: lmrob()).

To assess the imputations we would like to compare the empirical distribution of the several imputed values with a hypothesized Gaussian of "known" (mu, sigma) = (x' \beta, \sigma) and hence would want to find (\beta, \sigma) from the regression model that was used in mice (but possibly fitting \beta,\sigma using different data, e.g., in a missingness-simulation fit it to the full (nonmissing) data).
Our problem is that the mice.impute.() functions which mice() works with all do not keep the parameters of the models used, but only return the predicted values - which is perfect for what they are designed to do, but leaves us without a clue about how the final model looked like.

What do you propose?
I assume others have had related wishes in the past, and there already is a perfect solution?

cbind.mids command

Hi,

I´m very new at imputation in R, and the commands cbind.mids and rbind.mids do not seem to be working. I get the commands cbind.data.frame, CBind and cbind2, but not cbind.minds. Is this because I have an older version of MICE? Or is it something I´m doing wrong?
This is the command I use:

library(mice)
cbind.mids(imp, add_weight)

and this is the error-message I get:
Error in cbind.mids(imp, add_weight) :
could not find function "cbind.mids"

Best regards,
Synnøve

Multiple executions provide different outcomes -- Inconsistency of example of pg 92

Hi,
I am using "mice" package to impute the missing values of a large dataset and I've noticed that multiple executions of mice() function generate significantly different outcomes. I tried to investigate whether that is ok or not and noticed that in page 92 there is an example where two consecutive executions provide the same outcomes, more precisely:

imp1 <- mice(nhanes,maxit=1)
imp2 <- mice.mids(imp1)

yields the same result as

imp <- mice(nhanes,maxit=2)

for example:

> imp$imp$bmi[1,]

1 2 3 4 5

1 30.1 35.3 33.2 35.3 27.5

> imp2$imp$bmi[1,]

1 2 3 4 5

1 30.1 35.3 33.2 35.3 27.5

However, when I type the same commands I get

imp$imp$bmi[1,]
1 2 3 4 5
1 29.6 28.7 29.6 29.6 22.5
imp2$imp$bmi[1,]
1 2 3 4 5
1 29.6 33.2 30.1 22.7 25.5

So is the pdf documentation updated and if so can you please expla\in why I get different results -- though I've downloaded the latest version?

Thank you in advance!

Best,
Maria

Pooled logistic regression with MICE - unable to obtain confint

Hi there

I have been using MICE to generate imputed datasets for pooled logistic regression. I am trying to output odds ratio and 95% confidence intervals. The example code below was working until yesterday when I updated my packages. Unfortunately, re-installation did not fix the issue and I am not sure where I am going wrong.

`require(mice)
set.seed(123)

nhanes$hyp <- as.factor(nhanes$hyp) #Hypertension as factor

#Can run logistic regression on (non-imputed) dataset
model <- glm(hyp ~ bmi, family=binomial(link='logit'), nhanes)
model_or <- exp(cbind(OR= coef(model), confint(model))) #Output with odds ratio and 95% CI

#Using the nhanes dataset as an example
imputed_data <- mice::mice(nhanes, m=25, method="pmm", maxit=10, seed=12345)
imputed_model <- with(imputed_data, glm(hyp ~ bmi, family=binomial(link='logit')))
imputed_model_summary <- (summary(pool(imputed_model)))
imputed_model_OR <- exp(cbind(imputed_model_summary[,“est”],imputed_model_summary[,“lo 95”],imputed_model_summary[,“hi 95”])) #exponentiate the coefficient and the lower and upper 95% CI
imputed_model_summary <- (cbind(imputed_model_OR,imputed_model_summary)) #Combine into a single table

#No longer able outputting 95% CI`

Is there a way of outputting 95% CI from the pooled output? Many thanks for any assistance.

Andrew

Error when imputing with predictor matrix after update on mice 3.0.0

Hi!

I use R 3.5 andf mice 3.0.0 to impute a subset of variables in my data based on a subset of variables.(these variables are all numeric). I create a predictor matrix and impute like this (reproducible example with nhanes):

predMatrix <- matrix(rep(0, 4*4), ncol=4)
predMatrix[c(1:2), c(1:2, 4)] <- 1
diag(predMatrix) <- 0 
nhanes.miced <- mice(nhanes, m=30, maxit=30, predictorMatrix = predMatrix, seed=2016)

This approach worked well. After updating mice to v 3.0.0 along with a number of other packages I get a strange error:

Error in reformulate(setdiff(vars, j), response = j) : 
'termlabels' must be a character vector of length at least one

When I switch back to mice version 2.3, the error changes to.

Error in .subset(x, j) : invalid subscript type 'list'

But the code was executable with mice 2.3. Could the problem stem from a package that was also updated and which mice is depending on? Sadly, I have no idea which (other) packages were updated from which version.

I would appreciate any hint,
kind regards, Uwe Remer

University of Stuttgart
Institute for Social Sciences

Parabolic minimum/maximum in impute.quadratic

Hi,

we are currently doing a group presentation on Polynomial Combination and are quite certain that in line 100 of mice.impute.quadratic the term should be y.min <- -b1 / (2 * b2) instead of y.min <- -b1 / 2 * b2.

Kind regards,
Sarem

Pool for cox hazard model

Thank you for great package, mice.

In cox proportional hazard model pooling,
the result showed est was less than zero.

It might be confused with linear regression model.

How is the best method to pool with cox proportional hazard model?

library(mice)
library(survival)

#Making database
Tempmice<-mice(imputedmodel,
m=10,maxit=50,meth='pmm', seed=500)

#Cox proportional hazard model
CoxModel.1<- coxph(Tempmice,Surv(FU, Censor==1)~ Factor1+Factor2, method="breslow")

#Pool
fit<-with(data=Tempmice,CoxModel.1)
coxphimputed<-summary(pool(fit))
coxphimputed

Create new variable after imputation

This is a mail I got from Tobias Rolfes:

Datum: 20 mei 2017 15:48:53 GMT+5:30
Onderwerp: Mice: Create new variable after imputation

Hello Stef,

Thank you very much for creating such an useful package for multiple imputation.

Currently, I am facing the problem that I want to create a new variable after calculating imputations (e.g., sum scores of items) and calculate regressions with the new variabel. However, when I am doing so (cf., programm code below), the originally missing cases are deleted in the regression due to missings. Do you have an idea how I can solve the problem?

library(mice)
# Generate Data
A <- c(1, 2, 1, NA, 3, 4, 1, 2, 3)
B <- c(2, 3, 2, 3, 4, 4, 1, 2, NA)
C <- c(3, 4, 2, 3, 4, 4, 1, 3, 4)
Data <- data.frame(A,B,C)
# Imputation
imp <- mice(Data, method = "norm", m = 5, maxiter=1)
# Convert to Long
long <- complete(imp, action='long', include=TRUE)
# Generate new variable
long$newvar <- long$B
# Convert back to Mids
imput.short <- as.mids(long)
# Calculate Regression
RegModell0 <- with(imput.short,lm(C ~ A + newvar))
summary(RegModell0)

Many thanks in advance for your answer.

Best,
Tobias

`NA` in `nmis` column in pooled summary table

library("mice")
imp <- mice(nhanes2, print = FALSE)
cmp <- complete(imp)

fit <- with(imp, lm(bmi~age+hyp+chl))
summary(pool(fit))

The column nmis contains NA for hyp2. However, hypertension in the data has 8 missings, which seems inconsistent.

Request: make predictions possible after using pool()

When using mice multiple imputation to generate multiple imputed sets and then running a model on each one of them (like elastic-net from the glmnet package) and then pooling the estimates to a single set - its not possible to use predict.glmnet() (or predict()) on the object that comes out of mice::pool().
The prediction functions expect a fitted model object while mice::pool() returns

An object of class mipo, which stands for 'multiple imputation pooled outcome'.

Is it possible to add the ability to make predictions from the return object of mice::pool()?

Error: 'densityplot' is not an exported object from 'namespace:mice'

Dear Stef,

As I can see many before me have already said: Thank you very much for providing the community with this indispensable package for R.

I am experiencing an issue, though, with the densityplot-function, which I am unable to resolve:

When calling mice::densityplot or mice::densityplot.mids I get the following error:
Error: 'densityplot' is not an exported object from 'namespace:mice'

I have tried updating to the Development version (2.37?) but the problem remains.

Let me know if you need any more info on this.

Best regards,
Mikkel

Asking for method = "ppm" gives troubles

The problem appears to be in the check.method function. The following error is returned when I call for mice(boys, method = "ppm")

Error in check.method(setup, data) :
The following functions were not found: mice.impute.ppm, mice.impute.ppm, mice.impute.ppm, mice.impute.ppm, mice.impute.ppm, mice.impute.ppm, mice.impute.ppm, mice.impute.ppm

cran installation has non 0 exit status

Hello,
I tried to installl mice from cran and got the error message below.
Many thanks for the package, the github version works fine.
Best regards,

  • installing source package ‘mice’ ...
    ** package ‘mice’ successfully unpacked and MD5 sums checked
    ** libs
    g++ -I/usr/local/lib/R/include -DNDEBUG -I"/home/djj/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include" -I/usr/local/include -fpic -g -O2 -c RcppExports.cpp -o RcppExports.o
    g++ -I/usr/local/lib/R/include -DNDEBUG -I"/home/djj/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include" -I/usr/local/include -fpic -g -O2 -c match.cpp -o match.o
    g++ -shared -L/usr/local/lib -o mice.so RcppExports.o match.o Welcome to R! Goodbye!
    g++: error: Welcome: No such file or directory
    g++: error: to: No such file or directory
    g++: error: R!: No such file or directory
    g++: error: Goodbye!: No such file or directory
    /usr/local/lib/R/share/make/shlib.mk:6: recipe for target 'mice.so' failed
    make: *** [mice.so] Error 1
    ERROR: compilation failed for package ‘mice’
  • removing ‘/home/djj/R/x86_64-pc-linux-gnu-library/3.4/mice’

The downloaded source packages are in
‘/tmp/Rtmpgam70t/downloaded_packages’
Warning message:
In install.packages("mice") :
installation of package ‘mice’ had non-zero exit status

mice crashes when number of identifiers (id variable) is too big in a long format dataset

I have a long format dataset with 50,000 individuals and 687174 observations. When the id variable is a factor I get this error:

imp <- mice(ex, maxit = 0)
Error in matrix(rep(predictorMatrix[, j], times = n.dummy), ncol = n.dummy) : 
  long vectors not supported yet: memory.c:1668

Is there another way to include the id variable as a fixed predictor?
Thanks!

lattice generics not properly imported

mice does not import properly the generic lattice functions densityplot(), xyplot(), stripplot() and histogram(), so the user may find error densityplot not found. A temporary fix is

library("lattice")

This need to be fixed.

Error with pool() from mice v.3.0

With pool() from 'mice' v. 2.46 the following line worked just fine:
pool(as.mira(lmerMod.list))
[This pooled the results from a list of 10 lmer models of imputed data.]

With 'mice' v. 3.0, pool() gives the following error message
Error: Columns 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 must be named

The message is identical for
pool(lmerMod.list)

Any ideas?

Bug in `as.mids()`

I think there is a bug in as.mids() when initializing the mids object:

ini <- mice(data[data[, .imp] == 0, -c(.imp, .id)],
    m = max(as.numeric(data[, .imp])),
    maxit = 0)

The call to max(as.numeric(data[, .imp])) counts one imputation too much, if the original data is identified by .imp=0.
I use R 3.1.1.

mice impute missing values? Output of complete always has NAs.

I am trying to impute values using a linear model using mice. My understanding of mice is that it iterates over the rows. For a column with NAs it is using all other columns as predictors, fits the model, and then samples from this model to fill up the NAs.
Here is an example where I generate some data, and than introduce missing data using ampute.

    n <- 100
    xx<-data.frame(x = 1:n + rnorm(n,0,0.1), y =(1:n)*2 + rnorm(n,0,1))
    head(xx)
    res <- (ampute(xx))
    head(res$amp)

The missing data looks like:

            x         y
   1       NA  3.887147
   2 2.157168        NA
   3 2.965164  6.639856
   4 3.848165  8.720441
   5       NA 11.167439
   6       NA 12.835415

Then I am trying to impute the missing data:

   mic <- mice(res$amp,diagnostics = FALSE )

And I would expect that then there is non, but there are NA always in one of the columns.

 colSums(is.na(complete(mic,1)))

And in which of the two it is rather random.

By running the code above I am getting:

 > colSums(is.na(complete(mic,1)))
  x  y 
  0 30 

but also :

 > colSums(is.na(complete(mic,1)))
  x  y 
 33  0 

Support for data without any missing values in md.pattern()

Dear @stefvanbuuren, thanks for maintaining and developing this package. When a dataset does not contain missing values, it seems that the first column is missing.

library(mice)
#> Lade nötiges Paket: lattice
md.pattern(mtcars)
#>      mpg cyl disp hp drat wt qsec vs am gear carb  
#> [1,]   1   1    1  1    1  1    1  1  1    1    1 0
#> [2,]   0   0    0  0    0  0    0  0  0    0    0 0

However, my primary interest was to use md.pattern() to analyze missing patterns in a data frame with both numeric and non-numeric columns, which does not give the expected result.

mtcars$disp <- rep(letters, 2)[1:32]
mtcars$disp[c(1, 5, 23)] <- NA
md.pattern(mtcars)
#> Warning in data.matrix(x): NAs durch Umwandlung erzeugt
#>      mpg cyl hp drat wt qsec vs am gear carb disp   
#> [1,]   1   1  1    1  1    1  1  1    1    1    0  1
#> [2,]   0   0  0    0  0    0  0  0    0    0   32 32

I see you are using model.matrix(x) in the source code pretty high-up in the call, which I think is where the unexpected behaviour is triggered, but I don't see why md.pattern() should be restricted to numerical data. Thanks.

ignore files + build issues

I don't think NAMESPACE should be in the list of files to ignore, since it's an essential part of the package.

Also, some of the other project files might be important for building. I also pushed some changes to Description to quiet warnings (errors?) from the build system.

My install did not rebuild the documentation; info on the tool chain would be useful

Non-reproducibility and failed imputations between versions

Hello,

I recently tried to reproduce results from code I wrote several months ago, and I've run into some issues, primarily that mice isn't imputing any of the missing values I want.

I wanted to raise this as an issue because I had no trouble with these imputations using an earlier version (in the summer of 2017 - I am not sure of the version), and I used them in a prediction analysis to get very reasonable validation scores, so there wasn't anything wrong with the imputations.

The code runs, doesn't give any errors, but none of the imputed datasets now have any of the NAs filled in. This is somehow an issue involving the dataset, as I've tried running examples with mice using other datasets like nhanes, and they work fine.

The only clue I have is from loggedEvents, the results of which I pasted below.

 it im co dep meth out
1   1  1 19  19  pmm   4
2   1  1 26  26  pmm  29
3   1  1 27  27  pmm  30
4   1  2 19  19  pmm   4
5   1  2 26  26  pmm  29
6   1  2 27  27  pmm  30
7   1  3 19  19  pmm   4
8   1  3 26  26  pmm  29
9   1  3 27  27  pmm  30
10  2  1 19  19  pmm   4
11  2  1 26  26  pmm  29
12  2  1 27  27  pmm  30
13  2  2 19  19  pmm   4
14  2  2 26  26  pmm  29
15  2  2 27  27  pmm  30
16  2  3 19  19  pmm   4
17  2  3 26  26  pmm  29
18  2  3 27  27  pmm  30
19  3  1 19  19  pmm   4
20  3  1 26  26  pmm  29
21  3  1 27  27  pmm  30
22  3  2 19  19  pmm   4
23  3  2 26  26  pmm  29
24  3  2 27  27  pmm  30
25  3  3 19  19  pmm   4
26  3  3 26  26  pmm  29
27  3  3 27  27  pmm  30

(The numbers in dep and out are the column names, which I've anonymized as numbers - dataset itself is attached. Not all columns were imputed, the first 3 and last 6 in particular were left out.)
I've read the documentation, and I don't entirely understand what this output means, but I thought it might be elucidating for the authors and others. What does seem to be the case is that there are a few problem columns, though this looks like it is a only a few, not all of them, and again, this wasn't an issue previously - all columns have successfully imputed.

I used the randomForest method originally, though changing to pmm or other methods makes no difference, the values remain NA. I'm no expert in multiple imputation, but I'm quite baffled.

Thanks for any help you can offer!

mice-ex.zip

Predictive mean matching for skewed variables

This is a mail I got from Dianne Venneker:

Datum: maandag 22 mei 2017 16:46
Onderwerp: Question missing data

Dear Prof. Dr. Stef van Buuren,

I am currently working on a dataset with a lot of missing values.
As a imputation method, I used predictive mean matching.
Some of the variables are highly skewed and I therefore need to perform a log transformation. I am however unsure whether I have to do these transformations before or after the imputation. Do you perhaps have some advice on this matter?

Best wishes,
Dianne Venneker

Empty method leaves missing values in imputed datasets

Dear all,

When excluding a variable from imputation by specifying an empty method, mice 2.30 used to either give an error, that the variable is used for prediction, or run without issues. In 2.46, the imputation is performed, but variables still have missing values. Is this expected behavior, should the imputation be complete or should there be an error or warning?

I have a dataset, which previously was imputed without issues in 2.30 and in 2.46 now only has about half of the rows completed, unless I use the default imputation method. This issue can be reproduced using the example dataset in 2.46, see below. A difference between the example and my dataset is, that nhanes will give an error in 2.30, but no error occured in my dataset and imputation was without issues.

Example code in 2.46:

 library(mice)

# Multiple imputation with chl excluded from imputation
imp <- mice(nhanes, method = c("","pmm","pmm",""))
table(is.na(complete(imp)))

FALSE  TRUE 
   76    24 

Enhance reporting tools for glm.mids objects?

Hi all,

Forgive me if this is not the right forum to voice a suggestion. I am new to Github, but an avid useR, moderately familiar with the mice package, and less so multiple imputation in general.

I am interested in extending the S3 object system to improve reporting and programming of multiply imputed GLM analyses. Some S3 methods which I find are useful are: confint and anova. An example of such an implementation, simply using the output from summary, is:

## a confidence interval method for multiply imputed, pooled GLMs
confint.mipo <- function(object, parm, level=0.95, ...) {
  x <- summary(object)
  if (missing(parm))
    parm <- rownames(x)
  b <- x[parm, 'est', drop=F]
  s <- x[parm, 'se']
  lims <- c((1-level)/2, 1-(1-level)/2)
  sweep(x = s %o% qnorm(lims), STATS = b, MARGIN = 1, FUN = '+')
}

Error in mice(nhanes, method="myfunc")

Dear @stefvanbuuren,

I tried to rerun the script you attached to the mice paper (https://www.jstatsoft.org/article/view/v045i03) in JStatSoft but ran into the following error in line 740 of the file v45i03.R: mice(nhanes, method="myfunc")
Error in check.method(setup, data) : The following functions were not found: mice.impute.myfunc, mice.impute.myfunc, mice.impute.myfunc

Do you have an idea?

Thanks in advance

Markus Konkol
http://o2r.info/
Institute for Geoinformatics
Münster, Germany

Strange behaviour with CART on dataset of two columns

Hi Stef, hi Karin,

I'm a Ph.D. student at the University of Michigan and I'm trying to run a simulation on imputing missing data in a minimal setting (2 variables with missing values).

When I'm using meth = "cart" I'm receiving the following error message:

"Error in model.frame.default(formula = yobs ~ ., data = cbind(yobs, xobs),  : 
  'data' must be a data.frame, not a matrix or an array "

I've already searched the Internet for several days and tried different parameter settings, but I still can't find a solution or explanation for this problem. I've included a minimal working example showing that mice with meth="cart" on a dataset with two variables does not work. This data situation is not a problem for "rpart" or meth="norm.predict" in mice. (Adding a third variable produces no error)

It would be great if you could help me with that or point me to resources where I can find an solution to that problem (or at least an answer why it is not working).

If you need any further information, please let me know.

Thanks,
Micha

minimal example:

require("mice")
require("rpart")
a = rnorm(n = 200)
b = rnorm(n = 200)
c = rnorm(n = 200)

a[1:20] = NA
b[15:40] = NA
c[30:50] = NA

df1 = data.frame(a,b)
df2 = data.frame(a,b,c)

imp1 = mice(data = df1,m=1, meth = "cart", maxsurrogate=0, maxit = 1) # produces the described error
imp1 = mice(data = df2,m=1, meth = "cart", maxsurrogate=0, maxit = 1) # works fine

imp2 = mice(data = df1,m=1, meth = "norm.predict", maxsurrogate=0, maxit=1)
#complete(imp2)

tree1 = rpart(a~b, data=df1, maxsurrogate=0)
tree2 = rpart(b~a, data=df1, maxsurrogate=0)

end minimal example

pool.compare () with glmer models

Hi,

I tried to use pool.compare ( ) with the likelihood ratio test to compare two nested generalized linear mixed models and received the error below,
not meaningful for factors
Error in model.matrix(formula, data) %*% coefs : non-conformable arguments

Looking at the MICE documentation, I understand that pool.compare ( ) can be used to compare glm models. I am wondering if it can be applied to compare models that use glmer () function.

Thank you very much in advance!

Best,
Christina

Min(y) not meaningful for factors error in pool.compare

I am writing with a short bug report on mice’s pool.compare function. I did not have the time to understand the details of the problem but it goes as follows.

When pooling logistic regression models using ‘likelihood’ of pool.compare, the function returns an error when the dependent variable in the glm was a factor instead of a numeric (0,1) variable. The problem occurs because min(y) is used at some point in the function which is not defined for factors.

Based on one of the examples from pool.compare, a code to replicate the problem is:

imp2  <- mice(boys, maxit=2)
fit0 <- with(imp2, glm( as.factor(gen>levels(gen)[1]) ~ hgt+hc,family=binomial))
fit1 <- with(imp2, glm( as.factor(gen>levels(gen)[1]) ~ hgt+hc+reg,family=binomial))
pool.compare(fit1, fit0, method='likelihood', data=imp2)

I hope this helps to improve the code.

Best regards,
Thomas

Issues with mice.impute.2l.norm to impute multilevel missing data

#The following error appears when I tried to impute using mice.impute.2l.norm

"x + diag(diag(x) * ridge) : non-conformable arrays"

I find that the similar codes work for mice.impute.2l.pan.

Here is the debug code from R studio:

mice debug

Looking more closely to the problem. It appears that
line 117 of the code resulted in a matrix of element 1. Can you please confirm.

Please find the attached dataFCS2 and codes to reproduce the problem

ini2<-mice(dataFCS2,maxit=0)
pred<-ini2$predictorMatrix
pred[,c("ID","wave","sep2")]<-0
pred["bmiz",]<-c(-2,1,1,1,0,2,1,0,1,0)
pred["famStruc",]<-c(-2,1,1,1,0,1,1,0,0,1)
meth2<-ini2$method
meth2[c("bmiz","famStruc")]<-"2l.norm"
imp2<-mice(dataFCS2,m=40,meth=meth2,pred=pred)

Multilevel imputation: categorical/non-normal distributed variables

Dear professor van Buuren,

Is it possible to extend the mice package to a version where a function for first level imputation of categorical and non-normal distributed variables is possible? Since this is only available for normal distributed variables using 2l.pan, 2l.norm and 2l.lmer.

Thank you very much in advance,

Greetings,

Benjamin Gravesteijn

pool.compare with ordered logistic

Hi,

I'm conducting a likelihood ratio test with two ordinal logistic regressions using pool.compare(), but encounter the error that "model matrix %*% coefs" has non-conformable arguments.

Looking at the function pool.compare, I think this occurs because the coefficients outnumber the columns of the model.matrix, since polr() with J ordinal levels produces J-1 intercepts (i.e, J-2 more than the model matrix).

This produces an error in the Llogistic function, which tries to multiply the (n x k) model matrix by the (k -J - 2) coefficients.

Apologies if I'm just applying this incorrectly, but I couldn't find much more guidance online.

Thanks

Labels of factors lost

Adam Nieuzytek alerted me of the following problem:

## labeling bug 27feb2017

library("mice")
imp <- mice(nhanes2)
cmp <- complete(imp)
head(cmp) ## does have lavels, just data

# here we lose the the category labels
# because this is a contrast, not a category anymore
fit <- lm(bmi~age+hyp+chl, data = cmp)
fit

# however, the following is fine
fit2 <- lm( bmi~age+hyp+chl, data = nhanes2)
fit2

# there is a difference between
str(nhanes2)
str(cmp)

Presumably, somewhere in complete() there is a problem that creates this difference.

"Empty" imputation in mice

Hi Stef,

I ran

library(mice)
data(nhanes)
imp <- mice::mice(nhanes, m=0, maxit=0)

and obtained the error message invalid 'dimnames' given for data frame.

Is this intended? I suppose that the problem becomes visible since mice 3.0.0.

Alexander

Bug in as.mids2

Hello. I have encountered an error when converting a dataframe into a mice mids and then back-converted the mice mids into a new R data frame. The data produces different datasets.
I found this error on my own data, and reproduced it using a built-in dataset (epi, from the psych package).
Below is the syntax I used to reproduce the bug.
Thanks.

Imputed example.txt

Using mice package with Oracle R Enterprise (ORE)

I have my data in Oracle 12c. I installed the mice package on the ORE server side and tried to use it with the ore frame, but I receive the following error when on trying the md.pattern command:

Error in as.vector(x, mode) : invalid argument 'mode'

When I pull the data to R, and run the same command it works just fine. Please advice whether mice is capable of working with ORE and if so, will appreciate helping to overcoming the above error.

imputing categorical variable in a multilevel model

Hello,
I have an individual categorical variable (race) in a panel dataset. I would like to impute it and use it as a predictor. I only see the functions:

mice.impute.2lonly.mean
mice.impute.2lonly.norm
mice.impute.2lonly.pmm

If I use pmm (not sure it is the right thing to do), the imputed values would be used as a continuous variable when imputing other variables. Is that correct? If so, an alternative would be to transform that variable into a factor.

I get these warnings quite often:

Warning messages:
1: Some predictor variables are on very different scales: consider rescaling
2: Some predictor variables are on very different scales: consider rescaling

Thank you in advance!

Post-processing: function ifdo() not yet implemented

Dear Stef,

Another problem I encountered today showed itself when I was trying to reproduce some of your code from your book "Flexible Imputation of Missing Data" (fantastic book!).

At page 135-136 you present two code-solutions to the same post-processing setting:

library(mice)
ini <- mice(airquality[, 1:2], maxit = 0)
post <- ini$post
post["Ozone"] <- "imp[[j]][,i] <- squeeze(imp[[j]][,i], c(1,200))"
imp <- mice(airquality[, 1:2], method = "norm.nob", m = 1, maxit = 1, seed = 1, post = post)

Alternative to line 4:
post["Ozone"] <- "ifdo(c(Ozone < 1, Ozone > 200), c(1, 200))"

The former works fine but the latter gives the OzoneFunction ifdo() not yet implemented.

Has the "ifdo" function been withdrawn from the package?

Best regards,
Mikkel

Row order determines imputations

Let me start by saying I love this package, it's really wonderful software, thank you. Thought I'd bring something to your attention -- my colleague and I lost several hours of work to figuring this out so it certainly wasn't obvious to us, but apologies if this is a known issue.

We couldn't figure out why our replication archive had different imputations to our original, despite using near identical code, and setting seeds.

It turns out the rows of our dataframes were in different order, and this changes the imputations, even for otherwise identical data. MWE below. Sorry I didn't have time to submit a PR. I suppose it has something to do with random sampling of data somewhere. Although setting data.init didn't seem to solve the problem. One solution may be to allow for an "id" argument that sorts data, does imputations, and puts it back into its original ordering?

library(mice)
set.seed(1234)
N <- 1000
data <- data.frame(
  X1 = sample(1:5, N, TRUE),
  X2 = sample(1:5, N, TRUE),
  X3 = sample(0:1, N, TRUE),
  X4 = sample(0:1, N, TRUE),
  id = 1:N
)
# 5% of values missing
missing <-
  matrix(data = sample(c(TRUE, FALSE), prod(dim(data)), TRUE, c(.05, .95)),
         nrow = nrow(data))
data[missing] <- NA
head(data)
tail(data)
colSums(is.na(data)) / nrow(data)
# reorder datasets
data_1 <- data[sample(data$id),]
data_2 <- data[sample(data$id),]
all(data_1 == data_2)
all(data_1$id %in% data_2$id)
# Show here that imputing the same dataset twice leads to same imputations
imputed_same_order_1 <- complete(mice(data = data, m = 1, seed = 1234))
imputed_same_order_2 <- complete(mice(data = data, m = 1, seed = 1234))
all(imputed_same_order_1 == imputed_same_order_2)
# Different order leads to different imputations
imputed_diff_order_1 <- complete(mice(data = data_1, m = 1, seed = 1234))
imputed_diff_order_2 <- complete(mice(data = data_2, m = 1, seed = 1234))
# Sort into same order
imputed_diff_order_1 <- imputed_diff_order_1[sort(imputed_diff_order_1$id), ]
imputed_diff_order_2 <- imputed_diff_order_2[sort(imputed_diff_order_2$id), ]
all(imputed_diff_order_1 == imputed_diff_order_2)

Error when you include MaxNWts argument in mice() call and use mice.impute.cart somewhere in the model

I often end up having to use the MaxNWts argument in the mice call when using the polyreg imputation method, as described here. Like so:

df.imputed = mice(data = df.to.impute, m = 5, maxit = 10, MaxNWts = 5000)

However, if the same model also uses the cart imputation method, then the following error will be thrown when that MaxNWts argument gets passed to rpart() in mice.impute.cart.r...

Error in rpart(yobs ~ ., data = cbind(yobs, xobs), method = "anova", control = rpart.control(minbucket = minbucket,  : 
  Argument MaxNWts not matched

Thus it does not seem currently possible to use both the polr method and the cart method in the same imputation model when you need to increase the number of maximum weights. Is there a simple workaround?

Incorrect ordering of coefs and st. errors on using pool on multinom regression

Hi Stef

When using pool on multinomial regressions it returns coefficients and their standard errors in the incorrect order. It also drops the names. This seems due to there being no specific pool method for multinom models, and so it uses the default which doesn't quite work in this case.

From a question on stackoverflow: https://stackoverflow.com/questions/50291766/multinominal-regression-with-imputed-data
with a quick work-around.

thanks, david

Calling mice::mice()

I want to call mice::mice() from inside my own package. However, even at the command line I run into this problem:

mice::mice(airquality)
#> Error in check.method(setup, data): The following functions were not found: mice.impute.pmm, mice.impute.pmm

I have no problems when doing

library(mice)
mice(airquality)

Is this a bug or am I missing something?

Session info
devtools::session_info()
#> Session info --------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.3.0 (2016-05-03)
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  Danish_Denmark.1252         
#>  tz       Europe/Paris                
#>  date     2017-10-02
#> Packages ------------------------------------------------------------------
#>  package   * version date       source        
#>  backports   1.0.4   2016-10-24 CRAN (R 3.3.2)
#>  devtools    1.11.1  2016-04-21 CRAN (R 3.3.0)
#>  digest      0.6.9   2016-01-08 CRAN (R 3.3.0)
#>  evaluate    0.10    2016-10-11 CRAN (R 3.3.0)
#>  htmltools   0.3.5   2016-03-21 CRAN (R 3.3.0)
#>  knitr       1.15.1  2016-11-22 CRAN (R 3.3.0)
#>  lattice     0.20-33 2015-07-14 CRAN (R 3.3.0)
#>  magrittr    1.5     2014-11-22 CRAN (R 3.3.0)
#>  MASS        7.3-45  2016-04-21 CRAN (R 3.3.0)
#>  Matrix      1.2-6   2016-05-02 CRAN (R 3.3.0)
#>  memoise     1.0.0   2016-01-29 CRAN (R 3.3.0)
#>  mice        2.30    2017-02-18 CRAN (R 3.3.3)
#>  nnet        7.3-12  2016-02-02 CRAN (R 3.3.0)
#>  Rcpp        0.12.12 2017-07-15 CRAN (R 3.3.3)
#>  rmarkdown   1.3     2016-12-21 CRAN (R 3.3.0)
#>  rpart       4.1-10  2015-06-29 CRAN (R 3.3.0)
#>  rprojroot   1.2     2017-01-16 CRAN (R 3.3.2)
#>  stringi     1.0-1   2015-10-22 CRAN (R 3.3.0)
#>  stringr     1.0.0   2015-04-30 CRAN (R 3.3.0)
#>  survival    2.39-2  2016-04-16 CRAN (R 3.3.0)
#>  withr       1.0.1   2016-02-04 CRAN (R 3.3.0)
#>  yaml        2.1.13  2014-06-12 CRAN (R 3.3.0)

diagnostics = FALSE

I do set diagnostics false but then mice prints diagnostics anyway.

mic <- mice(res$amp,diagnostics = FALSE, maxit=100)

iter imp variable
1 1 x
1 2 x
1 3 x
1 4 x
1 5 x

Predictive mean matching for non-numerical data?

I try to use mice::mice.impute.pmm for imputation, but get an error when the data is not numerical. Here is an example:

xname <- c('age', 'hgt', 'wgt', 'reg')
r <- stats::complete.cases(boys[, xname])
x <- boys[r, xname]
y <- boys[r, 'tv']
ry <- !is.na(y)
yimp <- mice.impute.pmm(y, ry, x) # Error
yimp2 <- mice.impute.sample(y, ry, x) # works 

I believe pmm worked as I intended before an update but I am not 100% sure. I am now using mice version 2.46.0.

Thanks for looking into this issue,

Andreas

fastpmm

Hi
I can't find fastpmm as mice.impute method, is the Rcpp version for pmm no longer available?

cheers

Lara

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.