covaruber / sommer Goto Github PK

R 35.55% C++ 9.16% HTML 55.29%

average-information mixed-models rcpparmadillo

sommer's Introduction

sommer: Solving Mixed Model Equations in R

Structural multivariate-univariate linear mixed model solver for estimation of multiple random effects with unknown variance-covariance structures (e.g., heterogeneous and unstructured) and known covariance among levels of random effects (e.g., pedigree and genomic relationship matrices) (Covarrubias-Pazaran, 2016; Maier et al., 2015; Jensen et al., 1997). REML estimates can be obtained using the Direct-Inversion Newton-Raphson and Direct-Inversion Average Information algorithms for the problems r x r (r being the number of records) or using the Henderson-based average information algorithm for the problem c x c (c being the number of coefficients to estimate). Spatial models can also be fitted using the two-dimensional spline functionality available.

Installation

You can install the development version of sommer from GitHub:

devtools::install_github('covaruber/sommer')

Vignettes

Development

The sommer package is under active development. If you are an expert in mixed models, statistics or programming and you know how to implement of the following:

the minimum degree ordering algorithm
the symbolic cholesky factorization
factor analytic structure
generalized linear models

please help us to take sommer to the next level. Drop me an email or push some changes through github :)

sommer's People

Contributors

Stargazers

Watchers

sommer's Issues

could get standard errors for the random effects?

rf.sm<-randef(sp.sm)
names(rf.sm)
[1] "u:Tree" "Fam" "u:Row:Col"
rf.sm$Fam$dbh10[1:5]
FamF11 FamF12 FamF13 FamF14 FamF18
0.10463989 0.34510480 0.04833183 0.28591631 0.24046773

from up codes, it did not show the standard errors for the random effects. Is it possible to get them? If OK, how to do?
Thanks.

Best regards
Yuanzhen Lin

multivariate (multi-trait) for modeling GCA

I have an issue with analyzing multi-trait BLUP (trait: OER and BN) for modeling GCA from parent A and parent B.

Herewith I attach some additional information:

Excel file: phenotypic data, parent 1 (A) relationship matrix, parent 2 (B) relationship matrix:
OP_pheno.xlsx
Script:
fm6 <- mmer(cbind(OER, BN) ~ 1,
random= ~ vsr(A, Gu=A_P1) + vsr(B, Gu=A_P2),
rcov= ~ vsr(units,Gtc=unsm(2)),
data=Pheno)

when running the script, I get the following error:

fm6 <- mmer(cbind(OER, BN) ~ 1, random= ~ vsr(A, Gu=A_P1) + vsr(B, Gu=A_P2), rcov= ~ vsr(units,Gtc=unsm(2)), data=Pheno)
Version out of date. Please update sommer to the newest version using:
install.packages('sommer') in a new session
Use the 'dateWarning' argument to disable the warning message.Adding additional levels of Gu in the model matrix of 'A'
Adding additional levels of Gu in the model matrix of 'B'
iteration LogLik wall cpu(sec) restrained
1 -29.5416 21:56:0 0 0
2 -22.6125 21:56:0 0 2
3 -16.8525 21:56:0 0 1
Error: System is singular (V). Aborting the job. Try a bigger number of tolparinv.

Please enlighten me and give me a solution for this problem.

Thank you very much.

Error in H.mat(A, G)

When I try to run the example for H.mat() in RStudio Server, an error happened:
M <- matrix(rep(0,200*1000),200,1000)
for (i in 1:200) {
M[i,] <- sample(c(-1,0,0,1), size=1000, replace=TRUE)
}
rownames(M) <- 1:nrow(M)
v <- sample(1:nrow(M),100)
M2 <- M[v,]

A <- A.mat(M) # assume this is a pedigree-based matrix for the sake of example
G <- A.mat(M2)

H <- H.mat(A,G)
Error in H.mat(A, G) :
subtraction: incompatible matrix dimensions: 0x0 and 100x100

sessionInfo():
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2020.1.217/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] sommer_4.1.2 crayon_1.3.4 lattice_0.20-41 MASS_7.3-53 Matrix_1.2-18

loaded via a namespace (and not attached):
[1] compiler_3.6.3 tools_3.6.3 rstudioapi_0.11 yaml_2.2.0 Rcpp_1.0.5 grid_3.6.3

However, these codes can be run normally in the same RStudio Desktop version.

sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2020.1.217/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] sommer_4.1.2 crayon_1.3.4 lattice_0.20-41 MASS_7.3-53
[5] Matrix_1.3-0

loaded via a namespace (and not attached):
[1] compiler_3.6.3 tools_3.6.3 Rcpp_1.0.5 grid_3.6.3

Facing error at the stage of A.mat

Hello! Thanks for developing this excellent program!
When I try to construct an additive relationship matrix using A.mat, I encountered error issues. The genotype data originated from a .vcf which was converted into .raw data form using plink and performed with protocols as below:

m012= fread("plink.raw")
g012 = m012[,-c(3:6)]
dim(g012)
fid = g012$FID
iid = g012$IID
library(sommer)

setDF(g012)
rownames(g012) = g012$IID
g012$IID = NULL
g012$FID = NULL
Gmat = A.mat(g012-1)

plink.raw:

However, the program reported error like this:

Imputing markers with mean value
Error in A.mat(g012 - 1) :
Not compatible with requested type: [type=list; target=double].

So I am wondering whether it is the "NA" in my data, so I replace all "NA" with 0.
However, nothing changes, what's more, the report "Imputing markers with mean value" was missing if I replace NA.

I don't know what I should do now, could you please give me some suggestions?

Thanks!!

mmer fails when number of observations > 2^15

BUG

How to reproduce

Here's a minimal example with synthetic data. We are fitting one fixed and one random effect for 32800 observations. NOTE: Given the high number of rows, this takes quite a bit to run. The 32000 case is used just to prove the point that values below 2^15 are not affected and may be commented out.

library(conflicted)
library(tidyverse)
library(sommer)


# generate 32800 synthetic observations
data <- tibble(
  fixA = runif(32800, 0, 5),
  ranB = as.factor(sample.int(2, 32800, replace = TRUE))
) %>%
  mutate(val = -2 + 
           fixA * 0.7 + 
           as.numeric(ranB == 1) * rnorm(32800, mean = 1, sd = 0.2) +
           as.numeric(ranB == 0) * rnorm(32800, mean = 1, sd = 0.7) +
           rnorm(32800, 0, 0.01)
  )

print(paste("TEST WITH 1000:", Sys.time()))
# should run just fine with a few observations
model <- mmer(
  fixed = val ~ fixA,
  random = ~ ranB,
  rcov = ~ units,
  data = data %>%  slice_sample(n = 1000), 
  verbose = TRUE
)

print(paste("TEST WITH 32000:", Sys.time()))
# should run just fine with 32000
model <- mmer(
  fixed = val ~ fixA,
  random = ~ ranB,
  rcov = ~ units,
  data = data %>%  slice_sample(n = 32000), 
  verbose = TRUE
)

print(paste("TEST WITH 32800:", Sys.time()))
# crashes with segfault for 32800
model <- mmer(
  fixed = val ~ fixA,
  random = ~ ranB,
  rcov = ~ units,
  data = data, 
  verbose = TRUE
)

Expected behaviour

The code should run all the instances of mmer

Actual behaviour

mmer fails when fitting 32800 observations with a segmentation fault. Noticed this happened around 32000 so I tested different values. I guess this could be a signed 2 byte variable going out of range?

Notes

I found this out when trying to implement a somewhat automated pipeline which had convergence issues with mmec and didn't want to spend time on tweaking the step weight.

If this is intended or is outside the scope of mmer maybe it can be documented as it is quite difficult to debug?

more structure covariance structures?

I wonder if more covariance structures can be used. For example, AR, ARMA, FA(q), FA0(q)

Throw error for singular system

Using sommer I sometimes get this message.

System is singular (V). Stopping the job. Try a bigger number of tolparinv.

It would be really helpful if this was not just a message, but an actual error. That would make it easier to capture it and handle it properly in e.g a function or a loop.

If you want I could create a pull request for this

New Error: Not compatible with requested type

Hi,

I've been using your package for some time and I'm having a new error with exactly the same code and data that used to work before. I cannot find a solution, could you please help?

This is my code:

result <- GWAS(ccp ~ 1, 
               random =~vs(species),
               rcov=~units, gTerm = "u:species",
               M = model_table, data = DT2 )

And this is the output and error:

Performing GWAS evaluation
Error in GWAS(ccp ~ 1, random = ~vs(species, Gu = phyl_covar), rcov = ~units,  : 
  Not compatible with requested type: [type=list; target=double].

DT2 looks like this: (I've tried having the species col both as a factor and as character col)

      species    ccp
1 b_gravinae1 37.217
2    b_juncea 46.286
3     b_napus 46.155
.......

And this is how the model_table looks like:

 >model_table[1:4,1:4]   (there are no NAs anywhere, I checked)
           N0.HOG0000003 N0.HOG0000005 N0.HOG0000006 N0.HOG0000010
b_gravinae             0             1             0             0
b_juncea              11             8             1             0
b_napus               29            10             6             1
b_nigra                0             0             0             0

I haven't run the script in half a year, I wonder if the cause of all of this is updated packages? I've just done install.packages("sommer") and it still doesn't work.

Hope you can help, thanks,
Ricardo

Query regarding MEMMA output

I had a question regarding MEMMA's output. Reading the documentation I got the impression that I could extract variance components and $\hat{u}$ for each random effect specified, but in my attempt to create a genetic model with two random effects, I only see one random component.

Here is a minimal example where I attempt to perform a single random marker test with background genetic effects.

###R and package version
>packageVersion('sommer')
[1] ‘4.1.3’
> R.version$version.string
[1] "R version 4.0.4 (2021-02-15)"

####Define your design and covariance matrices####
Z1<- apply(as.matrix(dat[,c(11:13)]),2,
             function(x) as.integer(x*2)) #your qtl effects based on 3 founders of population
K1<- diag(ncol(Z1)) #Single variance with no covariance among founder effects
dimnames(K1)<-list(colnames(Z1),colnames(Z1)) 
Z2<-model.matrix(~-1+geno,data=dat) #Your background genetic effects
K2<- Gmat # additive covariance matrix
X<- model.matrix(~1+block,data=dat) #Fixed trial effects
Z<- list(qtl=list(Z=Z1,K=K1),
           poly=list(Z=Z2,K=K2))

testMod<-tryCatch({
   MEMMA(Y=dat$yield
       X=X,ZETA=Z,silent=F)
       },
   error=function(c){
      cat(c$message,file = "log.txt",append = T)
      return(NULL)
})

Examining the variances, I get the following:

> testMod$var.comp$Vu
         T.1
T.1 26.11816

And extracting the realised effects, I get an array the same size as the original dataset.

> nrow(testMod$u.hat)==nrow(dat)
[1] TRUE

I was expecting mod$u.hat to only contain the effects from Z1 and Z2 with the length of the output being the column ranks of Z1 and Z2. I suspect that the output of u.hat is actually the sum of all random effects for each listing in dat.

My question is whether I am mistaken in my understanding of MEMMA or have I made a mistake in how I specified these models?

Thanks in advance.

matching (co)variance estimates and standard errors

Thanks for your work on the sommer package.
This is a suggestion / question:

I am trying to estimate the genetic correlation among multiple phenotypes in mmer. This is no problem, via cov2cor(mmer.model$sigma[["u:id"]]). I am also trying to get the standard error of sigma, which is stored in mmer.model$sigmaSE. However, this returns a matrix that is not of the same dimension as the mmer.model$sigma list.

My questions is, how can I extract the correct standard errors for each element in the covariance matrix?
Thanks,
John

For example:

data("DT_cpdata")
A <- A.mat(GT)
ans.m <- mmer(cbind(Yield,color)~1,
              random=~ vs(id, Gu=A)
              + vs(Rowf,Gtc=diag(2))
              + vs(Colf,Gtc=diag(2)),
              rcov=~ vs(units),
              data=DT)

ans.m$sigma

ans.m$sigmaSE

which gives the following output pasted below.

GWAS without P3D exits with error

As the title says, attempting GWAS with exact tests for every SNP does not run to completion. I've tracked this down to lines 227-231 of GWAS.R where the final results are appended to the lastmodel object before it is returned from the function. However, if P3D is FALSE, the conditional block on lines 94-100 is never executed and lastmodel is never created in the scope of the original function, only the scope of scorecalc() as each marker is evaluated.

Error in mmer(y = y, Z = ETA, method = "EMMA") : unused arguments (y = y, Z = ETA)

HI
Good Afternoon
I am trying to reproduce the scenario1 of your SOMMERs package tutorial before applying to my data to Case1:Genotypic and phenotypic data for the parents is available and we want to predict performance of the possible crosses assuming a purely additive model (species with no heterosis) with below code
data(wheatLines) X <- wheatLines$wheatGeno; X[1:5,1:5]; dim(X) Y <- wheatLines$wheatPheno dim(Y) rownames(X) <- rownames(Y) y <- Y[,1] # response grain yield y Z1 <- diag(length(y)) # incidence matrix K <- A.mat(X) # additive relationship matrix ETA <- list(add=list(Z=Z1, K=K)) ans <- mmer(y=y, Z=ETA, method="EMMA") # kinship based summary(ans)

after running this code i am getting error like even after declaration of y and ETA
Error in mmer(y = y, Z = ETA, method = "EMMA") : unused arguments (y = y, Z = ETA)

Can you please help me where it went wrong and how to rectify it for my future work.
Thanking you very much

Cooks distance

Dear Eduardo,

do you think it would be feasible to implement a convenience function for the cooks distance value, as is available for lm and lmer?

Cheers Gregor

Pin function no longer available?

Hello,
I noticed that the pin() function is not longer available. Was there an issues with this function?
Best regards,
Mitchell

Poisson and Negative Bionomial Distributed Traits

Are there any plans to incorporate other distributions (i.e. poisson or neg. binomial) into mmer's function?

Perhaps, similar to https://github.com/glmmTMB/glmmTMB?

bug in vpredict

Hi,
there seems to be a bug in vpredict that prevents the output of class "vpredict.mmer" to behave like a data.frame. If you consider to change this code at the end:

toreturn2 <- data.frame(row.names = tname, Estimate = tvalue, SE = se)
class(toreturn2) <- "vpredict.mmer"

to:

toreturn2 <- data.frame(Estimate = tvalue, SE = se)
rownames(toreturn2 ) <- tname # seemed not to be evaluated correctly before in all cases
class(toreturn2) <- c("vpredict.mmer","data.frame") # allows data.frame inheritance

all data.frame methods should work, too and handling will be eased

Regards,
Johannes

Predict.mmec function issue

Hi Giovanny,

I found that in the predict.mmec, there is a bug when the order of the factor is switched.
If in the model we declare, Name:Env and also Name:Env in the predict.mmec, then we get the correct results. However, if we switch to Env:Name in the predict.mmec, we get incorrect results. I provide the code below:

library(sommer)
library(data.table)
library(plyr) 

options("scipen" = 100,"digits" = 4 ) # set numbering format

The data

data(DT_h2)
DT <- DT_h2
head(DT)

Env <- unique(DT$Env)

##### Make data list for Stage I #####
stgI_list <- matrix(data = list(), nrow = length(Env), ncol = 1, 
                    dimnames = list(Env, c("lsmeans")))

Stage I analysis

###################
##### Stage I #####

for (i in Env) {
  
  Edat <- droplevels(subset(DT, Env == i))
  
  print(i)
  
  mod.1 <- mmer(fixed   = y ~ Name-1,
                  random  = ~ Block,
                  data    = Edat)
  
  blue <- predict(mod.1, classify = "Name") # get the lsmeans and vcov
  
  blue.1 <- data.table(blue$pvals)[, c(2:4)] # grab the lsmeans
  
  names(blue.1) <- c("Name", "Yield_adj", "se") # rename columns
  
  blue.1[ , ':='(smith.w = diag(solve(mod.1$VarBeta)))] # calculate the Smith's weight
  
  stgI_list[[i, "lsmeans" ]] <- blue.1 # put all the results of Stage 1 in the list
  
  rm(Edat, mod.1, blue, blue.1)
}

Collapse the results

stgII_df <- as.data.frame(ldply(stgI_list[, "lsmeans"], 
                                  data.frame, .id = "Env"))

DT::datatable(as.data.frame(stgII_df),class = 'cell-border stripe')

Stage 2 in sommer

vei = 1/var(stgII_df$Yield_adj, na.rm = T)

W <- as(diag(stgII_df$smith.w), Class = "dgCMatrix") 

ans2 <- mmec(Yield_adj ~ Env,
             random = ~Name + Name:Env,
             rcov = ~vsc(isc(units, thetaC = matrix(3,1,1), 
                             theta =matrix(vei,1,1))),
             W = W,
             data = stgII_df)

summary(ans2)$varcomp

Dt <- ans2$Dtable

Dt[1,"include"] = TRUE
Dt[1,"average"] = TRUE
Dt[2,"include"] = TRUE
Dt[2,"average"] = TRUE
Dt[3,"include"] = TRUE
Dt[3,"average"] = TRUE

Dt

Predictions in sommer

p <- predict(ans2, Dtable = Dt, D = "Name:Env") #correct
p1 <- p$pvals

q <- predict(ans2, Dtable = Dt, D = "Env:Name") #incorrect
q1 <- q$pvals

Thank you and best wishes,
Harimurti

Make vpredict a method

It would be nice if vpredict could have a method for sommer, ie. vpredict.sommer.

Right now, if I have both asreml and sommer loaded, I can get error messages when using vpredict if the wrong package is loaded first/second.

scores in result of GWAS is incomplete

GWAS function returns only scores (P values), while according to the function vignette, a data frame with 5 columns should be returned.

Possible mix up between VanRaden and Endleman GRMs

Hi there, thanks for sommer.

I was comparing implementations of the VanRaden GRM estimator across several packages, and I noticed that the the A.mat function in sommer seems to produce different values. On further inspection, A.mat seems to produce the VanRaden estimate when endelman=TRUE and not when endelman=FALSE.

Looking at the source code, I think the estimators may have been mixed up as this looks to be calculating the VanRaden estimator (note the use of population frequencies) and this looks more like the Endelman estimator (note scaling via the mean of diagonal values).

Factor/character vectors when defining random effect matrices

vs() has undocumented behavior on line 469 of FUN_special.R. I noticed this when my attempt to do genomic prediction only returned random effects for my training set. By contrast, the example in the documentation produced the expected output. This has to do with differences in how model.matrix() handles character vs. factor vectors.

If dummy is a factor, then while it only contains entries for records with complete data, it also contains the full levels information for the unfiltered data (dataor in mmer()). model.matrix() retains the levels to produce a design matrix with the appropriate number of columns for estimating random effects of records with missing phenotypes. If dummy is a character vector, then model.matrix() does not have the levels and produces a design matrix for only the training set.

This is easily fixed on the user side by making sure that all categorical variables have the factor type. It would be easiest to highlight this in the documentation for mmer() (and perhaps vs() as well) so that users don't get confused. Because the documentation for mmer() is already quite complex, I think the best solution would be to check silently for character vectors and convert them to factors internally.

multivariate model specification with different fixed effects per trait

Hi.

Thanks for the great package.

I would like to ask how to specify a bi-variate model with different fixed effects for each trait.

Say y1=fix1b1+cov1c1+e1, and y2=fix2b2+cov2c2+e2,

where y1 and y2 are continuous responses, fix1 and fix2 are fixed classification variables, cov1 and cov2 are continuous co-variables, and e1 and e2 are residuals.

The task is to estimate a 2x2 residual co-variance matrix. How would such model be specified?

I could not find any example for this case in the manual.

Running trial and error, I have not reach the point of fitting the co-variables.

I successfully tried mmer(fixed=cbind(y1,y2)~vs(fix1,Gtr=fcm(c(1,0))),data=data).

But I stuck at fitting fix2: mmer(fixed=cbind(y1,y2)~vs(fix1,Gtr=fcm(c(1,0)))+vs(fix2,Gtr=fcm(c(0,1)))-1,data=data) .........

where I am getting message "fixed-effect model matrix is rank deficient so dropping 253 columns / coefficients" which implies that either "fix1" or "fix2" is entirely dropped from the model.

Any help much appreciated.

Best regards.

Issue on the standard error for the BLUP

Dear Giovanny,

I have a question regarding the standard error of BLUPs produced by Sommer. I am attaching a html file made by Rmarkdown.

I tried to analyse a sample data from agridat package (omer.sorghum). An MET data with 18 genotypes, 6 environments, with 4 blocks per environment.

The fixed effect is environment.
The random effects are gen, gen:env, and rep within each environment, which employs diagonal vs or rep:at(env).
The residual variances are heterogeneous environment-specific.

I found that the standard error of BLUPs from Sommer were larger than those from ASReml-R.
I remember in the previous version you have RtermsToForce in the predict.mmer function. However, in the version 4.1.1, this statement is not available anymore.

The BLUPs are identical to ASReml-R BLUPs.

I also checked the RE.used in the predict function, it might be a term is not necessary to be used (env).

I thank you for your attention and help.

Best regards,
Harimurti
Standard-errors-of-BLUPs-Sommer-vs-ASReml-R.docx

fitted values from mmer - fixed effects only

Hello,

I noticed that when mmer() calculates the fitted values, it only does so for the fitted effects, but not random effects (even when using type = "complete" like the documentation indicates). This is different from how other mixed model packages treated the fitted() function (nlme, lme4, asreml) and other common R classes derived from linear models (lm, glm). Is there a way to extract the predictions from the model object across fixed and random variables? If there is another way to extract this, please let me know. Perhaps I am missing something? From looking at your code, it looks like the fitted() function has gone through a few modifications.

Great package, thanks for your work.

Error in predict.mmer Error in `$<-.data.frame`(`tmp`, start, value = c(1, 2, 3, 4, 5, 13, : replacement has 16 rows, data has 8

I have been trying to predict a variable from a model using random regression with the mmer function. However, I am getting the following error:

"Error in `$<-.data.frame`(`*tmp*`, start, value = c(1, 2, 3, 4, 5, 13, : replacement has 16 rows, data has 8"

I tried to find out the error by carefully running the function code and realized that the start and end vectors are longer than the number of lines in the Dtable.

That is, when the start and end vectors are created here:

   start = end = numeric()
   add <- 1
   for (i in 1:length(effectsN)) {
     start[i] = add
     end[i] = start[i] + effectsN[i] - 1
     add = end[i] + 1
   }

They cannot be added here:

     Dtable$start <- start
     Dtable$end <- end

Is there a way to fix this?

Here is some information about my model:

Thank you very much in advance!

Factor analytic covariance structure updates during REML

Hi Giovanny,
Awesome work on this package! Do you have plans on implementing the factor analytic model similar to ASReml where the loadings are updated during REML estimation? If so, I would be happy to contribute.
Cheers!

predict() error in cross-validation

I got this error when I do cross-validation, called "Error in [.data.frame(object$dataOriginal, , x) : undefined columns selected". However, the column I choose is really defined. I think I got this error because the function predict() is fail to work, but I remind I can use it in the former version of sommer.

My code looks like this:

rm(list = ls())
setwd("")
getwd()
library(sommer)
library(nadiv)
library(data.table)

dat <- fread("../ABLUP/data/phenotype/phe520.csv",data.table = F,header = T)
for(i in 1:4) dat[,i] = as.factor(as.character(dat[,i]))
head(dat)
str(dat)
summary(dat)
dat2 <- na.omit(dat)
head(dat2)
str(dat2)
summary(dat2)

ped <- fread("../ABLUP/data/pedigree/ped520.csv",data.table = F)
head(ped)
pped =prepPed(ped)
Amat = as.matrix(makeA(pped))
dim(Amat)

Amat <- as.data.frame(Amat)
Amat[1:5,1:5]
id <- ped$ID
Amat <- Amat[which(rownames(Amat)%in%id),which(colnames(Amat)%in%id)]
Amat <- as.matrix(Amat)
Amat[1:5,1:5]
dim(Amat)

set.seed(2022)
# dd <- dat
dd <- dat2
row.names(dd) <- dd$ID
library(caret)
w <- createFolds(1:length(dd$ID),k = 5)
# w

r_pearson <- NULL
r_pearson2 <- NULL
r_spearman <- NULL
r_unbiased <- NULL
r_MSD <- NULL
time <- system.time(
  for(i in 1:5){
  #i=1
  vv <- dd$ID[w[[i]]]
  vv <- as.character(vv)
  tt <- dd
  tt[vv,]$RFI <- NA
  mod.mm = mmer(RFI ~ Gen + Batch + Sex, 
                random = ~ vs(ID,Gu = Amat),
                rcov = ~ units,
                data= tt)
  ablup <- predict(mod.mm,"ID")$pvals
  row.names(ablup) <- ablup$ID
  r_pearson[i] <- cor(ablup[vv,]$predicted.value,dd[vv,]$RFI,
                      method = "pearson")
  r_pearson2[i] <- (cor(ablup[vv,]$predicted.value,dd[vv,]$RFI,
                        method = "pearson"))^2
  r_spearman[i] <- cor(ablup[vv,]$predicted.value,dd[vv,]$RFI,
                       method = "spearman")
  r_unbiased[i] <- summary(lm(dd[vv,]$RFI~ablup[vv,]$predicted.value-1))$coefficients[1]
  #r_unbiased[i] <- cov(dd[vv,]$RFI,ablup[vv,]$predicted.value)/var(ablup[vv,]$predicted.value)
  r_MSD[i] <- mean((dd[vv,]$RFI-ablup[vv,]$predicted.value)^2)
})[3]

time2 <- time
print(time2)

se <- function(x){
  sd(x)/sqrt(length(x))
}

print(r_pearson)
n1 <- mean(r_pearson);n1e <- se(r_pearson)
print(r_pearson2)
n2 <- mean(r_pearson2);n2e <- se(r_pearson2)
print(r_spearman)
n3 <- mean(r_spearman);n3e <- se(r_spearman)
print(r_unbiased)
n4 <- mean(r_unbiased);n4e <- se(r_unbiased)
print(r_MSD)
n5 <- mean(r_MSD);n5e <- se(r_MSD)
ax <- data.frame(type=c("Accur","Relia","Rank","Ubia","MSD","CPU"),
                 value = c(n1,n2,n3,n4,n5,time2),
                 se = c(n1e,n2e,n3e,n4e,n5e,log10(time2)))
fwrite(ax,"ablup_mod.mm_CV.txt",sep="\t")

The first 30 rows of data looks like this:

structure(list(ID = structure(c(401L, 402L, 403L, 444L, 517L, 
404L, 405L, 406L, 407L, 408L, 409L, 410L, 411L, 412L, 413L, 414L, 
415L, 416L, 417L, 418L, 419L, 420L, 421L, 422L, 423L, 424L, 425L, 
426L, 427L, 428L), .Label = c("FX1_10", "FX1_100", "FX1_1000", 
"FX1_1001", "FX1_1002", "FX1_1003", "FX1_1004", "FX1_1005", "FX1_1006", 
"FX1_1007", "FX1_1008", "FX1_1009", "FX1_1011", "FX1_1012", "FX1_1015", 
"FX1_1016", "FX1_1017", "FX1_1018", "FX1_1019", "FX1_102", "FX1_1020", 
"FX1_1021", "FX1_1022", "FX1_1024", "FX1_1025", "FX1_1026", "FX1_1027", 
"FX1_103", "FX1_1030", "FX1_1031", "FX1_1032", "FX1_1034", "FX1_1035", 
"FX1_1036", "FX1_1038", "FX1_105", "FX1_107", "FX1_108", "FX1_109", 
"FX1_110", "FX1_111", "FX1_114", "FX1_115", "FX1_116", "FX1_117", 
"FX1_118", "FX1_119", "FX1_12", "FX1_120", "FX1_121", "FX1_122", 
"FX1_123", "FX1_124", "FX1_1244", "FX1_1245", "FX1_1246", "FX1_1247", 
"FX1_1248", "FX1_1249", "FX1_125", "FX1_1250", "FX1_1251", "FX1_1252", 
"FX1_1253", "FX1_1255", "FX1_1256", "FX1_1257", "FX1_1258", "FX1_1259", 
"FX1_1260", "FX1_1261", "FX1_1262", "FX1_1263", "FX1_1266", "FX1_1267", 
"FX1_1268", "FX1_1269", "FX1_127", "FX1_1270", "FX1_1271", "FX1_1273", 
"FX1_1274", "FX1_1275", "FX1_1276", "FX1_1277", "FX1_1279", "FX1_128", 
"FX1_1280", "FX1_1281", "FX1_1282", "FX1_1283", "FX1_1284", "FX1_1286", 
"FX1_1288", "FX1_1291", "FX1_1292", "FX1_1293", "FX1_1294", "FX1_1295", 
"FX1_1296", "FX1_1297", "FX1_1298", "FX1_1299", "FX1_13", "FX1_130", 
"FX1_1300", "FX1_1301", "FX1_1302", "FX1_1303", "FX1_1306", "FX1_1307", 
"FX1_1308", "FX1_131", "FX1_1311", "FX1_1312", "FX1_1313", "FX1_1315", 
"FX1_1317", "FX1_1318", "FX1_1319", "FX1_132", "FX1_1320", "FX1_133", 
"FX1_134", "FX1_135", "FX1_136", "FX1_137", "FX1_138", "FX1_139", 
"FX1_14", "FX1_140", "FX1_141", "FX1_142", "FX1_143", "FX1_144", 
"FX1_145", "FX1_146", "FX1_147", "FX1_148", "FX1_149", "FX1_15", 
"FX1_150", "FX1_151", "FX1_152", "FX1_154", "FX1_156", "FX1_157", 
"FX1_159", "FX1_161", "FX1_162", "FX1_163", "FX1_164", "FX1_165", 
"FX1_166", "FX1_167", "FX1_168", "FX1_17", "FX1_170", "FX1_171", 
"FX1_172", "FX1_173", "FX1_174", "FX1_175", "FX1_176", "FX1_177", 
"FX1_179", "FX1_18", "FX1_180", "FX1_181", "FX1_182", "FX1_183", 
"FX1_184", "FX1_185", "FX1_186", "FX1_187", "FX1_189", "FX1_19", 
"FX1_191", "FX1_192", "FX1_193", "FX1_194", "FX1_195", "FX1_196", 
"FX1_197", "FX1_199", "FX1_2", "FX1_20", "FX1_200", "FX1_201", 
"FX1_202", "FX1_205", "FX1_206", "FX1_207", "FX1_209", "FX1_21", 
"FX1_210", "FX1_211", "FX1_212", "FX1_213", "FX1_214", "FX4_A1281", 
"FX4_A1282", "FX4_A1283", "FX4_A1284", "FX4_A1285", "FX4_A1286", 
"FX4_A1287", "FX4_A1288", "FX4_A1289", "FX4_A1290", "FX4_A1291", 
"FX4_A1292", "FX4_A1293", "FX4_A1294", "FX4_A1295", "FX4_A1296", 
"FX4_A1299", "FX4_A130", "FX4_A1301", "FX4_A1302", "FX4_A1303", 
"FX4_A1304", "FX4_A1305", "FX4_A1306", "FX4_A1307", "FX4_A1308", 
"FX4_A1309", "FX4_A131", "FX4_A1310", "FX4_A1311", "FX4_A1312", 
"FX4_A1313", "FX4_A1314", "FX4_A1315", "FX4_A1316", "FX4_A1317", 
"FX4_A1318", "FX4_A1319", "FX4_A1320", "FX4_A1322", "FX4_A1323", 
"FX4_A1324", "FX4_A1325", "FX4_A1326", "FX4_A1327", "FX4_A1329", 
"FX4_A133", "FX4_A1330", "FX4_A1331", "FX4_A1332", "FX4_A1333", 
"FX4_A1334", "FX4_A1335", "FX4_A1336", "FX4_A1337", "FX4_A1338", 
"FX4_A1339", "FX4_A134", "FX4_A1340", "FX4_A1341", "FX4_A1343", 
"FX4_A1344", "FX4_A1347", "FX4_A1349", "FX4_A1350", "FX4_A1351", 
"FX4_A1352", "FX4_A1354", "FX4_A1355", "FX4_A1356", "FX4_A1357", 
"FX4_A136", "FX4_A1361", "FX4_A1362", "FX4_A1365", "FX4_A1367", 
"FX4_A1368", "FX4_A137", "FX4_A1370", "FX4_A1371", "FX4_A1372", 
"FX4_A1373", "FX4_A1374", "FX4_A1375", "FX4_A1376", "FX4_A1377", 
"FX4_A1378", "FX4_A1379", "FX4_A138", "FX4_A1380", "FX4_A1381", 
"FX4_A1382", "FX4_A1384", "FX4_A1386", "FX4_A1387", "FX4_A1388", 
"FX4_A1389", "FX4_A139", "FX4_A1391", "FX4_A1392", "FX4_A1393", 
"FX4_A1396", "FX4_A1397", "FX4_A1398", "FX4_A1399", "FX4_A140", 
"FX4_A1400", "FX4_A1401", "FX4_A1402", "FX4_A1404", "FX4_A1405", 
"FX4_A1406", "FX4_A1409", "FX4_A141", "FX4_A1410", "FX4_A1412", 
"FX4_A1413", "FX4_A1415", "FX4_A1416", "FX4_A1417", "FX4_A1418", 
"FX4_A142", "FX4_A1420", "FX4_A1421", "FX4_A1423", "FX4_A1424", 
"FX4_A1425", "FX4_A1426", "FX4_A1428", "FX4_A143", "FX4_A1431", 
"FX4_A1432", "FX4_A1433", "FX4_A1434", "FX4_A1436", "FX4_A1437", 
"FX4_A1439", "FX4_A144", "FX4_A1441", "FX4_A1442", "FX4_A1444", 
"FX4_A1445", "FX4_A1448", "FX4_A1449", "FX4_A145", "FX4_A1450", 
"FX4_A1452", "FX4_A1453", "FX4_A1455", "FX4_A1456", "FX4_A1458", 
"FX4_A1460", "FX4_A1461", "FX4_A1463", "FX4_A1464", "FX4_A1465", 
"FX4_A1469", "FX4_A1471", "FX4_A1472", "FX4_A1473", "FX4_A1474", 
"FX4_A1476", "FX4_A1477", "FX4_A1479", "FX4_A148", "FX4_A1480", 
"FX4_A1481", "FX4_A1482", "FX4_A1484", "FX4_A1485", "FX4_A1487", 
"FX4_A1488", "FX4_A1489", "FX4_A149", "FX4_A1490", "FX4_A1492", 
"FX4_A1493", "FX4_A1496", "FX4_A1497", "FX4_A1498", "FX4_A150", 
"FX4_A1500", "FX4_A1501", "FX4_A1503", "FX4_A1504", "FX4_A1505", 
"FX4_A1506", "FX4_A1508", "FX4_A1509", "FX4_A151", "FX4_A1511", 
"FX4_A1512", "FX4_A1513", "FX4_A1514", "FX4_A1516", "FX4_A1518", 
"FX4_A1519", "FX4_A152", "FX4_A1521", "FX4_A1522", "FX6_A1", 
"FX6_A10", "FX6_A11", "FX6_A1154", "FX6_A1155", "FX6_A1156", 
"FX6_A1157", "FX6_A1158", "FX6_A1159", "FX6_A1160", "FX6_A1161", 
"FX6_A1162", "FX6_A1163", "FX6_A1164", "FX6_A1165", "FX6_A1167", 
"FX6_A1168", "FX6_A1170", "FX6_A1171", "FX6_A1173", "FX6_A1175", 
"FX6_A1176", "FX6_A1177", "FX6_A1178", "FX6_A1179", "FX6_A1181",
"FX6_A1182", "FX6_A1183", "FX6_A1184", "FX6_A1185", "FX6_A1186", 
"FX6_A1187", "FX6_A1188", "FX6_A1189", "FX6_A1191", "FX6_A1192", 
"FX6_A1193", "FX6_A1194", "FX6_A1195", "FX6_A1196", "FX6_A1197", 
"FX6_A1198", "FX6_A1199", "FX6_A12", "FX6_A1201", "FX6_A1202", 
"FX6_A1204", "FX6_A1206", "FX6_A1208", "FX6_A1209", "FX6_A1210", 
"FX6_A1211", "FX6_A1212", "FX6_A1213", "FX6_A1214", "FX6_A1215", 
"FX6_A1216", "FX6_A1218", "FX6_A1219", "FX6_A1220", "FX6_A1221", 
"FX6_A1222", "FX6_A1225", "FX6_A1226", "FX6_A1227", "FX6_A1228", 
"FX6_A1229", "FX6_A1231", "FX6_A1233", "FX6_A1234", "FX6_A1235", 
"FX6_A1236", "FX6_A1237", "FX6_A1238", "FX6_A1245", "FX6_A1246", 
"FX6_A1247", "FX6_A1248", "FX6_A1249", "FX6_A1250", "FX6_A1251", 
"FX6_A1252", "FX6_A1254", "FX6_A1255", "FX6_A1256", "FX6_A1257", 
"FX6_A1258", "FX6_A1259", "FX6_A1260", "FX6_A1261", "FX6_A1262", 
"FX6_A1263", "FX6_A1265", "FX6_A1266", "FX6_A1267", "FX6_A1269", 
"FX6_A1270", "FX6_A1271", "FX6_A1272", "FX6_A1273", "FX6_A1274", 
"FX6_A1277", "FX6_A1279", "FX6_A1280", "FX6_A1281", "FX6_A1282", 
"FX6_A1283", "FX6_A1284", "FX6_A1287", "FX6_A1288", "FX6_A1289", 
"FX6_A1292", "FX6_A1293", "FX6_A1294", "FX6_A1295", "FX6_A1298", 
"FX6_A13", "FX6_A1300", "FX6_A1301", "FX6_A1305"), class = "factor"), 
    Gen = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L), .Label = c("5", "6", "7"), class = "factor"), 
    Batch = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L), .Label = c("1", "10", "5"), class = "factor"), 
    Sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L), .Label = c("1", "2"), class = "factor"), 
    RFI = c(3.627217181, -1.154734234, -1.74971223, 0.282745056, 
    6.923385569, 5.424798655, 13.8248368, 0.826463544, 3.334980245, 
    10.5229068, 0.896282303, -1.337765508, 5.263643505, -1.452049182, 
    1.192610925, 0.770879296, 2.151910959, 0.863960067, 2.110246559, 
    0.415996505, -5.685602958, 3.105247295, -6.532683774, 3.823433075, 
    4.712153828, 1.907716294, 2.788733274, 1.722785018, -5.163047648, 
    5.127251835), FCR = c(1.801438849, 1.731690141, 1.750359712, 
    1.736486486, 1.797419355, 1.796428571, 1.893055556, 1.734013605, 
    1.746258503, 1.831125828, 1.790540541, 1.82983871, 1.744736842, 
    1.726666667, 1.789051095, 1.707643312, 1.713548387, 1.700653595, 
    1.709615385, 1.699346405, 1.663265306, 1.737086093, 1.656666667, 
    1.8, 1.748701299, 1.734437086, 1.729411765, 1.736734694, 
    1.674657534, 1.778431373)), row.names = c(NA, 30L), class = "data.frame")

The first 30 rows of ped looks like this:

structure(list(ID = c("FX6_A1", "FX6_A10", "FX6_A11", "FX6_A12", 
"FX6_A13", "FX6_A1154", "FX6_A1155", "FX6_A1156", "FX6_A1157", 
"FX6_A1158", "FX6_A1159", "FX6_A1160", "FX6_A1161", "FX6_A1162", 
"FX6_A1163", "FX6_A1164", "FX6_A1165", "FX6_A1167", "FX6_A1168", 
"FX6_A1170", "FX6_A1171", "FX6_A1173", "FX6_A1175", "FX6_A1176", 
"FX6_A1177", "FX6_A1178", "FX6_A1179", "FX6_A1181", "FX6_A1182", 
"FX6_A1183"), Sire = c("FX5_G98", "FX5_G122", "FX5_G152", "FX5_G103", 
"FX5_G44", "FX5_G113", "FX5_G3", "FX5_G22", "FX5_G177", "FX5_G3", 
"FX5_G13", "FX5_G51", "FX5_G85", "FX5_G185", "FX5_G56", "FX5_G158", 
"FX5_G37", "FX5_G56", "FX5_G120", "FX5_G185", "FX5_G46", "FX5_G145", 
"FX5_G71", "FX5_G16", "FX5_G115", "FX5_G87", "FX5_G185", "FX5_G71", 
"FX5_G131", "FX5_G124"), Dam = c("g6m37144", "g6m53086", "g6m22136", 
"g6m3096", "g6m79031", "g6m28131", "g6m68093", "g6m66007", "g6m37146", 
"g6m11071", "g6m33118", "g6m9137", "g6m36107", "g6m53082", "g6m30128", 
"g6m54089", "g6m22124", "g6m79113", "g6m66126", "g6m33130", "g6m54118", 
"g6m19100", "g6m43118", "g6m66016", "g6m19114", "g6m8103", "g6m33130", 
"g6m43118", "g6m71079", "g6m22118")), row.names = c(NA, 30L), class = "data.frame")

Appreciate any advice, Best Regards, Moore.

GL(M)M's as a future target?

I was curious if generalised linear models and/or other models that use link functions were a potential future feature.

Given the sommer's rather large scope, I was surprised to not see any mention of GLM's.

Error in `contrasts<-`(`tmp`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

how to solve this problem

Suppress "fixed-effect model matrix is rank deficient so dropping 1 columns / coefficients"

Hi!
I would like to suppress the "fixed-effect model matrix is rank deficient so dropping 1 columns / coefficients" note, but neither suppressMessages() nor suppressWarnings() will do the job. Oh yeah and verbose = FALSE is also not affecting this. I guess this line could be made an actual warning or if left a cat(...), it could be wrapped in another if-clause adding another argument to silence it?

mmec using weights (W)

Dear Eduardo,

I have tried implementing the new mmec function. For the most part it seems to work great, but I am having issues when using weights. In my application I am using 1/se^2 from a first stage analysis as weights in the "weights" argument of mmer. Switching to mmec I will have to use the weights matrix (W). Using "weights" or "W" delivers the same results in mmer as you can see in the example I have provided (1). mmer and mmec also produce the same output in my example, when I do not use weights (2). The problem arises when I include weights (W) in mmec, it produces very different results, for the variance estimates and random effects (3). Maybe I have misspecified the W matrix? I appreciate any advice.

Best Gregor

`
data(DT_example)
DT <- DT_example
head(DT)

(1) compare mmer results with different weights arguments

DT$wt <- 1/as.numeric(DT$Year )^2
myW <- diag(DT$wt)

ans1 <- mmer(Yield~Env,
random= ~ Name + Env:Name,
rcov= ~ units,
weights = wt,
verbose = FALSE,
data=DT)

ans2 <- mmer(Yield~Env,
random= ~ Name + Env:Name,
rcov= ~ units,
W = myW,
verbose = FALSE,
data=DT)

unlist(ans1$sigma)
unlist(ans2$sigma)

ans1$Beta
ans2$Beta

ans1$U$Name$Yield[1:5]
ans2$U$Name$Yield[1:5]

(2) compare mmer and mmec results without weights

ans1 <- mmer(Yield~Env,
random= ~ Name + Env:Name,
rcov= ~ units,
verbose = FALSE,
data=DT)

ans2 <- mmec(Yield~Env,
random= ~ Name + Env:Name,
rcov= ~ units,
verbose = FALSE,
data=DT)

unlist(ans1$sigma)
unlist(ans2$sigma)

ans1$Beta
ans2$b

ans1$U$Name$Yield[1:5]
ans2$u[1:5]

(3) compare mmer and mmec results with! weights

DT$wt <- 1/as.numeric(DT$Year )^2
myW <- diag(DT$wt)

ans1 <- mmer(Yield~Env,
random= ~ Name + Env:Name,
rcov= ~ units,
W = myW,
verbose = FALSE,
data=DT)

ans2 <- mmec(Yield~Env,
random= ~ Name + Env:Name,
rcov= ~ units,
W = as(myW, Class="sparseMatrix"),
verbose = FALSE,
data=DT)

unlist(ans1$sigma)
unlist(ans2$sigma)

ans1$Beta
ans2$b

ans1$U$Name$Yield[1:5]
ans2$u[1:5]
`

Lack of output in scores

When I run the model, everything seems to run fine. When I look at the output, the model convergence statement is TRUE. The issue is when I open scores, there are only the SNP names and marker effects. No statistical information is given. A sample of the code and data can be provided directly.

Different varcomp estimates between sommer and ASReml-R when it is bounded

Hi Giovanny,

I found different results between sommer and ASReml-R for a simple alpha-design analysis.
The mod$beta already includes intercept and the predict.mmer function results in the same values as the mod$beta.
Also, the varcomp estimates are different from ASReml-R. Perhaps, this is the reason that the results are different.

I provide the code (and results) below.

Upload the package

library(sommer)
library(data.table)
library(agridat) 
library(asreml) 

options("scipen" = 100,"digits" = 4 ) # set numbering format

The data

df <- buntaran.wheat
Env <- levels(factor(paste(df$zone, df$loc, sep = "_")))
i = Env[1] 

i
[1] "middle_07bm27"

Analysis with sommer

Edat <- droplevels(subset(df, Env == i))

mod.1 <- mmer(fixed   = yield ~ gen-1,
                           random  = ~ rep + rep:alpha,
                           rcov = ~ units,      
                            data    = Edat)

summary(mod.1)$varcomp

                      VarComp VarCompSE  Zratio Constraint
rep.yield-yield          -710      4346 -0.1634   Positive
rep:alpha.yield-yield       0      7804  0.0000   Positive
units.yield-yield       49331     13338  3.6985   Positive

mod.1$Beta

  Trait    Effect Estimate
1  yield genG22455    783.7
2  yield genG23286    839.2
3  yield genG24054   1003.2
4  yield genG24521    801.4
5  yield genG25512    896.3
6  yield genG25965    852.1
7  yield genG26742    897.4
8  yield genG26777    829.7
9  yield genG27125    934.2
10 yield genG27130    903.1
11 yield genG27543    425.4
12 yield genG27546    800.7
13 yield genG27548    798.6
14 yield genG27590    986.2
15 yield genG27592    885.2
16 yield genG27593    872.0
17 yield genG27600    910.9
18 yield genG27605    832.1
19 yield genG27669    888.3
20 yield genG28128    805.4
21 yield genG28209    911.1
22 yield genG28949    970.0
23 yield genG28950    841.8

Predict BLUE - sommer

blue <- predict.mmer(mod.1, classify = "gen") # get the lsmeans and vcov
  
blue.1 <- data.table(blue$pvals)[, c(2:4)] # grab the lsmeans
  
names(blue.1) <- c("gen", "Yield_adj", "se") # rename columns

blue.1[ , ':='(smith.w = diag(solve(mod.1$VarBeta)))] # calculate the Smith's weight

blue.1

       gen Yield_adj    se    smith.w
 1: G22455     783.7 154.5 0.00004260
 2: G23286     839.2 108.9 0.00008622
 3: G24054    1003.2 154.5 0.00004260
 4: G24521     801.4 107.4 0.00008930
 5: G25512     896.3 154.5 0.00004260
 6: G25965     852.1 108.9 0.00008622
 7: G26742     897.4 154.5 0.00004260
 8: G26777     829.7 108.9 0.00008622
 9: G27125     934.2 220.3 0.00002078
10: G27130     903.1 108.9 0.00008622
11: G27543     425.3 220.3 0.00002078
12: G27546     800.8 154.5 0.00004260
13: G27548     798.5 126.6 0.00006338
14: G27590     986.2 220.3 0.00002078
15: G27592     885.2 154.5 0.00004260
16: G27593     872.0 108.9 0.00008622
17: G27600     910.9 220.3 0.00002078
18: G27605     832.1 108.9 0.00008622
19: G27669     888.3 154.5 0.00004260
20: G28128     805.4 109.4 0.00008519
21: G28209     911.1 220.3 0.00002078
22: G28949     970.0 154.5 0.00004260
23: G28950     841.8 108.9 0.00008622
       gen Yield_adj    se    smith.w

Analysis with ASReml-R

mod.1.asr <- asreml(fixed   = yield ~ gen,
                random  = ~ rep + rep:alpha,
                data    = Edat)
  
mod.1.asr <- update.asreml(mod.1.asr)

summary(mod.1.asr)$varcomp

             component std.error z.ratio bound %ch
rep           0.004965        NA      NA     B   0
rep:alpha     0.078505        NA      NA     B   0
units!R   49065.721485     11408   4.301     P   0

sm <- summary(mod.1.asr, coef =T)
sm$coef.fixed

            solution std error  z.ratio
gen_G22455      0.00        NA       NA
gen_G23286     69.61     191.8  0.36288
gen_G24054    219.56     221.5  0.99121
gen_G24521     36.51     191.8  0.19033
gen_G25512    112.62     221.5  0.50843
gen_G25965     82.49     191.8  0.43003
gen_G26742    113.71     221.5  0.51335
gen_G26777     60.13     191.8  0.31345
gen_G27125    150.50     271.3  0.55474
gen_G27130    133.57     191.8  0.69631
gen_G27543   -339.50     271.3 -1.25144
gen_G27546     17.09     221.5  0.07714
gen_G27548     27.42     202.2  0.13560
gen_G27590    202.55     271.3  0.74663
gen_G27592    101.57     221.5  0.45853
gen_G27593    102.40     191.8  0.53381
gen_G27600    127.25     271.3  0.46904
gen_G27605     62.56     191.8  0.32612
gen_G27669    104.61     221.5  0.47228
gen_G28128     31.11     191.8  0.16218
gen_G28209    127.42     271.3  0.46968
gen_G28949    186.36     221.5  0.84133
gen_G28950     72.25     191.8  0.37665
(Intercept)   774.26     156.6  4.94326

Predict BLUE - ASReml-R

blue.asr <- predict(mod.1.asr, classify = "gen", vcov = TRUE) # get the lsmeans and vcov

blue.1.asr <- data.table(blue.asr$pvals)[, c(1:3)] # grab the lsmeans

names(blue.1.asr) <- c("gen", "Yield_adj", "se") # rename columns
  
blue.1.asr[ , ':='(smith.w = diag(solve(blue.asr$vcov)))] # calculate the Smith's weight
  
blue.1.asr
       gen Yield_adj    se    smith.w
 1: G22455     774.3 156.6 0.00004076
 2: G23286     843.9 110.8 0.00008152
 3: G24054     993.8 156.6 0.00004076
 4: G24521     810.8 110.8 0.00008152
 5: G25512     886.9 156.6 0.00004076
 6: G25965     856.8 110.8 0.00008152
 7: G26742     888.0 156.6 0.00004076
 8: G26777     834.4 110.8 0.00008152
 9: G27125     924.8 221.5 0.00002038
10: G27130     907.8 110.8 0.00008152
11: G27543     434.8 221.5 0.00002038
12: G27546     791.3 156.6 0.00004076
13: G27548     801.7 127.9 0.00006114
14: G27590     976.8 221.5 0.00002038
15: G27592     875.8 156.6 0.00004076
16: G27593     876.7 110.8 0.00008152
17: G27600     901.5 221.5 0.00002038
18: G27605     836.8 110.8 0.00008152
19: G27669     878.9 156.6 0.00004076
20: G28128     805.4 110.8 0.00008152
21: G28209     901.7 221.5 0.00002038
22: G28949     960.6 156.6 0.00004076
23: G28950     846.5 110.8 0.00008152
       gen Yield_adj    se    smith.w

Best wishes,
Harimurti

issue with a nested factor

Hello,
I get an error in a model having a nested factor. It is based on the following, common design: the yield y of I=20 genotypes is measured in L=2 different locations for T=2 different years, with K=2 blocks in each environment (location, year). The location and year factors are crossed, and the block factor is nested within the location-year interaction. The location, year and block effects are modeled as fixed (b), and the genotypes are modeled as random variables (u).

Below is a reproducible example of a linear mixed model (LMM) with data simulated according to the well-known formula y = X b + Z u + e:

## choose dimensions
I <- 20   # number of genotypes
T <- 2    # number of years
L <- 2    # number of locations
K <- 2    # number of blocks per environment, thus nested in (location,year)

## set factor levels
set.seed(1234)
genos <- sprintf(fmt=paste0("geno%0", floor(log10(I))+1, "i"), 1:I)
years <- as.character(2015:(2015+T-1))
locations <- sapply(1:L, function(l){paste0(sample(LETTERS, 4), collapse="")})
blocks <- LETTERS[1:K]

## make the data frame (first year, then location, then block)
dat <- data.frame(geno=rep(genos, L*K*T),
                  year=rep(years, each=I*L*K),
                  location=rep(rep(locations, each=I*K), times=T),
                  block=rep(rep(blocks, each=I), times=T*L),
                  yield=NA,
                  stringsAsFactors=TRUE)
str(dat)

## make the design matrix of the effects modeled as fixed
X <- model.matrix(~ 1 + location + year + location:year + location:year / block,
                  data=dat)
dim(X)
colnames(X)

## draw fixed effects
b <- rnorm(n=ncol(X), mean=0, sd=2)

## make the design matrix of the effects modeled as random
Z <- model.matrix(~ geno-1, data=dat)
dim(Z)
colnames(Z)

## draw random effects
u <- rnorm(n=ncol(Z), mean=0, sd=2)

## simulate yield data
e <- rnorm(n=nrow(X), mean=0, sd=1)
y <- X %*% b + Z %*% u + e
dat$yield <- y[,1]
str(dat)

Here is how I would like to fit the model with sommer:

library(sommer)
packageVersion("sommer")
fit2 <- mmer(fixed=yield ~ 1 + location + year + location:year + location:year / block,
             random=~geno,
             data=dat)

However, I get the following error with sommer v4.1.2 and R v3.5.3:

iteration    LogLik     wall    cpu(sec)   restrained
    1      -17.5826   16:31:30      0           0
    2      11.804   16:31:30      0           0
    3      16.0294   16:31:30      0           0
    4      16.2564   16:31:30      0           0
    5      16.2582   16:31:30      0           0
    6      16.2582   16:31:30      0           0
Error in mmer(fixed = yield ~ 1 + location + year + location:year + location:year/block,  : 
  addition: incompatible matrix dimensions: 9x9 and 160x160

Am I making a mistake somewhere, or is there a issue with mmer?

Side question: how can I retrieve the design matrices X and Z used by mmer? Could you add an optional argument to the function allowing to retrieve them?

Best regards, and thanks a lot for your great package,
Tim

Error: Mat::init(): requested size is too large; suggest to enable ARMA_64BIT_WORD

Dear Eduardo,

We are trying to use A.mat with 75000k lines and 10k markers, getting the following error:
Error: Mat::init(): requested size is too large; suggest to enable ARMA_64BIT_WORD
Still seems to work with 50k lines, but 75k doesn't work any more. Must be related to the number of individuals, because I can replicate it with only 5 markers too. rrBLUP::A.mat does not seem to have this problem, but I thought sommer::A.mat might be more efficient?

I am not sure if that is something related to sommer or Rcpp...

Best Gregor

thetaC constraint in isc, dsc, ... functions: lack of documentation

These covariance functions have a parameter
"thetaC: an optional matrix for constraints in the variance components"
But it's not clear what the syntax is and how the matrix constrains the variance components.
Can you elaborate?

Spl2D gives strange number of effects in spatial analysis

In the spatial analysis examples on page 19 and 20 here https://cran.r-project.org/web/packages/sommer/vignettes/sommer.start.pdf why are there 168 rowXcol effects? If the are 13 rows and 36 columns shouldnt there be 468 effects? How can I find the rowXcol effects or is there a different interpretation?
Thank you

Specifying name in random effects term

This is issue (I haven't verified with any other) shown while using certain symbol as first argument of vs().

For example this works well,

bean_compound_symmetry <- mmer(seed_yield_per_ha ~ year, 
                               random = ~ vs(g, Gu = diagonal_gen) + 
                                 vs(interaction(year_bean$year, year_bean$g, sep = ":"), Gu = gy),
                               rcov = ~ units, 
                               data = year_bean, verbose = TRUE)

But throws error when specified as in vignette with following code:

bean_compound_symmetry <- mmer(seed_yield_per_ha ~ year, 
                               random = ~ vs(g, Gu = diagonal_gen) + 
                                 vs(g:year, Gu = year_gen_kronecker), 
                               rcov = ~ units, 
                               data = year_bean, verbose = TRUE)

Traceback shows:

4.
vs(g:year, Gu = year_gen_kronecker) at <text>#1
3.
eval(parse(text = rtermss[u]), data, parent.frame()) 
2.
eval(parse(text = rtermss[u]), data, parent.frame()) 
1.
mmer(seed_yield_per_ha ~ year, random = ~vs(g, Gu = diagonal_gen) + 
    vs(g:year, Gu = year_gen_kronecker), rcov = ~units, data = year_bean, 
    verbose = TRUE)

unsBLUP() function

Hello,

Is the unsBLUP function supposed to be functional? I saw it in the GxE information and it would be very convenient if that function was working.

Thanks!
Neal

Consider adding revision/version tags to repo.

Hi sommer contributors,

I think it would be a good idea to tag releases to CRAN with their corresponding versions. This allows users to checkout, tweak build config flags, and rebuild at a specific revision-level of interest.

Cheers,

-B

how to add permanent environmental effect in random regression model ?

Dear Covarrubias:
I wonder add a fixed term of random regression coefficient for block effect and a random term of random regression coefficient for permanent environmental effect in the model :
ansRR <- mmer(height~1+block, random= ~ vsr(leg(year,2),line), rcov= ~ units, data=df, verbose = FALSE)
but I don't know how to do, can u help me?
In addition, how can I determine the order of the Legendre polynomial fitted.

GWAS entries in CHANGELOG and documentation

Hi @covaruber,

Thank you very much for your work on this amazing package.

The CHANGELOG mentions the following under ## PENDINGS AND PRIORITIES:

enable P3D argument

The GWAS documnentation does already mention the use of P3D=TRUE/FALSE. Does this argument currently have no effect? If so, what is the current behaviour?

Enable GWAS for unimputed markers?

I am not sure how much development effort this would be. mmer can handle missing data, so a current workaround would be to fit individual mmer models with unimputed markers as the explanatory variable. This is clearly impractical for big datasets. Some quick tests suggest there is more to it then simply removing the few lines of code in GWAS that check for missing data.

In addition, the GWAS documentation suggests the marker matrix M must be encoded as -1,0,1. In general, mmer and thus GWAS should be able to cope with different SNP values (e.g. 0,1,2,3,4 for tetraploids or ratio values [0, 1] for alternative alleles over total read depth), at least for additive models? Perhaps this is just an issue for some "convenience" features like MAF checks etc? In this case only small changes might be required (e.g. a warning if marker values are anything else than -1,0,1.

Best wishes,
Konrad

EDIT: Rephrased the last point about marker encoding

the imputed marker matrix is $X, not$imputed

For A.mat() in the help document:
If return.imputed = TRUE, the function returns a list containing
$A
the A matrix

$imputed
the imputed marker matrix

However, the returned matrix is $X, not $imputed.

object 'ref.alt' not found

G1[1:5,1:5]
Marker_1 Marker_2 Marker_3 Marker_4 Marker_5
Blackwell.01 C C A A T
Blackwell.02 C C A A T
Blackwell.03 C C A A T
Blackwell.04 C C A A T
Blackwell.05 C C A A T

G1.num <- atcg1234(G1, ploidy=2, format="ATCG", maf=0.05)
Obtaining reference alleles
|========================================================== | 83%Error in FUN(...) : object 'ref.alt' not found

predict function when spl2D is used

Dear Giovanni and co-authors,

I am recently having issues with the predict function. It seems as if the use of the spl2D() function is causing the issue. Below is an example implemented using one of your example datasets in two versions of sommer. I appreciate any advice.

4.1.2

library(sommer)
data(DT_cpdata)
DT <- DT_cpdata
GT <- GT_cpdata
MP <- MP_cpdata
A <- A.mat(GT)

works

mix <- mmer(Yield~1, random = ~ id , rcov=~units, data=DT, date.warning = FALSE)
iteration LogLik wall cpu(sec) restrained
1 -180.5 14:55:39 0 0
2 -180.5 14:55:39 0 0
3 -180.5 14:55:40 1 0
4 -180.5 14:55:40 1 0
temp <- predict.mmer(object = mix, classify = "id", date.warning = FALSE)
iteration LogLik wall cpu(sec) restrained
1 -180.5 14:55:40 0 0
2 -180.5 14:55:40 0 0
3 -180.5 14:55:40 0 0
4 -180.5 14:55:40 0 0

does not work

mix <- mmer(Yield~1, random = ~ id + spl2D(Row,Col), rcov=~units, data=DT, date.warning = FALSE)#spl2D(Row,Col)
iteration LogLik wall cpu(sec) restrained
1 -174.166 14:55:40 0 0
2 -167.966 14:55:40 0 0
3 -167.006 14:55:40 0 0
4 -166.955 14:55:41 1 0
5 -166.955 14:55:41 1 0
temp <- predict.mmer(object = mix, classify = "id", date.warning = FALSE)
Error in [.data.frame(object$dataOriginal, , x) :
undefined columns selected

4.1.5

works

mix <- mmer(Yield~1, random = ~ id , rcov=~units, data=DT, date.warning = FALSE)
iteration LogLik wall cpu(sec) restrained
1 -180.5 15:0:4 0 0
2 -180.5 15:0:4 0 0
3 -180.5 15:0:4 0 0
4 -180.5 15:0:5 1 0
temp <- predict.mmer(object = mix, classify = "id", date.warning = FALSE)
Version out of date. Please update sommer to the newest version using:
install.packages('sommer') in a new session
Use the 'date.warning' argument to disable the warning message.iteration LogLik wall cpu(sec) restrained
1 -180.5 15:0:5 0 0
2 -180.5 15:0:5 0 0
3 -180.5 15:0:5 0 0
4 -180.5 15:0:5 0 0
Version out of date. Please update sommer to the newest version using:
install.packages('sommer') in a new session
Use the 'date.warning' argument to disable the warning message.

does not work

mix <- mmer(Yield~1, random = ~ id + spl2Da(Row,Col), rcov=~units, data=DT, date.warning = FALSE)#spl2D(Row,Col)
iteration LogLik wall cpu(sec) restrained
1 -174.166 15:0:13 0 0
2 -167.966 15:0:13 0 0
3 -167.006 15:0:13 0 0
4 -166.955 15:0:13 0 0
5 -166.955 15:0:13 0 0
temp <- predict.mmer(object = mix, classify = "id", date.warning = FALSE)
Error in [.data.frame(object$dataOriginal, , x) :
undefined columns selected

does not work

mix <- mmer(Yield~1, random = ~ id + spl2Db(Row,Col), rcov=~units, data=DT, date.warning = FALSE)#spl2D(Row,Col)
iteration LogLik wall cpu(sec) restrained
1 -164.166 15:0:16 1 0
2 -161.888 15:0:16 1 1
3 -160.526 15:0:16 1 2
4 -160.422 15:0:16 1 3
5 -180.5 15:0:16 1 2
temp <- predict.mmer(object = mix, classify = "id", date.warning = FALSE)
Error in [.data.frame(object$dataOriginal, , x) :
undefined columns selected

Can't get mmec to run

I'm trying to see if mmec would be useful for a project I am working on.

If I run this (slightly modified) example:

data(DT_gryphon)
DT <- DT_gryphon
Ag <- A_gryphon

## mmec uses the inverse of the relationship matrix
Agi <- solve(Ag + diag(1e-4,ncol(Ag),ncol(Ag)))
mix1b <- mmec(
  BWT~1,
  random=~vsc(isc(ANIMAL),Gu=Agi),
  rcov=~units,
  data=DT
)

I get the following error:

Error in if (class(Gu) %!in% c("dgCMatrix")) { : 
the condition has length > 1

This may be because of R 4.2.0 any if condition of length > 1 is now an error, and nowadays matrices have both class matrix and array. To avoid this error, check the class of an object using inherits. So in this case I think this would solve the issue:

if (!inherits(Gu, "dgCMatrix")) { ...... }

Looking at the source I can see that I need to pass a sparse matrix, but the above generates its error just before your stop.

covaruber / sommer Goto Github PK

sommer's Introduction

sommer: Solving Mixed Model Equations in R

Installation

Vignettes

Development

sommer's People

Contributors

Stargazers

Watchers

Forkers

sommer's Issues

BUG

How to reproduce

Expected behaviour

Actual behaviour

Notes

The data

Stage I analysis

Collapse the results

Stage 2 in sommer

Predictions in sommer

(1) compare mmer results with different weights arguments

(2) compare mmer and mmec results without weights

(3) compare mmer and mmec results with! weights

Upload the package

The data

Analysis with sommer

Predict BLUE - sommer

Analysis with ASReml-R

Predict BLUE - ASReml-R

works

does not work

works

does not work

does not work

Recommend Projects

Recommend Topics

Recommend Org