deruncie / megalmm Goto Github PK

R package for fitting high-dimensional multivariate linear mixed effect models

Home Page: https://deruncie.github.io/MegaLMM/

License: MIT License

R 18.32% C++ 13.01% C 0.41% HTML 68.26%

multi-variate linear-mixed-models factor-models

megalmm's Introduction

MegaLMM is a package for fitting multi-trait linear mixed models (MvLMMs) with multiple random effects based on the paper available here:

Please treat this as a Beta version and let me know of issues running the functions.

Installation:

devtools::install_github('deruncie/MegaLMM')

Note: the package requires OpenMP which causes problems on a Mac. I was able to get everything to run by following these instructions: https://mac.r-project.org/openmp/ including the addition to ~/.R/Makevars. Installing on Unix-like machines and maybe Windows should cause fewer problems.

Introduction:

Please see the vignette MultiEnvironmentTrial.Rmd for an introduction to using MegaLMM. This will apply MegaLMM to the analysis of a very incomplete multi-environment trial, and go through many of the key functions in the package

You can simply download the .Rmd file above and run it yourself. If you would like to build the vignette, do:

devtools::install_github('deruncie/MegaLMM', build_opts = c("--no-resave-data", "--no-manual"),force = TRUE,build_vignettes = TRUE)

vignette(topic = 'MultiEnvironmentTrial',package='MegaLMM')

Background

MegaLMM implements multivariate linear mixed models of the form:

Y = XB + Z_1 U_1 + Z_2 U_2 + ... + E

where Y is a n x t matrix of observations for n individuals and t traits, X is a design matrix for b fixed effects (including an intercept), Z_1 through Z_m are design matrices for m random effects, and E is a n x t matrix of residual errors. The random effects are U_1 through U_m are mutually independent of eachother and the residuals, but columns of each U_1 matrix can be correlated, and each column vector marginally follows a multivariate normal distribution with a known covariance matrix K_m.

MvLMMs like this are notoriously difficult to fit. We address this by re-paramterizing the MvLMM as a mixed effect factor model:

Y = F Lambda + X B_r + Z_1 U_r1 + Z_2 U_r2 + ... + E_r
F = X B_f + Z_1 U_f1 + Z_2 U_f2 + ... + E_f

where F is a n x k matrix of latent factor traits and Lambda is a k x t matrix of factor loadings. This is the model actually fit by MegaLMM. For full details, please see the accompanying paper.

Figure 1 from Runcie et al 2021.

The unique aspects of MegaLMM relative to other factor models are:

The residuals of Y after accounting for the factors are not assumed to be iid, but are modeled with independent (across traits) LMMs accounting for both fixed and random effects.
The factors themseleves are also not assumed to be iid, but are modeled with the same LMMs. This highlights the parallel belief that these latent factors represent traits that we just didn't measure directly.
Each factor is shared by all modeled sources of variation (fixed effects, random effects and residuals), rather than being unique to a particular source.
The factor loadings are strongly regularized so ensure that estimation is efficient. We accomplish this by ordering the factors from most-to-least important using a prior similar to that proposed by Bhattarchya and Dunson (2011)

megalmm's People

Contributors

Stargazers

Watchers

Forkers

rittersporn reworkhow dpcscience liwenbo520 ccclab-bigdata shanwai1234 elesandrobornhofen barnett874 kuaileyuandi yfqian95 digitalagsl jpiaskowski xiahui625649 ningshuang-yao

megalmm's Issues

How many latent factors to use?

Also a follow up question. If I understand the methodology correctly, the multivariate LMM in MegaLMM is a reparametrization of the standard multivariate LMM when the number of factors (k) is sufficiently large. However, if the number of factors is not sufficiently large, then I'm guessing that the model fitted in MegaLMM is an approximation of the multivariate LMM model. With this in mind, is there any way of knowing how to set k as being as small as possible before the accuracy of the posterior distributions really start to deteriorate?

Thanks again,
Timothy

setup the model function warning (setup_model_MegaLMM function)

Hi,
I got a warning (please check below) when I run the "setup_model_MegaLMM function".

"Message d'avis :
as(<matrix>, "lgTMatrix") is deprecated since Matrix 1.5-0; do as(as(as(., "lMatrix"), "generalMatrix"), "TsparseMatrix") instead "

Do you think it will make problems for the following work? I met a problem in the "initialize_MegaLMM(MegaLMM_state) " function.

r in (function (cond):
   error evaluating argument 'x' when selecting a method for function 't': Need S4 class dgCMatrix for a mapped sparse matrix

If you know how to solve this problem, please let me know.

Kind regards,
Yansen

More random effects

Hi Debiel E. Runcie,
My model consists of three random effects. Does your algorithm support more randoms effects?
According to my check, only two random effects, genetic and residual, can be included in the model.
I'm not sure my view is correct, maybe I'm missing something.
So please reply me.

Kind Regards，
Yansen

Error for Extending the chain

Hi,
I found my model hasn’t really converged (18.5 K rounds). So I would like to extend the chain to 50K rounds.
And my program was stopped already.
I found some suggestions in the "MultiEnvironmentTrial". I did as following:
First read all the data(phenotype, K...), setup_model_MegaLMM() function, add priors, initialize, etc, and then reload the current_state and resume the chain where I left off by below code:

MegaLMM_state$current_state = readRDS('myrun_ID/current_state.rds')

First I print the MegaLMM_state:

Current iteration: 18500, Posterior_samples: 0
Total time: 1.954428 days

Then I got below error:

[1] "Run 1"

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%Error in record_sample_Posterior_array(current_state[[param]], Posterior[[param]],  : 
  Wrong dimensions of Posterior_array
Calls: sample_MegaLMM ... save_posterior_sample -> record_sample_Posterior_array
Execution halted

Do you have any ideas why I have this problem?

Kind regards,
Yansen

documentation & pkgdown?

Following up to the ASA-CSSA-SSSA symposium discussion, would preparing a PR for building a pkgdown site, be useful? I can give this a shot.

Vignette generates warning

R> MegaLMM_state = setup_model_MegaLMM(
+ Y = Y_train,
+ formula = ~ Population + (1|Line),
+ data = sample_data,
+ relmat = list(Line = K),
+ run_parameters=run_parameters,
+ run_ID = sprintf('MegaLMM_fold_%02d',fold_ID)
+ )
Warning message:
as(<matrix>, "lgTMatrix") is deprecated since Matrix 1.5-0; do as(as(as(., "lMatrix"), "generalMatrix"), "TsparseMatrix") instead

MCMC convergence for latent factor parameters.

Hi,

I want to say a big thanks for putting in the effort of developing this R package. I quite like the approach you have taken to fit the multivariate LMM.

I have a query regarding convergence of the MCMC chains. I ran the code in the rmd file in the vignette but changed lines 172-187 to

n_iter = 1000;  # how many samples to collect at once?
for(i  in 1:70) {
  print(sprintf('Run %d',i))
  MegaLMM_state = sample_MegaLMM(MegaLMM_state,n_iter)  # run MCMC chain n_samples iterations. grainSize is a paramter for parallelization (smaller = more parallelization)
  
  MegaLMM_state = save_posterior_chunk(MegaLMM_state)  # save any accumulated posterior samples in the database to release memory
  print(MegaLMM_state) # print status of current chain
  plot(MegaLMM_state) # make some diagnostic plots. These are saved in a pdf booklet: diagnostic_plots.pdf
  
  # set of commands to run during burn-in period to help chain converge
  if(MegaLMM_state$current_state$nrun < MegaLMM_state$run_parameters$burn || i < 20) {
    MegaLMM_state = reorder_factors(MegaLMM_state,drop_cor_threshold = 0.6) # Factor order doesn't "mix" well in the MCMC. We can help it by manually re-ordering from biggest to smallest
    MegaLMM_state = clear_Posterior(MegaLMM_state)
    print(MegaLMM_state$run_parameters$burn)
  }
}

The diagnostic plots from the plot function are here. I mainly want to focus on the last page of the pdf. Looking at the code, I'm guessing that the different colored lines correspond to the five traits (out of the 100 traits in this dataset) with the largest mean lambda value. It is a bit disconcerting, however, to see for a simulated dataset that some of the traceplots for some lambda's do not seem to be setting down to a converged distribution but seem to jump around even after a large number of iterations (though factors 1-6 look good). Personally, I would be uncomfortable with these trace plots (to me, this suggests that there might be some multi-modality in the posterior distributions). However, I'm curious to know whether you have ways to improve the convergence of the MCMCs in MegaLMM, or whether such trace plots are not a concern (and if so why).

Many thanks,
Timothy

Not clear for some of the posterior Sample params

Dear,
I found most of the posterior Sample parameters unclear when I used MegaLMM.
All the parameters are:

#>  [1] "Lambda"       "U_F"          "F"            "delta"        "tot_F_prec"  
#>  [6] "F_h2"         "tot_Eta_prec" "resid_h2"     "B1"           "B2_F"        
#> [11] "B2_R"         "U_R"          "cis_effects"  "Lambda_m_eff" "Eta"

For example, I cannot understand what is means of the "tot_F_prec".
Could you add the explanations for all of the parameters? As you know, we need a lot of space if we save all of them. We need to select it in advance. So, we can save a lot of space when we use MegaLMM.

Kind regards,
Yansen

deruncie / megalmm Goto Github PK

megalmm's Introduction

Installation:

Introduction:

Background

megalmm's People

Contributors

Stargazers

Watchers

Forkers

megalmm's Issues

Recommend Projects

Recommend Topics

Recommend Org