Giter VIP home page Giter VIP logo

megalmm's Introduction

MegaLMM is a package for fitting multi-trait linear mixed models (MvLMMs) with multiple random effects based on the paper available here:

Please treat this as a Beta version and let me know of issues running the functions.

Installation:

devtools::install_github('deruncie/MegaLMM')

Note: the package requires OpenMP which causes problems on a Mac. I was able to get everything to run by following these instructions: https://mac.r-project.org/openmp/ including the addition to ~/.R/Makevars. Installing on Unix-like machines and maybe Windows should cause fewer problems.

Introduction:

Please see the vignette MultiEnvironmentTrial.Rmd for an introduction to using MegaLMM. This will apply MegaLMM to the analysis of a very incomplete multi-environment trial, and go through many of the key functions in the package

You can simply download the .Rmd file above and run it yourself. If you would like to build the vignette, do:

devtools::install_github('deruncie/MegaLMM', build_opts = c("--no-resave-data", "--no-manual"),force = TRUE,build_vignettes = TRUE)
vignette(topic = 'MultiEnvironmentTrial',package='MegaLMM')

Background

MegaLMM implements multivariate linear mixed models of the form:

Y = XB + Z_1 U_1 + Z_2 U_2 + ... + E

where Y is a n x t matrix of observations for n individuals and t traits, X is a design matrix for b fixed effects (including an intercept), Z_1 through Z_m are design matrices for m random effects, and E is a n x t matrix of residual errors. The random effects are U_1 through U_m are mutually independent of eachother and the residuals, but columns of each U_1 matrix can be correlated, and each column vector marginally follows a multivariate normal distribution with a known covariance matrix K_m.

MvLMMs like this are notoriously difficult to fit. We address this by re-paramterizing the MvLMM as a mixed effect factor model:

Y = F Lambda + X B_r + Z_1 U_r1 + Z_2 U_r2 + ... + E_r
F = X B_f + Z_1 U_f1 + Z_2 U_f2 + ... + E_f

where F is a n x k matrix of latent factor traits and Lambda is a k x t matrix of factor loadings. This is the model actually fit by MegaLMM. For full details, please see the accompanying paper.

Model outline

Figure 1 from Runcie et al 2021.

The unique aspects of MegaLMM relative to other factor models are:

  1. The residuals of Y after accounting for the factors are not assumed to be iid, but are modeled with independent (across traits) LMMs accounting for both fixed and random effects.
  2. The factors themseleves are also not assumed to be iid, but are modeled with the same LMMs. This highlights the parallel belief that these latent factors represent traits that we just didn't measure directly.
  3. Each factor is shared by all modeled sources of variation (fixed effects, random effects and residuals), rather than being unique to a particular source.
  4. The factor loadings are strongly regularized so ensure that estimation is efficient. We accomplish this by ordering the factors from most-to-least important using a prior similar to that proposed by Bhattarchya and Dunson (2011)

megalmm's People

Contributors

deruncie avatar jpiaskowski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

megalmm's Issues

How many latent factors to use?

Also a follow up question. If I understand the methodology correctly, the multivariate LMM in MegaLMM is a reparametrization of the standard multivariate LMM when the number of factors (k) is sufficiently large. However, if the number of factors is not sufficiently large, then I'm guessing that the model fitted in MegaLMM is an approximation of the multivariate LMM model. With this in mind, is there any way of knowing how to set k as being as small as possible before the accuracy of the posterior distributions really start to deteriorate?

Thanks again,
Timothy

setup the model function warning (setup_model_MegaLMM function)

Hi,
I got a warning (please check below) when I run the "setup_model_MegaLMM function".

"Message d'avis :
as(<matrix>, "lgTMatrix") is deprecated since Matrix 1.5-0; do as(as(as(., "lMatrix"), "generalMatrix"), "TsparseMatrix") instead "

Do you think it will make problems for the following work? I met a problem in the "initialize_MegaLMM(MegaLMM_state) " function.

r in (function (cond):
   error evaluating argument 'x' when selecting a method for function 't': Need S4 class dgCMatrix for a mapped sparse matrix

If you know how to solve this problem, please let me know.

Kind regards,
Yansen

More random effects

Hi Debiel E. Runcie,
My model consists of three random effects. Does your algorithm support more randoms effects?
According to my check, only two random effects, genetic and residual, can be included in the model.
I'm not sure my view is correct, maybe I'm missing something.
So please reply me.

Kind Regards,
Yansen

Error for Extending the chain

Hi,
I found my model hasn’t really converged (18.5 K rounds). So I would like to extend the chain to 50K rounds.
And my program was stopped already.
I found some suggestions in the "MultiEnvironmentTrial". I did as following:
First read all the data(phenotype, K...), setup_model_MegaLMM() function, add priors, initialize, etc, and then reload the current_state and resume the chain where I left off by below code:

MegaLMM_state$current_state = readRDS('myrun_ID/current_state.rds')

First I print the MegaLMM_state:

Current iteration: 18500, Posterior_samples: 0
Total time: 1.954428 days

Then I got below error:

[1] "Run 1"

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%Error in record_sample_Posterior_array(current_state[[param]], Posterior[[param]],  : 
  Wrong dimensions of Posterior_array
Calls: sample_MegaLMM ... save_posterior_sample -> record_sample_Posterior_array
Execution halted

Do you have any ideas why I have this problem?

Kind regards,
Yansen

documentation & pkgdown?

Following up to the ASA-CSSA-SSSA symposium discussion, would preparing a PR for building a pkgdown site, be useful? I can give this a shot.

Vignette generates warning

R> MegaLMM_state = setup_model_MegaLMM(
+ Y = Y_train,
+ formula = ~ Population + (1|Line),
+ data = sample_data,
+ relmat = list(Line = K),
+ run_parameters=run_parameters,
+ run_ID = sprintf('MegaLMM_fold_%02d',fold_ID)
+ )
Warning message:
as(<matrix>, "lgTMatrix") is deprecated since Matrix 1.5-0; do as(as(as(., "lMatrix"), "generalMatrix"), "TsparseMatrix") instead 

MCMC convergence for latent factor parameters.

Hi,

I want to say a big thanks for putting in the effort of developing this R package. I quite like the approach you have taken to fit the multivariate LMM.

I have a query regarding convergence of the MCMC chains. I ran the code in the rmd file in the vignette but changed lines 172-187 to

n_iter = 1000;  # how many samples to collect at once?
for(i  in 1:70) {
  print(sprintf('Run %d',i))
  MegaLMM_state = sample_MegaLMM(MegaLMM_state,n_iter)  # run MCMC chain n_samples iterations. grainSize is a paramter for parallelization (smaller = more parallelization)
  
  MegaLMM_state = save_posterior_chunk(MegaLMM_state)  # save any accumulated posterior samples in the database to release memory
  print(MegaLMM_state) # print status of current chain
  plot(MegaLMM_state) # make some diagnostic plots. These are saved in a pdf booklet: diagnostic_plots.pdf
  
  # set of commands to run during burn-in period to help chain converge
  if(MegaLMM_state$current_state$nrun < MegaLMM_state$run_parameters$burn || i < 20) {
    MegaLMM_state = reorder_factors(MegaLMM_state,drop_cor_threshold = 0.6) # Factor order doesn't "mix" well in the MCMC. We can help it by manually re-ordering from biggest to smallest
    MegaLMM_state = clear_Posterior(MegaLMM_state)
    print(MegaLMM_state$run_parameters$burn)
  }
}

The diagnostic plots from the plot function are here. I mainly want to focus on the last page of the pdf. Looking at the code, I'm guessing that the different colored lines correspond to the five traits (out of the 100 traits in this dataset) with the largest mean lambda value. It is a bit disconcerting, however, to see for a simulated dataset that some of the traceplots for some lambda's do not seem to be setting down to a converged distribution but seem to jump around even after a large number of iterations (though factors 1-6 look good). Personally, I would be uncomfortable with these trace plots (to me, this suggests that there might be some multi-modality in the posterior distributions). However, I'm curious to know whether you have ways to improve the convergence of the MCMCs in MegaLMM, or whether such trace plots are not a concern (and if so why).

Many thanks,
Timothy

Not clear for some of the posterior Sample params

Dear,
I found most of the posterior Sample parameters unclear when I used MegaLMM.
All the parameters are:

#>  [1] "Lambda"       "U_F"          "F"            "delta"        "tot_F_prec"  
#>  [6] "F_h2"         "tot_Eta_prec" "resid_h2"     "B1"           "B2_F"        
#> [11] "B2_R"         "U_R"          "cis_effects"  "Lambda_m_eff" "Eta"

For example, I cannot understand what is means of the "tot_F_prec".
Could you add the explanations for all of the parameters? As you know, we need a lot of space if we save all of them. We need to select it in advance. So, we can save a lot of space when we use MegaLMM.

Kind regards,
Yansen

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.