agriutilities

agriutilities is an R package designed to make the analysis of field trials easier and more accessible for everyone working in plant breeding. It provides a simple and intuitive interface for conducting single and multi-environmental trial analysis, with minimal coding required. Whether you’re a beginner or an experienced user, agriutilities will help you quickly and easily carry out complex analyses with confidence. With built-in functions for fitting Linear Mixed Models (LMM), agriutilities is the ideal choice for anyone who wants to save time and focus on interpreting their results.

Installation

From CRAN

install.packages("agriutilities")

From GitHub

You can install the development version of agriutilities from GitHub with:

remotes::install_github("AparicioJohan/agriutilities")

Automatic Data Analysis Pipeline

This is a basic example which shows you how to use some of the functions of the package.

Identify the Experimental Design

The function check_design_met helps us to check the quality of the data and also to identify the experimental design of the trials. This works as a quality check or quality control before we fit any model.

library(agriutilities)
library(agridat)
data(besag.met)
dat <- besag.met
results <- check_design_met(
  data = dat,
  genotype = "gen",
  trial = "county",
  traits = "yield",
  rep = "rep",
  block = "block",
  col = "col",
  row = "row"
)

plot(results, type = "connectivity")

plot(results, type = "missing")

Inspecting the output.

print(results)
---------------------------------------------------------------------
Summary Traits by Trial:
---------------------------------------------------------------------
# A tibble: 6 × 11
  county traits   Min  Mean Median   Max    SD    CV     n n_miss miss_perc
  <fct>  <chr>  <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <int>  <int>     <dbl>
1 C1     yield   87.9 149.   151.   200.  17.7 0.119   198      6    0.0303
2 C2     yield   24.4  56.1   52.1  125.  18.4 0.328   198      6    0.0303
3 C3     yield   28.2  87.9   89.2  137.  19.7 0.225   198      6    0.0303
4 C4     yield  103.  145.   143.   190.  17.1 0.118   198      6    0.0303
5 C5     yield   66.9 115.   116.   152.  16.4 0.142   198      6    0.0303
6 C6     yield   29.2  87.6   87.8  148.  26.6 0.304   198      6    0.0303

---------------------------------------------------------------------
Experimental Design Detected:
---------------------------------------------------------------------
  county exp_design
1     C1    row_col
2     C2    row_col
3     C3    row_col
4     C4    row_col
5     C5    row_col
6     C6    row_col

---------------------------------------------------------------------
Summary Experimental Design:
---------------------------------------------------------------------
# A tibble: 6 × 9
  county     n n_gen n_rep n_block n_col n_row num_of_reps num_of_gen
  <fct>  <int> <int> <int>   <int> <int> <int> <fct>       <fct>     
1 C1       198    64     3       8    11    18 3_9         63_1      
2 C2       198    64     3       8    11    18 3_9         63_1      
3 C3       198    64     3       8    11    18 3_9         63_1      
4 C4       198    64     3       8    11    18 3_9         63_1      
5 C5       198    64     3       8    11    18 3_9         63_1      
6 C6       198    64     3       8    11    18 3_9         63_1      

---------------------------------------------------------------------
Connectivity Matrix:
---------------------------------------------------------------------
   C1 C2 C3 C4 C5 C6
C1 64 64 64 64 64 64
C2 64 64 64 64 64 64
C3 64 64 64 64 64 64
C4 64 64 64 64 64 64
C5 64 64 64 64 64 64
C6 64 64 64 64 64 64

---------------------------------------------------------------------
Filters Applied:
---------------------------------------------------------------------
List of 1
 $ yield:List of 4
  ..$ missing_50%     : chr(0) 
  ..$ no_variation    : chr(0) 
  ..$ row_col_dup     : chr(0) 
  ..$ trials_to_remove: chr(0)

Single Trial Analysis (STA)

The results of the previous function are used in single_trial_analysis() to fit single trial models. This function can fit, Completely Randomized Designs (CRD), Randomized Complete Block Designs (RCBD), Resolvable Incomplete Block Designs (res-IBD), Non-Resolvable Row-Column Designs (Row-Col) and Resolvable Row-Column Designs (res-Row-Col).

NOTE: It fits models based on the randomization detected.

obj <- single_trial_analysis(results, progress = FALSE)

Inspecting the output.

print(obj)
---------------------------------------------------------------------
Summary Fitted Models:
---------------------------------------------------------------------
    trait  trial heritability        CV    VarGen    VarErr  design
   <char> <char>        <num>     <num>     <num>     <num>  <char>
1:  yield     C1         0.73  6.022489  87.39848  82.86095 row_col
2:  yield     C2         0.37 17.104998  25.80684 108.68546 row_col
3:  yield     C3         0.64 12.357202  83.57907 118.55567 row_col
4:  yield     C4         0.41  8.179408  35.75568 136.21218 row_col
5:  yield     C5         0.80  7.037586 103.79822  66.97523 row_col
6:  yield     C6         0.49 16.632367  71.92232 207.53073 row_col

---------------------------------------------------------------------
Outliers Removed:
---------------------------------------------------------------------
    trait  trial genotype    id outlier
   <char> <fctr>   <fctr> <int>  <lgcl>
1:  yield     C1      G60    50    TRUE

---------------------------------------------------------------------
First Predicted Values and Standard Errors (BLUEs/BLUPs):
---------------------------------------------------------------------
    trait genotype  trial    BLUEs  seBLUEs    BLUPs  seBLUPs         wt
   <char>   <fctr> <fctr>    <num>    <num>    <num>    <num>      <num>
1:  yield      G01     C1 141.4161 6.078858 143.5308 5.249771 0.02706176
2:  yield      G02     C1 157.8110 5.979708 155.8037 5.194547 0.02796663
3:  yield      G03     C1 127.3836 6.091534 133.0256 5.269999 0.02694925
4:  yield      G04     C1 154.8445 6.093866 153.8364 5.270427 0.02692863
5:  yield      G05     C1 163.8950 6.132141 161.1831 5.271809 0.02659352
6:  yield      G06     C1 128.5168 6.087902 133.6857 5.247130 0.02698141

plot(obj, horizontal = TRUE, nudge_y_h2 = 0.12)

plot(obj, type = "correlation")

The returning object is a set of lists with trial summary, BLUEs, BLUPs, heritability, variance components, potential extreme observations, residuals, the models fitted and the data used.

Two-Stage Analysis (MET)

The results of the previous function are used in met_analysis() to fit multi-environmental trial models.

met_results <- met_analysis(obj, vcov = "fa2", progress = FALSE)
Online License checked out Mon Jun 17 09:16:16 2024

Inspecting the output.

print(met_results)
---------------------------------------------------------------------
Trial Effects (BLUEs):
---------------------------------------------------------------------
  trait trial predicted.value std.error    status
1 yield    C1       149.75463  1.357203 Estimable
2 yield    C2        65.99443  1.140633 Estimable
3 yield    C3        90.60647  1.445419 Estimable
4 yield    C4       148.12551  1.172065 Estimable
5 yield    C5       121.77655  1.434386 Estimable
6 yield    C6        88.31651  1.532150 Estimable

---------------------------------------------------------------------
Heritability:
---------------------------------------------------------------------
  trait        h2
1 yield 0.8253127

---------------------------------------------------------------------
First Overall Predicted Values and Standard Errors (BLUPs):
---------------------------------------------------------------------
  trait genotype predicted.value std.error    status
1 yield      G01        110.4107  2.524415 Estimable
2 yield      G02        111.0488  2.534145 Estimable
3 yield      G03        102.6031  2.538533 Estimable
4 yield      G04        115.3953  2.530336 Estimable
5 yield      G05        120.6700  2.545688 Estimable
6 yield      G06        108.6678  2.552319 Estimable

---------------------------------------------------------------------
Variance-Covariance Matrix:
---------------------------------------------------------------------

Correlation Matrix ('fa2'): yield
     C1   C2   C3   C4   C5   C6
C1 1.00 0.54 0.67 0.81 0.93 0.35
C2 0.54 1.00 0.60 0.90 0.52 0.68
C3 0.67 0.60 1.00 0.73 0.70 0.40
C4 0.81 0.90 0.73 1.00 0.83 0.60
C5 0.93 0.52 0.70 0.83 1.00 0.34
C6 0.35 0.68 0.40 0.60 0.34 1.00

Covariance Matrix ('fa2'): yield
      C1    C2    C3    C4     C5    C6
C1 80.39 25.73 53.51 35.80  84.13 25.75
C2 25.73 28.74 28.50 23.86  28.05 29.67
C3 53.51 28.50 78.96 32.26  62.98 28.93
C4 35.80 23.86 32.26 24.54  41.43 24.38
C5 84.13 28.05 62.98 41.43 101.53 27.94
C6 25.75 29.67 28.93 24.38  27.94 66.57

---------------------------------------------------------------------
First Stability Coefficients:
---------------------------------------------------------------------
  trait genotype superiority   static    wricke predicted.value
1 yield      G57    22.56100 33.35260 15.210394        92.44993
2 yield      G29    16.98170 34.39632  4.243540        99.41798
3 yield      G34    16.98545 33.64634  8.152888        99.81112
4 yield      G59    16.68886 34.56884  5.361337        99.91280
5 yield      G31    15.97985 31.29838 12.350068       101.69455
6 yield      G10    15.61917 32.24828 11.747802       102.32827

Exploring Factor Analytic in MET analysis.

pvals <- met_results$trial_effects
model <- met_results$met_models$yield
fa_objt <- fa_summary(
  model = model,
  trial = "trial",
  genotype = "genotype",
  BLUEs_trial = pvals,
  k_biplot = 8,
  size_label_var = 4,
  filter_score = 1
)

fa_objt$plots$loadings_c

fa_objt$plots$biplot

For more information and to learn more about what is described here you may find useful the following sources: Isik, Holland, and Maltecca (2017); Rodriguez-Alvarez et al. (2018).

Code of Conduct

Please note that the agriutilities project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

References

Isik, Fikret, James Holland, and Christian Maltecca. 2017. Genetic Data Analysis for Plant and Animal Breeding. Vol. 400. Springer.

Rodriguez-Alvarez, Maria Xose, Martin P Boer, Fred A van Eeuwijk, and Paul HC Eilers. 2018. “Correcting for Spatial Heterogeneity in Plant Breeding Experiments with p-Splines.” Spatial Statistics 23: 52–71.

alternative H2 estimation based on lme4

Hi there,
just saw your work on this repository and I already like it a lot. I also saw that your work on H2 Cullis is in progress and maybe you find this useful:

https://github.com/PaulSchmidtGit/Heritability/tree/master/Alternative%20Heritability%20Measures

You can navigate the folders to the H2 Cullis in lme4 there and will find the code below. I know that this example is probably not generally applicable but it might help anyway:

library(agridat)
library(lme4) 
library(psych)
library(tidyverse)


# get example data --------------------------------------------------------
dat <- agridat::john.alpha


# fit model ---------------------------------------------------------------
# random genotype effect
g.ran <- lme4::lmer(data    = dat,
                    formula = yield ~ rep + (1|gen) + (1|rep:block))


# handle model estimates --------------------------------------------------
# to my knowledge, lme4 does not offer a function to
# extract variance-covariance-matrices for BLUPs (a.k.a. prediction error variance [PEV] matrix).
# therefore, I here manually reconstruct mixed model equation for this specific example.
# notice that this solution therefore only works for this specific model!

vc <- g.ran %>% VarCorr %>% as_tibble # extract estimated variance components (vc)

# R = varcov-matrix for error term
n    <- g.ran %>% summary %>% pluck(residuals) %>% length # numer of observations
vc_e <- vc %>% filter(grp=="Residual") %>% pull(vcov)     # error vc
R    <- diag(n)*vc_e                                      # R matrix = I_n * vc_e

# G = varcov-matrx for all random effects
# subset of G regarding genotypic effects
n_g  <- g.ran %>% summary %>% pluck("ngrps") %>% pluck("gen") # number of genotypes
vc_g <- vc %>% filter(grp=="gen") %>% pull(vcov)              # genotypic vc
G_g  <- diag(n_g)*vc_g                                        # gen part of G matrix = I * vc.g

# subset of G regarding incomplete block effects
n_b  <- g.ran %>% summary %>% pluck("ngrps") %>% pluck("rep:block") # number of incomplete blocks
vc_b <- vc %>% filter(grp=="rep:block") %>% pull(vcov)              # incomplete block vc
G_b  <- diag(n_b)*vc_b                                              # incomplete block part of G matrix = I * vc.b

G <- bdiag(G_g, G_b) # G is blockdiagonal with G.g and G.b in this example

# Design Matrices
X <- g.ran %>% getME("X") %>% as.matrix # Design matrix fixed effects
Z <- g.ran %>% getME("Z") %>% as.matrix # Design matrix random effects

# Mixed Model Equation (HENDERSON 1986; SEARLE et al. 2006)
C11 <- t(X) %*% solve(R) %*% X
C12 <- t(X) %*% solve(R) %*% Z
C21 <- t(Z) %*% solve(R) %*% X
C22 <- t(Z) %*% solve(R) %*% Z + solve(G) 

C <- rbind(cbind(C11, C12),  
           cbind(C21, C22)) %>% as.matrix # Combine components into one matrix C

# Mixed Model Equation Solutions 
C_inv <- C %>% solve                             # Inverse of C
C22_g <- C_inv[levels(dat$gen), levels(dat$gen)] # subset of C.inv that refers to genotypic BLUPs

# Mean variance of BLUP-difference from C22 matrix of genotypic BLUPs
one        <- matrix(1, nrow=n_g, ncol=1)      # vector of 1s
P_mu       <- diag(n_g, n_g) - one %*% t(one)  # P_mu = matrix that centers for overall-mean
vdBLUP_sum <- psych::tr(P_mu %*% C22_g)        # sum of all variance of differences = trace of P_mu*C22_g
vdBLUP_avg <- vdBLUP_sum * (2/(n_g*(n_g-1)))   # mean variance of BLUP-difference = divide sum by number of genotype pairs


# H2 Cullis ---------------------------------------------------------------
H2Cullis <- 1 - (vdBLUP_avg / 2 / vc_g)
H2Cullis #0.8091336

apariciojohan / agriutilities Goto Github PK

agriutilities's Introduction

agriutilities

Installation

From CRAN

From GitHub

Automatic Data Analysis Pipeline

Identify the Experimental Design

Single Trial Analysis (STA)

Two-Stage Analysis (MET)

Exploring Factor Analytic in MET analysis.

Code of Conduct

References

agriutilities's People

Contributors

Stargazers

Watchers

Forkers

agriutilities's Issues

Recommend Projects

Recommend Topics

Recommend Org