In the current CRAN version of Familias (version 2.4), any pedigree members that are not present in the datamatrix
are ignored in the calculations even though they may affect the likelihood. More precisely, these persons are considered extra persons that are disconnected from the pedigree. Presumably, any untyped extra persons are pruned when the likelihood is evaluated. The following code labels the untyped pedigree members as extra persons:
|
for (i in pedigrees) { |
|
nPersons <- length(i$sex) |
|
neworder <- rep(0, nPersons) |
|
nExMales <- nExFemales <- 0 |
|
for (j in 1:nPersons) { |
|
mm <- match(i$id[j], persons, nomatch = 0) |
|
if (mm > 0) |
|
neworder[j] <- mm |
|
else if (i$sex[j] == "female") { |
|
nExFemales <- nExFemales + 1 |
|
neworder[j] <- nExFemales |
|
} |
|
else { |
|
nExMales <- nExMales + 1 |
|
neworder[j] <- nExMales |
|
} |
|
} |
When Fst>0 and mutations are possible, the likelihood is affected by adding extra untyped ancestors to the pedigree. Hence, to correctly calculate such a likelihood, it is currently necessary to explicitly add these untyped ancestors to the datamatrix
with NA
observations for their alleles to prevent these persons from being pruned. The example below demonstrates this:
f <- setNames(c(0.3,0.7), c("A","B"))
Fst <- 0.03
locus <- Familias::FamiliasLocus(frequencies = f,
MutationModel = "Proportional",
MutationRate = 0.01)
to_familias_ped <- function(x){
Familias::FamiliasPedigree(id = x$ID,
dadid = x$ID[match(x$FIDX, x$ID)],
momid = x$ID[match(x$MIDX, x$ID)],
sex = ifelse(x$SEX==1,"male","female"))
}
pr <- pr_explicit <- numeric(2)
number_of_generations = 2
for(number_of_generations in 1:2){
ancestral_ped <- pedtools::ancestralPed(number_of_generations)
ancestral_ped_familias <- to_familias_ped(ancestral_ped)
# calculation without explicitly adding untyped persons to datamatrix
data_matrix <- matrix(c("A","A"), ncol = 2, dimnames = list(tail(ancestral_ped$ID, 1)))
pr[number_of_generations] <- Familias::FamiliasPosterior(pedigrees = ancestral_ped_familias, loci = locus,
datamatrix = data_matrix, kinship = Fst)$likelihoods
# calculation with explicitly adding NAs to datamatrix
data_matrix_explicit <- matrix(rep(c(NA,NA), length(ancestral_ped$ID)),
ncol = 2, dimnames = list(ancestral_ped$ID))
data_matrix_explicit[tail(ancestral_ped$ID, 1),] <- c("A","A")
pr_explicit[number_of_generations] <- Familias::FamiliasPosterior(pedigrees = ancestral_ped_familias, loci = locus,
datamatrix = data_matrix_explicit, kinship = Fst)$likelihoods
}
# these are not the same
pr # [1] 0.0963 0.0963
pr_explicit # [1] 0.09600357 0.09572109
To work around this issue, the FamiliasPosterior
function could add the untyped persons to the datamatrix
when the kinship
parameter is positive. This requires some further changes because the code relies on every person in the datamatrix
to be present in every pedigree. I believe it is possible to workaround this assumption with minor changes to the code.