Light

thoree / familias Goto Github PK

Probabilities for Pedigrees Given DNA Data

R 14.37% C++ 85.52% C 0.11%

familias's Introduction

Familias 2.6.1

Familias has existed since 2000 as a Windows program and can be downloaded from https://familias.no. The program calculates likelihoods and probabilities used to infer family relationships based on autosomal marker data. The core code of the Windows program programmed in C++ has been available since 2012 in R. The mentioned website also contains references describing the implementation and validation.

No further development of this package is planned. For forensic pedigree analyses and visualisations we rather recommend the pedsuite packages, see https://magnusdv.github.io/pedsuite/. In particular, the package pedFamilias facilitates conversion of .fam files into the pedsuite format.

Installation

Install from GitHub by as follows:

 # First install devtools if needed
if(!require(devtools)) install.packages("devtools")

# Install Familias from GitHub
devtools::install_github("thoree/Familias")

familias's People

Contributors

Stargazers

Watchers

Forkers

mkruijver marsicofl magnusdv

familias's Issues

Frequencies in separate data package

It would be nice to have the NorwegianFrequencies database in a separate package, perhaps along with other frequency databases as well.

This would reduce the dependency load and make it more accessible for use in other packages.

Blind search

Hi,

I would like to use Familias R package. It is not clear to me if the blind search present in the Windows interface version has also been implemented in the command line version.
Thanks.

Best wishes,
Maria Angela

Untyped persons are ignored if not present in datamatrix

In the current CRAN version of Familias (version 2.4), any pedigree members that are not present in the datamatrix are ignored in the calculations even though they may affect the likelihood. More precisely, these persons are considered extra persons that are disconnected from the pedigree. Presumably, any untyped extra persons are pruned when the likelihood is evaluated. The following code labels the untyped pedigree members as extra persons:

Familias/R/FamiliasPosterior.R

Lines 52 to 68 in c3d1617

 for (i in pedigrees) { 

 nPersons <- length(i$sex) 

 neworder <- rep(0, nPersons) 

 nExMales <- nExFemales <- 0 

 for (j in 1:nPersons) { 

 mm <- match(i$id[j], persons, nomatch = 0) 

 if (mm > 0) 

 neworder[j] <- mm 

 else if (i$sex[j] == "female") { 

 nExFemales <- nExFemales + 1 

 neworder[j] <- nExFemales 

 } 

 else { 

 nExMales <- nExMales + 1 

 neworder[j] <- nExMales 

 } 

 }

When Fst>0 and mutations are possible, the likelihood is affected by adding extra untyped ancestors to the pedigree. Hence, to correctly calculate such a likelihood, it is currently necessary to explicitly add these untyped ancestors to the datamatrix with NA observations for their alleles to prevent these persons from being pruned. The example below demonstrates this:

f <- setNames(c(0.3,0.7), c("A","B"))

Fst <- 0.03
locus <- Familias::FamiliasLocus(frequencies = f,
                                 MutationModel = "Proportional",
                                 MutationRate = 0.01)

to_familias_ped <- function(x){
  Familias::FamiliasPedigree(id = x$ID,
                             dadid = x$ID[match(x$FIDX, x$ID)],
                             momid = x$ID[match(x$MIDX, x$ID)],
                             sex = ifelse(x$SEX==1,"male","female"))
}

pr <- pr_explicit <- numeric(2)

number_of_generations = 2
for(number_of_generations in 1:2){
  ancestral_ped <- pedtools::ancestralPed(number_of_generations)
  ancestral_ped_familias <- to_familias_ped(ancestral_ped)
  
  # calculation without explicitly adding untyped persons to datamatrix
  data_matrix <- matrix(c("A","A"), ncol = 2, dimnames = list(tail(ancestral_ped$ID, 1)))
  pr[number_of_generations] <- Familias::FamiliasPosterior(pedigrees = ancestral_ped_familias, loci = locus,
                                                           datamatrix = data_matrix, kinship = Fst)$likelihoods

  
  # calculation with explicitly adding NAs to datamatrix
  data_matrix_explicit <- matrix(rep(c(NA,NA), length(ancestral_ped$ID)), 
                                      ncol = 2, dimnames = list(ancestral_ped$ID))
  data_matrix_explicit[tail(ancestral_ped$ID, 1),]  <- c("A","A")
  
  pr_explicit[number_of_generations] <- Familias::FamiliasPosterior(pedigrees = ancestral_ped_familias, loci = locus,
                                                           datamatrix = data_matrix_explicit, kinship = Fst)$likelihoods
}

# these are not the same
pr # [1] 0.0963 0.0963
pr_explicit # [1] 0.09600357 0.09572109

To work around this issue, the FamiliasPosterior function could add the untyped persons to the datamatrix when the kinship parameter is positive. This requires some further changes because the code relies on every person in the datamatrix to be present in every pedigree. I believe it is possible to workaround this assumption with minor changes to the code.

Loss of precision in posterior calculations

Big LR values give 1 in posterior result cause of precision loss by division here
posterior <- posterior/sum(posterior)

like this

$posterior
   notFather     isFather 
3.684015e-11 1.000000e+00 

$prior
notFather  isFather 
      0.5       0.5 

$LR
  notFather    isFather 
          1 27144299178

can be fixed by using
Rmpfr::mpfr(posterior / sum(posterior), 64)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	for (i in pedigrees) {
	nPersons <- length(i$sex)
	neworder <- rep(0, nPersons)
	nExMales <- nExFemales <- 0
	for (j in 1:nPersons) {
	mm <- match(i$id[j], persons, nomatch = 0)
	if (mm > 0)
	neworder[j] <- mm
	else if (i$sex[j] == "female") {
	nExFemales <- nExFemales + 1
	neworder[j] <- nExFemales
	}
	else {
	nExMales <- nExMales + 1
	neworder[j] <- nExMales
	}
	}