Giter VIP home page Giter VIP logo

imputer's Introduction

imputeR

GBS data normally have high error rate for heterozygote sites. ImputeR is a package to infer the most likely genotypes using raw GBS data.

Install

Install devtools first, and then use devtools to install imputeR from github.

devtools::install_github("hadley/devtools")
library(devtools)
install_github("yangjl/imputeR")
library(imputeR)

Documentation

A vignette can be found here. Documented functions are listed as below. Their usage information can be found by typing ?function_name or help(function_name).

  • impute_parent

Utils

  • getsfs: setup the neutral SFS
  • error_mx and gen_error_mat: get genotype error matrix for mom and kids.

imputer's People

Contributors

yangjl avatar rossibarra avatar

Stargazers

Joanna S. Griffiths avatar  avatar Asif Zubair avatar caumine avatar  avatar peterdfields avatar Zhian N. Kamvar avatar

Watchers

 avatar James Cloos avatar  avatar

Forkers

samhuairen

imputer's Issues

phase_parent issues

There are two other issues with phase_parent() that just came to my attention.

Issue 1: See the following errors:

set.seed(13567)
GBS.array <- sim.array(size.array=50, numloci=100, hom.error=0.02, het.error=0.8, rec=0.25, selfing=0.1, imiss=0.5, misscode=3)
GBS.array <- get_true_GBS(GBS.array)
probs <- error_mx(hom.error=0.02, het.error=0.8, imiss=0.5)
system.time(phase <- phase_parent(GBS.array, win_length=2, join_length=2, verbose=F))
user system elapsed
0.31 0.00 0.31

set.seed(13567)
GBS.array <- sim.array(size.array=50, numloci=500, hom.error=0.02, het.error=0.8, rec=0.25, selfing=0.1, imiss=0.5, misscode=3)
GBS.array <- get_true_GBS(GBS.array)
probs <- error_mx(hom.error=0.02, het.error=0.8, imiss=0.5)
system.time(phase <- phase_parent(GBS.array, win_length=2, join_length=2, verbose=F))

>>> join chunks [ 1 and 2, total:2] ...

Error in log(tem) : non-numeric argument to mathematical function
Timing stopped at: 2.02 0 2.02

The only difference between the two runs is the numloci, and I think that whenever there is one or more heterozygous sites that cannot be phased, phase_parent() automatically results in an error. When we increase numloci, we start to get more simulations that result in heterozygous sites that cannot be phased.

Issue 2: Phasing runtime
I originally thought that phasing is quick, but I forgot that I was dealing with small number of sites. When I extrapolated the number of sites to 500,000, the number of hours seems astronomical. See following:

win_length=2, size_array=50, numloci=100, time=0.98s, extrapolate into real data: 1.5hours
win_length=5, size_array=50, numloci=100, time=6.89s, extrapolate into real data: 11.5hours
win_length=10, size_array=50, numloci=100, time=434s, extrapolate into real data: 30days

The only win_length that seems feasible to me is 2. But I am not sure if win_length=2 is good enough.

Thanks,
CJ

alternative? major?

impute_mom.Rd refers to number of copies of alternative allele. Is alternative (==nonreference) correct?

copy_mom in master

need change to copy_mom in master to prevent odd things when recombination breakpoints are identical (which happens with few loci or high recombination rates)

recp=unique(c(1,sort(round(runif(co, min=2, max=numloci-1))), numloci+1)) #position

error_mx

in branch master, error_mx in utils.R should be:

cbind(mx*matrix(c(1, 0, 0), nrow = 3,byrow=F,ncol=3),1/3)

instead of

cbind(mx*matrix(c(1, 0, 0), nrow = 3,byrow=F,ncol=3),1)

throughout so that sum across kid genotypes = 1 not 3

load_data

When I did the conversion on B73, I got about 1/3 sites as 2, and about 2/3 sites as 0. This probably doesn't affect us, but just wanted to mention it since it contradicts with what you mentioned last time.
-CJ

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.