stefanedwards / siccuracy Goto Github PK
View Code? Open in Web Editor NEWStefan's imputation accuracy package
Stefan's imputation accuracy package
New name for rowconcatenate
: cbind_SNP
.
Do not use cbind.SNP
as this will cause method dispatcher to call this function for an object of class 'SNP'.
I have noticed the following:
1: Using format statement (Iw)
does not work if the variable is a real.
2: Fortran throws a fit if trying to read a real formatted number into an integer. No help there.
Int to int is easy, real to real even so. But what do we do about cross-overs, i.e. reading integers and output reals (easy), and reading reals and outputting integers?
Using commit 2e6503f6
, the first two tests of test_convert_phase failed.
Intuitively, true
matrix has samples as rows and SNPs as columns. Data is however retrieved as rows. A simple transposing should do the trick, but needs to be checked in calculations.
imputation_accuracy
should also count correct called genotypes (column-wise, row-wise), entries were genotype is missing in true, in imputed, or false.
Allow for a tolerance for comparison (e.g. tol = 0.1
), to compare with gene dosages.
Update return value of imputation_accuracy
to have:
List of 2
$ snps : data.frame(means, sds, cors, correct, true.na, imputed.na, both.na)
$ samples : data.frame(rowID, cors, correct, true.na, imputed.na, both.na)
Use plink -bfile <name stem> --recode A
to recode a plink binary file to a text formatted file, coding genotypes as 0
, 1
, and 2
. Two issues exists:
Options:
Return:
n
as number of rows converted, data.frame with mapping, m
as number of columns.This method will also work for converting files for DMU (although with argument --recode 12
[?]).
The fast
concept is not really fast. But the "fast" method has a really low memory footprint, although not adaptive.
In adaptive routine, providing a row that has no variance, the corresponding element in rowcors
disappears. In fast routine, this is not the case.
Standardization must be FALSE
, else it adds variance by scaling and shifting each element in the row separately.
ts <- Siccuracy:::make.test(15, 21)
true <- ts$true
true[2,] <- 2
write.snps(true, ts$truefn)
# No standardization, as this changes each element of row 2 -- and it gets variance!
imputed <- ts$imputed
mat1 <- cor(as.vector(true), as.vector(imputed), use = 'complete.obs')
suppressWarnings(row1 <- sapply(1:nrow(true), function(i) cor(true[i,], imputed[i,], use='na.or.complete')))
suppressWarnings(col1 <- sapply(1:ncol(true), function(i) cor(true[,i], imputed[,i], use='na.or.complete')))
res <- imputation_accuracy(ts$truefn, ts$imputedfn, standardized = FALSE, adaptive = TRUE)
expect_equal(res$matcor, mat1, tolerance=1e-9)
expect_equal(res$rowcors, row1, tolerance=1e-9)
expect_equal(res$colcors, col1, tolerance=1e-9)
res <- imputation_accuracy(ts$truefn, ts$imputedfn, standardized = FALSE, adaptive = FALSE)
expect_equal(res$matcor, mat1, tolerance=1e-9)
expect_equal(res$rowcors, row1, tolerance=1e-9)
expect_equal(res$colcors, col1, tolerance=1e-9)
Learn to parse the binary format of PLINK without the need to use --recode A (or --recode 12).
ncol
, nSNPs
, nAnimals
, naval
, NAval
, numeric.format
, etc. are just some of the different names for the same concepts. These need to be stream lined.
Task also requires removing nSNPs
and ncol
as setting this to anything other than the actual number of columns will lead to weird results.
Should probably be invisible
, and be documented in help.
This problem is not replicated when compiled as x64.
The test-routine has at least 1 column that has zero variance. Under x64, the correlation of this column is NA
. Only when compiled under i386, and supplying vector of scaling (just one of center
, scaling
, or p
), this constant column gets an odd correlation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.