Giter VIP home page Giter VIP logo

rmvp's Introduction

rMVP CRAN Version

An r package for Memory-efficient, Visualization-enhanced, and Parallel-accelerated Genome-Wide Association Study

#----------------------rMVP v1.1.0 is coming, and stronger again!------------------------#

Repos:

Github: https://github.com/xiaolei-lab/rMVP
Gitee(quick access in China): https://gitee.com/xiaolei-lab/rMVP

Authors:

Design and Maintenance: Lilin Yin#, Haohao Zhang#, Shuhong Zhao, Xinyun Li, and Xiaolei Liu.
Contributors: Zhenshuang Tang, Jingya Xu, Dong Yin, Zhiwu Zhang, Xiaohui Yuan, Mengjin Zhu
Citation: Yin L, Zhang H, Tang Z, Xu J, Yin D, Zhang Z, Yuan X, Zhu M, Zhao S, Li X. rMVP: A Memory-efficient, Visualization-enhanced, and Parallel-accelerated tool for Genome-Wide Association Study, Genomics, Proteomics & Bioinformatics , 2021, 19 (4), 619-628, doi: 10.1016/j.gpb.2020.10.007.

Questions, suggestions, and bug reports are welcome and appreciated: [email protected]

🧰 Relevant software tools for genetic analyses and genomic breeding

📫 HIBLUP: Versatile and easy-to-use GS toolbox. 🍀 SIMER: data simulation for life science and breeding.
🚴‍♂️ KAML: Advanced GS method for complex traits. 🏔️ IAnimal: an omics knowledgebase for animals.
🏊 hibayes: A Bayesian-based GWAS and GS tool. 📊 CMplot: A drawing tool for genetic analyses.

Contents


Installation

back to top

WE STRONGLY RECOMMEND TO link MKL or OpenBLAS with R to accelerate parallel computing. To install rMVP in R:

  • The stable version:
install.packages("rMVP")
  • The latest version:
devtools::install_github("xiaolei-lab/rMVP")

After installed successfully, rMVP can be loaded by typing

library(rMVP)

Typing ?rMVP could get the details of all parameters.

For more help on Windows installation, see the wiki page (Chinese)


Data Preparation

Phenotype

back to top
We suggest to provide the phenotype file, because users needn't to manually pre-treat the order of phenotype and genotype individuals, rMVP could automatically adjust the order of phenotype file to be consistent with genotype file. Note that if the phenotype is provided in data conversion, rMVP will generate a new phenotype file, please remember to load it for analysis rather than original one.

Taxa trait1 trait2 trait3
33-16 101.5 0.25 0
38-11 102.7 0.23 1
4226 101.2 -0.17 1
4722 105.5 NA 0
A188 108.1 0.57 1
A214N 95.13 0.87 0
A239 100.2 -0.16 1

Genotype

PLINK binary

back to top
If you have genotype data in PLINK Binary format (details see http://zzz.bwh.harvard.edu/plink/data.shtml#bed):  

fileBed, name of genotype data in PLINK Binary format
fileKin, TRUE or FALSE, if TRUE, kinship matrix represents relationship among individuals will be calculated
filePC, TRUE or FALSE, if TRUE, principal component analysis will be performed
out, prefix of output file
maxLine, number, the number of markers read into memory

# Full-featured function (Recommended)
MVP.Data(fileBed="plink",
         filePhe=NULL,
         fileKin=FALSE,
         filePC=FALSE,       
         #maxLine=10000,
         out="mvp.plink"
         )
         
# Only convert genotypes
MVP.Data.Bfile2MVP(bfile="plink", out='mvp', maxLine=1e4) # the genotype data should be fully imputed before using this function

VCF

back to top
If you have genotype data in VCF format:
fileVCF, name of genotype data in VCF format
filePhe, name of phenotype data
sep.phe, separator of phenotype file
fileKin, TRUE or FALSE, if TRUE, kinship matrix represents relationship among individuals will be calculated
filePC, TRUE or FALSE, if TRUE, principal component analysis will be performed
out, the prefix of output file

##fileformat=VCFv4.2
##fileDate=20171105
##source=PLINKv1.90
##contig=<ID=1,length=2>
##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	-9_CZTB0004	-9_CZTB0006	-9_CZTB0008	-9_CZTB0010	-9_CZTB0011	-9_CZTB0012
1	1	10000235	A	C	.	.	PR	GT	0/1	0/0	0/0	0/0	0/0	0/1
1	1	10000345	A	G	.	.	PR	GT	0/0	0/0	0/0	0/0	1/1	1/1
1	1	10004575	G	.	.	.	PR	GT	0/0	0/0	0/0	0/0	0/0	0/0
1	1	10006974	C	T	.	.	PR	GT	0/0	0/0	0/1	1/1	0/1	1/1
1	1	10006986	A	G	.	.	PR	GT	0/0	0/0	0/1	./.	1/1	1/1
# Full-featured function (Recommended)
MVP.Data(fileVCF="myVCF.vcf",
         #filePhe="Phenotype.txt",
         fileKin=FALSE,
         filePC=FALSE,
         out="mvp.vcf"
         )

# Only convert genotypes
MVP.Data.VCF2MVP("myVCF.vcf", out='mvp') # the genotype data should be fully imputed before using this function

Hapmap

back to top
If you have genotype data in Hapmap format:

fileHMP, a string or a string vector, e.g. fileHMP = "hapmap.txt" or fileHMP = c("chr1.hmp.txt", "chr2.hmp.txt", "chr3.hmp.txt")
filePhe, name of phenotype file
sep.phe, separator of phenotype file
fileKin, TRUE or FALSE, if TRUE, kinship matrix represents relationship among individuals will be calculated
filePC, TRUE or FALSE, if TRUE, principal component analysis will be performed
out, the prefix of output file
maxLine, number, the number of markers read into memory

hapmap.txt

rs# alleles chrom pos strand assembly# center protLSID assayLSID panelLSID QCcode 33-16 38-11 4226 4722 A188 ... A239
rs3683945 G/A 1 3197400 + NA NA NA NA NA NA AG AG GG AG GG ... AA
rs3707673 A/G 1 3407393 + NA NA NA NA NA NA GA GA AA GA AA ... GG
rs6269442 G/A 1 3492195 + NA NA NA NA NA NA AG GG GG AG GG ... AA
rs6336442 G/A 1 3580634 + NA NA NA NA NA NA AG AG GG AG GG ... AA
rs13475699 G 1 3860406 + NA NA NA NA NA NA GG GG GG GG GG ... GG
# Full-featured function (Recommended)
MVP.Data(fileHMP="hapmap.txt",
         filePhe="Phenotype.txt",
         sep.hmp="\t",
         sep.phe="\t",
         SNP.effect="Add",
         fileKin=FALSE,
         filePC=FALSE,
         #maxLine=10000,
         out="mvp.hmp"
         )

# Only convert genotypes
MVP.Data.Hapmap2MVP("hapmap.txt", out='mvp') # the genotype data should be fully imputed before using this function

Numeric

back to top
If you have genotype data in Numeric (m * n, m rows and n columns, m is the number of SNPs, n is the number of individuals) format:  

fileNum, name of genotype data in Numeric format
filePhe, name of phenotype file
fileMap, name of map file, a header should be added, e.g. SNP Chr Pos
sep.num, separator of Numeric file
sep.phe, separator of phenotype file
type.geno, the type of data in Numeric file, "char", "integer", or "double"
fileKin, TRUE or FALSE, if TRUE, kinship matrix represents relationship among individuals will be calculated
filePC, TRUE or FALSE, if TRUE, principal component analysis will be performed
out, the prefix of output file
maxLine, number, the number of markers read into memory
auto_transpose, bool, if auto_transpose = TRUE, it is automatically transposed to ensure that the number of rows (markers) is greater than the number of columns (individuals).

Numeric.txt Map.txt
1 1 2 1 2 0
1 1 0 1 0 2
1 2 2 1 2 0
1 1 2 1 2 0
0 0 0 0 0 0
SNP Chr Pos
rs3683945 1 3197400
rs3707673 1 3407393
rs6269442 1 3492195
rs6336442 1 3580634
rs13475699 1 3860406
# Full-featured function (Recommended)
MVP.Data(fileNum="Numeric.txt",
         filePhe="Phenotype.txt",
         fileMap="Map.txt",
         sep.num="\t",
         sep.map="\t", 
         sep.phe="\t",
         fileKin=FALSE,
         filePC=FALSE,
         #maxLine=10000,
         out="mvp.num"
         )

# Only convert genotypes
MVP.Data.Numeric2MVP("Numeric.txt", out='mvp', maxLine=1e4, auto_transpose=T) # the genotype data should be fully imputed before using this function

Genomic relationship matrix (Optional)

back to top
If you have Kinship matrix data that represents the relationship among individuals

fileKin, name of Kinship matrix data, the dimension is n * n (n is sample size), no taxa names included
sep.kin, separator of Kinship file

mvp.kin.txt

0.3032 -0.0193 0.0094 0.0024 0.0381 ... -0.0072
-0.0193 0.274 -0.0243 0.0032 -0.0081 ... 0.0056
0.0094 -0.0243 0.3207 -0.0071 -0.0045 ... -0.0407
0.0024 0.0032 -0.0071 0.321 -0.008 ... -0.0093
0.0381 -0.0081 -0.0045 -0.008 0.3498 ... -0.0238
... ... ... ... ... ... ...
-0.0072 0.0056 -0.0407 -0.0093 -0.0238 ... 0.3436
# read from file
MVP.Data.Kin("mvp.kin.txt", out="mvp", maxLine=1e4, sep='\t')

# calculate from mvp_geno_file
MVP.Data.Kin(TRUE, mvp_prefix='mvp.vcf', out='mvp')

Principal Components (Optional)

back to top
If you have Principal Components data

filePC, name of Principal Components matrix data, the dimension is n * nPC (n is sample size, nPC is number of first columns of PCs), no taxa names and header row included
sep.pc, separator of Principal Components file

mvp.pc.txt

0.010175524 -0.037989071 0.009588312
-0.009138673 -0.036763080 -0.006396714
-0.004723734 -0.047837625 0.021687731
0.012887843 -0.048418352 0.054298850
0.003871951 -0.038070387 0.008020508
-0.079505846 0.005818163 -0.206364549
# read from file
MVP.Data.PC("mvp.pc.txt", out='mvp', sep='\t')

# calculate from mvp_geno_file
MVP.Data.PC(TRUE, mvp_prefix='mvp.vcf', pcs.keep=5)

Data Input

Basic

back to top
At least you should prepare three datasets: genotype, phenotype, and map

genotype, genotype data generated by 'MVP.Data' function
phenotype, phenotype data, the first column is taxa name and second column is phenotype value. Note that if the phenotype is provided in data conversion, rMVP will generate a new phenotype file, please remember to load it for analysis rather than original one
map, SNP map information, the first column is SNP name, the second column is Chromosome ID, the third column is phsical position

genotype <- attach.big.matrix("mvp.geno.desc")
phenotype <- read.table("mvp.phe",head=TRUE)
map <- read.table("mvp.geno.map" , head = TRUE)

Advanced

back to top
You can load the prepared Kinship matrix and Covariates data generated by 'MVP.Data' function
Kinship, Kinship matrix, the dimension of Kinship matrix is n * n (n is sample size), no taxa names included
Covariates, Covariates matrix, the dimension of Covariates matrix is n * nCV (n is sample size, nCV is number of covariates, no taxa names and header row included
NOTE: If pcs have been added in covariate files, PLEASE DO NOT assign value to nPC.GLM, nPC.MLM, nPC.FarmCPU.

```r
Kinship <- attach.big.matrix("mvp.kin.desc")
Covariates_PC <- bigmemory::as.matrix(attach.big.matrix("mvp.pc.desc"))

If you have additional environmental fixed effects (e.g., breed, sex) or covariates (weight), please use it as following:

Covariates <- model.matrix.lm(~as.factor(breed)+as.factor(sex)+as.numeric(weight), data=yourdata, na.action = "na.pass")
# NA is acceptable in 'Covariates'

# if you are supposed to take PC to covariate
Covariates <- cbind(Covariates, Covariates_PC)

NOTE: rMVP has no function of adjusting the order of individuals in covariates. PLEASE make sure the order of individuals in covariates file must be consistent with that in genotype file.

If you have prepared Kinship matrix and Covariates data generated by other software packages, see Kinship and Principal Components


Start GWAS

back to top

Three models are included in MVP package: General Linear Model (GLM), Mixed Linear Model (MLM), and FarmCPU.

phe, phenotype data
geno, genotype data
map, map data
K, Kinship matrix
CV.GLM, Covariates added in GLM
CV.MLM, Covariates added in MLM
CV.FarmCPU, Covariates added in FarmCPU
If you don't want to add PCs as covariates, please comment out the parameters instead of setting the nPC to 0
please attention that if nPC.GLM > 0, no PCs should be added in CV.GLM
nPC.GLM, number of first columns of Principal Components added in GLM
please attention that if nPC.MLM > 0, no PCs should be added in CV.MLM
nPC.MLM, number of first columns of Principal Components added in MLM
please attention that if nPC.FarmCPU > 0, no PCs should be added in CV.FarmCPU
nPC.FarmCPU, number of first columns of Principal Components added in FarmCPU
maxLine, the number of markers handled at a time, smaller value would reduce the memory cost
ncpus, number of CPUs used for parallel computation, If not set, all CPUs will be used by default
vc.method, methods of variance components analysis, three methods are avaiblable, "BRENT", "EMMA", and "HE"
maxLoop, a parameter for FarmCPU only, the maximum iterations allowed in FarmCPU
method.bin, a parameter for FarmCPU only, two options are available: 'static' or 'FaST-LMM'
permutation.threshold, if TRUE, a threshold of permutation will be used in manhattan plot. The phenotypes are permuted to break the relationship with the genotypes. The experiment is replicated for a number of times. A vector of minimum p value of all experiments is recorded and the 95% quantile value of this vector is recommended to be used as significant threshold
permutation.rep, number of permutation replicates, only used when permutation.threshold is TRUE
threshold, 0.05/marker size, a cutoff line on manhattan plot
method, models for association tests, three models are available in MVP, "GLM", "MLM", and "FarmCPU", one or two or three models can be selected for association tests
file.output, a Boolean value or a string vector. If TRUE, output all types of files. If FALSE, no files are output. For string vectors, the available values are c("pmap", "pmap.signal", "plot", "log"). Among them, pmap represents all SNP P-Val files, pmap.signal represents significant SNP files, Plot represents visualization results, and log represents log files.

imMVP <- MVP(
    phe=phenotype,          #NA is acceptable in phenotype
    geno=genotype,
    map=map,
    #K=Kinship,             #if you have pre-computed GRM, please keep there open, otherwise rMVP will compute it automatically
    #CV.GLM=Covariates,     #if you have environmental covariates, please keep all 'CV.*' open
    #CV.MLM=Covariates,
    #CV.FarmCPU=Covariates,
    nPC.GLM=5,              #if you have added PCs into covariates, please keep there closed
    nPC.MLM=3,              #if you don't want to add PCs as covariates, please comment out the parameter instead of setting it to 0.
    nPC.FarmCPU=3,
    maxLine=10000,          #smaller value would reduce the memory cost
    #ncpus=10,
    vc.method="BRENT",      #only works for MLM
    method.bin="static",    # "FaST-LMM", "static" (#only works for FarmCPU)
    threshold=0.05,
    method=c("GLM", "MLM", "FarmCPU"),
    file.output=c("pmap", "pmap.signal", "plot", "log")
)

If you have more than one phenotype

for(i in 2:ncol(phenotype)){
  imMVP <- MVP(
    phe=phenotype[, c(1, i)],
    geno=genotype,
    map=map,
    #K=Kinship,
    #CV.GLM=Covariates,
    #CV.MLM=Covariates,
    #CV.FarmCPU=Covariates,
    nPC.GLM=5,
    nPC.MLM=3,
    nPC.FarmCPU=3,
    maxLine=10000,
    #ncpus=10,
    vc.method="BRENT",
    method.bin="static",
    threshold=0.05,
    method=c("GLM", "MLM", "FarmCPU"),
    file.output=c("pmap", "pmap.signal", "plot", "log")
  )
  gc()
}

Output

back to top
MVP automatically outputs high-quality figures, three types of figure formats are available (".jpg",".pdf",".tiff", default is ".jpg"). Users could also adjust the output figure using about 50 parameters in MVP.Report(). MVP.Report() not only accept the final return of MVP(), but also accepts results from third-party software packages, such as PLINK, GEMMA, GAPIT, TASSEL, and FarmCPU. The result from third-party software packages should at least contain four columns, which are marker name, chromosome, physical position, and P-value of a trait, results of more than one trait could be sequentially appended column by column. Typing ?MVP.Report() to see details of all parameters and typing data(pig60K) or data(cattle50K) to load demo datasets. Type ?MVP.Repory to see parameter details.

> data(pig60K)   #GWAS result of MLM
> data(cattle50K)   #SNP effects calculated from rrblup

> head(pig60K)

          SNP Chromosome Position    trait1     trait2     trait3
1 ALGA0000009          1    52297 0.7738187 0.51194318 0.51194318
2 ALGA0000014          1    79763 0.7738187 0.51194318 0.51194318
3 ALGA0000021          1   209568 0.7583016 0.98405289 0.98405289
4 ALGA0000022          1   292758 0.7200305 0.48887140 0.48887140
5 ALGA0000046          1   747831 0.9736840 0.22096836 0.22096836
6 ALGA0000047          1   761957 0.9174565 0.05753712 0.05753712

> head(cattle50K)

   SNP chr    pos Somatic cell score  Milk yield Fat percentage
1 SNP1   1  59082        0.000244361 0.000484255    0.001379210
2 SNP2   1 118164        0.000532272 0.000039800    0.000598951
3 SNP3   1 177246        0.001633058 0.000311645    0.000279427
4 SNP4   1 236328        0.001412865 0.000909370    0.001040161
5 SNP5   1 295410        0.000090700 0.002202973    0.000351394
6 SNP6   1 354493        0.000110681 0.000342628    0.000105792

In the demo datasets, the first three columns are marker name, chromosome, and physical position, respectively, the rest columns are the P-value or effect of multiple traits. Number of traits is theoretically unlimited.

Phenotype distribution

back to top

phe, phenotype data
file.type, format of output figure
breakNum, number of breaking points for phenotype when plotting distribution
dpi, resolution of output figure

MVP.Hist(phe=phenotype, file.type="jpg", breakNum=18, dpi=300)

SNP-density plot

back to top

plot.type, four options ("d", "c", "m", "q"); if "d", draw SNP-density plot
bin.size, the window size for counting SNP number
bin.max, maximum SNP number, for windows, which has more SNPs than bin.max, will be painted in same color
col, colors for separating windows with different SNP density
file.type, format of output figure
dpi, resolution of output figure

MVP.Report(pig60K[, c(1:3)], plot.type="d", col=c("darkgreen", "yellow", "red"), file.type="jpg", dpi=300)

PCA plot

back to top
pca, the first three columns of principle components
Ncluster, cluster number
class, the class of all individuals, for example: "breed", "location"...
col, colors for each cluster
pch, point shape for each cluster
file.type, format of output figure
dpi, resolution of output figure

pca <- attach.big.matrix("mvp.pc.desc")[, 1:3]
#pca <- prcomp(t(as.matrix(genotype)))$x[, 1:3]
MVP.PCAplot(PCA=pca, Ncluster=3, class=NULL, col=c("red", "green", "yellow"), file.type="jpg")

Manhattan plot in Circular fashion

back to top
For GWAS results:

> MVP.Report(pig60K,plot.type="c",chr.labels=paste("Chr",c(1:18,"X"),sep=""),r=0.4,cir.legend=TRUE,
        outward=FALSE,cir.legend.col="black",cir.chr.h=1.3,chr.den.col="black",file.type="jpg",
        memo="",dpi=300)

> MVP.Report(pig60K,plot.type="c",r=0.4,col=c("grey30","grey60"),chr.labels=paste("Chr",c(1:18,"X"),sep=""),
      threshold=c(1e-6,1e-4),cir.chr.h=1.5,amplify=TRUE,threshold.lty=c(1,2),threshold.col=c("red",
      "blue"),signal.line=1,signal.col=c("red","green"),chr.den.col=c("darkgreen","yellow","red"),
      bin.size=1e6,outward=FALSE,file.type="jpg",memo="",dpi=300)

#Note:
1. if signal.line=NULL, the lines that crosse circles won't be added.
2. if the length of parameter 'chr.den.col' is not equal to 1, SNP density that counts 
   the number of SNP within given size('bin.size') will be plotted around the circle.

For GS/GP results:

> MVP.Report(cattle50K,plot.type="c",LOG10=FALSE,outward=TRUE,matrix(c("#4DAF4A",NA,NA,"dodgerblue4",
            "deepskyblue",NA,"dodgerblue1", "olivedrab3", "darkgoldenrod1"), nrow=3, byrow=TRUE),
            chr.labels=paste("Chr",c(1:29),sep=""),threshold=NULL,r=1.2,cir.chr.h=1.5,cir.legend.cex=0.5,
            cir.band=1,file.type="jpg", memo="",dpi=300,chr.den.col="black")
        
#Note: 
Parameter 'col' can be either vector or matrix, if a matrix, each trait can be plotted in different colors.

Manhattan plot in Rectangular fashion for single trait or method

back to top
For GWAS results:

> MVP.Report(pig60K,plot.type="m",LOG10=TRUE,threshold=NULL,col=c("dodgerblue4","deepskyblue"), cex=0.7,
            chr.den.col=NULL,file.type="jpg",memo="",dpi=300)

> MVP.Report(pig60K, plot.type="m", col=c("dodgerblue4","deepskyblue"), LOG10=TRUE, ylim=NULL,
        threshold=c(1e-6,1e-4), threshold.lty=c(1,2), threshold.lwd=c(1,1), threshold.col=c("black",
        "grey"), amplify=TRUE,chr.den.col=NULL, signal.col=c("red","green"), signal.cex=c(1,1),
        signal.pch=c(19,19),file.type="jpg",memo="",dpi=300)

> MVP.Report(pig60K, plot.type="m", LOG10=TRUE, ylim=NULL, threshold=c(1e-6,1e-4),threshold.lty=c(1,2),
        col=c("grey60","grey30"), threshold.lwd=c(1,1), threshold.col=c("black","grey"), amplify=TRUE,
        chr.den.col=c("darkgreen", "yellow", "red"),bin.size=1e6,signal.col=c("red","green"),
        signal.cex=c(1,1),signal.pch=c(19,19),file.type="jpg",memo="",dpi=300)
        
#Note:
if the length of parameter 'chr.den.col' is bigger than 1, SNP density that counts 
   the number of SNP within given size('bin.size') will be plotted.

For GS/GP results:

> MVP.Report(cattle50K, plot.type="m", band=0, LOG10=FALSE, ylab="Abs(SNP effect)",threshold=0.015,
        threshold.lty=2, threshold.lwd=1, threshold.col="red", amplify=TRUE, signal.col=NULL,
        col=c("dodgerblue4","deepskyblue"), chr.den.col=NULL, file.type="jpg",memo="",dpi=300)

#Note: 
if signal.col=NULL, the significant SNPs will be plotted with original colors.

Manhattan plot in Rectangular fashion for multiple traits or methods

back to top

> MVP.Report(pig60K, plot.type="m", multracks=TRUE, threshold=c(1e-6,1e-4),threshold.lty=c(1,2), 
        threshold.lwd=c(1,1), threshold.col=c("black","grey"), amplify=TRUE,bin.size=1e6,
        chr.den.col=c("darkgreen", "yellow", "red"), signal.col=c("red","green"),signal.cex=c(1,1),
        file.type="jpg",memo="",dpi=300)

a. all traits in a axes:

b. all traits in separated axes:

Q-Q plot for single trait or method

back to top

> MVP.Report(pig60K,plot.type="q",conf.int.col=NULL,box=TRUE,file.type="jpg",memo="",dpi=300)

Q-Q plot for multiple traits or methods

back to top

> MVP.Report(imMVP,plot.type="q",col=c("dodgerblue1", "olivedrab3", "darkgoldenrod1"),threshold=1e6,
        signal.pch=19,signal.cex=1.5,signal.col="red",conf.int.col="grey",box=FALSE,multracks=
        TRUE,file.type="jpg",memo="",dpi=300)


Citation

back to top

For MVP (Please cite MVP package for all the functions):
Yin L, Zhang H, Tang Z, Xu J, Yin D, Zhang Z, Yuan X, Zhu M, Zhao S, Li X. rMVP: A Memory-efficient, Visualization-enhanced, and Parallel-accelerated tool for Genome-Wide Association Study, Genomics, Proteomics & Bioinformatics, 2021, 19 (4), 619-628, doi: 10.1016/j.gpb.2020.10.007.

For calculation of K matrix:
Vanraden, P. M. "Efficient Methods to Compute Genomic Predictions." Journal of Dairy Science 91.11(2008):4414-4423.

For GLM(PC) model:
Price, Alkes L., et al. "Principal components analysis corrects for stratification in genome-wide association studies." Nature genetics 38.8 (2006): 904.

For MLM(K) model:
Yu, Jianming, et al. "A unified mixed-model method for association mapping that accounts for multiple levels of relatedness." Nature genetics 38.2 (2006): 203.

For MLM(PCA+K) model:
Price, Alkes L., et al. "Principal components analysis corrects for stratification in genome-wide association studies." Nature genetics 38.8 (2006): 904.

For FarmCPU model:
Liu, Xiaolei, et al. "Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies." PLoS genetics 12.2 (2016): e1005767.

For variance components:
> HE: Zhou, Xiang. "A unified framework for variance component estimation with summary statistics in genome-wide association studies." The annals of applied statistics 11.4 (2017): 2027.
> EMMA/P3D:
1. Kang, Hyun Min, et al. "Efficient control of population structure in model organism association mapping." Genetics 178.3 (2008): 1709-1723.
2. Zhang, Zhiwu, et al. "Mixed linear model approach adapted for genome-wide association studies." Nature genetics 42.4 (2010): 355.
> Fast-lmm: Lippert, Christoph, et al. "FaST linear mixed models for genome-wide association studies." Nature methods 8.10 (2011): 833.

FAQ and Hints

back to top

🆘 Question1: Failing to install "devtools":

ERROR: configuration failed for package ‘git2r’

removing ‘/Users/acer/R/3.4/library/git2r’

ERROR: dependency ‘git2r’ is not available for package ‘devtools’

removing ‘/Users/acer/R/3.4/library/devtools’

😋 Answer: Please try following codes in terminal:

apt-get install libssl-dev/unstable

🆘 Question2: When installing packages from Github with "devtools", an error occurred:

Error in curl::curl_fetch_disk(url, x$path, handle = handle): Problem with the SSL CA cert (path? access rights?)

😋 Answer: Please try following codes and then try agian.

library(httr)
set_config(config(ssl_verifypeer = 0L))

🆘 Question3: When installing MVP:

Error in lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress, : write failed ERROR: lazy loading failed for package ‘MVP’ removing ‘/home/liuxl/R/x86_64-pc-linux-gnu-library/3.3/MVP’ Warning message: In install.packages("MVP_1.0.1.tar.gz", repos = NULL) : installation of package ‘MVP_1.0.1.tar.gz’ had non-zero exit status

😋 Answer: It is probably an issue caused by disk full, please check disk space.

Questions, suggestions, and bug reports are welcome and appreciated. ➡️

back to top

rmvp's People

Contributors

hyacz avatar kant avatar xiaoleiliubio avatar yinlilin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rmvp's Issues

When calculating kinship, "Error: cannot allocate vector of size 56.4 Gb"

There's at least one, possibly two problems here. I'm trying to use MVP.Data to prepare my data in PLINK binary format for FarmCPU. I first noticed something was possibly wrong when MVP had written about 25% of the SNP markers (8.2M total)... at this point, the .geno.bin output stopped increasing in size. It had reached 7GB and stayed at this size from the time the console said "Number of markers written into BIG file" was ~2M to when it was ~8M. The .bed input is 28GB, yet the largest output was this 7GB .geno.bin.

Also, I appear to be unable to make a kinship matrix once writing markers into the BIG file is complete. This problem is shown below, below the command I used.

>MVP.Data(fileBed="my_in_prefix",
+          filePhe="my_pheno_data",
+          fileKin=TRUE,
+          filePC=TRUE,
+          out="my_out_prefix"    
+ )
[1] "Number of Markers Written into BIG File: 8252993"
[1] "Preparation for Genotype data is done!"
[1] "Preparation for PHENOTYPE data is done!"
[1] "Calculate KINSHIP using Vanraden method..."
Error: cannot allocate vector of size 56.4 Gb

This was followed by R crashing in the way it normally does when too much memory is used. I thought 64GB RAM would be enough for our 28GB SNP set, so that's all I asked for. Our admins can quickly add more to the virtual machine I'm working on if needed. If this is the problem, how much RAM should they add for us to be sure the whole process (from MVP.Data through FarmCPU) will work?

I have a kinship matrix prepared by EMMAX-kin, so would it be more wise for me to import and parse that one than to try making a new kinship matrix?

At this point in time, we have to use this large SNP set, which has already been filtered based on 5% maf... We can do more filtering later, but first need to make sure we can replicate results with the 8.2M set across multiple platforms.

Thanks much for your time!

permutation quantile

Hi,

I am new to GitHub, so please excuse me if my question is not issued in a proper way.

I have a question regarding the permutation threshold. If I understand it correctly, the 95% quantile of the p-values vector is used as a significance threshold and I found that to be relatively high or maybe not stringent enough (see below an example with 200 permutations).

Rectangular-Manhattan m_2 MLM_dom

The authors of this paper (https://onlinelibrary.wiley.com/doi/full/10.1111/tpj.14659) used a 95% quantile not of the p-value itself but the -log10(p) value.

image
Figure 5: Arouisse, B., Korte, A., van Eeuwijk, F., & Kruijer, W. (2020). Imputation of 3 million SNPs in the Arabidopsis regional mapping population. The Plant Journal, 102(4), 872–882. https://doi.org/10.1111/tpj.14659
(this is a different trait as the one I tested, but the experiment is in the same species, with a similar amount of genotypes as I used and I obtained the markers from their imputation)

Following that approach I was wondering whether this would correspond to a 5% quantile if the p-value itself is used. Maybe this is a matter of preference, but I was thinking that having a relatively stringent p-value, while still detecting significant results would be a good trade-off between type I and II errors. Maybe I am understanding the math/statistics behind it incorrectly, but I would appreciate it if you could give a short comment.
Thanks a lot in advance!

NA value for se in the GWAS output

Hi,
Thanks for developing this nice package.
I want to know why am I getting NA value for se in some significant SNPs? also, is there a way to calculate proportion of variance explained by the significant SNP?

Thanks for your time!

hc.FarmCPU_signals.txt

Problem to read Numeric data

Hi,

I'm struggling to get my numeric data to read in MVP.

I prepared the numeric format based on the instructions at https://github.com/XiaoleiLiuBio/MVP

It looks like this (Just a portion shown here. The total number of row/SNP is 52157 and total number of column/individual is 68), no header and row names:

NaN | NaN | NaN | NaN | NaN | NaN | NaN
0 | 0 | 0 | 0 | 2 | 0 | NaN
2 | 2 | 2 | 2 | 2 | 2 | 0
2 | 2 | 2 | 2 | 2 | 2 | 2
2 | 0 | 2 | 2 | 2 | 0 | 0
0 | NaN | 0 | 0 | 0 | 2 | 2
NaN | NaN | NaN | NaN | NaN | NaN | NaN
0 | 0 | 0 | 0 | 0 | 0 | 0

I understand that there shouldn't be NaN or missing data in numeric format. I don't know how to overcome this. Could you please help me?

I also got this error:

Error in big.matrix(nrow = numRows, ncol = createCols, type = type, dimnames = list(rowNames, :
A big.matrix must have at least one row and one column

I used this code to try to fix:

fileNum=matrix(1:3546676,ncol=68)

But still got the same error.

May I ask what should I do?

Appreciate your help.

Many thanks.

Regards,
Tina

Error in running MVP: Error in SetAll.bm(x, value)

Hi Xiaolei,
I got geno, pheno data and map ready with MVP.Data function for the run, but when I start the run, I got the following error:

[1] "Input data has 198 individuals, 109914 markers"
[1] "Principal Component Analysis Start ..."
means for first 10 snps:
[1] 0 0 0 0 0 0 0 0 0 0
Error in SetAll.bm(x, value) :
Matrix dimensions do not agree with big.matrix instance set size.

Any idea why this error occurred? Thanks!

PVE value

where the PVE value by each SNP?

thank you!

cite MVP

Hi Xiaolei,
I am wondering what's the best way to cite MVP in my manuscript. Thanks!

Two minor bugs

Hi

Thank you for the recent improvements to this package!

I have two minor things:

  1. When rMVP reports the significance level, it cuts off at 6 decimal places:

    rMVP/R/MVP.r

    Line 307 in 3acd666

    logging.log(paste("Significant level: ", sprintf("%.6f", threshold/m), sep=""), "\n", verbose = verbose)

    In my case I have so many SNPs that rMVP always prints Significant level: 0.000000. Perhaps it would be better to use scientific notation using formatC(0.0000000000005, format = "e", digits = 2)?

  2. The files ending in signal.csv are very useful, is it possible there is a bug when only a single SNP is significant (which happens a lot with FarmCPU)?
    Normally, I get a proper table with proper headers, but when I have only one SNP, I get a strange text-file like this:

"x"
"SNP_5015"
"3"
"47924314"
"-6185.31927432418"
"1015.52184827578"
"1.50018111376143e-08"

i.e., the single-row table got transposed. Funny enough it's just a cbind and a write.csv, I currently don't understand how this happens.

sessionInfo():

R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.10 (Santiago)

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] rMVP_0.99.16     bigmemory_4.5.33 MASS_7.3-51.4

loaded via a namespace (and not attached):
[1] bigmemory.sri_0.1.3 compiler_3.6.1      tools_3.6.1
[4] Rcpp_1.0.2

Submit to Bioconductor / release 0.99.15

There are notes in the build report that seem like true positives, and that should be addressed:

  • seem like straight-forward corrections
    * checking R code for possible problems ... [16s/16s] NOTE
    MVP.Data.Kin: warning in read.big.matrix(fileKin, head = FALSE, type =
      "double", sep = sep): partial argument match of 'head' to 'header'
    MVP.Data.Numeric2MVP: warning in read.big.matrix(num_file, head =
      FALSE, sep = sep): partial argument match of 'head' to 'header'
    MVP.Data.PC: warning in read.big.matrix(filePC, head = FALSE, type =
      "double", sep = sep): partial argument match of 'head' to 'header'
    MVP.FarmCPU: warning in FarmCPU.BIN(Y = phe[, c(1, 2)], GD = geno, GM =
      map, CV = CV, P = myPrior, method = method.bin, b = bin.size, s =
      bin.selection, theLoop = theLoop, bound = bound, ncpus = ncpus):
      partial argument match of 'GD' to 'GDP'
    MVP.FarmCPU: warning in FarmCPU.BIN(Y = phe[, c(1, 2)], GD = geno, GM =
      map, CV = theCV, P = myPrior, method = method.bin, b = bin.size, s =
      bin.selection, theLoop = theLoop, ncpus = ncpus): partial argument
      match of 'GD' to 'GDP'
    MVP.FarmCPU: warning in FarmCPU.Remove(GD = geno, GM = map, seqQTN =
      seqQTN, seqQTN.p = seqQTN.p, threshold = 0.7): partial argument match
      of 'GD' to 'GDP'
    print_info: warning in make_line(welcome, line = linechar, width =
      width): partial argument match of 'line' to 'linechar'
    
  • Undefined global functions or variables: Revo.version getMKLthreads setMKLthreads looks like missing imports
  • seems like mis-use of compiler flags
    * checking use of SHLIB_OPENMP_*FLAGS in Makefiles ... NOTE
      src/Makevars: SHLIB_OPENMP_CXXFLAGS is included in PKG_CXXFLAGS but not in PKG_LIBS
      src/Makevars: SHLIB_OPENMP_CFLAGS is included in PKG_LIBS but linking is by C++
      src/Makevars.win: SHLIB_OPENMP_CXXFLAGS is included in PKG_CXXFLAGS but not in PKG_LIBS
      src/Makevars.win: SHLIB_OPENMP_CFLAGS is included in PKG_LIBS but linking is by C++
    Use of these macros is discussed in sect 1.2.1.1 of Writing R
    Extensions. The macros for different languages may differ so the
    matching macro must be used in PKG_CXXFLAGS (etc) and match that used
    in PKG_LIBS (except for Fortran: see the manual).
    

DESCRIPTION

  • Please provide a Description: field that reads more like a paper abstract, providing the user with an informative summary of the functionality of your package and how it advances current 'state of the art'. Remember that the field can be wrapped to multiple lines, with continuation lines indented, e.g., by 4 spaces.
  • It seems likely that BH, RccpPrgoess, Rcpp, parallel and perahps other packages are not 'seen' by the end user, and should be in Imports: rather than Depends:.

NAMESPACE

  • BH does not have any R functions, so importing it's NAMESPACE does not make sense

vignettes

  • Please arrange for more of your vignette code chunks to be actually evaluated. The problem is that these 'pictures of code' quickly break, and the user is left with the impression that the package functions as advertised when it does not.

R -- the following are specific comments, but issues should be addressed throughout the package

  • MVP.Data.r:77 message("WARNING...") if this is a warning, then use warning() rather than than message().
  • MVP.Data.r:77 please avoid long lines that combine programming idioms, they are difficult to read and comprehend
    if ("sep.vcf" %in% names(parames))
        warning("'sep.vcf' has be DEPRECATED")
    
  • MVP.Data.r:90 use vapply() rather than sapply() to avoid surprises when the length of the first argument is 0.
  • MVP.Data.r:223 use message() rather than cat() to communicate informative messages to users; note that message() does not usually require paste().
  • MVP.Data.r:550 use seq_len() or seq_along() rather than 1:n to avoid surprises when n < 1.
  • MVP.Data.r:555 stop("ERRROR: ...") and similar seems redundant, simply stop("...") would seem sufficient.
  • MVP.Data.r:857 setting options(...) without restoring values to their default is bad style. use
    opts <- options(...)
    on.exit(options(opts))
    
  • MVP.EMMA.Vg.Ve.r:113 another example of using seq_len() rather than 1:(m - 1)
  • MVP.EMMA.Vg.Ve.r: 115 'hoist' loop invariant code like this function definition outside the for loop to avoid redundant computation.
  • MVP.FarmCPU.r:95 another example where use of message() is appropriate
  • MVP.FarmCPU.r:188 reconsider the need for verbose output like this.
  • MVP.FarmCPU.r:300 this code is highly similar (identical?) to other code; write simple functions and reuse rather than copy / paste.
  • MVP.GLM.r:93 please use consistent indentation for readability
  • MVP.PCA.r:230 remove dead code.
  • MVP.Utility.r:59 re-think the value of re-implementing a progress bar; this represents additional code that you must maintain, providing limited incremental value.

General

  • are the files on the master branch of this repository necessary for the package? For instance, are the files in the results folder used?

could not find function "MVP.Data"

Unable execute the code getting error as
Error in MVP.Data(fileHMP = "HapMapwildsoy.hmp.txt", filePhe = "Pheno_269.txt", : could not find function "MVP.Data"

Error in svd(X) : infinite or missing values in 'x'

library(rMVP)
MVP.Data(fileHMP="Filtered genotype.hmp copy.txt",
         filePhe="Phenotypic_data.txt",
         sep.hmp="\t",
         sep.phe="\t",
         SNP.effect="Add",
         fileKin=FALSE,
         filePC=FALSE,
         out="mvp.hmp",
         #priority="memory",
         #maxLine=10000
)

genotype <- attach.big.matrix("mvp.hmp.geno.desc")
phenotype <- read.table("mvp.hmp.phe",head=TRUE)
map <- read.table("mvp.hmp.geno.map" , head = TRUE)

imMVP <- MVP(
  phe=phenotype,
  geno=genotype,
  map=map,
  #K=Kinship,
  #CV.GLM=Covariates,     ##if you have additional covariates, please keep there open.
  #CV.MLM=Covariates,
  #CV.FarmCPU=Covariates,
  nPC.GLM=5,      ##if you have added PC into covariates, please keep there closed.
  nPC.MLM=3,
  nPC.FarmCPU=3,
  priority="speed",
  #ncpus=10,
  vc.method="BRENT",
  maxLoop=10,
  method.bin="FaST-LMM",#"FaST-LMM","EMMA", "static"
  #permutation.threshold=TRUE,
  #permutation.rep=100,
  threshold=0.05,
  method=c("GLM", "MLM", "FarmCPU")
)

This is the output that I am getting

> library(rMVP)
> MVP.Data(fileHMP="Filtered genotype.hmp copy.txt",
+          filePhe="Phenotypic_data.txt",
+          sep.hmp="\t",
+          sep.phe="\t",
+          SNP.effect="Add",
+          fileKin=FALSE,
+          filePC=FALSE,
+          out="mvp.hmp",
+          #priority="memory",
+          #maxLine=10000
+ )
Preparing data for MVP...
Reading file...
inds: 133	markers:1472
Number of threads: 1
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Preparation for GENOTYPE data is done within 0s 
131 common individuals between phenotype and genotype. 
Preparation for PHENOTYPE data is Done within 0s 
Imputing...
out is NULL, impute inplace.
Number of threads: 3
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Impute Genotype File is done!
MVP data prepration accomplished successfully!
Warning messages:
1: In MVP.Data(fileHMP = "Filtered genotype.hmp copy.txt", filePhe = "Phenotypic_data.txt",  :
  SNP.effect has been DEPRECATED.
2: In MVP.Data(fileHMP = "Filtered genotype.hmp copy.txt", filePhe = "Phenotypic_data.txt",  :
  sep.hmp has been DEPRECATED.
> genotype <- attach.big.matrix("mvp.hmp.geno.desc")
> phenotype <- read.table("mvp.hmp.phe",head=TRUE)
> map <- read.table("mvp.hmp.geno.map" , head = TRUE)
> imMVP <- MVP(
+   phe=phenotype,
+   geno=genotype,
+   map=map,
+   #K=Kinship,
+   #CV.GLM=Covariates,     ##if you have additional covariates, please keep there open.
+   #CV.MLM=Covariates,
+   #CV.FarmCPU=Covariates,
+   nPC.GLM=5,      ##if you have added PC into covariates, please keep there closed.
+   nPC.MLM=3,
+   nPC.FarmCPU=3,
+   priority="speed",
+   #ncpus=10,
+   vc.method="BRENT",
+   maxLoop=10,
+   method.bin="FaST-LMM",#"FaST-LMM","EMMA", "static"
+   #permutation.threshold=TRUE,
+   #permutation.rep=100,
+   threshold=0.05,
+   method=c("GLM", "MLM", "FarmCPU")
+ )
====================== Welcome to MVP ======================
      A Memory-efficient, Visualization-enhanced, and       
             Parallel-accelerated Tool For GWAS             
                    __  __  __   __  ___                    
                   |  \/  | \ \ / / | _ \                   
                   | |\/| |  \ V /  |  _/                   
                   |_|  |_|   \_/   |_|   Version: 0.99.17  
  Designed and Maintained by Lilin Yin, Haohao Zhang, and   
  Xiaolei Liu                                               
  Contributors: Zhenshuang Tang, Jingya Xu, Dong Yin,       
  Zhiwu Zhang, Xiaohui Yuan, Mengjin Zhu, Shuhong Zhao,     
  Xinyun Li                                                 
  Contact: [email protected]                      
============================================================
Start: 2020-03-03 15:19:31 
The log has been output to the file: /Users/rudrakshisharma/Desktop/Manhattan plots/MVP.20200303_151931.log 
Input data has 131 individuals, 1472 markers 
Phenotype: Arimont 
Relationship matrix mode in speed 
Scale the genotype matrix 
Computing Z'Z 
Deriving relationship matrix successfully 
Eigen Decomposition 
Deriving PCs successfully 
Number of PCs included in GLM: 5 
Number of PCs included in MLM: 3 
Number of PCs included in FarmCPU: 3 
-------------------------GWAS Start------------------------- 
General Linear Model (GLM) Start... 
scanning...

Mixed Linear Model (MLM) Start... 
Variance components using: BRENT 
Estimated Vg and Ve: NA NA 
Error in svd(X) : infinite or missing values in 'x'
In addition: Warning messages:
1: In MVP.GLM(phe = phe, geno = geno, CV = CV.GLM, cpu = ncpus, bar = bar,  :
  NAs introduced by coercion
2: In MVP.MLM(phe = phe, geno = geno, K = K, eigenK = eigenK, CV = CV.MLM,  :
  NAs introduced by coercion

Can someone please help me on this issue??

Regards
Rudra

Incomplete Manhattan plots

Hi,
I recently got this incomplete manhatten plot while running a rMVP script on a linux server (end of Chr 12 not mapped) This phenomenon does not occur when I am running the same script on my windows working station.
Thanks in advance for the help.

Solyc01g109520 MLM Rectangular-Manhattan dom

Installation problem rMVP

install.packages("rMVP")
Installing package into ‘C:/Users/user/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
package ‘rMVP’ is not available (for R version 3.5.3)

Or the the development version from GitHub:

install.packages("devtools")

devtools::install_github("xiaolei-lab/rMVP")
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
there is no package called ‘glue’

Or the the development version from GitHub:

install.packages("devtools")

devtools::install_github("xiaolei-lab/rMVP")
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
there is no package called ‘glue’
devtools::install_github("xiaolei-lab/rMVP")
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
there is no package called ‘glue’

nPC.GLM, nPC.MLM and nPC.FarmCPU

How to choose values of nPC.GLM, nPC.MLM and nPC.FarmCPU?
And, How to choose values of nPC.GLM, nPC.MLM and nPC.FarmCPU when I use K, CV.GLM, CV.MLM, CV.FarmCPU .

imMVP <- MVP(
    phe=phenotype,Ff
    geno=genotype,
    map=map,
    #K=Kinship,
    #CV.GLM=Covariates,
    #CV.MLM=Covariates,
    #CV.FarmCPU=Covariates,
    nPC.GLM=5,
    nPC.MLM=3,
    nPC.FarmCPU=3,
    priority="speed",
    #ncpus=10,
    vc.method="BRENT",
    maxLoop=10,
    method.bin="FaST-LMM",#"FaST-LMM","EMMA", "static"
    #permutation.threshold=TRUE,
    #permutation.rep=100,
    threshold=0.05,
    method=c("GLM", "MLM", "FarmCPU")
)

multiple trait analysis issues

Hello, Xiaolei,

I found single trait perform correctly, multiple traits analysis running with errors.(version 0.99.10)

for(i in 2:ncol(phenotype)){imMVP <- MVP(phe=phenotype[, c(1, i)],geno=genotype,map=map,K=Kinship,nPC.GLM=5,nPC.MLM=3,nPC.FarmCPU=3,perc=1,priority="speed",vc.method="GEMMA",maxLoop=10,method.bin="FaST-LMM",permutation.threshold=TRUE,permutation.rep=100.threshold=0.05,method=c("GLM", "MLM", "FarmCPU"))}
Error: unexpected symbol in " 2:ncol(phenotype)){imMVP <- MVP(phe=phenotype[, c(1, i)],geno=genotype,map=map,K=Kinship,nPC.GLM=5,nPC.MLM=3,nPC.FarmCPU=3,perc=1,priority="speed",vc.method="GEMMA",maxLoop=10,method.bin="FaS"
Execution halted

Error in big.PCA

Hi,

I have 2 questions:

Q1. I meet this problem at the end when running MVP:-

Error in big.PCA(bigMat = big.geno, pcs.to.keep = pcs.keep) :
could not find function "big.PCA"

Here is my script:

> imMVP <- MVP(
+     phe=phenotype,
+     geno=genotype,
+     map=map,
+     #K=Kinship,
+     #CV.GLM=Covariates,
+     #CV.MLM=Covariates,
+     #CV.FarmCPU=Covariates,
+     nPC.GLM=3,
+     nPC.MLM=3,
+     nPC.FarmCPU=3,
+     perc=1,
+     priority="speed",
+     ncpus=12,
+     vc.method="EMMA",
+     maxLoop=10,
+     method.bin="FaST-LMM",#"FaST-LMM","EMMA", "static"
+     #permutation.threshold=TRUE,
+     #permutation.rep=100,
+     threshold=0.05,
+     method=c("GLM", "MLM", "FarmCPU")
+ )
#--------------------------------------------Welcome to MVP---------------------------------------------# 
#          A Memory-efficient, Visualization-enhanced, and Parallel-accelerated Tool For GWAS           # 
# Version: 1.0.1                                                                                        # 
# Authors: Lilin Yin, Haohao Zhang, Zhiwu Zhang, Xinyun Li, Xiaohui Yuan, Shuhong Zhao, and Xiaolei Liu # 
# Contact: [email protected]                                                                  # 
#-------------------------------------------------------------------------------------------------------# 
[1] "Input data has 34 individuals, 51910 markers"
[1] "Principal Component Analysis Start..."
Error in big.PCA(bigMat = big.geno, pcs.to.keep = pcs.keep) : 
  could not find function "big.PCA"

Q2. I have 38 individuals but it appears that it is reading only 34?

My MVP.Data script looks like this and was successful:

> MVP.Data(fileHMP="~/Desktop/SNP_formatted.hmp.txt",
+          filePhe="~/Desktop/Phenotype.txt",
+          sep.hmp="\t",
+          sep.phe="\t",
+          SNP.effect="Add",
+          fileKin=FALSE,
+          filePC=FALSE,
+          out="mvp.hmp",
+          #priority="memory",
+          #maxLine=10000
+ )
[1] "Preparing data for MVP..."
[1] "Preparation for PHENOTYPE data is done!"
[1] "Output numeric genotype..."
[1] "File: ~/Desktop/myGAPIT/SNP_Rlm7_formatted.hmp.txt ; Total markers:  51910  finished!"
[1] "Preparation for numeric data is done!"
[1] "Output mvp genotype..."
[1] "MVP data prepration accomplished successfully!"

Matrix dimensions do not agree

I added my data using the MVP.data function. I've confirmed that the output files all have the same n value, but when I run the following code, I get a Matrix dimensions error. This error does not occur if I run the model without PCs.
I noticed in the MVP readme, that the Covariate lines were commented out. Is this a problem with the underlying packages?

MVP.Data(fileHMP="data/filtered.hmp.txt",
         filePhe="data/mvp_input_20180511.txt",
         sep.hmp="\t",
         sep.phe="\t",
         SNP.effect="Add",
         fileKin=T,
         filePC=T,
         out="mvp"
)

genotype <- attach.big.matrix("mvp.geno.desc")
phenotype <- read.table("mvp.phe",head=TRUE)
map <- read.table("mvp.map" , head = TRUE)
Kinship <- attach.big.matrix("mvp.kin.desc")
Covariates <- attach.big.matrix("mvp.pc.desc")
for(i in 2:ncol(phenotype)){
  chiles_env_pcs <- MVP(
    phe=phenotype[, c(1, i)],
    geno=genotype,
    map=map,
    K=Kinship,
    CV.GLM=Covariates,
    CV.MLM=Covariates,
    CV.FarmCPU=Covariates,
    nPC.GLM=5,
    nPC.MLM=3,
    nPC.FarmCPU=3,
    perc=1,
    priority="memory",
    ncpus=2,
    vc.method="GEMMA",
    maxLoop=10,
    method.bin="FaST_LMM",#"FaST-LMM","EMMA", "static"
    threshold = 1.895353e-06,
    method=c("GLM", "MLM", "FarmCPU")  )
}

The Error Message:

[1] "Input data has 190 individuals, 30418 markers"
[1] "Principal Component Analysis Start..."
 means for first 10 snps:
 [1] 0 0 0 0 0 0 0 0 0 0
Error in SetElements.bm(x, i, j, value) : 
  Matrix dimensions do not agree with big.matrix instance set size.

Unusual FarmCPU results

Hi,

I ran FarmCPU for multiple traits and I got quite unusual results. I attach a few manhattan plots to illustrate the issue.

Cheers,

Stephan

rectangular-manhattan trait4 farmcpu
rectangular-manhattan trait7 farmcpu
rectangular-manhattan trait10 farmcpu
rectangular-manhattan trait25 farmcpu
rectangular-manhattan trait30 farmcpu
rectangular-manhattan trait31 farmcpu

Changing marker size for bonferoni correction

Hi,

I'm calculated the number of effective SNPs and was wondering how I would be able to change marker size to this value so that I can have a corrected Bonferroni threshold according to the number of effective SNPs. How do I change that in rMVP? I tried doing threshold = 0.05/effective size but that seems to inflate the threshold value even more. Any suggestions would be appreciated!

Feature request: Add N, Allele1 and Allele2 info to MVP output

@XiaoleiLiuBio

Dear professor Liu,

Thank you for providing this wonderful tool.

In practice, many gwas summary based downstream analyses required infomation like "Number of tested samples", "Allele1 (the minor allele)" and "Allele2 (the major allele)". Although these data could be extracted from the input, it is convenient that the MVP could output these data by default.

Best wishes,

Songtao Gui

cannot install the rMVP package

I tried the two methods you write on GitHub but both of them failed.

devtools::install_github("xiaoleiLiubio/[email protected]")
Downloading GitHub repo xiaoleiLiubio/[email protected]
Error: Failed to install 'rMVP' from GitHub:
Could not find tools necessary to compile a package
Call pkgbuild::check_build_tools(debug = TRUE) to diagnose the problem.

install.packages("rMVP_0.99.14.1.tar.gz", repos=NULL)
Warning: invalid package ‘rMVP_0.99.14.1.tar.gz’
Error: ERROR: no packages specified

unexpected numeric constant

The issue occurs when trying to run MVP(). It is as follows:

genotype = attach.big.matrix("/work/schnablelab/jhurst5/mvp2.hmp.numeric")
Error in parse(file = file, keep.source = keep.source) :
mvp2.hmp.numeric:1:9: unexpected numeric constant
1: 0   0
            ^
Calls: attach.big.matrix ... tryCatchOne -> -> dget -> eval -> parse
Execution halted

Nothing seems wrong with the column or row numbers in the data.

If anyone knows what the problem is, your help would be appreciated.

multi phenotype

dear liu:

when I was run a MVP, I figure that you segess code is

> for(i in 2:ncol(phenotype)){
+   imMVP <- MVP(
+     phe=phenotype[, c(1, i)],
+     geno=genotype,
+     map=map,
+     nPC.GLM=5,
+     nPC.MLM=3,
+     nPC.FarmCPU=3,
+     perc=1,
+     priority="memory",
+     ncpus=16,
+     vc.method="EMMA",
+     maxLoop=10,
+     method.bin="FaST-LMM",
+     threshold=0.05,
+     method=c("GLM", "MLM", "FarmCPU"),
+ file="pdf",
+   )
+ }

can I just Run

> imMVP <- MVP(
+     phe=phenotype,
+     geno=genotype,
+     map=map,
+     nPC.GLM=5,
+     nPC.MLM=3,
+     nPC.FarmCPU=3,
+     perc=1,
+     priority="memory",
+     ncpus=16,
+     vc.method="EMMA",
+     maxLoop=10,
+     method.bin="FaST-LMM",
+     threshold=0.05,
+     method=c("GLM", "MLM", "FarmCPU"),
+ file="pdf",
+   )

output permutation threshold

Hi Xiaolei,
thank you for this fantastic tool. I am using permutations to determine a significance threshold for a FarmCPU GWA. I see that the permuted threshold is correctly plotted to the resulting manhattan plots, however I cannot find the numerical value of the threshold saved anywhere in the object returned by MVP() function. Am I missing something? Thank you so much

Matteo

Eigen::internal::tridiagonal_qr_step rfunctions install error!

rfunctions install error!

I install rfunctions
Spectra/LinAlg/TridiagEigen.h:93:49: error: no matching function for call to 'tridiagonal_qr_step(double*&, double*&, int&, int&, Eigen::PlainObjectBase<Eigen::Matrix<double, -1, -1> >::Scalar*, int&)'
Eigen::internal::tridiagonal_qr_step(maind, subd, start, end, evecs.data(), n);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/ilustre/users/dna/.env/lib64/R/library/RcppEigen/include/Eigen/Eigenvalues:39:0,

NA phenotype

Hi Xiaolei,

first - thank you very much for information about your new package. It looks very nice and promising, i am under impressed.

I would like to know is there any way to work with MVP with missing value in phenotype? I generate MVP data (phenotype file have some missing data) and i try to start analysis (only FarmCPU) but i got following error:
when i used FaST-LMM option:

Error in if (llresults[[i]]$LL > LL.save) { :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In FUN(newX[, i], ...) : no non-missing arguments to min; returning Inf
2: In FUN(newX[, i], ...) : no non-missing arguments to min; returning Inf
3: In min(P[P != 0], na.rm = TRUE) :
no non-missing arguments to min; returning Inf
4: In if (!is.na(inclosure.size)) { :
the condition has length > 1 and only the first element will be used

and when i use static:

Error in seq.int(from, to, length.out = n) : 'from' must be finite
In addition: Warning messages:
1: In FUN(newX[, i], ...) : no non-missing arguments to min; returning Inf
2: In FUN(newX[, i], ...) : no non-missing arguments to min; returning Inf
3: In min(P[P != 0], na.rm = TRUE) :
no non-missing arguments to min; returning Inf
4: In if (!is.na(inclosure.size)) { :
the condition has length > 1 and only the first element will be used
5: In min(myPrior, na.rm = TRUE) :
no non-missing arguments to min; returning Inf
6: In min(P[P != 0], na.rm = TRUE) :
no non-missing arguments to min; returning Inf

You can repeat this error by setting up any value in demo data with NA. In basic FarmCPU package my data works well.

Best,
Marcin

best way to convert SNP genotype to numeric matrix

Hi Xiaolei,
thanks a lot for writing this nice package.
I would like to use MVP for my project, but I am not sure the best way to convert unphased SNP genotype (vcf format, around 60k SNPs) to numeric matrix (0,1,2). Do you which tool I should use to do the conversion.

Indels in HapMap file

Hi,

does indel coded as "--" are intentionally not supported? I have HapMap file, which work properly with TASSEL, however after running MVP.Data() on it, indels are coded same as alternative allele. For exaple line:

[1] "S01_17126" "A/-" "01" "17126" "+" NA NA NA NA NA NA "--" [13] "AA" "--" "AA" "--" "AA" "--" "AA" "AA" "--" "--" "AA" "AA" [25] "--" "AA" "AA" "AA" "AA" "--" "AA" "AA" "AA" "AA" "AA" "AA" [37] "--" "AA" "--" "--" "AA" "--" "--" "--" "--" "--" "--" "AA" [49] "--" "--" "AA" "--" "AA" "AA" "AA" "--" "AA" "AA" "AA" "AA" [61] "AA" "--" "--" "--" "AA" "--" "AA" "--" "AA" "--" "AA" "AA" [73] "--" "--" "--" "--" "--" "AA" "AA" "--" "--" "--" "--" "--" [85] "--" "--" "--" "--" "AA" "--" "--" "AA" "--" "--" "AA" "AA" [97] "--" "--" "AA" "AA" "AA" "--" "--" "--" "AA" "AA" "AA" "AA" [109] "AA" "AA" "AA" "AA" "AA" "--" "AA" "--" "AA" "--" "AA" "AA" [121] "--" "AA" "AA" "AA" "AA" "AA" "--" "AA" "AA" "AA" "--" "AA" [133] "AA" "--" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "AA" "--" "--" [145] "--" "AA" "AA" "AA" "AA" "AA" "--" "AA" "AA" "--" "AA" "AA" [157] "--" "--" "--" "AA" "--" "--" "AA" "AA" "AA" "--" "AA" "AA" [169] "AA" "--" "--" "--" "AA" "AA" "AA" "AA" "AA" "--" "--" "AA" [181] "AA" "AA" "AA" "--" "--" "AA" "--" "AA" "--" "--" "AA" "AA" [193] "--" "AA" "AA" "AA" "--" "AA" "--" "AA" "AA" "AA" "AA" "--" [205] "AA" "--" "--" "--" "AA" "--" "AA" "--" "--" "--" "--" "AA" [217] "--" "A-" "--" "AA" "AA" "AA" "AA" "AA" "AA" "--" "--" "--" [229] "--" "AA" "--" "--" "AA" "--" "--" "--" "--" "AA" "--" "--" [241] "AA" "--" "AA" "AA" "AA" "AA" "--" "AA" "--" "--" "--" "-A" [253] "AA" "AA" "-A" "--" "AA" "--" "AA" "AA" "AA" "--" "AA" "AA" [265] "AA" "AA" "--" "AA" "--" "AA" "AA" "AA" "--" "--" "--" "AA" [277] "--" "AA" "AA" "--" "--" "--" "AA" "--" "--" "--" "--" "--" [289] "AA" "AA" "AA" "AA" "AA" "AA" "--" "AA" "AA" "--" "AA" "AA" [301] "AA" "--" "AA" "AA" "--" "AA" "AA" "AA" "--" "--" "AA" "--" [313] "AA" "AA" "--" "AA" "--" "AA" "--" "--" "AA" "AA" "AA" "AA" [325] "--" "--" "AA" "--" "--" "--" "--" "--" "AA" "--" "--" "AA" [337] "--" "--" "--" "--" "--" "--" "AA" "--" "AA" "AA" "--" "--" [349] "AA" "--" "AA" "--" "AA" "AA" "--" "AA" "AA" "AA" "AA" "AA" [361] "--" "AA" "AA" "--" "--" "AA" "--" "AA" "AA"

in big.matrix object created with MVP.Data looks like this:

[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [76] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [151] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [226] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [301] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

In map file alleles are properly coded:
S01_17126 1 17126 A -

Rest SNPs looks fine. I have used most recent version rMVP installed from github.

Best,
Marcin

figure in ppt format

There is a R package "export" which can export figure to ppt.
So can you add this function to MVP.Report?

library(export)
library(ggplot2)
data <- read.table('~/Desktop/genetic_map/summary_genetic_physical_relationship.txt',sep='\t',header=T)
ggplot(data,aes(x=Physcial,y=Genetic)) + geom_point(size=0.4) 
graph2ppt( file="~/Desktop/genetic_map/genetic_physical.pptx", paper="A4")

Multi-Trait Manhattan Plot Issues

Hello,

I am currently using MVP.Report() to make a multi trait Manhattan plot. There are two issues I am running into:

  1. Is there an argument to change the title of the plot?

  2. Is there an argument to change/add colors to the overlapping point-plot. It appears that there are only 4 colors by default(blue, green, purple and orange) so when a 5th phenotype is added to the plot it goes back to blue, making two different phenotypes have the same color for their respective points.

Any help would be greatly appreciated.

Error in big.matrix

dear Liu,I got a problem in using ‘FarmCPU’ model.
MVP(phe=pheno,geno=geno,map=map,method=c('FarmCPU'))
Input data has 159 individuals, 1565878 markers
|------------------------GWAS Start------------------------|
FarmCPU Start...
Current loop: 1 out of maximum of 10
seqQTN:
NULL
scanning...
number of covariates in current loop is:
0

|> 1.00%
|-> 2.00%
|-> 3.00%
|--> 4.00%
|--> 5.00%
|---> 6.00%
|---> 7.00%
|----> 8.00%
|----> 9.00%
|-----> 10.00%
|-----> 11.00%
|------> 12.00%
|------> 13.00%
|-------> 14.00%
|-------> 15.00%
|--------> 16.00%
|--------> 17.00%
|---------> 18.00%
|---------> 19.00%
|----------> 20.00%
|----------> 21.00%
|-----------> 22.00%
|-----------> 23.00%
|------------> 24.00%
|------------> 25.00%
|-------------> 26.00%
|-------------> 27.00%
|--------------> 28.00%
|--------------> 29.00%
|---------------> 30.00%
|---------------> 31.00%
|----------------> 32.00%
|----------------> 33.00%
|-----------------> 34.00%
|-----------------> 35.00%
|------------------> 36.00%
|------------------> 37.00%
|-------------------> 38.00%
|-------------------> 39.00%
|--------------------> 40.00%
|--------------------> 41.00%
|---------------------> 42.00%
|---------------------> 43.00%
|----------------------> 44.00%
|----------------------> 45.00%
|-----------------------> 46.00%
|-----------------------> 47.00%
|------------------------> 48.00%
|------------------------> 49.00%
|-------------------------> 50.00%
|-------------------------> 51.00%
|--------------------------> 52.00%
|--------------------------> 53.00%
|---------------------------> 54.00%
|---------------------------> 55.00%
|----------------------------> 56.00%
|----------------------------> 57.00%
|-----------------------------> 58.00%
|-----------------------------> 59.00%
|------------------------------> 60.00%
|------------------------------> 61.00%
|-------------------------------> 62.00%
|-------------------------------> 63.00%
|--------------------------------> 64.00%
|--------------------------------> 65.00%
|---------------------------------> 66.00%
|---------------------------------> 67.00%
|----------------------------------> 68.00%
|----------------------------------> 69.00%
|-----------------------------------> 70.00%
|-----------------------------------> 71.00%
|------------------------------------> 72.00%
|------------------------------------> 73.00%
|-------------------------------------> 74.00%
|-------------------------------------> 75.00%
|--------------------------------------> 76.00%
|--------------------------------------> 77.00%
|---------------------------------------> 78.00%
|---------------------------------------> 79.00%
|----------------------------------------> 80.00%
|----------------------------------------> 81.00%
|-----------------------------------------> 82.00%
|-----------------------------------------> 83.00%
|------------------------------------------> 84.00%
|------------------------------------------> 85.00%
|-------------------------------------------> 86.00%
|-------------------------------------------> 87.00%
|--------------------------------------------> 88.00%
|--------------------------------------------> 89.00%
|---------------------------------------------> 90.00%
|---------------------------------------------> 91.00%
|----------------------------------------------> 92.00%
|----------------------------------------------> 93.00%
|-----------------------------------------------> 94.00%
|-----------------------------------------------> 95.00%
|------------------------------------------------> 96.00%
|------------------------------------------------> 97.00%
|-------------------------------------------------> 98.00%
|-------------------------------------------------> 99.00%
|-------------------------------------------------->100.00%
Current loop: 2 out of maximum of 10
Optimizing Pseudo QTNs...
Error in big.matrix(nrow = length(rows), ncol = length(cols), type = type, :
A big.matrix must have at least one row and one column
此外: Warning messages:
1: In FarmCPU.Specify(GI = GM, GP = GP, bin.size = b, inclosure.size = s) :
强制改变过程中产生了NA
2: In ID.GP/bin.size : 长的对象长度不是短的对象长度的整倍数
3: In FarmCPU.Specify(GI = GM, GP = GP, bin.size = b, inclosure.size = s) :
强制改变过程中产生了NA
4: In FarmCPU.Specify(GI = GM, GP = GP, bin.size = b, inclosure.size = s) :
强制改变过程中产生了NA
5: In FarmCPU.Specify(GI = GM, GP = GP, bin.size = b, inclosure.size = s) :
强制改变过程中产生了NA
I'm a beginner and I don't konw how to deal with it, please help me.

installation error - rfunction

Hi Xiaolei,

Im want to test this new GWAS script, however, I have a problem installing the rfunction. What it seems to be the culprit?

Thanks in advance.

Best, Charles

"Warning: running command 'make -f "Makevars.win" -f "C:/PROGRA1/R/R-341.1/etc/x64/Makeconf" -f "C:/PROGRA1/R/R-341.1/share/make/winshlib.mk" SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)' SHLIB="rfunctions.dll" WIN=64 TCLBIN=64 OBJECTS="RcppExports.o bigmemory.o cgls.o geninv.o gls.o lanczos.o linop.o lsmr.o multivar_norm.o pnorm.o testing.o utils.o"' had status 2
ERROR: compilation failed for package 'rfunctions'"

use significant SNP as CV, permutation of FarmCPU

Hi Xiaolei,
thank you for develop such a convenient package. I have some questions:
1, I hve read your paper on plos genetics, I think FarmCPU can detect epstasis loci. But new loci ware detected when previous significant SNP added to CV (somebody call this SNP-fix, p value of new loci was just excess the threshold). is that needed to add significant SNP as CV for FarmCPU model?
2, when I use permutation to set threshold on farmcpu model, MVP display "GLM", is that right?
can you tell me? (my statistics backgroud was weak, you just need tell me what was right)
thank you.

Error in solve.default(ww)

Hi Xiaolei,

When I use this command for the FarmCPU GWAS(showed below)

imMVP <- MVP(phe=phenotype,geno=genotype,map=map,K=Kinship,CV.FarmCPU=Covariates,nPC.FarmCPU=3,perc=1,priority="speed",ncpus=8,vc.method="EMMA",maxLoop=10,method.bin="FaST-LMM",permutation.threshold=TRUE,permutation.rep=100,threshold=0.05,method="FarmCPU")


#--------------------------------------Welcome to MVP--------------------------------------# 
#    A Memory-efficient, Visualization-enhanced, and Parallel-accelerated Tool For GWAS    # 
# Version: 1.0.1                                                                           # 
# Authors: Lilin Yin, Haohao Zhang, Zhiwu Zhang, Xinyun Li, Shuhong Zhao, and Xiaolei Liu  # 
# Contact: [email protected]                                                     # 
#------------------------------------------------------------------------------------------# 
[1] "Input data has 378 individuals, 258873 markers"
[1] "Principal Component Analysis Start..."
 means for first 10 snps:
 [1] 0 0 0 0 0 0 0 0 0 0
[1] "GWAS Start..."
[1] "FarmCPU Start ..."
[1] "Current loop: 1 out of maximum of 10"
[1] "seqQTN"
NULL
[1] "scanning..."
[1] "number of covariates in current loop is:"
[1] 8
Error in solve.default(ww) : 
  Lapack routine dgesv: system is exactly singular: U[5,5] = 0

I got an error about Error in solve.default(ww).The kinship and PCA were calculated by MVP.
Please help,Thanks.

Error: $ operator is invalid for atomic vectors

Hi Xiaolei,

I am getting the following error

[1] "Current loop: 2 out of maximum of 10"
[1] "Optimizing Pseudo QTNs..."
Error: $ operator is invalid for atomic vectors
In addition: Warning messages:
1: In ID.GP/bin.size :
 longer object length is not a multiple of shorter object length
2: In if (!is.na(inclosure.size)) { :
  the condition has length > 1 and only the first element will be used
3: In parallel::mclapply(1:m, seqQTN.optimize.parallel, mc.cores = ncpus) :
  all scheduled cores encountered errors in user code
Execution halted

I have tried changing minor things, but cannot figure out what the issue is. Any help would be appreciated.

Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x'

Hi,

I'm running an analysis with rMVP with a HapMap formatted SNP file, and it crashes in the PCA step.
The file was converted to HapMap from VCF via Tassel's export function. The same file works fine in GAPIT.

My analysis script:

library(rMVP)
MVP.Data(fileHMP="input.hmp.txt",
         filePhe="Phenotype.csv",
         fileKin=FALSE,
         filePC=FALSE,
         out="mvp.vcf",
         priority="memory",
)


genotype <- attach.big.matrix("mvp.vcf.geno.desc")

phenotype <- read.table("mvp.vcf.phe",head=TRUE)

map <- read.table("mvp.vcf.map" , head = TRUE)


MVP
for(i in 2:ncol(phenotype)){
  imMVP <- MVP(
    phe=phenotype[, c(1, i)],
    geno=genotype,
    map=map,
    nPC.GLM=5,
    nPC.MLM=3,
    nPC.FarmCPU=3,
    perc=1,
    file='tiff',
    priority="memory",,
    ncpus=12,
    vc.method="EMMA",
    maxLoop=10,
    method.bin="FaST-LMM",#"FaST-LMM","EMMA", "static"
    #permutation.threshold=TRUE,
    #permutation.rep=100,
    threshold=0.05,
    method=c("FarmCPU", 'MLM')
  )
}

The STDOUT:

Preparing data for MVP...
Reading file...
inds: 83        markers:4640569
Preparation for GENOTYPE data is done within 1m54s
83 common individuals between phenotype and genotype.
Preparation for PHENOTYPE data is Done within 0s
Imputing...
Impute Genotype File is done!
MVP data prepration accomplished successfully!

The STDERR:

Loading required package: Rcpp
Loading required package: RcppProgress
Loading required package: BH
Loading required package: bigmemory
Loading required package: biganalytics
Loading required package: foreach
Loading required package: biglm
Loading required package: DBI
Loading required package: parallel
Loading required package: MASS
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x'
Calls: MVP -> MVP.PCA -> prcomp -> prcomp.default -> svd
Execution halted

Have you seen this before?

mvp.pc.txt and mvp.kin.txt

Hello Xiaolei

I have a problem while creating kinship files. I have loaded "mvp.geno.desc" as genotype in my R, but when I tries "MVP.Data.Kin(TRUE, mvp_prefix='mvp', out='mvp'", it does not work.

>  MVP.Data.PC(TRUE, out='mvp', perc=1, pcs.keep=5)
Error in MVP.Data.PC(TRUE, out = "mvp", perc = 1, pcs.keep = 5) : 
   could not find function "MVP.Data.PC"

The same thing happened when I typed "MVP.Data.PC(TRUE, out='mvp', perc=1, pcs.keep=5)". How can I fix this problem? Otherwise, may I have your example the kinship file and Principal Components files?

Best regards
Leah

Issue with loading vcf file

Hi,

I tried to load my vcf file using the MVP.Data function to convert it to numeric but I always get the the error

"Error in VCF.Numeralization(x = tt2[-c(1:9)][index], impute = SNP.impute) : object 'maxA' not found"

I hope you can help me.

Cheers,

Stephan

Missing genotype, PC, kinship data in MVP.Data output from .vcf

I'm having trouble with my data in MVP that leads to outputs lacking all genotype information per individuals in the GWAS population.

My SNP data was in .tped/.tfam, so I converted it to .vcf with PLINK before loading this data into MVP.Data. Here's the first 10 items from the last line of my .vcf input:
scaffold_724 20159 724_20159 2 1 . . PR GT 0/1

I used this command:

MVP.Data(fileVCF="my_prefix.vcf",
         # used grep in bash and found 1529 header lines with ##, confirmed with head (also bash)
         vcf.jump=1529,
         sep.vcf="\t",
         filePhe="my_pheno_data.txt",
         fileKin=TRUE,
         filePC=TRUE,
         out="my_out")

Everything appeared to be working fine, until I realized when the processing was nearly finished that the output files were only about 200MB combined (and my .vcf input is 28GB). Then, this happened:

... [1] "Number of Markers Written into File: 1: 8252000" [1] "Number of Markers Written into File: 1: 8252993" [1] "Preparation for numeric data is done!" [1] "Preparation for PHENOTYPE data is done!" [1] "Output mvp genotype..." [1] "Calculate KINSHIP using Vanraden method..." [1] "Preparation for Kinship matrix is done!" [1] "Principal Component Analysis Start ..." means for first 10 snps: [1] 0 0 0 0 0 0 0 0 0 0 [1] "Preparation for PC matrix is done!" [1] "MVP data prepration accomplished successfully!" Warning message: In big.PCA(bigMat = big.geno, pcs.to.keep = pcs.keep) : selected too many PCs to keep [5], changing to 2

> Covariates <- attach.big.matrix("my_prefix.pc.desc")
> head(Covariates)
     [,1] [,2]
[1,]    1    0
[2,]    0    1
> head(Kinship)
     [,1] [,2]
[1,]    0    0
[2,]    0    0
--

The genotype file (.map) contains only 3 columns and doesn't show the genotypes for each individual, just the Chr, SNP position and SNP ID.

Thanks for making this tool and for the help!

Do you plan to have Julia package?

Hi,

rMVP is a very excellent package and I was wondering whether I can wrap your R package into a Julia package. Then Julia users are able to use your package.

Thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.