My genotyping (WGS) data has 1026 samples.
`
genofile
Object of class "SeqVarGDSClass"
File: /analysis/21-WGS/analysis/STAARpipeline/gds_with_AVGDP/chr11.gds (283.1M)
The sgrm matrix is 1026x1026 . I made this using --degree 2 (everything else the same, I know non of my cases are related, and only 1 pair of controls are 1st degree relatives, which the SGRM pipeline identified)
> sgrm[100:105,100:105] ### Example of few lines of the matrix
> 6 x 6 sparse Matrix of class "dsCMatrix"
> 0_HG00151 0_HG00154 0_HG00155 0_HG00157 0_HG00158 0_HG00159
> 0_HG00151 0.5043822 . . . . .
> 0_HG00154 . 0.5009275 . . . .
> 0_HG00155 . . 0.4998904 . . .
> 0_HG00157 . . . 0.4966408 . .
> 0_HG00158 . . . . 0.4979511 .
> 0_HG00159 . . . . . 0.5044529
Phenotype has 1004 samples, with a binary variable, which I've created as follows:
tail(phenotype) #### The PCs here are those that are generated by the SGRM output .score file
FID SNPSEX pheno group_race PC1 PC2 PC3
999 0_NA20821 2 1 EUR -0.02332929 0.01675661 -0.0010476483
1000 0_NA20822 2 1 EUR -0.02330287 0.01741464 0.0001754674
1001 0_NA20826 2 1 EUR -0.02354092 0.01799696 -0.0002679973
1002 0_NA20827 1 1 EUR -0.02345230 0.01764050 -0.0008549907
1003 0_NA20828 2 1 EUR -0.02335791 0.01736181 -0.0007909704
1004 0_NA20832 2 1 EUR -0.02320004 0.01698723 0.0004000635
PC4 PC5 PC6 PC7 PC8
999 -0.003812300 -0.007291464 -0.04352340 0.0010770846 -0.0009705049
1000 -0.003096957 -0.006086234 -0.04654090 0.0005451567 -0.0014752338
1001 -0.004336742 -0.004473674 -0.04533632 0.0002382286 0.0016030985
1002 -0.003471963 -0.004603992 -0.03881987 -0.0001682949 -0.0005360977
1003 -0.003189036 -0.004899655 -0.04413306 0.0002276412 -0.0001268270
1004 -0.003445883 -0.008064412 -0.04676934 0.0007819886 -0.0020845665
....
... PC9 to PC20 ....
```>
My genotyping data is a mix of Europeans, hispanics, and African races, which I've coded as the "group_race" in the phenotype file.
From another answered question I understood that "groups" need only be used if my phenotype is a continuous variable.
So I skipped it.
`## fit null model
obj_nullmodel <- fit_nullmodel(pheno~SNPSEX+PC1+PC2+PC3+PC4+PC5+PC6+PC7+PC8+PC9+PC10, data=phenotype,kins=sgrm,use_sparse=TRUE,
kins_cutoff=0.022,id="FID",family=gaussian(link="identity"),verbose=TRUE)
`
This is the output. I am wondering if this is reasonable.
obj_nullmodel$coefficients
(Intercept) SNPSEX PC1 PC2 PC3 PC4
1.35457458 -0.22051452 -2.45884174 1.31956283 -0.12508581 0.21253838
PC5 PC6 PC7 PC8 PC9 PC10
0.40876625 1.82775475 -0.27736805 -0.14491347 0.03741783 -0.01117903
table( obj_nullmodel$residuals )
0
1004
table( obj_nullmodel$scaled.residuals)
< table of extent 0 >
head(obj_nullmodel$scaled.residuals)
1 2 3 4 5 6
NaN NaN NaN NaN NaN NaN
> obj_nullmodel$converged
[1] TRUE
> obj_nullmodel$sparse_kins
[1] TRUE
> obj_nullmodel$relatedness
[1] TRUE
> obj_nullmodel$use_SPA
[1] FALSE
I'm concerned because when I ran the Individual analysis, I got NaN Pvalues for all variants.
head analysis/results_individual_analysis_sig.csv
"","CHR","POS","REF","ALT","ALT_AF","MAF","N","pvalue","pvalue_log10","Score","Score_se","Est","Est_se"
"NA",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
"NA.1",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
"NA.2",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
"NA.3",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
"NA.4",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
"NA.5",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
"NA.6",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
Here is a one part of the individual analysis just to see the NaNs
ind<- get(load("analysis/DILIN_Individual_Analysis_78.Rdata"))
head(ind)
CHR POS REF ALT ALT_AF MAF N pvalue pvalue_log10 Score Score_se Est. Est_se
989 4 70019914 G A 0.02442672 0.02442672 1004 NaN NaN NaN. 26.30146 NaN 0.03802071
1 4 70019928 G A 0.55276382 0.44723618 1004 NaN NaN NaN. 77.90924 NaN 0.01283545
990 4 70020508 GA G 0.01846307 0.01846307 1004 NaN NaN NaN. 22.09448 NaN 0.04526017
2 4 70020586 G A 0.06000000 0.06000000 1004 NaN NaN NaN. 38.48158 NaN 0.02598646
3 4 70020606 T C 0.09450000 0.09450000 1004 NaN NaN NaN 46.11840 NaN 0.02168332
4 4 70020771 G A 0.09530938 0.09530938 1004 NaN NaN NaN. 46.27219 NaN 0.02161125