Comments (16)
I am building the new version.
Feel free to re-open if it is not really fixed in your setup.
from eggla.
I tried just with slopes and 2 chromosomes and both issues were fixed, thank you!
from eggla.
I tried just with slopes and 2 chromosomes and both issues were fixed, thank you!
And I've just ran it on all phenos and chromosome 21 in ALSPAC with no issues.
Now to run on everything :)
Thanks both!
from eggla.
Could you run the function with clean = FALSE
, and look at one results file in /wrk/anni
, especially the columns you have.
The issue arise at the following lines changed in #97, but I did not have any issues in my tests:
Lines 510 to 538 in 75144ee
Do you know if someone else had the issue?
from eggla.
Of course, I will update when I get some results with clean=FALSE.
As far as I know @burrowsk has tested it with ALSPAC, I understood that was fine?
from eggla.
yes, that's correct - though I do now need to run the GWAS for aucs, slopes and APAR as within three different runs owing to memory issues (even when securing entire high memory nodes). However they do run in ALSPAC.
from eggla.
@annihei FWI, you don't need to run the GWAS for all chromosomes or on all individuals.
I think the issue is simple about your VCF files, which might not contain what @burrowsk and I have (if I recall properly, you do not have the same imputation metrics for instance).
from eggla.
I think it is strange that the issue occurs in a different chromosome every time. Here's now the latest result, in the command I had all traits included as in the first command shown in the above comment (that time it had gone through more chromosomes than now):
run_eggla_gwas(
data = pcs,
results = res,
id_column ="ID",
traits = c("slope_.*", "auc_.*", "^AP_.*", "^AR_.*"),
covariates = c("sex","PC1", "PC2", "PC3", "PC4", "PC5"),
vcfs = list.files(
path = "/.../genetic_files/",
pattern = "\\.vcf$|\\.vcf.gz$",
full.names = TRUE
),
working_directory = "/wrk/anni/",
use_info = TRUE,
bin_path = list(bcftools = "/usr/local/bin/bcftools",plink2 = "/usr/local/bin/plink2"),
bcftools_view_options = NULL,
build = "37",
strand = "+",
info_type = "minimac3 R2",
threads = 1,
quiet = FALSE,
clean = FALSE
)
...
....
....
[chr2.dose.vcf.gz] Results written in "/.../gwas_plink2/d73503fc0464566816b7f320358a6391__chr2.dose.results.gz"
[chr20.dose.vcf.gz] Performing PLINK2 regression ...
[chr20.dose.vcf.gz] Combining results files ...
[chr20.dose.vcf.gz] Extracting INFO ...
[chr20.dose.vcf.gz] Computing PLINK2 missing rate ...
[chr20.dose.vcf.gz] Computing PLINK2 Hardy-Weinberg test ...
Error in data.table::fread(sprintf("%s.vmiss", results_file)) :
File '/.../gwas_plink2/d73503fc0464566816b7f320358a6391__chr20.dose.vmiss' does not exist or is non-readable. getwd()=='/wrk/anni/'
In addition: There were 11 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In FUN(X[[i]], ...) :
Discarded single-line footer: <<20 41158156 20:41158156:T:C T C C T 2728.41 7158 0.38117 0.998252 PC>>
2: In FUN(X[[i]], ...) : Discarded single-line footer: <<2>>
3: In FUN(X[[i]], ...) :
Discarded single-line footer: <<20 41155126 20:41155126:A:T A T T A 7.43195 7158 0.00103827 0.7747 PC3>>
4: In FUN(X[[i]], ...) :
Discarded single-line footer: <<20 41154033 20:41154033:C:T C T T C 52.8987 7158 0.00739015 0.924387 PC2 3579 1.54572e-05 0.0175046 0.00>>
5: In FUN(X[[i]], ...) :
Discarded single-line footer: <<20 41150971 20:41150971:T:G T G G T 0 7158 0 n>>
6: In FUN(X[[i]], ...) : Discarded single-line footer: <<20 41154162>>
7: In FUN(X[[i]], ...) :
Discarded single-line footer: <<20 41156384 20:41156384:T:A T A A T 0.23999 715>>
8: In FUN(X[[i]], ...) :
Discarded single-line footer: <<20 41148313 20:41148313:T:A T A A T 4.75684 7158 0.0>>
9: In FUN(X[[i]], ...) :
Discarded single-line footer: <<20 41152302 20:41152302:G:A G A A G 3.02502 7158 0.00042>>
10: In FUN(X[[i]], ...) :
Discarded single-line footer: <<20 41156241 20:41156241:A:C A>>
11: In FUN(X[[i]], ...) :
Discarded single-line footer: <<20 41157181 20:41157181:T:C T C C T 27.1082 7158 0.00378712 0.929206 PC3 3579 -0.000134311 0.000176839 -0.759515 0.>>
When I compare the files in .../gwas_plink2/ file between chr2 and chr20, they have the same files except these:
only chr2:
d73503fc0464566816b7f320358a6391__chr2.dose.hardy
d73503fc0464566816b7f320358a6391__chr2.dose.smiss
d73503fc0464566816b7f320358a6391__chr2.dose.vmiss
d73503fc0464566816b7f320358a6391__chr2.dose.results.gz
only chr20:
d73503fc0464566816b7f320358a6391__chr20.dose-temporary.pgen
d73503fc0464566816b7f320358a6391__chr20.dose-temporary.psam
d73503fc0464566816b7f320358a6391__chr20.dose-temporary.pvar.zst
from eggla.
Hi @annihei What happens if you only run one set of traits e.g. "slope_.*" on a couple of chromosomes (e.g. 21 & 22 for quickness; you can just move the other chromosome files to a subdirectory so they are not included in the analysis). When I ran the new code on our servers there were memory issues, even when I use entire high memory nodes, so I had to run the traits in three separate runs. I also ran the chromosomes in parallel 6 at a time as running all 22 in parallel would sometimes cause memory issues too.
from eggla.
I think it might be that PLINK2 ran out of memory thus failing to compute the "statistics" files.
But without profiling the job for memory issues, I can't really tell.
Note that your last run failed at
Lines 599 to 608 in 75144ee
I went back to PLINK2 website and the issue might be with PLINK2 itself.
I am completely sure, that's the issue, but the bugs fixed here seems highly related to both of your issues @burrowsk and @annihei
Could you try to update your PLINK2 version?
Internally eggla
uses the 9th of January version, version just before the three bugs were fixed.
29 Mar: Fixed a rare multiallelic-variant .pgen writing bug that could result in an obviously-corrupted .pgen (--validate reports unexpected file size) when REF and ALT1 frequencies are both low.
25 Mar: Fixed --indep-pairwise chrX bug that occurred when some nonfounders were present. --indep-pairwise speed improvement. --indep-order flag added; "--indep-order 2" greatly speeds up --indep-pairwise while slightly changing results, and will become the default in the future. --glm 'provref' optional column set added, distinguishing provisional from known reference alleles; this will be included in the default column set in the future.
3 Mar: Fixed --hardy/--hwe bug that could cause a segfault when chrX and multiallelic variants are present, and sex information is absent. --glm 'omitted' optional column set added, listing only the omitted allele; this (and 'a1freq') will be included in the default column set in the future. --glm 'ax' column set deprecated. Linux builds now avoid requesting more memory than reported by /proc/meminfo's 'MemAvailable' entry (when it exists). --pca now prints a more informative error message when there is a nan in the GRM.
from eggla.
I am upgrading PLINK2 inside eggla
and inside the Docker container to the latest, see #108.
from eggla.
I just released v0.19.2 (see https://github.com/mcanouil/eggla/releases/tag/v0.19.2).
Continuous Integration checks and Docker image build are ongoing.
Let me know, if the issue was really about the segfault in PLINK2.
from eggla.
I managed to analyse the slopes and all chromosomes on one go after the update (didn't try with all phenotypes). I got following warnings:
Warning: --glm 'ax' column is deprecated; you probably want 'omitted' instead.
If you think you need ax's current behavior, contact the developers and
describe your use case.
I'll go through the results with more detail this week as this was the first time I managed to get results after the updates.
gwas-software.txt is missing the plink2 and bcftools versions:
$ cat gwas-software.txt
R version 4.2.2 Patched (2022-11-10 r83330)
from eggla.
This is weird to not have the versions.
You can try to run the piece of code which retriever version, see
Lines 173 to 197 in ce06337
Also could you check the path argument you used for the function? By default it looked for the system PLINK2, see
Lines 94 to 115 in ce06337
You can use
eggla
embedded PLINK2 as in the test example:https://github.com/mcanouil/eggla/blob/main/inst/setup/eggla-example.R
from eggla.
Here is my command:
run_eggla_gwas(
data = pcs,
results = res,
id_column ="ID",
traits = c("slope_.*"),
covariates = c("sex","PC1", "PC2", "PC3", "PC4", "PC5"),
vcfs = list.files(
path = "/.../genetic_files/",
pattern = "\\.vcf$|\\.vcf.gz$",
full.names = TRUE
),
working_directory = "/.../nfbc1966",
use_info = TRUE,
bin_path = list(bcftools = "/usr/local/bin/bcftools",plink2 = "/usr/local/bin/plink2"),
bcftools_view_options = NULL,
build = "37",
strand = "+",
info_type = "minimac3 R2",
threads = 1,
quiet = FALSE,
clean = FALSE
)
And here's for plink and bcftools:
> bin_path = list(bcftools = "/usr/local/bin/bcftools",plink2 = "/usr/local/bin/plink2")
> plink_version <- try(
+ expr = system(
ignore.stdout = TRUE,
ignore.stderr = TRUE
+ command = sprintf("%s --version", bin_path[["plink2"]]),
+ intern = TRUE,
+ ignore.stdout = TRUE,
+ ignore.stderr = TRUE
+ ),
+ silent = TRUE
+ )
> if (inherits(plink_version, "try-error") || !file.exists(bin_path[["plink2"]])) {
+ stop("Please check PLINK2 binary path!")
+ }
> plink_version
character(0)
> bcftools_version <- try(
+ expr = system(
+ command = sprintf("%s --version", bin_path[["bcftools"]]),
+ intern = TRUE,
+ ignore.stdout = TRUE,
+ ignore.stderr = TRUE
+ ),
+ silent = TRUE
+ )
> if (inherits(bcftools_version, "try-error") || !file.exists(bin_path[["bcftools"]])) {
+ stop("Please check BCFTools binary path!")
+ }
> bcftools_version
character(0)
Plink should be fine :
$ /usr/local/bin/plink2
PLINK v2.00a4LM AVX2 Intel (11 Apr 2023) www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang GNU General Public License v3
...
...
...
$ /usr/local/bin/bcftools
Program: bcftools (Tools for variant calling and manipulating VCFs and BCFs)
Version: 1.10.2 (using htslib 1.10.2)
Usage: bcftools [--version|--version-only] [--help] <command> <argument>
...
...
...
from eggla.
@annihei Thanks, for these detailed output. I was able to reproduce both issue: version not being printed in the file and the new warning from PLINK2 GLM.
A fixe is coming in #109
from eggla.
Related Issues (20)
- Replace slopes_dt with aucs_dt in outlier removal [run_eggla_lmm()]
- Inconsistent use of vep and vep_file in run_eggla_gwas() HOT 1
- Title of residual plots in run_eggla_lmm()
- Increase of number of iterations for the EM algorithm `niterEM` HOT 5
- Currently in `egg_model` (and related), warnings are not necessarily captured
- Covariates and slopes HOT 6
- GWAS function output file to tab separated HOT 9
- `growthcleanr` has been archived from CRAN HOT 2
- Update knots from 2,8,12 to 1,8,12
- Check that all diagnostics figures/tables to the output of `run_eggla_lmm()` HOT 2
- Outlier plot legend getting mixed HOT 2
- Error in run_eggla_gwas when use_info=TRUE HOT 8
- All parameters correlations
- INFO_TYPE description given to genotyped SNPs HOT 5
- Incorrect parameter correlations HOT 4
- Release `eggla` `v1.0.0` HOT 5
- HWE_P in run_eggla_gwas() results inconsistent with analysis plan HOT 6
- N and EAF definition HOT 10
- Update default value for compute_apar function HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from eggla.