Giter VIP home page Giter VIP logo

Comments (16)

mcanouil avatar mcanouil commented on June 12, 2024 1

I am building the new version.
Feel free to re-open if it is not really fixed in your setup.

from eggla.

annihei avatar annihei commented on June 12, 2024 1

I tried just with slopes and 2 chromosomes and both issues were fixed, thank you!

from eggla.

burrowsk avatar burrowsk commented on June 12, 2024 1

I tried just with slopes and 2 chromosomes and both issues were fixed, thank you!

And I've just ran it on all phenos and chromosome 21 in ALSPAC with no issues.
Now to run on everything :)

Thanks both!

from eggla.

mcanouil avatar mcanouil commented on June 12, 2024

Could you run the function with clean = FALSE, and look at one results file in /wrk/anni, especially the columns you have.
The issue arise at the following lines changed in #97, but I did not have any issues in my tests:

eggla/R/run_eggla_gwas.R

Lines 510 to 538 in 75144ee

results <- data.table::setnames(
x = data.table::rbindlist(
l = lapply(
X = (function(x) `names<-`(x, sub("[^.]*\\.", "", x)))(
list.files(
path = dirname(results_file),
pattern = sprintf("%s\\..*\\.glm\\..*", basename(results_file)),
full.names = TRUE
)
),
FUN = data.table::fread
),
idcol = "trait_model"
),
old = function(x) sub("^#", "", x)
)[
TEST %in% "ADD", -c("TEST", "T_STAT", "REF", "ALT")
]
data.table::setnames(
x = results,
old = c(
"CHROM", "POS", "ID", "A1", "AX",
"A1_FREQ", "MACH_R2", "OBS_CT", "BETA", "SE", "P", "ERRCODE"
),
new = c(
"CHR", "POS", "SNPID", "EFFECT_ALLELE", "NON_EFFECT_ALLELE",
"EAF", "PLINK2_MACH_R2", "N", "BETA", "SE", "P", "ERRCODE"
)
)

image

Do you know if someone else had the issue?

from eggla.

annihei avatar annihei commented on June 12, 2024

Of course, I will update when I get some results with clean=FALSE.

As far as I know @burrowsk has tested it with ALSPAC, I understood that was fine?

from eggla.

burrowsk avatar burrowsk commented on June 12, 2024

yes, that's correct - though I do now need to run the GWAS for aucs, slopes and APAR as within three different runs owing to memory issues (even when securing entire high memory nodes). However they do run in ALSPAC.

from eggla.

mcanouil avatar mcanouil commented on June 12, 2024

@annihei FWI, you don't need to run the GWAS for all chromosomes or on all individuals.
I think the issue is simple about your VCF files, which might not contain what @burrowsk and I have (if I recall properly, you do not have the same imputation metrics for instance).

from eggla.

annihei avatar annihei commented on June 12, 2024

I think it is strange that the issue occurs in a different chromosome every time. Here's now the latest result, in the command I had all traits included as in the first command shown in the above comment (that time it had gone through more chromosomes than now):

run_eggla_gwas(
  data = pcs,
  results = res,
  id_column ="ID",
  traits = c("slope_.*", "auc_.*", "^AP_.*", "^AR_.*"),
  covariates = c("sex","PC1", "PC2", "PC3", "PC4", "PC5"),
  vcfs = list.files(
      path = "/.../genetic_files/",
      pattern = "\\.vcf$|\\.vcf.gz$",
      full.names = TRUE
    ),
  working_directory = "/wrk/anni/",
  use_info = TRUE,
  bin_path = list(bcftools = "/usr/local/bin/bcftools",plink2 = "/usr/local/bin/plink2"),
  bcftools_view_options = NULL,
  build = "37",
  strand = "+",
  info_type = "minimac3 R2",
  threads = 1,
  quiet = FALSE,
  clean = FALSE
)
...
....
....
[chr2.dose.vcf.gz] Results written in "/.../gwas_plink2/d73503fc0464566816b7f320358a6391__chr2.dose.results.gz"
[chr20.dose.vcf.gz] Performing PLINK2 regression ...
[chr20.dose.vcf.gz] Combining results files ...
[chr20.dose.vcf.gz] Extracting INFO ...
[chr20.dose.vcf.gz] Computing PLINK2 missing rate ...
[chr20.dose.vcf.gz] Computing PLINK2 Hardy-Weinberg test ...
Error in data.table::fread(sprintf("%s.vmiss", results_file)) :
  File '/.../gwas_plink2/d73503fc0464566816b7f320358a6391__chr20.dose.vmiss' does not exist or is non-readable. getwd()=='/wrk/anni/'
In addition: There were 11 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In FUN(X[[i]], ...) :
  Discarded single-line footer: <<20    41158156        20:41158156:T:C T       C       C       T       2728.41 7158    0.38117 0.998252        PC>>
2: In FUN(X[[i]], ...) : Discarded single-line footer: <<2>>
3: In FUN(X[[i]], ...) :
  Discarded single-line footer: <<20    41155126        20:41155126:A:T A       T       T       A       7.43195 7158    0.00103827      0.7747  PC3>>
4: In FUN(X[[i]], ...) :
  Discarded single-line footer: <<20    41154033        20:41154033:C:T C       T       T       C       52.8987 7158    0.00739015      0.924387        PC2     3579    1.54572e-05     0.0175046     0.00>>
5: In FUN(X[[i]], ...) :
  Discarded single-line footer: <<20    41150971        20:41150971:T:G T       G       G       T       0       7158    0       n>>
6: In FUN(X[[i]], ...) : Discarded single-line footer: <<20     41154162>>
7: In FUN(X[[i]], ...) :
  Discarded single-line footer: <<20    41156384        20:41156384:T:A T       A       A       T       0.23999 715>>
8: In FUN(X[[i]], ...) :
  Discarded single-line footer: <<20    41148313        20:41148313:T:A T       A       A       T       4.75684 7158    0.0>>
9: In FUN(X[[i]], ...) :
  Discarded single-line footer: <<20    41152302        20:41152302:G:A G       A       A       G       3.02502 7158    0.00042>>
10: In FUN(X[[i]], ...) :
  Discarded single-line footer: <<20    41156241        20:41156241:A:C A>>
11: In FUN(X[[i]], ...) :
  Discarded single-line footer: <<20    41157181        20:41157181:T:C T       C       C       T       27.1082 7158    0.00378712      0.929206        PC3     3579    -0.000134311    0.000176839   -0.759515       0.>>

When I compare the files in .../gwas_plink2/ file between chr2 and chr20, they have the same files except these:
only chr2:
d73503fc0464566816b7f320358a6391__chr2.dose.hardy
d73503fc0464566816b7f320358a6391__chr2.dose.smiss
d73503fc0464566816b7f320358a6391__chr2.dose.vmiss
d73503fc0464566816b7f320358a6391__chr2.dose.results.gz

only chr20:
d73503fc0464566816b7f320358a6391__chr20.dose-temporary.pgen
d73503fc0464566816b7f320358a6391__chr20.dose-temporary.psam
d73503fc0464566816b7f320358a6391__chr20.dose-temporary.pvar.zst

from eggla.

burrowsk avatar burrowsk commented on June 12, 2024

Hi @annihei What happens if you only run one set of traits e.g. "slope_.*" on a couple of chromosomes (e.g. 21 & 22 for quickness; you can just move the other chromosome files to a subdirectory so they are not included in the analysis). When I ran the new code on our servers there were memory issues, even when I use entire high memory nodes, so I had to run the traits in three separate runs. I also ran the chromosomes in parallel 6 at a time as running all 22 in parallel would sometimes cause memory issues too.

from eggla.

mcanouil avatar mcanouil commented on June 12, 2024

I think it might be that PLINK2 ran out of memory thus failing to compute the "statistics" files.
But without profiling the job for memory issues, I can't really tell.

Note that your last run failed at

eggla/R/run_eggla_gwas.R

Lines 599 to 608 in 75144ee

if (!quiet) message(sprintf("[%s] Computing PLINK2 missing rate ...", basename(vcf)))
system(paste(c(
bin_path[["plink2"]],
"--vcf", vcf_file, "dosage=DS",
"--threads", threads,
"--missing", "vcols=+chrom,+nmissdosage,+nobs,+fmiss",
if (file.exists(sprintf("%s.samples", basename_file))) c("--keep", sprintf("%s.samples", basename_file)),
"--silent",
"--out", results_file
), collapse = " "))

I went back to PLINK2 website and the issue might be with PLINK2 itself.
I am completely sure, that's the issue, but the bugs fixed here seems highly related to both of your issues @burrowsk and @annihei
Could you try to update your PLINK2 version?
Internally eggla uses the 9th of January version, version just before the three bugs were fixed.
image

29 Mar: Fixed a rare multiallelic-variant .pgen writing bug that could result in an obviously-corrupted .pgen (--validate reports unexpected file size) when REF and ALT1 frequencies are both low.
25 Mar: Fixed --indep-pairwise chrX bug that occurred when some nonfounders were present. --indep-pairwise speed improvement. --indep-order flag added; "--indep-order 2" greatly speeds up --indep-pairwise while slightly changing results, and will become the default in the future. --glm 'provref' optional column set added, distinguishing provisional from known reference alleles; this will be included in the default column set in the future.
3 Mar: Fixed --hardy/--hwe bug that could cause a segfault when chrX and multiallelic variants are present, and sex information is absent. --glm 'omitted' optional column set added, listing only the omitted allele; this (and 'a1freq') will be included in the default column set in the future. --glm 'ax' column set deprecated. Linux builds now avoid requesting more memory than reported by /proc/meminfo's 'MemAvailable' entry (when it exists). --pca now prints a more informative error message when there is a nan in the GRM.

from eggla.

mcanouil avatar mcanouil commented on June 12, 2024

I am upgrading PLINK2 inside eggla and inside the Docker container to the latest, see #108.

from eggla.

mcanouil avatar mcanouil commented on June 12, 2024

I just released v0.19.2 (see https://github.com/mcanouil/eggla/releases/tag/v0.19.2).
Continuous Integration checks and Docker image build are ongoing.

Let me know, if the issue was really about the segfault in PLINK2.

from eggla.

annihei avatar annihei commented on June 12, 2024

I managed to analyse the slopes and all chromosomes on one go after the update (didn't try with all phenotypes). I got following warnings:

Warning: --glm 'ax' column is deprecated; you probably want 'omitted' instead.
If you think you need ax's current behavior, contact the developers and
describe your use case.

I'll go through the results with more detail this week as this was the first time I managed to get results after the updates.

gwas-software.txt is missing the plink2 and bcftools versions:

$ cat gwas-software.txt
R version 4.2.2 Patched (2022-11-10 r83330)

from eggla.

mcanouil avatar mcanouil commented on June 12, 2024

This is weird to not have the versions.

You can try to run the piece of code which retriever version, see

eggla/R/run_eggla_gwas.R

Lines 173 to 197 in ce06337

plink_version <- try(
expr = system(
command = sprintf("%s --version", bin_path[["plink2"]]),
intern = TRUE,
ignore.stdout = TRUE,
ignore.stderr = TRUE
),
silent = TRUE
)
if (inherits(plink_version, "try-error") || !file.exists(bin_path[["plink2"]])) {
stop("Please check PLINK2 binary path!")
}
bcftools_version <- try(
expr = system(
command = sprintf("%s --version", bin_path[["bcftools"]]),
intern = TRUE,
ignore.stdout = TRUE,
ignore.stderr = TRUE
),
silent = TRUE
)
if (inherits(bcftools_version, "try-error") || !file.exists(bin_path[["bcftools"]])) {
stop("Please check BCFTools binary path!")
}

Also could you check the path argument you used for the function? By default it looked for the system PLINK2, see

eggla/R/run_eggla_gwas.R

Lines 94 to 115 in ce06337

run_eggla_gwas <- function(
data,
results,
id_column,
traits = c("slope_.*", "auc_.*", "^AP_.*", "^AR_.*"),
covariates,
vcfs,
working_directory,
vep_file = NULL,
use_info = TRUE,
bin_path = list(
bcftools = "/usr/bin/bcftools",
plink2 = "/usr/bin/plink2"
),
bcftools_view_options = NULL,
build = "38",
strand = "+",
info_type = "IMPUTE2 info score via 'bcftools +impute-info'",
threads = 1,
quiet = FALSE,
clean = TRUE
) {

You can use eggla embedded PLINK2 as in the test example:
https://github.com/mcanouil/eggla/blob/main/inst/setup/eggla-example.R

from eggla.

annihei avatar annihei commented on June 12, 2024

Here is my command:

run_eggla_gwas(
  data = pcs,
  results = res,
  id_column ="ID",
  traits = c("slope_.*"),
  covariates = c("sex","PC1", "PC2", "PC3", "PC4", "PC5"),
  vcfs = list.files(
      path = "/.../genetic_files/",
      pattern = "\\.vcf$|\\.vcf.gz$",
      full.names = TRUE
    ),
  working_directory = "/.../nfbc1966",
  use_info = TRUE,
  bin_path = list(bcftools = "/usr/local/bin/bcftools",plink2 = "/usr/local/bin/plink2"),
  bcftools_view_options = NULL,
  build = "37",
  strand = "+",
  info_type = "minimac3 R2",
  threads = 1,
  quiet = FALSE,
  clean = FALSE
)

And here's for plink and bcftools:

> bin_path = list(bcftools = "/usr/local/bin/bcftools",plink2 = "/usr/local/bin/plink2")
>  plink_version <- try(
+    expr = system(
     ignore.stdout = TRUE,
     ignore.stderr = TRUE
+      command = sprintf("%s --version", bin_path[["plink2"]]),
+      intern = TRUE,
+      ignore.stdout = TRUE,
+      ignore.stderr = TRUE
+    ),
+    silent = TRUE
+  )

>  if (inherits(plink_version, "try-error") || !file.exists(bin_path[["plink2"]])) {
+    stop("Please check PLINK2 binary path!")
+  }
> plink_version
character(0)

> bcftools_version <- try(
+    expr = system(
+      command = sprintf("%s --version", bin_path[["bcftools"]]),
+      intern = TRUE,
+      ignore.stdout = TRUE,
+      ignore.stderr = TRUE
+    ),
+    silent = TRUE
+  )
>  if (inherits(bcftools_version, "try-error") || !file.exists(bin_path[["bcftools"]])) {
+    stop("Please check BCFTools binary path!")
+  }
> bcftools_version
character(0)

Plink should be fine :

$ /usr/local/bin/plink2
PLINK v2.00a4LM AVX2 Intel (11 Apr 2023)       www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
...
...
...

$ /usr/local/bin/bcftools

Program: bcftools (Tools for variant calling and manipulating VCFs and BCFs)
Version: 1.10.2 (using htslib 1.10.2)

Usage:   bcftools [--version|--version-only] [--help] <command> <argument>
...
...
...

from eggla.

mcanouil avatar mcanouil commented on June 12, 2024

@annihei Thanks, for these detailed output. I was able to reproduce both issue: version not being printed in the file and the new warning from PLINK2 GLM.
A fixe is coming in #109

from eggla.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.