Hi Xihao, Sorry to keep bothering you. I'm trying to run the associa

Hi Daniel, Apology not necessary and happy to help. <p dir="auto

Step 3.2: Gene-centric noncoding analysis about staarpipeline-tutorial HOT 9 CLOSED

xihaoli commented on September 25, 2024

Step 3.2: Gene-centric noncoding analysis

from staarpipeline-tutorial.

Comments (9)

xihaoli commented on September 25, 2024

Hi Daniel,

Apology not necessary and happy to help.

For 1), it seems like your id.genotype is a NULL object in R to cause the error (lines 73-79). You may need to check whether the sample id in your genotype data

id.genotype <- seqGetData(genofile,"sample.id")

where genofile is your opened aGDS file and the sample id in your phenotype data

phenotype.id <- as.character(obj_nullmodel$id_include)

are aligned well.

For 2), I am not that familiar with LSF cluster, but essentially you need to run

Rscript STAARpipeline_Gene_Centric_Noncoding.r <int number>

in a loop from {1..379} and the results will include all ~20,000 genes across 22 chromosomes. This is also the case for the rest of the analyses in 3.2.

Hope this helps.

Best,
Xihao

from staarpipeline-tutorial.

daniel-hui commented on September 25, 2024

Thank you Xihao. I was able to run STAARpipeline_Gene_Centric_Noncoding.r in a loop to 379. However, the output Rdata files are all very small, 213 bytes or less, and the output for result fields seem to be "NULL". They are attached and uploaded here https://drive.google.com/file/d/1TXES3vdgXOvYGLbs-XNmbe92dtiL8jUT/view?usp=share_link if it's helpful -- if the issue isn't immediately clear please don't feel the need to take a close look, I may try just running everything cleanly from the beginning to see if that resolves things.
out_1.txt

from staarpipeline-tutorial.

xihaoli commented on September 25, 2024

Hi Daniel,

Thanks for sharing and I'll take a look. One thing to confirm- have you checked the sample id in both your genotype and phenotype data and made sure they are aligned? Also, is there an output log file (like the one you pasted above) that is available to share?

Best,
Xihao

from staarpipeline-tutorial.

daniel-hui commented on September 25, 2024

The sample IDs seem to be matched (there was some small issue before I was able to run STAARpipeline_Gene_Centric_Noncoding.r but it ran afterward). I saved the stderr/stdout when I ran STAARpipeline_Gene_Centric_Noncoding.r to out_1.txt (couldn't upload the Rdata file to GitHub). Thanks.

from staarpipeline-tutorial.

xihaoli commented on September 25, 2024

Hi Daniel,

Upon checking the output you shared, it seems like the annotation/filter channel in your aGDS file are all NA's. As such, if you set QC_label <- "annotation/filter" in the STAARpipeline_Gene_Centric_Noncoding.r script, you won't end up with any variant being the "PASS" variant and thus your noncoding analysis results will be empty (NULL). To address this issue, may I ask if all of the variants in your aGDS file have passed QC?

Best,
Xihao

from staarpipeline-tutorial.

daniel-hui commented on September 25, 2024

Are you saying that the SNPs should have the value "PASS" for the "FILTER" field in their VCFs? All the variants I'm using are post-QC but in the specific VCF files I input these values are all just ".". Should I change the value of QC_label? I tried changing it to "" but it gave this error:

Error in seqGetData(genofile, QC_label) :
  '' is not a standard variable name, and the standard format:
    sample.id, variant.id, position, chromosome, allele, genotype
    annotation/id, annotation/qual, annotation/filter
    annotation/info/VARIABLE_NAME, annotation/format/VARIABLE_NAME
    sample.annotation/VARIABLE_NAME, etc
Calls: Gene_Centric_Noncoding -> noncoding -> seqGetData
Execution halted

from staarpipeline-tutorial.

xihaoli commented on September 25, 2024

Hi Daniel,

The post-QC variants should have the value "PASS" in the final GDS/aGDS file for analysis. In your case, given all variants are post-QC, you can create a new channel in your GDS/aGDS file by setting the value of all variants as "PASS". To do this, below is an example script (I will include this note in the STAARpipeline-Tutorial):

library(gdsfmt)
library(SeqArray)

dir_geno <- "/path_to_the_GDS_file/"
gds_file_name_1 <- "freeze.5.chr"
gds_file_name_2 <- ".pass_and_fail.gtonly.minDP0.gds"

for (chr in 1:22){
  print(paste("Chromosome:",chr))
  gds.path <- paste0(dir_geno,gds_file_name_1,chr,gds_file_name_2)
  genofile<-seqOpen(gds.path, readonly = FALSE)
  #genofile
  
  position <- as.integer(seqGetData(genofile, "position"))
  length(position)
  Anno.folder <- index.gdsn(genofile, "annotation/info")
  add.gdsn(Anno.folder, "QC_label", val=factor(rep("PASS", length(position))), compress="LZMA_ra", closezip=TRUE)
  #genofile
  
  seqClose(genofile)
}

Then, in the STAARpipeline_Gene_Centric_Noncoding.r script, you can update it to

QC_label <- "annotation/info/QC_label"

Hope this is clear.

Best,
Xihao

from staarpipeline-tutorial.

daniel-hui commented on September 25, 2024

Ah great, thanks it seems to be working now. Do you have any recommendation on how to output the results to a nice tabular text format? For instance if I read in the Rdata file like this:

get(load("BMI_noncoding_1.Rdata"))

if I enter results_noncoding I can see the results but don't seem to be able to write them to file using write.csv() or write.table().

I can see the different annotation categories like results_noncoding$ but if I try to look at any individual one e.g., like results_noncoding$downstream it outputs NULL. Happy holidays.

from staarpipeline-tutorial.

xihaoli commented on September 25, 2024

Hi Daniel,

You're very welcome. After you have finished running the gene-centric noncoding analysis for the whole genome, you may follow the tutorial on Step 2.2: Summarize gene-centric noncoding analysis results, which provides necessary functions to automatically format results and create Manhattan/QQ plots, etc.

Happy holidays!

Best,
Xihao

from staarpipeline-tutorial.

Step 3.2: Gene-centric noncoding analysis about staarpipeline-tutorial HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent