Giter VIP home page Giter VIP logo

Comments (9)

xihaoli avatar xihaoli commented on September 25, 2024

Hi Daniel,

Apology not necessary and happy to help.

For 1), it seems like your id.genotype is a NULL object in R to cause the error (lines 73-79). You may need to check whether the sample id in your genotype data

id.genotype <- seqGetData(genofile,"sample.id")

where genofile is your opened aGDS file and the sample id in your phenotype data

phenotype.id <- as.character(obj_nullmodel$id_include)

are aligned well.

For 2), I am not that familiar with LSF cluster, but essentially you need to run

Rscript STAARpipeline_Gene_Centric_Noncoding.r <int number>

in a loop from {1..379} and the results will include all ~20,000 genes across 22 chromosomes. This is also the case for the rest of the analyses in 3.2.

Hope this helps.

Best,
Xihao

from staarpipeline-tutorial.

daniel-hui avatar daniel-hui commented on September 25, 2024

Thank you Xihao. I was able to run STAARpipeline_Gene_Centric_Noncoding.r in a loop to 379. However, the output Rdata files are all very small, 213 bytes or less, and the output for result fields seem to be "NULL". They are attached and uploaded here https://drive.google.com/file/d/1TXES3vdgXOvYGLbs-XNmbe92dtiL8jUT/view?usp=share_link if it's helpful -- if the issue isn't immediately clear please don't feel the need to take a close look, I may try just running everything cleanly from the beginning to see if that resolves things.
out_1.txt

from staarpipeline-tutorial.

xihaoli avatar xihaoli commented on September 25, 2024

Hi Daniel,

Thanks for sharing and I'll take a look. One thing to confirm- have you checked the sample id in both your genotype and phenotype data and made sure they are aligned? Also, is there an output log file (like the one you pasted above) that is available to share?

Best,
Xihao

from staarpipeline-tutorial.

daniel-hui avatar daniel-hui commented on September 25, 2024

The sample IDs seem to be matched (there was some small issue before I was able to run STAARpipeline_Gene_Centric_Noncoding.r but it ran afterward). I saved the stderr/stdout when I ran STAARpipeline_Gene_Centric_Noncoding.r to out_1.txt (couldn't upload the Rdata file to GitHub). Thanks.

from staarpipeline-tutorial.

xihaoli avatar xihaoli commented on September 25, 2024

Hi Daniel,

Upon checking the output you shared, it seems like the annotation/filter channel in your aGDS file are all NA's. As such, if you set QC_label <- "annotation/filter" in the STAARpipeline_Gene_Centric_Noncoding.r script, you won't end up with any variant being the "PASS" variant and thus your noncoding analysis results will be empty (NULL). To address this issue, may I ask if all of the variants in your aGDS file have passed QC?

Best,
Xihao

from staarpipeline-tutorial.

daniel-hui avatar daniel-hui commented on September 25, 2024

Are you saying that the SNPs should have the value "PASS" for the "FILTER" field in their VCFs? All the variants I'm using are post-QC but in the specific VCF files I input these values are all just ".". Should I change the value of QC_label? I tried changing it to "" but it gave this error:

Error in seqGetData(genofile, QC_label) :
  '' is not a standard variable name, and the standard format:
    sample.id, variant.id, position, chromosome, allele, genotype
    annotation/id, annotation/qual, annotation/filter
    annotation/info/VARIABLE_NAME, annotation/format/VARIABLE_NAME
    sample.annotation/VARIABLE_NAME, etc
Calls: Gene_Centric_Noncoding -> noncoding -> seqGetData
Execution halted

from staarpipeline-tutorial.

xihaoli avatar xihaoli commented on September 25, 2024

Hi Daniel,

The post-QC variants should have the value "PASS" in the final GDS/aGDS file for analysis. In your case, given all variants are post-QC, you can create a new channel in your GDS/aGDS file by setting the value of all variants as "PASS". To do this, below is an example script (I will include this note in the STAARpipeline-Tutorial):

library(gdsfmt)
library(SeqArray)

dir_geno <- "/path_to_the_GDS_file/"
gds_file_name_1 <- "freeze.5.chr"
gds_file_name_2 <- ".pass_and_fail.gtonly.minDP0.gds"

for (chr in 1:22){
  print(paste("Chromosome:",chr))
  gds.path <- paste0(dir_geno,gds_file_name_1,chr,gds_file_name_2)
  genofile<-seqOpen(gds.path, readonly = FALSE)
  #genofile
  
  position <- as.integer(seqGetData(genofile, "position"))
  length(position)
  Anno.folder <- index.gdsn(genofile, "annotation/info")
  add.gdsn(Anno.folder, "QC_label", val=factor(rep("PASS", length(position))), compress="LZMA_ra", closezip=TRUE)
  #genofile
  
  seqClose(genofile)
}

Then, in the STAARpipeline_Gene_Centric_Noncoding.r script, you can update it to

QC_label <- "annotation/info/QC_label"

Hope this is clear.

Best,
Xihao

from staarpipeline-tutorial.

daniel-hui avatar daniel-hui commented on September 25, 2024

Ah great, thanks it seems to be working now. Do you have any recommendation on how to output the results to a nice tabular text format? For instance if I read in the Rdata file like this:

get(load("BMI_noncoding_1.Rdata"))

if I enter results_noncoding I can see the results but don't seem to be able to write them to file using write.csv() or write.table().

I can see the different annotation categories like results_noncoding$ but if I try to look at any individual one e.g., like results_noncoding$downstream it outputs NULL. Happy holidays.

from staarpipeline-tutorial.

xihaoli avatar xihaoli commented on September 25, 2024

Hi Daniel,

You're very welcome. After you have finished running the gene-centric noncoding analysis for the whole genome, you may follow the tutorial on Step 2.2: Summarize gene-centric noncoding analysis results, which provides necessary functions to automatically format results and create Manhattan/QQ plots, etc.

Happy holidays!

Best,
Xihao

from staarpipeline-tutorial.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.