Comments (9)
Hi Daniel,
Apology not necessary and happy to help.
For 1), it seems like your id.genotype
is a NULL
object in R
to cause the error (lines 73-79). You may need to check whether the sample id in your genotype data
id.genotype <- seqGetData(genofile,"sample.id")
where genofile
is your opened aGDS file and the sample id in your phenotype data
phenotype.id <- as.character(obj_nullmodel$id_include)
are aligned well.
For 2), I am not that familiar with LSF cluster, but essentially you need to run
Rscript STAARpipeline_Gene_Centric_Noncoding.r <int number>
in a loop from {1..379} and the results will include all ~20,000 genes across 22 chromosomes. This is also the case for the rest of the analyses in 3.2.
Hope this helps.
Best,
Xihao
from staarpipeline-tutorial.
Thank you Xihao. I was able to run STAARpipeline_Gene_Centric_Noncoding.r
in a loop to 379. However, the output Rdata files are all very small, 213 bytes or less, and the output for result fields seem to be "NULL". They are attached and uploaded here https://drive.google.com/file/d/1TXES3vdgXOvYGLbs-XNmbe92dtiL8jUT/view?usp=share_link if it's helpful -- if the issue isn't immediately clear please don't feel the need to take a close look, I may try just running everything cleanly from the beginning to see if that resolves things.
out_1.txt
from staarpipeline-tutorial.
Hi Daniel,
Thanks for sharing and I'll take a look. One thing to confirm- have you checked the sample id in both your genotype and phenotype data and made sure they are aligned? Also, is there an output log file (like the one you pasted above) that is available to share?
Best,
Xihao
from staarpipeline-tutorial.
The sample IDs seem to be matched (there was some small issue before I was able to run STAARpipeline_Gene_Centric_Noncoding.r
but it ran afterward). I saved the stderr/stdout when I ran STAARpipeline_Gene_Centric_Noncoding.r
to out_1.txt
(couldn't upload the Rdata file to GitHub). Thanks.
from staarpipeline-tutorial.
Hi Daniel,
Upon checking the output you shared, it seems like the annotation/filter
channel in your aGDS file are all NA
's. As such, if you set QC_label <- "annotation/filter"
in the STAARpipeline_Gene_Centric_Noncoding.r script, you won't end up with any variant being the "PASS"
variant and thus your noncoding analysis results will be empty (NULL
). To address this issue, may I ask if all of the variants in your aGDS file have passed QC?
Best,
Xihao
from staarpipeline-tutorial.
Are you saying that the SNPs should have the value "PASS" for the "FILTER" field in their VCFs? All the variants I'm using are post-QC but in the specific VCF files I input these values are all just ".". Should I change the value of QC_label
? I tried changing it to "" but it gave this error:
Error in seqGetData(genofile, QC_label) :
'' is not a standard variable name, and the standard format:
sample.id, variant.id, position, chromosome, allele, genotype
annotation/id, annotation/qual, annotation/filter
annotation/info/VARIABLE_NAME, annotation/format/VARIABLE_NAME
sample.annotation/VARIABLE_NAME, etc
Calls: Gene_Centric_Noncoding -> noncoding -> seqGetData
Execution halted
from staarpipeline-tutorial.
Hi Daniel,
The post-QC variants should have the value "PASS" in the final GDS/aGDS file for analysis. In your case, given all variants are post-QC, you can create a new channel in your GDS/aGDS file by setting the value of all variants as "PASS". To do this, below is an example script (I will include this note in the STAARpipeline-Tutorial):
library(gdsfmt)
library(SeqArray)
dir_geno <- "/path_to_the_GDS_file/"
gds_file_name_1 <- "freeze.5.chr"
gds_file_name_2 <- ".pass_and_fail.gtonly.minDP0.gds"
for (chr in 1:22){
print(paste("Chromosome:",chr))
gds.path <- paste0(dir_geno,gds_file_name_1,chr,gds_file_name_2)
genofile<-seqOpen(gds.path, readonly = FALSE)
#genofile
position <- as.integer(seqGetData(genofile, "position"))
length(position)
Anno.folder <- index.gdsn(genofile, "annotation/info")
add.gdsn(Anno.folder, "QC_label", val=factor(rep("PASS", length(position))), compress="LZMA_ra", closezip=TRUE)
#genofile
seqClose(genofile)
}
Then, in the STAARpipeline_Gene_Centric_Noncoding.r script, you can update it to
QC_label <- "annotation/info/QC_label"
Hope this is clear.
Best,
Xihao
from staarpipeline-tutorial.
Ah great, thanks it seems to be working now. Do you have any recommendation on how to output the results to a nice tabular text format? For instance if I read in the Rdata file like this:
get(load("BMI_noncoding_1.Rdata"))
if I enter results_noncoding
I can see the results but don't seem to be able to write them to file using write.csv()
or write.table()
.
I can see the different annotation categories like results_noncoding$
but if I try to look at any individual one e.g., like results_noncoding$downstream
it outputs NULL
. Happy holidays.
from staarpipeline-tutorial.
Hi Daniel,
You're very welcome. After you have finished running the gene-centric noncoding analysis for the whole genome, you may follow the tutorial on Step 2.2: Summarize gene-centric noncoding analysis results, which provides necessary functions to automatically format results and create Manhattan/QQ plots, etc.
Happy holidays!
Best,
Xihao
from staarpipeline-tutorial.
Related Issues (20)
- fit_nullmodel Output is mostly Null and 0 HOT 16
- Fitting NULL model for binary outcomes HOT 5
- Error in Gene Centric Analysis HOT 1
- Error in results_plof_genome[, "cMAC"] : subscript out of bounds HOT 2
- Followup Question to Issue #28 HOT 2
- STAARpipeline_Gene_Centric_Noncoding HOT 2
- Dynamic Window dim(X) error HOT 3
- Can't annotate individual variant results HOT 2
- [Suggestion-Implementation] Add information to summary and annotations of results HOT 1
- Conditional analysis - Summary Gene Centric Noncoding not running to completion HOT 6
- Ukbiobank Agds files generation HOT 16
- Plots for gene centric ncRNA regions HOT 5
- FATAL ERROR - Too many first alleles as the major allele (~21.5%). HOT 1
- warning messages in generating the annotated GDS (aGDS) file. HOT 3
- Controls / cases counts inverted when using binary model HOT 7
- kinship matrix HOT 2
- variant set in gene-centric coding/noncoding analysis HOT 2
- in the Step 2: Individual (single-variant) analysis, Error in if (chr == 1) { : argument is of length zero HOT 3
- Error : Mat::operator(): index out of bounds & Error in apply(emthr_SCANG_O, 2, max) : HOT 1
- What is the difference between the "results_temp" and "results_m" results? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from staarpipeline-tutorial.