Comments (12)
Okay, so I try again. The VCF I should run the function on is an ExAC file, such as ExAC.r0.3.1.sites.vep.vcf.gz from ftp://ftp.broadinstitute.org/pub/ExAC_release/current
That will give me estimates of how common the variant is in the general population, so I can later identify "germline" variants which are unusual in the general population = potential TIN variants. Is that correct?
from detin.
Hey Erle,
I included this function in the deTiN utilities package. I just noticed this instruction is in the wrong section of the wiki! Ill move that but have quoted it here
Call the deTiN function deTiN_utilities.build_exac_pickle(vcf) with your vcf file.
from detin.
Hi Amaro,
Thanks for your reply and the links.
I'm afraid it is still unclear if I should use the file that @MagdalenaZZ pointed to? or I can use that ExAC file directly as input in deTiN. Can you please clarify? Which is the VCF file that should be input in the deTiN_utilities.build_exac_pickle? The one here: ftp://ftp.broadinstitute.org/pub/ExAC_release/current or one from the link you provided? Or I can directly use a file from the link you provided instead of the ExAC file?
Thank you in advance for your answers
As a reply to your side bar - frustration can be avoided if proper documentation of the tools is provided. I think that is according to the community guidelines
Marius
from detin.
Hey Marius,
Good point. I will add that to the wiki page regarding the ExAC file and some of the additional details from our discussion.
Thanks
Amaro
from detin.
Which VCF file is "your vcf file"? Perhaps it is my somatic variant calls, annotated with ExAc "AF=" in the INFO field?
from detin.
Hi Magdalena,
Sorry for the lack of clarity. The VCF should contain germline SNPs such as the VCF generated by ExAC not a somatic VCF.
from detin.
Hi,
Any answers here? Your wiki for the ExAC file is one line which is not explicit enough for anyone.
Can you please update your wiki and elaborate a bit more about the sources for ExAC and the required input/output ?
Already the input for deTiN seems over demanding and highly specific wrt variant callers. At this point I am not sure it is worth the effort.
Thanks
from detin.
Hey Marius,
You can find VCFs with high frequency germline events here: https://gnomad.broadinstitute.org/downloads
The ExAC file is just a VCF with high frequency germline events used to filter out variants. For more on VCFs you can read the documentation. The VCF is most useful when TiN levels are high >20% and germline somatic events are more difficult to distinguish based on DNA read counts.
The input of that function is a single file (the VCF) and the output is a pickle which contains a list of sites to filter out. The numerous inputs to DeTiN are required to build an accurate model and the unfortunately variant callers in the field are numerous with no standard format for their outputs / inputs - I worked with what is commonly used by Gaddy's lab.
As a side bar - being frustrated in GitHub issue threads is not productive and is against the community guidelines.
from detin.
Hey Marius
Sure. What genome build are you using? Is it HG38? I can't clarify without knowing the details of your set up.
from detin.
Hi Amaro,
Yes. It is hg38.
Here is the list of files I have prepared so far for the input, so please let me know if that would work:
- –-mutation_data_path: from Strelka (runStats.tsv or runStats.xml? i assume the .tsv file)
- –-cn_data_path: from GATK ACNV (clean.called.CNVs.seq; clean.cr.seq; clean.modelBegin.seg; cleanmodelFinal.seg) which one to use? called.CNVs.seg ?
- –-tumor_het_data and –-normal_het_data: instead of GATK I've got VCF files (germline and somatic) from VarScan2. Is it compatible? Or deTiN is looking for some GATK specific tags in the VCF files?
- –-exac_data_path: missing
Thank you for your help.
Marius
from detin.
Hey Marius,
–-mutation_data_path: from Strelka (runStats.tsv or runStats.xml? i assume the .tsv file)
Yup use the TSV file (though I'm not familiar with Strelka outputs so I don't know what runStats.tsv is). Not sure about what headers Strelka outputs but they should be easy enough to match up. The required column names are listed here.
cn_data_path
The input should be a seg file. I haven't kept up with GATK ACNV since I'm no longer at the Broad they used to output a file with .acs.seg as a suffix. Im not 100% sure which of those files would be the right one. I think they still generate a file similar to this maybe this post would be helpful?
tumor_het_data and normal_het_data
Im not familiar with using VarScan2 as a germline caller but I would convert these VCFs to TSVs with the following headers: CONTIG,POS,REF_COUNT and ALT_COUNT' -- There are no specific tags were looking for there.
–-exac_data_path: missing
This file is not strictly required. If you want to generate your own I would use this VCF and filter for variants with allele fraction > 1%. VCFtools will allow you to do this easily. Once you have done that run the function as described in the wiki.
from detin.
Hi Amaro,
Thank you for all the details.
I will have a look at that post regarding the CNV file.
I think the information you provided should be enough to give it another try.
So then the --exac_data_path
is not a mandatory argument. OK, good to know.
Thank you once again. Maybe this would be helpful for others too if present in the README or Wiki page.
Marius
from detin.
Related Issues (20)
- Missing required input fields HOT 3
- generating inputs with GATK4 HOT 4
- Description of outputs HOT 2
- Results HOT 2
- dbsnp/cosmic.vcf HOT 1
- Which version of GATK for CallCNLoHAndSplits HOT 5
- How can we get the value "tau" in aSCNA segmentation file? HOT 2
- AttributeError: 'list' object has no attribute 'isnull' HOT 4
- TypeError: object of type 'numpy.float64' has no len(): len(self.ascna_based_model.centroids) HOT 1
- Does deTiN support to accept SSNV and SCNA data from callers other than Mutect1 and AllelicCNV? HOT 4
- Fails without INDEL data HOT 4
- n_probs argument requirement in aSCNA segmentation file: HOT 1
- Running with SSNV data only HOT 2
- Validation data on SRA HOT 2
- error while generating ExAC file HOT 1
- gatk except AllelicCNV
- error message "can't set using a multi-index" when running example data HOT 2
- Easy Question: Is "alt_allele_in_normal" from Mutect1 is similar to "normal artifact" in Mutect2(GATK4) HOT 2
- Error message
- aSCNA segmentation file from Canvas
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from detin.