Giter VIP home page Giter VIP logo

Comments (12)

MagdalenaZZ avatar MagdalenaZZ commented on June 16, 2024 2

Okay, so I try again. The VCF I should run the function on is an ExAC file, such as ExAC.r0.3.1.sites.vep.vcf.gz from ftp://ftp.broadinstitute.org/pub/ExAC_release/current
That will give me estimates of how common the variant is in the general population, so I can later identify "germline" variants which are unusual in the general population = potential TIN variants. Is that correct?

from detin.

amarotaylor avatar amarotaylor commented on June 16, 2024 1

Hey Erle,
I included this function in the deTiN utilities package. I just noticed this instruction is in the wrong section of the wiki! Ill move that but have quoted it here

Call the deTiN function deTiN_utilities.build_exac_pickle(vcf) with your vcf file.

from detin.

MariusGheorghe avatar MariusGheorghe commented on June 16, 2024 1

Hi Amaro,

Thanks for your reply and the links.
I'm afraid it is still unclear if I should use the file that @MagdalenaZZ pointed to? or I can use that ExAC file directly as input in deTiN. Can you please clarify? Which is the VCF file that should be input in the deTiN_utilities.build_exac_pickle? The one here: ftp://ftp.broadinstitute.org/pub/ExAC_release/current or one from the link you provided? Or I can directly use a file from the link you provided instead of the ExAC file?
Thank you in advance for your answers

As a reply to your side bar - frustration can be avoided if proper documentation of the tools is provided. I think that is according to the community guidelines

Marius

from detin.

amarotaylor avatar amarotaylor commented on June 16, 2024 1

Hey Marius,

Good point. I will add that to the wiki page regarding the ExAC file and some of the additional details from our discussion.

Thanks
Amaro

from detin.

MagdalenaZZ avatar MagdalenaZZ commented on June 16, 2024

Which VCF file is "your vcf file"? Perhaps it is my somatic variant calls, annotated with ExAc "AF=" in the INFO field?

from detin.

amarotaylor avatar amarotaylor commented on June 16, 2024

Hi Magdalena,

Sorry for the lack of clarity. The VCF should contain germline SNPs such as the VCF generated by ExAC not a somatic VCF.

from detin.

MariusGheorghe avatar MariusGheorghe commented on June 16, 2024

Hi,
Any answers here? Your wiki for the ExAC file is one line which is not explicit enough for anyone.
Can you please update your wiki and elaborate a bit more about the sources for ExAC and the required input/output ?
Already the input for deTiN seems over demanding and highly specific wrt variant callers. At this point I am not sure it is worth the effort.
Thanks

from detin.

amarotaylor avatar amarotaylor commented on June 16, 2024

Hey Marius,

You can find VCFs with high frequency germline events here: https://gnomad.broadinstitute.org/downloads
The ExAC file is just a VCF with high frequency germline events used to filter out variants. For more on VCFs you can read the documentation. The VCF is most useful when TiN levels are high >20% and germline somatic events are more difficult to distinguish based on DNA read counts.

The input of that function is a single file (the VCF) and the output is a pickle which contains a list of sites to filter out. The numerous inputs to DeTiN are required to build an accurate model and the unfortunately variant callers in the field are numerous with no standard format for their outputs / inputs - I worked with what is commonly used by Gaddy's lab.

As a side bar - being frustrated in GitHub issue threads is not productive and is against the community guidelines.

from detin.

amarotaylor avatar amarotaylor commented on June 16, 2024

Hey Marius

Sure. What genome build are you using? Is it HG38? I can't clarify without knowing the details of your set up.

from detin.

MariusGheorghe avatar MariusGheorghe commented on June 16, 2024

Hi Amaro,

Yes. It is hg38.

Here is the list of files I have prepared so far for the input, so please let me know if that would work:

  • –-mutation_data_path: from Strelka (runStats.tsv or runStats.xml? i assume the .tsv file)
  • –-cn_data_path: from GATK ACNV (clean.called.CNVs.seq; clean.cr.seq; clean.modelBegin.seg; cleanmodelFinal.seg) which one to use? called.CNVs.seg ?
  • –-tumor_het_data and –-normal_het_data: instead of GATK I've got VCF files (germline and somatic) from VarScan2. Is it compatible? Or deTiN is looking for some GATK specific tags in the VCF files?
  • –-exac_data_path: missing

Thank you for your help.

Marius

from detin.

amarotaylor avatar amarotaylor commented on June 16, 2024

Hey Marius,

–-mutation_data_path: from Strelka (runStats.tsv or runStats.xml? i assume the .tsv file)

Yup use the TSV file (though I'm not familiar with Strelka outputs so I don't know what runStats.tsv is). Not sure about what headers Strelka outputs but they should be easy enough to match up. The required column names are listed here.

cn_data_path

The input should be a seg file. I haven't kept up with GATK ACNV since I'm no longer at the Broad they used to output a file with .acs.seg as a suffix. Im not 100% sure which of those files would be the right one. I think they still generate a file similar to this maybe this post would be helpful?

tumor_het_data and normal_het_data

Im not familiar with using VarScan2 as a germline caller but I would convert these VCFs to TSVs with the following headers: CONTIG,POS,REF_COUNT and ALT_COUNT' -- There are no specific tags were looking for there.

–-exac_data_path: missing

This file is not strictly required. If you want to generate your own I would use this VCF and filter for variants with allele fraction > 1%. VCFtools will allow you to do this easily. Once you have done that run the function as described in the wiki.

from detin.

MariusGheorghe avatar MariusGheorghe commented on June 16, 2024

Hi Amaro,

Thank you for all the details.

I will have a look at that post regarding the CNV file.
I think the information you provided should be enough to give it another try.
So then the --exac_data_path is not a mandatory argument. OK, good to know.

Thank you once again. Maybe this would be helpful for others too if present in the README or Wiki page.

Marius

from detin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.