Giter VIP home page Giter VIP logo

Comments (5)

edgardomortiz avatar edgardomortiz commented on August 25, 2024

Hi Sun,

The script was designed for converting VCFs into phylogenetic matrices, these matrices are usually made of several samples and collapse the alleles of a genotype into an ambiguity code. Your VCF only has one sample, and the RY correspond to IUPAC ambiguities, (R = G or A, Y = C or T). In other words, your heterozygous genotypes are now represented by an ambiguity code in the output.

As per another user request (#23) I will add an option so you can get only the ALT nucleotide instead of the heterozygote represented as an ambiguity, but I haven't had time yet to add that code. Is this a feature that you would find useful? or, what kind of output were you expecting?

Edgardo

from vcf2phylip.

U201412486 avatar U201412486 commented on August 25, 2024

Hi Edgardo,
Thank you for your reply.
I expect the output represented by the ambiguity code.However, it can not recognize the condition that the homozygous genotypes in the vcf file which is produced by merging several individual vcf files are represented by the point.
sun

from vcf2phylip.

edgardomortiz avatar edgardomortiz commented on August 25, 2024

Would you mind to elaborate? I don't quite get the "represented by the point" part. Also, is this VCF merged from several individuals? I am sure I am misunderstanding something.

from vcf2phylip.

U201412486 avatar U201412486 commented on August 25, 2024

Homozygous genotypes are represented by "0/0" in the vcf format,which means the nucleotides are as same as the reference.However homozygous genotypes are represented by "." after merging multiple VCF files into a single VCF file using vcf-merge software.The vcf file is just like this below.
NC_0009 4411093 . G A 226.50 PASS AC=8;AN=8;BQB=0.786616;DP4=4,0,316,349;DP=778;MQ0F=0;MQ=60;MQB=0.997475;MQSB=1;RPB=0.708333;SF=25,26,27,28;SGB=-0.693147;VDB=0.156993 GT:PL . . . . . . . . . . . . . . . . . . . . . . . . . 1/1:255,255,0 1/1:255,255,0 1/1:255,132,0 1/1:255,255,0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

from vcf2phylip.

edgardomortiz avatar edgardomortiz commented on August 25, 2024

The . represents missing genotypes in the VCF format. The genotypes are not modified after the merging, what you are seeing is that not all your samples have the same set of genotypes. Just in case, here is the format specification:
http://samtools.github.io/hts-specs/VCFv4.2.pdf (Page 6)

from vcf2phylip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.