Giter VIP home page Giter VIP logo

ge's People

Watchers

 avatar

ge's Issues

The switch error results differ in two dataset.

I applied switch error rate checking on all ~7000 trios in ~/meta/trios_for_bcftools_mendelian.ped in ~/data/phasing_test/shapeit4_full_all_snp_wrong/

  1. directely on gwas.phased.vcf.gz
  2. only 100 trios child_phased.vcf.gz

Issues:

  • only 3625 trios are calculated, what are the missing trios? why?
  • I am expecting the same number of testing sites, switch rate, and Medelian errors for the 100 trios comparing both data 1) and data 2). But it turned not to be the case. Why?? -- I need to check the sites to see where it differs first.
  • switch error rate in 1) ~1.5% 2) 0.5%

multi-allelic using bcftools norm

This is potentially problematic. In the current version of merged data, we removed all multi-allelic data.

Did we keep the one of the allele of the multi-allelic site?

No, I can confirm that all multi-allelic sites are removed in the latest merged file /data/pipeline2018/run_0001/merge/chr20/merged_m2M2_snps_nohemi_minac1.vcf.gz

Why so many flips?

  1. sanity check:
  • tabulate call parents child genotype combination, fit all the flips into all the valid entries. The point being not to have abnormality in any of the entries.

  • flips against AF

  • Simon said: probability to flips against frequency in panel.

  • Check the gelsnp dataset, because the problem can be caused by undersized chunk size. Too few het sites.

liftover 10292 b37 samples

  • The number is based on the data available on 2018-10-07, rare disease, cancer germline cohort, qc_passed, genome_build!=NA.
  • Combining with 38874 build 38 files, it will make a release of 49166 samples.

singleton

  • flip rate vs MAF, the purpose of it is to see how the flips are affected by singleton, doubleton, and so on.
  • remove singletons.

VCF subsetting results empty.

There is a large number of VCF outputs are empty - size: 203Kb.
My first guess is there are some error occured.
Second guess is the chr22 instead of 22 issue.

  • pick up an empty file and redo the subsetting.
  • make a script for chr22 style files.

High mendelean error

We observed high Mendel error and low switch error rate at trio: LP3000134-dna_c04.
Why?

flip switch rate above 50%?

I obtained some flip rate (number of flips/number of switches) above 50%
Can it happen? Need to investigate.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.