Giter VIP home page Giter VIP logo

vvp-pub's Introduction

VVP

Variant prioritization / burden test. Version 1.5

INSTALL

DEPENDENCIES

  1. Gnu scientific library (https://www.gnu.org/software/gsl/)
  2. openmp compatible version of gcc. If your compiler (clang) is not, you can remove the -fopenmp flag in the Makefile. Change the line that looks like: CFLAGS = -lz -lm -O3 -lgsl -lgslcblas -fopenmp #-Wall to CFLAGS = -lz -lm -O3 -lgsl -lgslcblas #-fopenmp #-Wall
  3. zlib (https://zlib.net)
  4. make

BUILD

In the VVP directory:

make

Make will build 2 executables: build_background and VVP

Note: This has been built and run on Mac laptops and Linux servers.

EXAMPLE RUNNING VVP

To see available parameters of the executables, run with the -h option.

Before running VVP, a background must be built. From the VVP directory:

cd example

../build_background -i 1KG_cftr_background.recode.vep.vcf.gz -o 1KG.build -b 2500 -v CSQ,4,6,1,15

The build_background step produces output to stdout for each of the variants in the background vcf file. It also creates several different output files including extensions .bin, .chr_offsets.txt, .dist. These files contained information used by VVP.

To run prioritize variants using VVP (in the example folder):

../VVP -i target_spiked_simple.vcf.gz -d 1KG.build -v CSQ,4,6,1,15 1> target.spiked.vvp.out

target_spiked.vvp.out contains the vvp output.

PREPARE VCF FILE FOR ANALYSIS

The VVP pipeline does not support mulitallelic lines, these must first be decomposed. We recommend using vt decompose to accomplish this task (http://genome.sph.umich.edu/wiki/Vt).

Mandatory preprocessing of a vcf file includes multiallelic decomposition and VEP annotation. It is important to decompose BEFORE annnotating because of potential annotation collisions. Our recommended steps are to use vt to decompose and normalize variants followed by VEP annotation. No special options in VEP are required for the variant annotation. Testing has been done with VEP v82.

VVP BACKGROUND

A prebuilt background based on gnomAD (http://gnomad.broadinstitute.org/) for use with VVP can be downloaded here (2.5GB): https://s3-us-west-2.amazonaws.com/gnomad-vvp-background/gnomad.062717.build.tar.gz

VVP OUTPUT

VVP outputs a tab delimited file with 31 columns. The columns are the following:

column name description
chr chromosome
start variant start coord
ref reference allele
var variant allele
gene gene id
transcript transcript id
hemi_score raw variant score for hemizygous genotype
hemi_vvp vvp score for hemizygous genotype
nhemi number of hemizygous indivduals
hemi_indvs list of hemizygous individuals
hemi_nocall number of hemizygous nocalls
het_score raw variant score for heterozygous genotype
het_vvp vvp score for heterozygous genotype
nhet number of heterozygous individuals
het_indvs list of heterozygous individuals
het_nocall number of heterozygous nocalls
hom_score raw variant score for homozygous genotype
hom_vvp vvp score for homozygous genotype
nhom number of homozygous individuals
hom_indvs list of homozygous individuals
hom_nocall number of homozygous nocalls
coding_ind 1 if variant is coding, 0 otherwise
indel_ind 1 if variant is an indel, 0 otherwise
aa_score amino acid weight
n_bhemi number of hemizygous background individuals
n_bhet number of heterozygous background individuals
n_bhom number of homozygous background individuals
n_bnocall number of alleles nocalled in background
bit_offset byte offset to background
vid variant id
ll_weight optional extra weight

vvp-pub's People

Contributors

stevendflygare avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vvp-pub's Issues

How to obtain CSQ

Now,I get CSQ through vep, like this:

image

But when i run the shell

../VVP -i gnomad_chr_1_1000.vcf.gz -d gnomad.062717.build -v CSQ,4,6,1,15 1> target.spiked.vvp.out

The output target.spiked.vvp.out has nothing:

image

Using build_background for gnomAD v3

Hi,

I am having trouble creating a background using gnomAD v3.

https://storage.googleapis.com/gnomad-public/release/3.0/vcf/genomes/gnomad.genomes.r3.0.sites.vcf.bgz

Command ran:

build_background -i gnomad.genomes.r3.0.sites.vcf.bgz -b 1 -v CSQ,4,6,1,15 -o test.out

I receive the error:

#chr    start   ref     var     transcript      hemi_score      het_score       hom_score       nhemi   nhet    nhom    hemi_nocall     het_nocall      hom_nocall      coding_ind
VCF FORMAT PROBLEM:      vcf line has fewer than 10 columns, will skip: chr1:10031, C, AC=0;AN=53780;AF=0.00000e+00;lcr;variant_type=snv;n_alt_alleles=1;ReadPosRankSum=-1.38000e+00;MQRankSum=-5.72000e-01;RAW_MQ=6.39630e+04;DP=26;MQ_DP=52;VarDP=26;MQ=3.50722e+01;QD=2.96154e+00;FS=5.09715e+00;SB=21,6,3,3;InbreedingCoeff=-1.72592e-05;AS_VQSLOD=-9.41040e+00;NEGATIVE_TRAIN_SITE;culprit=AS_MQ;SOR=9.60000e-02;AC_asj_female=0;AN_asj_female=776;AF_asj_female=0.00000e+00;nhomalt_asj_female=0;AC_eas_female=0;AN_eas_female=558;AF_eas_female=0.00000e+00;nhomalt_eas_female=0;AC_afr_male=0;AN_afr_male=6700;AF_afr_male=0.00000e+00;nhomalt_afr_male=0;AC_female=0;AN_female=27974;AF_female=0.00000e+00;nhomalt_female=0;AC_fin_male=0;AN_fin_male=3278;AF_fin_male=0.00000e+00;nhomalt_fin_male=0;AC_oth_female=0;AN_oth_female=430;AF_oth_female=0.00000e+00;nhomalt_oth_female=0;AC_ami=0;AN_ami=350;AF_ami=0.00000e+00;nhomalt_ami=0;AC_oth=0;AN_oth=802;AF_oth=0.00000e+00;nhomalt_oth=0;AC_male=0;AN_male=25806;AF_male=0.00000e+00;nhomalt_male=0;AC_ami_female=0;AN_ami_female=150;AF_ami_female=0.00000e+00;nhomalt_ami_female=0;AC_afr=0;AN_afr=14854;AF_afr=0.00000e+00;nhomalt_afr=0;AC_eas_male=0;AN_eas_male=612;AF_eas_male=0.00000e+00;nhomalt_eas_male=0;AC_sas=0;AN_sas=606;AF_sas=0.00000e+00;nhomalt_sas=0;AC_nfe_female=0;AN_nfe_female=14256;AF_nfe_female=0.00000e+00;nhomalt_nfe_female=0;AC_asj_male=0;AN_asj_male=720;AF_asj_male=0.00000e+00;nhomalt_asj_male=0;AC_raw=2;AN_raw=115882;AF_raw=1.72589e-05;nhomalt_raw=0;AC_oth_male=0;AN_oth_male=372;AF_oth_male=0.00000e+00;nhomalt_oth_male=0;AC_nfe_male=0;AN_nfe_male=10036;AF_nfe_male=0.00000e+00;nhomalt_nfe_male=0;AC_asj=0;AN_asj=1496;AF_asj=0.00000e+00;nhomalt_asj=0;AC_amr_male=0;AN_amr_male=3392;AF_amr_male=0.00000e+00;nhomalt_amr_male=0;nhomalt=0;AC_amr_female=0;AN_amr_female=2454;AF_amr_female=0.00000e+00;nhomalt_amr_female=0;AC_sas_female=0;AN_sas_female=110;AF_sas_female=0.00000e+00;nhomalt_sas_female=0;AC_fin=0;AN_fin=4364;AF_fin=0.00000e+00;nhomalt_fin=0;AC_afr_female=0;AN_afr_female=8154;AF_afr_female=0.00000e+00;nhomalt_afr_female=0;AC_sas_male=0;AN_sas_male=496;AF_sas_male=0.00000e+00;nhomalt_sas_male=0;AC_amr=0;AN_amr=5846;AF_amr=0.00000e+00;nhomalt_amr=0;AC_nfe=0;AN_nfe=24292;AF_nfe=0.00000e+00;nhomalt_nfe=0;AC_eas=0;AN_eas=1170;AF_eas=0.00000e+00;nhomalt_eas=0;AC_ami_male=0;AN_ami_male=200;AF_ami_male=0.00000e+00;nhomalt_ami_male=0;AC_fin_female=0;AN_fin_female=1086;AF_fin_female=0.00000e+00;nhomalt_fin_female=0;faf95_afr=0.00000e+00;faf99_afr=0.00000e+00;faf95_sas=0.00000e+00;faf99_sas=0.00000e+00;faf95_adj=0.00000e+00;faf99_adj=0.00000e+00;faf95_amr=0.00000e+00;faf99_amr=0.00000e+00;faf95_nfe=0.00000e+00;faf99_nfe=0.00000e+00;faf95_eas=0.00000e+00;faf99_eas=0.00000e+00;vep=C|upstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000450305|transcribed_unprocessed_pseudogene|||,C|upstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000456328|processed_transcript|||,C|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000488147|unprocessed_pseudogene|||
Segmentation fault (core dumped)

Is build background not compatible with this new gnomAD version?

lock up if VVP is run concurrently

When two or more VVP processes are run concurrently they may all need to open the same background for memory mapping. For some reason the background is opened for read and write rather than for read only and this locks up the processes so everything just hangs.

I've identified the relevant bits of code here
https://github.com/Yandell-Lab/VVP-pub/blob/master/search_binary_bkgrnd.c#L20
https://github.com/Yandell-Lab/VVP-pub/blob/master/search_binary_bkgrnd.c#L24
and here
https://github.com/Yandell-Lab/VVP-pub/blob/master/search_binary_bkgrnd.c#L61
https://github.com/Yandell-Lab/VVP-pub/blob/master/search_binary_bkgrnd.c#L64

I've made a patch which seems to be working although I must admit I have not looked deep into the code.

20c20
<     int fdSrc = open(bin_file, O_RDONLY, 0);
---
>     int fdSrc = open(bin_file, O_RDWR, 0);
24c24
<     mm_bin = (unsigned char *)mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE | MAP_POPULATE, fdSrc, 0);
---
>     mm_bin = (unsigned char *)mmap(NULL, st.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fdSrc, 0);
61c61
<     int fdSrc = open(bit_file, O_RDONLY, 0);
---
>     int fdSrc = open(bit_file, O_RDWR, 0);
64c64
<     mm_bits = (unsigned char *)mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE | MAP_POPULATE, fdSrc, 0);
---
>     mm_bits = (unsigned char *)mmap(NULL, st.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fdSrc, 0);

Is write access really necessary?

how to interpret output file

Hi,

I've successfully run the VVP and got the output result (Hooray!), just wondered how should I interpret the output file, and what does each column mean in the output?

Thanks,

gnomAD vvp non-existant + vcf gives segmentation error

Since there is no gnomAD vvp build anymore I tried to run the example build (which works in the example provided), it gives segmentation error in my own vcf file which I formatted as it is in the example. What should I do to fix this problem.

how to generate percentile scores

Hi

I have successfully run vvp on my vcf file, but the output does not contain percentile scores. How do I compile percentile scores for my gene of interest using CRD curves, as done in the publication? I can't find it in the documentation or in the additional files of the publication.

Thanks,
Odette

request: bioconda

Hi
Great tool - any chance you can or will get this up as a package in bioconda?
Would be great for the bioinformatics and NGS community

Best

Steve

how to use background based on gnomAD ?

hi
I I've successfully run the VVP,but I do know how to use background based on gnomAD? I download gnomad.062717.build.tar.gz and unzip it ,can not find gnomad.062717.build.bin and gnomad.062717.build.max.
can you provide some tips or command to build gnomAD background?

WARNING: chromosome not in offsets

Hi Steven,
I tried running vvp on a test sample data and but it gives 'WARNING: chromosome chrY not in offsets' error all chr, also I have not been able to figure what the CSQ parameter is about. If you clarify these, it would be helpful for my analysis.

command : ./VVP -i /test_sample.vcf -d /VVP-pub-master/gnomad.062717.build -v CSQ,4,6,1,15 1>test_sample.out
error : WARNING: chromosome chrY not in offsets

can't build with gcc

I was able to build this with clang, but building with gcc (v 5.4.0) fails with the messages below.

CMakeFiles/VVP.dir/score_variants.c.o: In function `main':
score_variants.c:(.text+0xf1c): undefined reference to `gzdopen'
score_variants.c:(.text+0xf39): undefined reference to `gzopen'
score_variants.c:(.text+0x1076): undefined reference to `gzgets'
score_variants.c:(.text+0x111a): undefined reference to `gzgets'
CMakeFiles/VVP.dir/score_variant.c.o: In function `compute_score':
score_variant.c:(.text+0x175): undefined reference to `log'
score_variant.c:(.text+0x198): undefined reference to `log'
score_variant.c:(.text+0x1ab): undefined reference to `log'
score_variant.c:(.text+0x1ce): undefined reference to `log'
score_variant.c:(.text+0x1e1): undefined reference to `log'
CMakeFiles/VVP.dir/score_variant.c.o:score_variant.c:(.text+0x204): more undefined references to `log' follow
CMakeFiles/VVP.dir/parse_vcf.c.o: In function `parse_allele_frequency_line':
parse_vcf.c:(.text+0x4661): undefined reference to `gsl_rng_env_setup'
parse_vcf.c:(.text+0x4668): undefined reference to `gsl_rng_taus'
parse_vcf.c:(.text+0x4670): undefined reference to `gsl_rng_alloc'
parse_vcf.c:(.text+0x46a7): undefined reference to `gsl_rng_uniform_int'
parse_vcf.c:(.text+0x47ad): undefined reference to `gsl_rng_uniform_int'
parse_vcf.c:(.text+0x48ad): undefined reference to `gsl_rng_uniform_int'
parse_vcf.c:(.text+0x49a4): undefined reference to `gsl_rng_uniform_int'
parse_vcf.c:(.text+0x4a7a): undefined reference to `gsl_rng_free'
collect2: error: ld returned 1 exit status
CMakeFiles/VVP.dir/build.make:250: recipe for target 'VVP' failed
make[2]: *** [VVP] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/VVP.dir/all' failed
make[1]: *** [CMakeFiles/VVP.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.