sigven / gvanno Goto Github PK
View Code? Open in Web Editor NEWGeneric human DNA variant annotation pipeline
Generic human DNA variant annotation pipeline
Hi,
Is there currently a way to pull in HGMD annotations? I see that gvanno does rely on VEP, which is capable of doing this.
Thanks,
James
=>
ERROR [stage-1 18/53] RUN cpanm Test::Object PPI::Document Task::Weaken Test::SubCalls Test::Object DBI DBD::mysql Archive::Zip Perl::Critic Set::IntervalTree 15.2s[stage-1 18/53] RUN cpanm Test::Object PPI::Document Task::Weaken Test::SubCalls Test::Object DBI DBD::mysql Archive::Zip Perl::Critic Set::IntervalTree:
0.796 --> Working on Test::Object
0.796 Fetching http://www.cpan.org/authors/id/E/ET/ETHER/Test-Object-0.08.tar.gz ... OK
0.943 Configuring Test-Object-0.08 ... OK
1.283 Building and testing Test-Object-0.08 ... OK
2.352 Successfully installed Test-Object-0.08
2.514 PPI::Document is up to date. (1.277)
2.514 Task::Weaken is up to date. (1.06)
2.514 --> Working on Test::SubCalls
2.514 Fetching http://www.cpan.org/authors/id/E/ET/ETHER/Test-SubCalls-1.10.tar.gz ... OK
2.607 Configuring Test-SubCalls-1.10 ... OK
2.985 ==> Found dependencies: Hook::LexWrap
2.985 --> Working on Hook::LexWrap
2.985 Fetching http://www.cpan.org/authors/id/E/ET/ETHER/Hook-LexWrap-0.26.tar.gz ... OK
3.045 Configuring Hook-LexWrap-0.26 ... OK
3.376 Building and testing Hook-LexWrap-0.26 ... OK
4.457 Successfully installed Hook-LexWrap-0.26
4.541 Building and testing Test-SubCalls-1.10 ... OK
5.609 Successfully installed Test-SubCalls-1.10
5.937 DBI is up to date. (1.643)
5.937 --> Working on DBD::mysql
5.937 Fetching http://www.cpan.org/authors/id/D/DV/DVEEDEN/DBD-mysql-5.003.tar.gz ... OK
6.038 Configuring DBD-mysql-5.003 ... ! Configure failed for DBD-mysql-5.003. See /root/.cpanm/work/1703234489.7/build.log for details.
6.358 N/A
6.358 --> Working on Archive::Zip
6.358 Fetching http://www.cpan.org/authors/id/P/PH/PHRED/Archive-Zip-1.68.tar.gz ... OK
6.412 Configuring Archive-Zip-1.68 ... OK
6.726 Building and testing Archive-Zip-1.68 ... OK
15.08 Successfully installed Archive-Zip-1.68
15.19 Perl::Critic is up to date. (1.152)
15.19 Set::IntervalTree is up to date. (0.12)
15.19 4 distributions installed
ERROR: failed to solve: process "/bin/sh -c cpanm Test::Object PPI::Document Task::Weaken Test::SubCalls Test::Object DBI DBD::mysql Archive::Zip Perl::Critic Set::IntervalTree" did not complete successfully: exit code: `
This is where I have to go very deep into gvanno works and I can't find the time, but i wanted to give you the headsup.
Thanks for your work on Gvanno so far and hopefully you can fix this too!
Hi ,
I have installed gvanno using "Singularity" on my mac machine.
I am getting the following error..
My Command--->
python3 gvanno-1.4.1/gvanno.py --query_vcf gvanno-1.4.1/examples/example.grch38.vcf.gz --gvanno_dir gvanno-1.4.1 --output_dir run_example --sample_id example --genome_assembly grch38 --container singularity --force_overwrite
2021-04-15 15:43:00 - gvanno-start - INFO - --- Germline variant annotation (gvanno) workflow ----
2021-04-15 15:43:00 - gvanno-start - INFO - Sample name: example
2021-04-15 15:43:00 - gvanno-start - INFO - Genome assembly: grch38
2021-04-15 15:43:00 - gvanno-validate-input - INFO - STEP 0: Validate input data
Error for command "exec": unknown shorthand flag: 'W' in -W
Can someone assist?
thanks,
Nandan
Hi,
I am attempting to run gvanno 1.4.1. When I run the test data with the example command I am getting an error from Docker. Do you have any idea what might be causing this?
python ./gvanno-1.4.1/gvanno_print.py --query_vcf /scratch/gvanno/gvanno-1.4.1/examples/example.grch38.vcf.gz
--gvanno_dir /scratch/gvanno/gvanno-1.4.1 --output_dir /scratch/
--sample_id example --genome_assembly grch38 --container docker --force_overwrite
docker: Error response from daemon: error while creating mount source path '/scratch': mkdir : permission denied.
From the README:
The gvanno workflow accepts a single input file:
An unannotated, single-sample VCF file
Is single-sample specified because this is more efficient than consuming a large multi-sample file? Or is there any additional reason?
Hello,
Is there an easy way to run gvanno without installing python 3.6? Is it for example possible with an docker container that has python 3.6?
When I run the test on grch37, python ~/gvanno-0.7.0/gvanno.py --input_vcf ~/gvanno-0.7.0/examples/example.vcf.gz ~/gvanno-0.7.0 ~/gvanno-0.7.0/examples grch37 ~/gvanno-0.7.0/gvanno.toml example
, it works fine. But if I try to run grch38, I get the following error:
-------------------- EXCEPTION --------------------
MSG: ERROR: Specified FASTA file/directory /usr/local/share/vep/data/homo_sapiens/95_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz not found
STACK Bio::EnsEMBL::Variation::Utils::FastaSequence::setup_fasta /opt/vep/src/ensembl-vep/Bio/EnsEMBL/Variation/Utils/FastaSequence.pm:182
STACK Bio::EnsEMBL::VEP::BaseVEP::fasta_db /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm:477
STACK Bio::EnsEMBL::VEP::Runner::init /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:129
STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel /opt/vep/src/ensembl-vep/vep:225
Date (localtime) = Wed Mar 20 04:34:09 2019
Ensembl API version = 95
---------------------------------------------------
Both grch37 and grch38 bundles are installed:
~/gvanno-0.7.0$ ll data/grch3*/.vep/
data/grch37/.vep/:
total 12
drwxr-xr-x 3 vep vep 4096 Mar 20 00:14 ./
drwxr-xr-x 10 vep vep 4096 Feb 4 13:43 ../
drwxr-xr-x 3 vep vep 4096 Feb 4 13:41 homo_sapiens/
data/grch38/.vep/:
total 12
drwxr-xr-x 3 vep vep 4096 Mar 20 00:02 ./
drwxr-xr-x 10 vep vep 4096 Feb 4 14:19 ../
drwxr-xr-x 3 vep vep 4096 Feb 4 14:17 homo_sapiens/
However, it doesn't look like the GRCh38 FASTA is present:
~/gvanno-0.7.0$ ll data/grch3*/.vep/*/95_*/*dna*
-rw-r--r-- 1 vep vep 882931599 Feb 4 13:42 data/grch37/.vep/homo_sapiens/95_GRCh37/Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz
-rw-r--r-- 1 vep vep 2743 Feb 4 13:42 data/grch37/.vep/homo_sapiens/95_GRCh37/Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz.fai
-rw-r--r-- 1 vep vep 769912 Feb 4 13:41 data/grch37/.vep/homo_sapiens/95_GRCh37/Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz.gzi
Can the bundle be updated?
While running gvanno on human vcf files without --no_vcf_validate
produces the following error.
ERROR: Line 261298: Sample #1, GP=386,336,0 value does not lie in the interval [0,1].
ERROR: Line 261299: Sample #1, GP=314.3,264.3,0 value does not lie in the interval [0,1].
ERROR: Line 261300: Sample #1, GP=103.7,53.7,1.864e-05 value does not lie in the interval [0,1].
ERROR: Line 261301: Sample #1, GP=95.16,45.16,0.0001325 value does not lie in the interval [0,1].
ERROR: Line 261302: Sample #1, GP=95.16,45.16,0.0001325 value does not lie in the interval [0,1].
ERROR: Line 261303: Sample #1, GP=47.53,4.778,1.758 value does not lie in the interval [0,1].
ERROR: Line 261304: Sample #1, GP=376.3,326.3,0 value does not lie in the interval [0,1].
ERROR: Line 261305: Sample #1, GP=62.31,12.31,0.2629 value does not lie in the interval [0,1].
ERROR: Line 261306: Sample #1, GP=410.5,360.5,0 value does not lie in the interval [0,1].
ERROR: Line 261307: Sample #1, GP=299,249,0 value does not lie in the interval [0,1].
From what I can tell from the README, the only configurables are whether or not to run LoFTee and some parameters affecting memory and CPU usage. In my output, I am not seeing some of the dbNSFP 4.0 annotations, such as PrimateAI, nor does the header suggest that it should be there:
##VEP="v95" time="2019-03-26 16:54:00" cache="/usr/local/share/vep/data/homo_sapiens/95_GRCh38" ensembl=95.4f83453 ensembl-funcgen=95.94439f4 ensembl-io=95.78ccac5 ensembl-variation=95.858de3e 1000genomes="phase3" COSMIC="86" ClinVar="201810" ESP="V2-SSA137" HGMD-PUBLIC="20174" assembly="GRCh38.p12" dbSNP="151" gencode="GENCODE 29" genebuild="2014-07" gnomAD="170228" polyphen="2.2.2" regbuild="1.0" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|ALLELE_NUM|DISTANCE|STRAND|FLAGS|PICK|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|RefSeq|DOMAINS|HGVS_OFFSET|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|CLIN_SIG|SOMATIC|PHENO|LoF|LoF_filter|LoF_flags|LoF_info">
##LoF=Loss-of-function annotation (HC = High Confidence; LC = Low Confidence)
##LoF_filter=Reason for LoF not being HC
##LoF_flags=Possible warning flags for LoF
##LoF_info=Info used for LoF annotation
Are there command-line or .toml parameters that we can use to configure this to extract the desired fields from dbNSFP?
Hello,
Thank you for your work on gvanno!
I have gotten gvanno working on my centOS 7 box, and the example annotation ran well. However, when I am trying to annotate a simple vcf file like the following, I ran into the error (probably on every line of variant):
...
ERROR: Line ...: Format is not a colon-separated list of alphanumeric strings.
ERROR: Line ....: Format is not a colon-separated list of alphanumeric strings.
ERROR: Line ....: Format is not a colon-separated list of alphanumeric strings.
ERROR: Line ....: Format is not a colon-separated list of alphanumeric strings.
The VCF file looks like the following:
##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
1 2115900 . T C . PASS AN=3646;AC=11 GT
1 2115911 . C G . PASS AN=3646;AC=2 GT
1 2115912 . G A . PASS AN=3646;AC=1 GT
1 2115999 . C T . PASS AN=3646;AC=4 GT
1 2116124 . C G . PASS AN=3646;AC=10,0 GT
....
What might be the cause of the error? Is there a way to format the vcf to get it properly annotated?
Thanks in advance!
-- ipstone
Inspecting my warnings, I'm seeing the following (thousands of times):
head onesample.gvanno_ready.vep.vcf_warnings.txt
WARNING: 31 : Use of uninitialized value $faidx in split at /opt/vep/src/ensembl-vep/modules/LoF.pm line 499, <$fh> line 5031.
Use of uninitialized value $faidx in split at /opt/vep/src/ensembl-vep/modules/LoF.pm line 499, <$fh> line 5031.
Use of uninitialized value $faidx in split at /opt/vep/src/ensembl-vep/modules/LoF.pm line 499, <$fh> line 5031.
WARNING: 15 : Use of uninitialized value $faidx in split at /opt/vep/src/ensembl-vep/modules/LoF.pm line 499, <$fh> line 5031.
Is this OK to ignore? Or does it imply an issue with something I'm doing, or something in the script?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.