Giter VIP home page Giter VIP logo

gvanno's People

Contributors

oskarvid avatar sigven avatar tinavisnovska avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gvanno's Issues

HGMD annotations

Hi,

Is there currently a way to pull in HGMD annotations? I see that gvanno does rely on VEP, which is capable of doing this.

Thanks,
James

Upgrading to ENSEMBL version 110 fails at building Dockerfile

Hey sigven,
I've been a long time user of GVanno for all kinds of projects and I wanted to upgrade to 1.6.0 to get the newest features. There has not been any update since beginning of the year and ENSEMBL has moved on to 110 from 109, so 109 cannot be downloaded automatically. I see that in the code there are variables to change ENSEMBL release versions, so I went ahead and changed them to 110. But rebuilding the gvanno Dockerfile failed. Error message from buildSingularity.sh:
=> ERROR [stage-1 18/53] RUN cpanm Test::Object PPI::Document Task::Weaken Test::SubCalls Test::Object DBI DBD::mysql Archive::Zip Perl::Critic Set::IntervalTree 15.2s

[stage-1 18/53] RUN cpanm Test::Object PPI::Document Task::Weaken Test::SubCalls Test::Object DBI DBD::mysql Archive::Zip Perl::Critic Set::IntervalTree:
0.796 --> Working on Test::Object
0.796 Fetching http://www.cpan.org/authors/id/E/ET/ETHER/Test-Object-0.08.tar.gz ... OK
0.943 Configuring Test-Object-0.08 ... OK
1.283 Building and testing Test-Object-0.08 ... OK
2.352 Successfully installed Test-Object-0.08
2.514 PPI::Document is up to date. (1.277)
2.514 Task::Weaken is up to date. (1.06)
2.514 --> Working on Test::SubCalls
2.514 Fetching http://www.cpan.org/authors/id/E/ET/ETHER/Test-SubCalls-1.10.tar.gz ... OK
2.607 Configuring Test-SubCalls-1.10 ... OK
2.985 ==> Found dependencies: Hook::LexWrap
2.985 --> Working on Hook::LexWrap
2.985 Fetching http://www.cpan.org/authors/id/E/ET/ETHER/Hook-LexWrap-0.26.tar.gz ... OK
3.045 Configuring Hook-LexWrap-0.26 ... OK
3.376 Building and testing Hook-LexWrap-0.26 ... OK
4.457 Successfully installed Hook-LexWrap-0.26
4.541 Building and testing Test-SubCalls-1.10 ... OK
5.609 Successfully installed Test-SubCalls-1.10
5.937 DBI is up to date. (1.643)
5.937 --> Working on DBD::mysql
5.937 Fetching http://www.cpan.org/authors/id/D/DV/DVEEDEN/DBD-mysql-5.003.tar.gz ... OK
6.038 Configuring DBD-mysql-5.003 ... ! Configure failed for DBD-mysql-5.003. See /root/.cpanm/work/1703234489.7/build.log for details.
6.358 N/A
6.358 --> Working on Archive::Zip
6.358 Fetching http://www.cpan.org/authors/id/P/PH/PHRED/Archive-Zip-1.68.tar.gz ... OK
6.412 Configuring Archive-Zip-1.68 ... OK
6.726 Building and testing Archive-Zip-1.68 ... OK
15.08 Successfully installed Archive-Zip-1.68
15.19 Perl::Critic is up to date. (1.152)
15.19 Set::IntervalTree is up to date. (0.12)
15.19 4 distributions installed


Dockerfile:215

213 | RUN apt-get update && apt-get -y install apache2 apt-utils build-essential cpanminus curl git libmysqlclient-dev libpng-dev libssl-dev manpages mysql-client openssl perl perl-base unzip vim wget sudo
214 | # install ensembl dependencies
215 | >>> RUN cpanm Test::Object PPI::Document Task::Weaken Test::SubCalls Test::Object DBI DBD::mysql Archive::Zip Perl::Critic Set::IntervalTree
216 | RUN apt-get update && apt-get install apt-transport-https
217 |

ERROR: failed to solve: process "/bin/sh -c cpanm Test::Object PPI::Document Task::Weaken Test::SubCalls Test::Object DBI DBD::mysql Archive::Zip Perl::Critic Set::IntervalTree" did not complete successfully: exit code: `

This is where I have to go very deep into gvanno works and I can't find the time, but i wanted to give you the headsup.
Thanks for your work on Gvanno so far and hopefully you can fix this too!

Error when running example in gvanno

Hi ,

I have installed gvanno using "Singularity" on my mac machine.

I am getting the following error..

My Command--->
python3 gvanno-1.4.1/gvanno.py --query_vcf gvanno-1.4.1/examples/example.grch38.vcf.gz --gvanno_dir gvanno-1.4.1 --output_dir run_example --sample_id example --genome_assembly grch38 --container singularity --force_overwrite

2021-04-15 15:43:00 - gvanno-start - INFO - --- Germline variant annotation (gvanno) workflow ----
2021-04-15 15:43:00 - gvanno-start - INFO - Sample name: example
2021-04-15 15:43:00 - gvanno-start - INFO - Genome assembly: grch38

2021-04-15 15:43:00 - gvanno-validate-input - INFO - STEP 0: Validate input data
Error for command "exec": unknown shorthand flag: 'W' in -W

Can someone assist?

thanks,

Nandan

docker error response

Hi,

I am attempting to run gvanno 1.4.1. When I run the test data with the example command I am getting an error from Docker. Do you have any idea what might be causing this?

python ./gvanno-1.4.1/gvanno_print.py --query_vcf /scratch/gvanno/gvanno-1.4.1/examples/example.grch38.vcf.gz
--gvanno_dir /scratch/gvanno/gvanno-1.4.1 --output_dir /scratch/
--sample_id example --genome_assembly grch38 --container docker --force_overwrite

docker: Error response from daemon: error while creating mount source path '/scratch': mkdir : permission denied.

Why "single sample"?

From the README:

The gvanno workflow accepts a single input file:

An unannotated, single-sample VCF file

Is single-sample specified because this is more efficient than consuming a large multi-sample file? Or is there any additional reason?

run without installing python 3.6

Hello,

Is there an easy way to run gvanno without installing python 3.6? Is it for example possible with an docker container that has python 3.6?

GRCh38 FASTA missing from GRCh38 bundle?

When I run the test on grch37, python ~/gvanno-0.7.0/gvanno.py --input_vcf ~/gvanno-0.7.0/examples/example.vcf.gz ~/gvanno-0.7.0 ~/gvanno-0.7.0/examples grch37 ~/gvanno-0.7.0/gvanno.toml example, it works fine. But if I try to run grch38, I get the following error:

-------------------- EXCEPTION --------------------
MSG: ERROR: Specified FASTA file/directory /usr/local/share/vep/data/homo_sapiens/95_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz not found

STACK Bio::EnsEMBL::Variation::Utils::FastaSequence::setup_fasta /opt/vep/src/ensembl-vep/Bio/EnsEMBL/Variation/Utils/FastaSequence.pm:182
STACK Bio::EnsEMBL::VEP::BaseVEP::fasta_db /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm:477
STACK Bio::EnsEMBL::VEP::Runner::init /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:129
STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel /opt/vep/src/ensembl-vep/vep:225
Date (localtime)    = Wed Mar 20 04:34:09 2019
Ensembl API version = 95
---------------------------------------------------

Both grch37 and grch38 bundles are installed:

~/gvanno-0.7.0$ ll data/grch3*/.vep/
data/grch37/.vep/:
total 12
drwxr-xr-x  3 vep vep 4096 Mar 20 00:14 ./
drwxr-xr-x 10 vep vep 4096 Feb  4 13:43 ../
drwxr-xr-x  3 vep vep 4096 Feb  4 13:41 homo_sapiens/

data/grch38/.vep/:
total 12
drwxr-xr-x  3 vep vep 4096 Mar 20 00:02 ./
drwxr-xr-x 10 vep vep 4096 Feb  4 14:19 ../
drwxr-xr-x  3 vep vep 4096 Feb  4 14:17 homo_sapiens/

However, it doesn't look like the GRCh38 FASTA is present:

~/gvanno-0.7.0$ ll data/grch3*/.vep/*/95_*/*dna*
-rw-r--r-- 1 vep vep 882931599 Feb  4 13:42 data/grch37/.vep/homo_sapiens/95_GRCh37/Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz
-rw-r--r-- 1 vep vep      2743 Feb  4 13:42 data/grch37/.vep/homo_sapiens/95_GRCh37/Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz.fai
-rw-r--r-- 1 vep vep    769912 Feb  4 13:41 data/grch37/.vep/homo_sapiens/95_GRCh37/Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz.gzi

Can the bundle be updated?

Running without --no_vcf_validate option gives me an error with human vcf files.

While running gvanno on human vcf files without --no_vcf_validate produces the following error.

ERROR: Line 261298: Sample #1, GP=386,336,0 value does not lie in the interval [0,1].                                                                                                                  
ERROR: Line 261299: Sample #1, GP=314.3,264.3,0 value does not lie in the interval [0,1].
ERROR: Line 261300: Sample #1, GP=103.7,53.7,1.864e-05 value does not lie in the interval [0,1].
ERROR: Line 261301: Sample #1, GP=95.16,45.16,0.0001325 value does not lie in the interval [0,1].
ERROR: Line 261302: Sample #1, GP=95.16,45.16,0.0001325 value does not lie in the interval [0,1].
ERROR: Line 261303: Sample #1, GP=47.53,4.778,1.758 value does not lie in the interval [0,1].
ERROR: Line 261304: Sample #1, GP=376.3,326.3,0 value does not lie in the interval [0,1].
ERROR: Line 261305: Sample #1, GP=62.31,12.31,0.2629 value does not lie in the interval [0,1].
ERROR: Line 261306: Sample #1, GP=410.5,360.5,0 value does not lie in the interval [0,1].
ERROR: Line 261307: Sample #1, GP=299,249,0 value does not lie in the interval [0,1].

Parameters to include dbNSFP?

From what I can tell from the README, the only configurables are whether or not to run LoFTee and some parameters affecting memory and CPU usage. In my output, I am not seeing some of the dbNSFP 4.0 annotations, such as PrimateAI, nor does the header suggest that it should be there:

##VEP="v95" time="2019-03-26 16:54:00" cache="/usr/local/share/vep/data/homo_sapiens/95_GRCh38" ensembl=95.4f83453 ensembl-funcgen=95.94439f4 ensembl-io=95.78ccac5 ensembl-variation=95.858de3e 1000genomes="phase3" COSMIC="86" ClinVar="201810" ESP="V2-SSA137" HGMD-PUBLIC="20174" assembly="GRCh38.p12" dbSNP="151" gencode="GENCODE 29" genebuild="2014-07" gnomAD="170228" polyphen="2.2.2" regbuild="1.0" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|ALLELE_NUM|DISTANCE|STRAND|FLAGS|PICK|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|RefSeq|DOMAINS|HGVS_OFFSET|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|CLIN_SIG|SOMATIC|PHENO|LoF|LoF_filter|LoF_flags|LoF_info">
##LoF=Loss-of-function annotation (HC = High Confidence; LC = Low Confidence)
##LoF_filter=Reason for LoF not being HC
##LoF_flags=Possible warning flags for LoF
##LoF_info=Info used for LoF annotation

Are there command-line or .toml parameters that we can use to configure this to extract the desired fields from dbNSFP?

issue with annotating simple vcf only contains 8 columns of VCFv4.1

Hello,

Thank you for your work on gvanno!
I have gotten gvanno working on my centOS 7 box, and the example annotation ran well. However, when I am trying to annotate a simple vcf file like the following, I ran into the error (probably on every line of variant):

...
            ERROR: Line ...: Format is not a colon-separated list of alphanumeric strings.
            ERROR: Line ....: Format is not a colon-separated list of alphanumeric strings.
            ERROR: Line ....: Format is not a colon-separated list of alphanumeric strings.
            ERROR: Line ....: Format is not a colon-separated list of alphanumeric strings.

The VCF file looks like the following:

##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
1       2115900 .       T       C       .       PASS    AN=3646;AC=11   GT
1       2115911 .       C       G       .       PASS    AN=3646;AC=2    GT
1       2115912 .       G       A       .       PASS    AN=3646;AC=1    GT
1       2115999 .       C       T       .       PASS    AN=3646;AC=4    GT
1       2116124 .       C       G       .       PASS    AN=3646;AC=10,0 GT
....

What might be the cause of the error? Is there a way to format the vcf to get it properly annotated?

Thanks in advance!

-- ipstone

Uninitialized value $faidx

Inspecting my warnings, I'm seeing the following (thousands of times):

head onesample.gvanno_ready.vep.vcf_warnings.txt 
WARNING: 31 : Use of uninitialized value $faidx in split at /opt/vep/src/ensembl-vep/modules/LoF.pm line 499, <$fh> line 5031.
Use of uninitialized value $faidx in split at /opt/vep/src/ensembl-vep/modules/LoF.pm line 499, <$fh> line 5031.
Use of uninitialized value $faidx in split at /opt/vep/src/ensembl-vep/modules/LoF.pm line 499, <$fh> line 5031.
WARNING: 15 : Use of uninitialized value $faidx in split at /opt/vep/src/ensembl-vep/modules/LoF.pm line 499, <$fh> line 5031.

Is this OK to ignore? Or does it imply an issue with something I'm doing, or something in the script?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.