vatlab / varianttools Goto Github PK

software tool for the manipulation, annotation, selection, and analysis of variants in the context of next-gen sequencing analysis

Home Page: https://vatlab.github.io/vat-docs/

License: GNU General Public License v3.0

C++ 20.16% Shell 0.07% Python 17.81% C 60.26% R 0.06% Dockerfile 0.03% SWIG 0.36% Cython 1.25%

varianttools's Introduction

Variant Tools

A command line tool for the manipulation, annotation, and analysis of genetic variants from next-generation sequencing studies.

Installation

If you are using a conda environment, you can install variant tools with command

conda install variant_tools -c bioconda -c conda-forge

Option -c conda-forge is required to enforce the use of conda-forge version of dependencies (e.g. boost-cpp) over their counterpoarts in the base channel.

Otherwise, you can try to install it through pip

pip install variant_tools

You will need to install

libboost
gsl
numpy
Cython
hdf5
blosc
A C++ compiler such as gcc

which, in a conda environment, could be installed with command

conda install -c conda-forge boost-cpp gsl numpy cython blosc hdf5

This method can be used if you download or clone the latest version of variant tools from this repository.

Documentation

Please refer to Variant Tools documentation for details.

varianttools's People

Stargazers

Watchers

Forkers

tsnorri anickerson wook2014 rowling2392

varianttools's Issues

import error in vt_sqlite3.py

line 26 : from ._vt_sqlite3 import *
There is not a module called _vt_sqlite3.
This error caused several error in running tests.

Duplicate files in annotation

Are there any differences between ccdsGene_hg19-20111206.ann and ccdsGene-hg19_20111206.ann? They seem like duplicate?

ExAC annotation

Hello,

Is variant tools currently capable to annotating variants with ExAC v.0.3.1 frequencies, I only see v.0.2 available. If not, can it be updated to include this newest version?

Thank you,
Ricky Lali

error in testRext and testWeights

As reported by test_associate.TestAssociate.

Compile with gcc 6.1

I recently upgraded my gcc to version 6.1 for some bioconductor packages, but now I have an issue with vtools compile:

sqlite/vt_sqlite3_ext.cpp: At global scope:
sqlite/vt_sqlite3_ext.cpp:2858:1: error: narrowing conversion of ‘4294967168u’ from ‘unsigned int’ to ‘int’ inside { } [-Wnarrowing]
 };
 ^

Apparently this is some code copied from sqlite and is now obsolete. Maybe there is some upstream changes we should incorporate?

could you tell me the file format for flag --group_by?

Sorry to ask about this, there are too much information on the website and I couldn't find the file format for --group_by. Thank you very much!

Genotype annotations

A typical genotype entry looks like:

0/0:43,0:43:92:0,92,1267

The first part 0/0 is the actual genotype; the others are genotype annotations. In our current implementation (vat 2.0 hereafter) we import GT by default and others optional. We do import everything because we want to be able to create filters when performing quality control or calculating summary stats.

However in many scenarios the genotype data have already being QC-ed. Also we may start from un-QC-ed genotype data, yet after QC we'll no longer need those other genotype information. That is when we may want to create new projects that only keeps the GT info.

Can we make each field in genotype data a separate data matrix? For example we have a project that looks like:

project.variants
project.GT
project.DP

And our filtering would be

vtools samples <various geno_info based filtering> -t project.gmask
vtools select project.gmask project.GT ..

where gmask is a sparse matrix of zero or ones. Zero means the entry is to be excluded, one means to be included, in computing other statistics.

gene base annotation

Which database to use to get gene annotation consequence ? Like intron, upstream , missense_variant, splice_acceptor_variant ...

vtools remove variants does not work for hdf5

The syntax is vtools remove variants TABLE where TABLE is a variant table. In sqlite, the implementation is something like (code]

DELETE FROM genotype_2 WHERE variant_id IN (SELECT variant_id from TABLE)

Not sure what API to use though because all the IDs should be passed to HDF5.

vtools remove samples does not work for hdf5

as title

Set up Travis CI tests for variant tools for continuous intergration

This is VERY helpful in making sure the health of the trunk as we move along.

Error: Existing database has different linking fields

Dear

I confronted some problems during the variant association testing. I'm running the following command: vtools associate nonsynFS001 phenotype -m "CFisher --name Fisher --alternative 2" --group_by region_name --to_db cfisher --force -j8 > cfishernonsynFS001.txt

And I get the following error Existing database has different linking fields (existing: {'*': ['variant_region_name']}, required: {'*': ['variant_genename']}).

But I do not want to use variant_genename, because my intronic/UTR variants are not getting any gene name. While variant_region_name contains also gene name annotations for my intronic variants. Is there any way to change this default variable to the region_name?

Any suggestions?

Many thanks in advance

Matthias

Compile error on Linux server

error only happens on Linux server...

seems like the problem is causing by those .nfsxxxxxxxxxxxxxxx files, cannot delete any of them even on command line...

Processing variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg
removing '/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg' (and everything under it)
error removing /home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg: [Errno 16] Device or resource busy: '/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/.nfs00000002054938b000000026'
error removing /home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg: [Errno 16] Device or resource busy: '/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/.nfs00000002054938a700000024'
error removing /home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg: [Errno 16] Device or resource busy: '/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/.nfs00000002054938a000000022'
error removing /home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg: [Errno 16] Device or resource busy: '/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/.nfs000000020553350600000023'
error removing /home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg: [Errno 16] Device or resource busy: '/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/.nfs00000002054938ae00000025'
error removing /home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg: [Errno 16] Device or resource busy: '/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/.nfs000000020553350e00000027'
error removing /home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg: [Errno 39] Directory not empty: '/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools'
error removing /home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg: [Errno 39] Directory not empty: '/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg'
Extracting variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg to /home/mleong/anaconda3/lib/python3.6/site-packages
variant-tools 3.0.0.dev0 is already the active version in easy-install.pth
Installing vtools script to /home/mleong/anaconda3/bin
Installing vtools_report script to /home/mleong/anaconda3/bin

Tests too large for travisCI

Downloading reference genomes hg18 and hg19 might be unavoidable but downloading hg19 is too much. We should remove some tests that require large resources.

multiple fields in genotype

Working :
vtools select denovo -o chr pos ref alt "genotype('','field=GT')"

Working :
vtools select denovo -o chr pos ref alt "genotype('','field=DP_geno')"

Not working:
vtools select denovo -o chr pos ref alt "genotype('','field=GT&field=DP_geno')"

Do I am wrong ?

error in test_pipeline.py

vtools execute test_pipeline.pipeline createpopshows ERROR: Failed to execute step createpop_10: name 'CreatePopulation' is not defined. Right now, testCreatePopulation,testEvolvePopulation,testDrawCaseCtrlSample,testDrawRandomSample are commented out.

Customized reference genome

Not sure if it is possible, but I'm wondering if we can analyze genomes other than human. I think a couple of years ago we got asked about analyzing mouse genome but I forgot how it turned out. Is it possible to init a vtools project given reference genome (in completegenomics crr format, if not fasta)? For example vtools init project --build /path/to/reference/genome? In that way we can annotate / filter data from other species.

Error in association analysis

Hello,
I am trying to run Burden test and I'm running into the following error

vtools associate variant aff -m "LogitRegBurden --alternative 2" -j1 --to_db logit > all.asso.res INFO: 127 samples are found INFO: 215112 groups are found Loading genotypes: 100% [===================================================================================================================================================================================================================================================================================] 127 0.4/s in 00:05:25 GSL Error 11: errorGSL Error 11: gsl_sf_gamma_inc_Q_e(a, x, &result)GSL Error 11: errorGSL Error 11: gsl_sf_gamma_inc_Q_e(a, x, &result)GSL Error 11: errorGSL Error 11: gsl_sf_gamma_inc_Q_e(a, x, &result)GSL Error 11: errorGSL Error 11: gsl_sf_gamma_inc_Q_e(a, x, &result)GSL Error 11: errorGSL Error 11: gsl_sf_gamma_inc_Q_e(a, x, &result)

Also, when I load the phenotype info it says
ERROR: Invalid or missing value detected for field sex. Allowed values are M/F, 1/2, Male/Female.
But all the genders are coded M/F with no missing value.

Any advice on how to proceed ?

Thank you,

Mysql support

From the original paper, it says variant tools support mysql as backend.
I don't find any mention about that from the website. Could you tell me if I m wrong?

Status of new hdf5 storage model.

This thread is used to update the implementation of the HDF5 storage model.

Pending

init:
- --store added to allow specification of storage model.
- --parent and --children raise NotImplemented error with --store hdf5
import:
- Import vcf files to hdf5 storage is done.
- import non-vcf files will not be supported. users are asked to use sqlite storage model instead.
show:
- show table works
- PENDING show genotypes not working for hdf5.
update
- There are sqlite extensions
select
exclude
output
- There are sqlite extensions
export
remove
- remove variants is difficult to port
- remove genotype is difficult to port
associate
admin
- merge-samples is difficult to port

Completed

liftover: liftover does not involve genotype storage
use: use does not involve genotype storage
compare: compare does not involve genotype storage
execute: SQL query can only be applied to sqlite storage.
phenotype: phenotype does not involve genotype storage

GnomAD resource and splitting multiallelic entries when importing VCF

Recently, I attempted to include the GnomAD dataset (provided as the sites VCF) to our exome annotation pipeline, based on varianttools. Doing so, I encountered an issue with importing multiallelic alternative allele information and also annotations for multiallelic sites. How can one handle multialleic sites when preparing custom .ann files and importing multiallic VCFs for annotations? I tried using the ExAC importing rules (ExAC.ann) file, but it it designed for tab delimited input file after multiallelic sites have already been split, not for VCF originally provided by GnomAD team.

Thank you in advance for any guidance on this!

Association tests failing while processing group: Operator FindGenotypePattern raises an exception (Input genotype matrix does not have a variant)

Hello,
When performing vtools association tests using the WeightedBurdenQt method on a subset of samples and grouping by two large non-contiguous sets of genomic regions, I get an error when performing association tests on both groups:

Operator FindGenotypePattern raises an exception (Input genotype matrix does not have a variant).

Loci are removed "due to having no minor allele or having more than 100.0% missing genotypes", but the number removed is always an order of magnitude less than the total loci in each group.

Do you have any advice on how to proceed?

Thank you

Problem with vtools select

I have been using Variant Tools to annotate damaging variants with dbNSFP (please see below the command I used). I noticed that none of the frameshift insertions and deletions remained after this step. Could you please help me explain why this might be?

vtools use dbNSFP

vtools select missense_nonsense_splice_frameshift 'SIFT_pred_all like "D%" \
OR Polyphen2_HDIV_pred_all like "D%" OR Polyphen2_HDIV_pred_all like "P%" \
OR Polyphen2_HVAR_pred_all like "D%" OR Polyphen2_HVAR_pred_all like "P%" \
OR LRT_pred like "D%" OR MutationTaster_pred like "D%" \
OR MutationTaster_pred like "A%"’ \
-t deleterious_missense_nonsense_splice_frameshift "deleterious nonsynonymous, \
stoploss, stopgain, frameshift and splicing variants selected from table \
missense_nonsense_splice_frameshift"

Export then Import via Vcf Format

I tried to export from a vtools project using the following command:

vtools export variants_for_course --format vcf --samples "selected=\"Yes\"" --header CHROM POS ID REF ALT QUAL FILTER INFO FORMAT '%(sample_names)s' -o my.vcf

then I move the file to a different folder and import into another vtools project
vtools import --format vcf my.vcf --build hg19
I got this warning

WARNING: Cannot import genotype from the input file: Failed to guess sample name. Please specify sample names for 3000 samples using parameter --sample_name, or add a proper header to your input file that matches the columns of samples. See "vtools import -h" for details.

I tried add "#" to the header in the vcf file, but still doesn't work

INFO: 0 new variants from 0 lines are imported.

Removing python 2 support.

Supporting both python2 and 3 has caused unnecessary maintenance work so we are dropping support for python 2.

Weighted tests

Hi,

I've been trying to use multiple scores from dbsnpf as external weights for my association test, but all of the tests fail with the following output:

DEBUG: Association test WeightedBurdenQt failed while processing 'WRN': Operator WeightedGenotypeTester raises an exception (Cannot find genotype/variant information: dbNSFP.MutationTaster_converted_rankscore)

I've tried multiple scores, eg CADD, polyphen2 etc and the tests always fail the same way.

This is my command

vtools associate rare_variants aff --method WeightedBurdenBt
--name WeightedBurdenBt --extern_weight dbNSFP.MutationTaster_converted_rankscore
-j8 --force --to_db weighted_test --group_by variant.genename >rare_coding_variants_weighted.txt

and I'm using varianttools/2.7.0

I've tried to set the variable using vtools update variant --set 'MT_score_converted=dbNSFP.MutationTaster_score_converted' and use this as the external weight but it makes no difference.

I'd really like to know if there is a way around this?

Best wishes,

Sarah

Here is the output of vtools show:

$ vtools show
Project name: variant_association_tools_analysis_080916
Created on: Thu Sep 8 10:42:44 2016
Primary reference genome: hg19
Secondary reference genome:
Runtime options: verbosity=1, shared_resource=/home/breakthr/smaguire/.variant_tools, local_resource=/home/breakthr/smaguire/.variant_tools
Variant tables: common_var
rare_coding_var
rare_nonsyn
rare_truncating_only
rare_var
variant
Annotation databases: dbSNP (/.variant_tools/annoDB/dbSNP, hg19_141)
dbNSFP (/.variant_tools/annoDB/dbNSFP, hg18_hg19_2_9)
weighted_test (weighted_test, 1.0)

HDF5 stability issues

Perhaps it is worth noticing that HDF5 does not have error recovery mechanism and there are quite a few complaints if you google "corruption hdf5". Cautions have to be used here if we decide to use HDF5 as the replacement.

How to export FORMAT fields in vcf format

I have been running the vtools on vcf file containing 3 WES samples (after GATK)

Command:

I have used the import :

vtools import ../../../align/recalibrated_variants_.vcf --var_info DP filter info --geno_info DP_geno --build hg19

I have used the export

alt dbSNP.name refGene.name2 refGene.name dbNSFP.SIFT_score
dbNSFP.Polyphen2_HDIV_score Polyphen2_HDIV_pred Polyphen2_HVAR_score
Polyphen2_HVAR_pred dbSNP.func kgDesc --order_by chr pos --header chr
pos ref alt rsname gene 'refgene name' 'SIFT score' 'Polyphen2 HDIV
score' 'Polyphen2 HDIV pred' 'Polyphen2 HVAR score' 'Polyphen2 HVAR
pred' 'dbSNP func code' 'pathway' '%(sample_names)s' 'DP_geno'
--output NS.csv

head NS.csv

chr,pos,ref,alt,rsname,gene,refgene name,SIFT score,Polyphen2 HDIV
score,Polyphen2 HDIV pred,Polyphen2 HVAR score,Polyphen2 HVAR
pred,dbSNP func code,pathway,SRR925784,SRR925788,SRR925803,DP_geno
1,723819,T,A,rs11804171,,,,,,,,unknown,,NA,0,2

There is no field with the depth of each sample also I have imported the --geno_info DP_geno field from the vcf file . Can you please recommend how to get the FORMAT field: GT:AD:DP:GQ:PGT:PID:PL of every
sample? I need the values of AD:DP of every sample.

Improving variant tools homepage (move to github pages?)

The current wiki-based documentation is not too bad but it would be better if we can test the examples more easily.

reference version

Hi,

My data were aligned by hg37 as a reference genome.
I think hg37 is the same to hg19.
But when I apply vtools to my data, I got error messages.
How can I use vtools to my data?

Thanks,
Youngji

vtools show genotypes does not work for hdf5

as titie.

The V3 branch.

I have created a V3 branch and I am getting rid of all of the Python2 stuff. I will go over the subcommands one by one and create a tentative accessor interface with a sqlite backend (the new VarStore module). You will need to continue to work on the HDF5 interface (geno_info branch) and we can merge the sqlite and hdf5 backends after we are both ready.

Phenotype: New columns based on sample genotype and genotype information

Hi there,

I want to add a new column to my phenotype table based on sample genotype information, and I found the following example on the website http://varianttools.sourceforge.net/Vtools/Phenotype#toc5

vtools phenotype --from_stat "sample_alt=#(alt)"

It seems that this function will calculate the alternative count for each subject on all variants. Is there a way to calculate the alternative count on a subset of variants, eg, the alternative count on one gene?

Thanks.

why your left join is so fast ?

I tried to make annotation using postgreSQL like vtools do.
I wonder why vtools is so fast when I perform a selection on dbNSFP which is a huge database.
What was your improvement ?
I see you use UCSC binning and indexing. Are you using other improvement ?

--from_stat VTOOLS v 2.7.0

I have 49 individuals in one sample but actually, using --from_stat to define the number of WT genotype for example I obtain numbers like 98.. My sample is well define.

When I use:

vtools update variant --from_stat 'N_heterozygalt=#(het)' --samples "pheno='0'" -j8

Vtools INFO is: INFO: 49 samples are selected
which is the correct number of individuals
but, when I do an output, I have:

chr   ref   alt   #(wtGT)  #(het)  #(hom) #(alt)
1        T       A             4       24      70      164

but I have only 49 individuals..

When I used vtools v2.6.2 I did not have this issue.

wtools : REST API on top of Variant tools

Hi,

Just to inform you, I m building a REST API on top of VariantTools and the GUI client to make vtools request across the web . ( local server for now )
I just started the project, if you are interesting, it's here :

https://github.com/dridk/wtools
https://github.com/dridk/wtools-client

And here is a preview of the GUI . Ofcourse, I plan to manage table, intersection and all other cool stuff you made in Vtools !

Can I use the merged vcf file as an input file?

I have one merged-vcf file from 200 WGS data file. I merged gvcf files of 200 samples using the GATK pipeline and this makes it easier to do joint analysis and is more efficient for storage size. Do I have to split the multiple samples to import in the vtools? Can I use a merged-vcf file including multiple samples in vtools? Thanks.

Question about permutations in variant association tools

I try to use variant association tools to do my association analyses with exome data, and I find your method called WeightedBurdenBt and its permutation procedure.
In the documentation, it is mentioned that a p-value is calculated every 1000 permutations. So if we ask for 10000 permuations, only 10 p-values are calculated in the best case ?

But I would expect that a p-value is calculated at each permutation and every 1000 p-values, the lower bound of the 95 percent CI of the obtained distribution of p-values is compared to "C".
Is it really what is done ?

Could you please explain this point to me ?

Thank you
Best

Fabienne JABOT-HANIN

Associate analysis error in the test for hdf5

I was trying to run the associate test but the error showed up...

Note: the error only shows up on Linux server, it doesn't show up on my laptop.

mleong@q1prpfs04:~/j6htestVarianttools$  vtools associate variant smoking --discard_variants "%(NA)>0.1" --HDF --method "BurdenBt --name BurdenTest --alternative 2" --group_by refGene.name2  -j 6 --force -v 2
DEBUG:
DEBUG: associate variant smoking --discard_variants %(NA)>0.1 --HDF --method "BurdenBt --name BurdenTest --alternative 2" --group_by refGene.name2 -j 6 --force -v 2
DEBUG: Using temporary directory /tmp/tmpsd5fksxl/_tmp_149688
DEBUG: Select phenotype and covariates using query SELECT sample_id, sample_name, smoking FROM sample LEFT OUTER JOIN filename ON sample.file_id = filename.file_id WHERE smoking IS NOT NULL
INFO: 2504 samples are found
DEBUG: Running query INSERT INTO __asso_tmp SELECT DISTINCT variant.variant_id, 0, refGene.refGene.name2  FROM variant, refGene.__rng_refGene_hg19_chr_txStart_txEnd, refGene.refGene WHERE (variant.bin = refGene.__rng_refGene_hg19_chr_txStart_txEnd.bin AND variant.chr = refGene.__rng_refGene_hg19_chr_txStart_txEnd.chr AND variant.pos >= refGene.__rng_refGene_hg19_chr_txStart_txEnd.start AND variant.pos <= refGene.__rng_refGene_hg19_chr_txStart_txEnd.end ) AND (refGene.refGene.rowid = refGene.__rng_refGene_hg19_chr_txStart_txEnd.range_id);
INFO: Grouping variants by 'refGene.name2', please be patient ...
INFO: 573 groups are found
Process GroupHDFGenerator-5:
Traceback (most recent call last):
  File "/home/mleong/anaconda3/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association_hdf5.py", line 77, in run
    for row in cur.execute(select_group):
vt_sqlite3.OperationalError: unable to open database file

('group time: ', 69.71390271186829)
Testing for association:   0.0% [>                                                                                                                                    ]  in 00:00:00Process Phenotype association analysis for a group of variants:
Traceback (most recent call last):
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 972, in run
    genotype, which, var_info, geno_info = getGenotype_HDF5(self,grp)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association_hdf5.py", line 347, in getGenotype_HDF5
    colnames=accessEngine.get_colnames(chr,geneSymbol)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/accessor.py", line 1155, in get_colnames
    return group.colnames[:]
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 818, in __getattr__
    return self._f_get_child(name)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 698, in _f_get_child
    self._g_check_has_child(childname)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 395, in _g_check_has_child
    % (self._v_pathname, name))
tables.exceptions.NoSuchNodeError: group ``/chr22/ADM2`` does not have a child named ``colnames``

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mleong/anaconda3/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 993, in run
    except Exception as e:
TypeError: catching classes that do not inherit from BaseException is not allowed
Process Phenotype association analysis for a group of variants:
Traceback (most recent call last):
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 972, in run
    genotype, which, var_info, geno_info = getGenotype_HDF5(self,grp)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association_hdf5.py", line 347, in getGenotype_HDF5
    colnames=accessEngine.get_colnames(chr,geneSymbol)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/accessor.py", line 1155, in get_colnames
    return group.colnames[:]
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 818, in __getattr__
    return self._f_get_child(name)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 698, in _f_get_child
    self._g_check_has_child(childname)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 395, in _g_check_has_child
    % (self._v_pathname, name))
tables.exceptions.NoSuchNodeError: group ``/chr22/A4GALT`` does not have a child named ``colnames``

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mleong/anaconda3/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 993, in run
    except Exception as e:
TypeError: catching classes that do not inherit from BaseException is not allowed
Process Phenotype association analysis for a group of variants:
Traceback (most recent call last):
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 972, in run
    genotype, which, var_info, geno_info = getGenotype_HDF5(self,grp)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association_hdf5.py", line 347, in getGenotype_HDF5
    colnames=accessEngine.get_colnames(chr,geneSymbol)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/accessor.py", line 1155, in get_colnames
    return group.colnames[:]
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 818, in __getattr__
    return self._f_get_child(name)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 698, in _f_get_child
    self._g_check_has_child(childname)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 395, in _g_check_has_child
    % (self._v_pathname, name))
tables.exceptions.NoSuchNodeError: group ``/chr22/ACR`` does not have a child named ``colnames``

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mleong/anaconda3/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 993, in run
    except Exception as e:
TypeError: catching classes that do not inherit from BaseException is not allowed
Process Phenotype association analysis for a group of variants:
Traceback (most recent call last):
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 972, in run
    genotype, which, var_info, geno_info = getGenotype_HDF5(self,grp)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association_hdf5.py", line 347, in getGenotype_HDF5
    colnames=accessEngine.get_colnames(chr,geneSymbol)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/accessor.py", line 1155, in get_colnames
    return group.colnames[:]
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 818, in __getattr__
    return self._f_get_child(name)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 698, in _f_get_child
    self._g_check_has_child(childname)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 395, in _g_check_has_child
    % (self._v_pathname, name))
tables.exceptions.NoSuchNodeError: group ``/chr22/ADORA2A`` does not have a child named ``colnames``

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mleong/anaconda3/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 993, in run
    except Exception as e:
TypeError: catching classes that do not inherit from BaseException is not allowed
Process Phenotype association analysis for a group of variants:
Traceback (most recent call last):
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 972, in run
    genotype, which, var_info, geno_info = getGenotype_HDF5(self,grp)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association_hdf5.py", line 347, in getGenotype_HDF5
    colnames=accessEngine.get_colnames(chr,geneSymbol)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/accessor.py", line 1155, in get_colnames
    return group.colnames[:]
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 818, in __getattr__
    return self._f_get_child(name)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 698, in _f_get_child
    self._g_check_has_child(childname)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 395, in _g_check_has_child
    % (self._v_pathname, name))
tables.exceptions.NoSuchNodeError: group ``/chr22/ACO2`` does not have a child named ``colnames``

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mleong/anaconda3/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 993, in run
    except Exception as e:
TypeError: catching classes that do not inherit from BaseException is not allowed
Process Phenotype association analysis for a group of variants:
Traceback (most recent call last):
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 972, in run
    genotype, which, var_info, geno_info = getGenotype_HDF5(self,grp)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association_hdf5.py", line 347, in getGenotype_HDF5
    colnames=accessEngine.get_colnames(chr,geneSymbol)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/accessor.py", line 1155, in get_colnames
    return group.colnames[:]
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 818, in __getattr__
    return self._f_get_child(name)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 698, in _f_get_child
    self._g_check_has_child(childname)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/tables/group.py", line 395, in _g_check_has_child
    % (self._v_pathname, name))
tables.exceptions.NoSuchNodeError: group ``/chr22/ADORA2A_AS1`` does not have a child named ``colnames``

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mleong/anaconda3/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 993, in run
    except Exception as e:
TypeError: catching classes that do not inherit from BaseException is not allowed
^CERROR:
Association tests stopped by keyboard interruption (0/573 completed).
refgene_name2	sample_size_BurdenTest	num_variants_BurdenTest	total_mac_BurdenTest	beta_x_BurdenTest	pvalue_BurdenTest	wald_x_BurdenTest
Traceback (most recent call last):
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 1117, in runAssociation
    res = resQueue.get()
  File "/home/mleong/anaconda3/lib/python3.6/multiprocessing/queues.py", line 94, in get
    res = self._recv_bytes()
  File "/home/mleong/anaconda3/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/mleong/anaconda3/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/mleong/anaconda3/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 1128, in runAssociation
    sys.exit(1)
SystemExit: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 1226, in associate
    runAssociation(args,asso,proj,results)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 1144, in runAssociation
    except Exception as e:
TypeError: catching classes that do not inherit from BaseException is not allowed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mleong/anaconda3/bin/vtools", line 11, in <module>
    load_entry_point('variant-tools==3.0.0.dev0', 'console_scripts', 'vtools')()
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/vtools.py", line 266, in main
    args.func(args)
  File "/home/mleong/anaconda3/lib/python3.6/site-packages/variant_tools-3.0.0.dev0-py3.6-linux-x86_64.egg/variant_tools/association.py", line 1228, in associate
    except Exception as e:
TypeError: catching classes that do not inherit from BaseException is not allowed

vtools admin merge-samples does not work for hdf5

Sample tables are merged in command vtools admin merge-samples.

Export genotype() and samples() in vcf output.

I am storing a lot of variants from different samples within a vtools project. Now, I need to export these variants in vcf-format, but I also need to have a field with genotype information per sample within the info-field.
With vtools output I can use these functions: genotype(,'missing=.') and samples() which gives me exactly what I want. But then, there will not be the vcf-specific variant format (insertions and deletions are represented with an additional reference base)

How can I use the named funtions with vtools export in order to get the vcf-format?
Or is it possible to produce vcf-format with vtools output by any means?

Support for summary data

I'd like to start some brainstorming here for the next stage of the project. The first thing popping to my mind is the support for summary data. Numerous methods these days are developed based-off summary statistics. Data integration of such kind will continue to be popular. Currently only vtools_report meta_analysis uses summary statistics data. The behavior is more like a standalone meta analysis program branded with VAT. We need more work on the design and implementation to support summary data.

Limitations of vtools genotype database #11

Moved from vatlab/VarStore#11

It is not sparse for simple scenarios (where genotypes are coded 0/1/2/NA)
It does not distinguish between 0 and NA, unless we specifically asks it
It does not store phase information
It only accepts integers, not imputed genotypes
Storing various annotations makes the genotype database painfully complex

filter by genotypes

I have 3 samples : mother, father, child

I can easily show genotypes for each table sample using the following:

vtools select variant -o chr ref alt "genotype()"

which return for instance :

chr A T 2,1,1

2,1,1 => 2 for father, 1 for mother, 1 for child.

I now would like to filter and select variant which are heterozygot for mother and father , and homo for child.
This should works but it takes too many time :

vtools select variant "genotype('father') == 1 & genotype('mother') == 1" -o chr ref alt

Is there a better approch ?

vtools remove genotype does not work for hdf5

This one is even more difficult than remove variant because we are using a condition on genotype fields to remove genotype (code). The query is something like

DELETE FROM genotype_1 WHERE DP > 10

for condition DP > 10.

Perhaps we should leave this for later.

Installation issues on Debian8 3.16

I have been trying to install Variant Tools on my Debian 8 system and can't seem to do so whether I download the source code or install via python (2.7.9) pip. With pip I keep getting the "error: command 'x86_64-linux-gnu-gcc' failed with exit status 1" report. When I look back into the log file I can't seem to find which library is missing. I've tried updating python with python-dev and several other packages. I've also tried with python3 (3.4.2). I'm using GNU bash 4.3.30 on Debian8 3.16 with Xfce 4.10 Desktop environment.

No reference to github from official website

I didn't know varianttools provide a github.
Could you add a link from the main website ?

vtools use dbNSFP results in 'WARNING: Failed to download database or downloaded database unusable'

Dear Variant Tools Developers
I attempted running 'vtools use dbNSFP' and the download process ended with:
"WARNING: Failed to download database or downloaded database unusable: Error -3 while decompressing: invalid stored block lengths"

A google search of "WARNING: Failed to download database or downloaded database unusable" took me to an expired sourceforge page for variant tools, with no cached version.

How do I resolve this error?

Thank you for your time

Change options of %pull and %push

Currently we have

%run -r (remote)
%push -t (to)
%pull -f (from)

but it could be better if we use -r for all options.

New project for VAT?

The new development will take a long time to complete because we will replace the core storage model and we might need to change the interface in a non-compatible way. It is possible to create a brach of varianttools but perhaps we can start a new project? I suppose we can rename command vtools to vat and start from the import command, and gradually move useful parts of variant tools to the new project. In the meantime, we can keep variant tools updated with updated annotation databases etc.