Comments (11)
Ensembl and UCSC use different feature values. By default, sansa uses "gene" as the feature value (fine for Ensembl) but for UCSC you need to specify "transcript".
sansa annotate -f transcript -g hg38.knownGene.gtf.gz ADB_LL.delly.SV.vcf.gz
from sansa.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT JSH_LN_Lt ADB_Blood LMS_Blood SJH_Blood JSH_Blood 10847652_NT 12140782_Blood
chr1 63400385 DEL00000000 A <DEL> 0 LowQual IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv0.8.1;CHR2=chr1;END=209990649;PE=2;MAPQ=37;CT=3to5;CIPOS=-519,519;CIEND=-519,519;RDRATIO=1.0334;SOMATIC GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV 0/1:-4.58867,0,-44.725:46:PASS:0:18476967:0:-1:8:3:0:0 0/0:0,-8.1278,-162:81:PASS:0:18508703:0:-1:27:0:0:0 0/0:0,-6.02059,-120:60:PASS:0:17239644:0:-1:20:0:0:0 0/0:0,-2.40824,-48:24:PASS:0:16914471:0:-1:8:0:0:0 0/0:0,-4.81643,-93.9999:48:PASS:0:17879783:0:-1:16:0:0:0 0/0:0,-3.91338,-78:39:PASS:0:18005816:0:-1:13:0:0:0 0/0:0,-8.42883,-167.7:84:PASS:0:17091360:0:-1:28:0:0:0
chr1 246232881 DUP00000001 C <DUP> . PASS IMPRECISE;SVTYPE=DUP;SVMETHOD=EMBL.DELLYv0.8.1;CHR2=chr1;END=246233246;PE=3;MAPQ=36;CT=5to3;CIPOS=-393,393;CIEND=-393,393;RDRATIO=1.4739;SOMATIC GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV 0/1:-4.88483,0,-59.5916:49:PASS:3645:12321:3421:3:12:3:0:0 0/0:0,-4.50384,-71.5884:45:PASS:4352:6450:3251:2:15:0:0:0 0/0:0,-1.16884,-9.36473:12:LowQual:1356:1997:332:2:4:0:0:0 0/0:0,-2.40646,-41.0982:24:PASS:3547:10813:2313:4:8:0:0:0 0/0:0,-2.35972,-37.8515:24:PASS:3319:10219:2997:3:8:0:0:0 0/0:0,-5.29267,-88.5741:53:PASS:5360:7338:6727:1:18:0:0:0 0/0:0,-3.57627,-56.4639:36:PASS:3338:4977:3309:1:12:0:0:0
this is how my vcf file looks like except header region
from sansa.
Hi,
Are you using the release version or the latest github code? I think I fixed this already in the latest code.
Thanks, Tobias
from sansa.
thanks for fast reply
i installed sansa with conda and my sansa version is 0.07 which is release version.
should i delete it and reinstall with lastest github code?
**********************************************************************
Program: Sansa
This is free software, and you are welcome to redistribute it under
certain conditions (BSD License); for license details use '-l'.
This program comes with ABSOLUTELY NO WARRANTY; for details use '-w'.
Sansa (Version: 0.0.7)
Contact: Tobias Rausch ([email protected])
**********************************************************************
Usage: sansa <command> <arguments>
Commands:
annotate annotate VCF file
from sansa.
Yes, that fix happened after the release of v0.0.7. I have updated now the bioconda version with a new release v0.0.8.
https://anaconda.org/bioconda/sansa
from sansa.
hi @tobiasrausch
thanks for your kindness
yes, i re-download newly updated conda version (0.0.8)
and error message that i've mentioned was gone.
that's really greatful
but, i have another issue when i run sansa
my command :
sansa annotate -g hg38.knownGene.gtf.gz ADB_LL.delly.SV.vcf.gz
and error message :
[2021-Apr-06 11:40:35] sansa annotate -g hg38.knownGene.gtf.gz ADB_LL.delly.SV.vcf.gz
[2021-Apr-06 11:40:35] Parse SV annotation database
[2021-Apr-06 11:40:35] Parsed 1 out of 1 VCF/BCF records.
[2021-Apr-06 11:40:35] GTF feature parsing
Error parsing GTF/GFF3/BED file!
my GTF file was downloaded from here :
https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/
and my vcf form :
.....header region
##contig=<ID=HLA-DRB1*15:01:01:02,length=11571>
##contig=<ID=HLA-DRB1*15:01:01:03,length=11056>
##contig=<ID=HLA-DRB1*15:01:01:04,length=11056>
##contig=<ID=HLA-DRB1*15:02:01,length=10313>
##contig=<ID=HLA-DRB1*15:03:01:01,length=11567>
##contig=<ID=HLA-DRB1*15:03:01:02,length=11569>
##contig=<ID=HLA-DRB1*16:02:01,length=11005>
##INFO=<ID=RDRATIO,Number=1,Type=Float,Description="Read-depth ratio of tumor vs. normal.">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Somatic structural variant.">
##bcftools_viewVersion=1.8+htslib-1.8
##bcftools_viewCommand=view /users/hslee/cancer/WGS_hg38/SV/delly/ADB_LL.delly.SV.bcf; Date=Wed Mar 31 00:20:53 2021
#CHROM[1] POS[2] ID[3] REF[4] ALT[5] QUAL[6] FILTER[7] INFO[8]
chr5 648742 DUP00000003 G <DUP> . LowQual IMPRECISE;SVTYPE=DUP;SVMETHOD=EMBL.DELLYv0.8.1;CHR2=
thanks in advance again
from sansa.
oh i solved this problem with Ensembl GFF3.gz format.
i can't figure it out why general GTF format doesn't work but .. it works with GFF3.gz format anyway.
thanks @tobiasrausch
have a nice day!
i'll close this issue
from sansa.
oh, you're right
it runs perfectly
thanks for your advice
and thanks for this beautiful work as well
from sansa.
Ensembl and UCSC use different feature values. By default, sansa uses "gene" as the feature value (fine for Ensembl) but for UCSC you need to specify "transcript".
sansa annotate -f transcript -g hg38.knownGene.gtf.gz ADB_LL.delly.SV.vcf.gz
I downloaded gff3.gz from here: http://ftp.ensemblgenomes.org/pub/plants/release-51/gff3/zea_mays/
And run command as flows,
sansa_v0.0.8 annotate -g Zea_mays.Zm-B73-REFERENCE-NAM-5.0.51.gtf.gz ../03-SV/P1_vs_P2.somatic.vcf
I also try '-f transcript' or with GTF file, but all get the same error as flows:
[2021-Dec-17 18:21:06] Parse SV annotation database
[2021-Dec-17 18:21:07] Parsed 13276 out of 13276 VCF/BCF records.
[2021-Dec-17 18:21:07] GTF feature parsing
Error parsing GTF/GFF3/BED file!
from sansa.
I think the problem is the attribute (-i
). It's not called gene_name in your case but gene_id.
sansa annotate -i gene_id -f gene ...
from sansa.
from sansa.
Related Issues (6)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sansa.