fritzsedlazeck / sniffles Goto Github PK
View Code? Open in Web Editor NEWStructural variation caller using third generation sequencing
License: Other
Structural variation caller using third generation sequencing
License: Other
I am just wondering if I run Sniffles twice with exactly the same data and settings, such as
${SNIFFLES} -m ${inputfile} -v ${outputfile} -t 20 --report_BND --genotype -s 5
Should the results be exactly the same?
Thanks.
Hi, thanks for developing such a nice tool. It's also very easy to use.
Just wondering whether it is possible to call SV using multiple samples?
For example
./sniffles -m sample1.bam sample2.bam -v output.vcf
Thanks again!
Hi, first of all, thanks for this great tool!
I wonder whether it would be possible to add a feature for allowing regenotyping of SVs?
One of the great features of the new Delly2 (for Illumina data SV detection) is that you can first call variants independently in all individuals of a family/cohort, then merge the site-list and regenotype in all individuals.
Also, this would make it easier to decide whether an SV is present in a child's parent beyond interval overlaps...
Hi,
thanks for sniffles, it appears to be nice, quick and flexible on my very low coverage ONT data.
I am having an issue with the VCF. Tabix claims it is not sorted. However vt sort will not read it since the header is lacking many fields. I am sure I can find a workaround to sort with, but this might be helpful.
bwa_sniffles$ vt sort BRCA2_dup_sniffles.vcf
[W::vcf_parse] contig 'chrM' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] INFO 'STD_quant_start' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'STD_quant_stop' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'Kurtosis_quant_start' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'Kurtosis_quant_stop' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'SUPTYPE' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'STRANDS' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'STRANDS2' is not defined in the header, assuming Type=String
[W::vcf_parse] contig 'chr1' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr2' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr3' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr4' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr5' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr6' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr7' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr8' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr9' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr10' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr11' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr12' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr13' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr14' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr15' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr16' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr17' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr18' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr19' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr20' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr21' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr22' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chrX' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chrUn_gl000220' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chrUn_gl000225' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chrY' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr4_gl000193_random' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr7_gl000195_random' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr9_gl000198_random' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr17_gl000204_random' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr17_gl000205_random' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chr19_gl000208_random' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chrUn_gl000219' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chrUn_gl000224' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chrUn_gl000230' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] contig 'chrUn_gl000232' is not defined in the header. (Quick workaround: index the file with tabix.)
[vcf.c:1220 bcf_write] Unchecked error (2), exiting.
Colin
Dear Fritz,
We have assembled a large mammalian genome using 90X of SMRT platform data. The base-accuracy of the assembly is often assessed using alignment of HiseqX paired-end reads to the assembly and calculating the number of homozygous snps and homozygous indels.
I wanted to use long-reads to assess the assembly in a similar manner. I am using ngmlr and sniffles to align the long-reads back to the assembly and to genotype homozygous structural variations against the assembly. I have used the following cmd for this purpose.
${SNIFFLES_PATH}/sniffles -m input.bam -v output.vcf -t 10 -l 50 --report_seq --cluster --genotype
I have found approximately ~20,000 structural variations against the de novo assembly. I am quite surprised by the fact that there the genotype is 0/0 for the structural variation. I was expecting the genotype to be 0/1 or 1/1. Could you provide some insights into this?
I look forward to your response.
Best,
Sangjin Lee
Hi,
I have not been sorting the alignments (e.g. via samtools sort) that I am giving sniffles. I just do this:
bwa index lambdaGenomeSequence.fa
bwa mem -M -x pacbio lambdaGenomeSequence.fa simreads.fasta | samtools view -bSh - > simreads.bam
sniffles -m simreads.bam -v simreads.vcf
Sniffles does not complain about the file not being sorted and it seems to give expected results on simulations. However, I was just curious if there will be unexpected effects of not sorting when I move on to real data on a much larger genome.
best,
John
Hi Fritz,
I work with Arabidopsis thaliana which we dont expect to see so much heterozygosity. But Half of my SV calls basically heterozygous and I dont want to filter them out.
Is there any parameter that I can use for genotyping considering this problem?
I have found these but I dont know which I am supposed to use:
-f , --allelefreq
--min_het_af
--min_homo_af
Best Regards,
Mehmet
Hi,
Thanks for developing this tool. It seems to do what you says it does. :)
I was curious about the "--use_MD_Cigar" flag. Could you give me any insight of when I would or would not want to use it?
Many thanks.
--John
Hi,
I tried installing Sniffles on my Mac, without success. I was wondering whether you could help me, please?
I have followed these steps:
wget https://github.com/fritzsedlazeck/Sniffles/archive/master.tar.gz -O Sniffles.tar.gz
tar xzvf Sniffles.tar.gz
cd Sniffles-master/
mkdir -p build/
cd build/
Then, for Mac, you suggest this: cmake -D CMAKE_C_COMPILER=/opt/local/bin/gcc-mp-4.7 -D CMAKE_CXX_COMPILER=/opt/local/bin/g++-mp-4.7 ..
However, I don't have a 'local' folder in 'opt', only a 'X11' folder. However, in my '/usr/bin
', I have 'gcc' and 'g++'. So I tried the following command:
cmake -D CMAKE_C_COMPILER=/usr/bin/gcc -D CMAKE_CXX_COMPILER=/usr/bin/g++ ..
Compiling the C compiler identification source file "CMakeCCompilerId.c" failed.
Compiler: /opt/local/bin/gcc-mp-4.7
Build flags:
Id flags: -c
The output was:
No such file or directory
Compiling the C compiler identification source file "CMakeCCompilerId.c" failed.
Compiler: /opt/local/bin/gcc-mp-4.7
Build flags:
Id flags: -Aa
The output was:
No such file or directory
Compiling the C compiler identification source file "CMakeCCompilerId.c" failed.
Compiler: /opt/local/bin/gcc-mp-4.7
Build flags:
Id flags: -D__CLASSIC_C__
The output was:
No such file or directory
etc.
It seems that I do need to get '/opt/local/bin/gcc-mp-4.7' and '/opt/local/bin/g++-mp-4.7'. Do you know how I could do this?
Thanks.
Best wishes,
Cristian.
Hi,
First, how much value of STD_quant_start and STD_quant_stop is precise and how much value of STD_quant_start and STD_quant_stop is imprecise.
Second: use bam(ngm-lr) to caculate depth(samtools),depth is very low,less than 2x, raw data 60x, how difference the pacbio bam and the illumina bam?how do you caculate depth?
Thanks.
using a current build adding --genotype flag produces AF entries in INFO but they aren't defined in the header. I assume this is allele frequency so am adding
##INFO=<ID=AF,Number=.,Type=Integer,Description="Allele Frequency.">
is that right?
Hi, I've just run sniffles to call structral variants with pacbio data. Now I want to filter some variants which look like impossilbe. For example,
1 724782 0 N <DUP> . PASS IMPRECISE;SVMETHOD=Snifflesv1.0.7;CHR2=1;END=224202044;ZMW=19;STD_quant_start=59.084034;STD_quant_stop=93.640885;Kurtosis_quant_start=-1.482464;Kurtosis_quant_stop=-1.658414;SVTYPE=DUP;SUPTYPE=SR;SVLEN=223477262;STRANDS=-+;RE=26 GT:DR:DV ./.:.:26
The length of this SV is 223477262, almost equal to the length of chromesome 1. And in the INFO field it is marked as IMPRECISE. I wonder whether I can use this mark to exclude it ? If not , is there any other way to filter raw variants?
Hi Fritz,
on my machine, Sniffles crashes with a segfault when given an invalid path to the output vcf file:
sniffles -m input.bam -v non_exisiting_directory/output.vcf
Segmentation fault
I noticed this when I forgot to create the output directory ("non_exisiting_directory") before running Sniffles. Although this was my fault of course, it would be great if Sniffles would give a proper error message ;-)
Cheers
David
Hi again,
Yesterday, I git cloned the latest version (1.0.3) -- I had been using 1.0.0.
I noticed it called more SVs given the same data. Most of the additional SV calls were of the type: TRA
. I assume this means "translocation". ((edit: were there TRA calls in 1.0.0 -- or is this a newer feature since then?))
I am running this on a de novo assembly and also observed that the majority of the TRA calls (if not all) strictly map to the ends of separate contigs (different contig pairs for each TRA call) -- i.e. potentially scaffolding them. Have you explored using this information for scaffolding de novo assemblies? If so, any thoughts?
Best,
John
Hi.
Which is the recommended pipeline for SV by PacBio reads
for small var, 1) seems better. pacbio reads are more close to reference genome.
for big SV, 2) seems better. pacbio reads are more close to each other?
Is there some public test result?
I am trying to use sniffle to detect sv on human genomes. The guide says "requires output from BWA-MEM with the optional SAM attributes enabled". What sam attributes are required? Thanks!
Hi,
I have just run Snifflesv1.0.7 to call structural variants with pacbio data. However, when I checked my results, I found many translocation calls are redundant. For example , there are 8 records located on chromosome 9 that seems to represent the same variant.
9 110248031 28615 N <TRA> . PASS PRECISE;SVMETHOD=Snifflesv1.0.7;CHR2=22;END=23388487;ZMW=15;STD_quant_start=0.000000;STD_quant_stop=0.632456;Kurtosis_quant_start=6.944904;Kurtosis_quant_stop=-1.664143;SVTYPE=TRA;SUPTYPE=SR;SVLEN=1203722464;STRANDS=--;RE=18 GT:DR:DV ./.:.:18
9 110248031 28616 N <TRA> . PASS PRECISE;SVMETHOD=Snifflesv1.0.7;CHR2=22;END=23524163;ZMW=15;STD_quant_start=0.000000;STD_quant_stop=0.632456;Kurtosis_quant_start=1.716204;Kurtosis_quant_stop=-0.498174;SVTYPE=TRA;SUPTYPE=SR;SVLEN=1203858140;STRANDS=-+;RE=18 GT:DR:DV ./.:.:18
9 110249307 28617 N <TRA> . PASS PRECISE;SVMETHOD=Snifflesv1.0.7;CHR2=22;END=23388486;ZMW=28;STD_quant_start=0.000000;STD_quant_stop=0.516398;Kurtosis_quant_start=10.688393;Kurtosis_quant_stop=2.387808;SVTYPE=TRA;SUPTYPE=SR;SVLEN=1203721187;STRANDS=--;RE=30 GT:DR:DV ./.:.:30
9 110249307 28619 N <TRA> . PASS PRECISE;SVMETHOD=Snifflesv1.0.7;CHR2=22;END=23524163;ZMW=19;STD_quant_start=0.288675;STD_quant_stop=0.645497;Kurtosis_quant_start=1.002878;Kurtosis_quant_stop=2.246256;SVTYPE=TRA;SUPTYPE=SR;SVLEN=1203856864;STRANDS=-+;RE=24 GT:DR:DV ./.:.:24
9 110250549 28621 N <TRA> . PASS PRECISE;SVMETHOD=Snifflesv1.0.7;CHR2=22;END=23388492;ZMW=34;STD_quant_start=0.000000;STD_quant_stop=0.677003;Kurtosis_quant_start=-0.195078;Kurtosis_quant_stop=-1.416803;SVTYPE=TRA;SUPTYPE=SR;SVLEN=1203719951;STRANDS=++;RE=48 GT:DR:DV ./.:.:48
9 110250549 28622 N <TRA> . PASS PRECISE;SVMETHOD=Snifflesv1.0.7;CHR2=22;END=23524158;ZMW=30;STD_quant_start=0.000000;STD_quant_stop=0.000000;Kurtosis_quant_start=0.167256;Kurtosis_quant_stop=5.765222;SVTYPE=TRA;SUPTYPE=SR;SVLEN=1203855617;STRANDS=+-;RE=38 GT:DR:DV ./.:.:38
9 133681711 28629 N <TRA> . PASS PRECISE;SVMETHOD=Snifflesv1.0.7;CHR2=22;END=23632361;ZMW=40;STD_quant_start=0.935414;STD_quant_stop=0.500000;Kurtosis_quant_start=4.041601;Kurtosis_quant_stop=6.036755;SVTYPE=TRA;SUPTYPE=SR;SVLEN=1180532658;STRANDS=-+;RE=49 GT:DR:DV ./.:.:49
9 133681337 29767 N <TRA> . PASS PRECISE;SVMETHOD=Snifflesv1.0.7;CHR2=22;END=23632359;ZMW=37;STD_quant_start=0.759555;STD_quant_stop=0.960769;Kurtosis_quant_start=2.699066;Kurtosis_quant_stop=7.087048;SVTYPE=TRA;SUPTYPE=SR;SVLEN=1180533030;STRANDS=+-;RE=53 GT:DR:DV ./.:.:53
According to other research, there are only one reciprocal translocation between chromosome 9 and 22, so I wonder why this happened, and what I can do to get a better result.
I aligned Oxford Nanopore (ONP) samples using ngmlr with this command
ngmlr -t 4 -r <FASTA> -q <ONP_FASTQ> -o <out.sam> -x ont
I ran Sniffles with the following command
sniffles --genotype -m <sorted.bam> -s 5 -v <sniffles_sv.vcf>
The output VCF I get does not have genotypes. The sites appear as such
./.:.:13
I notice get only .
for the DR format field as well.
To iterate, every site gets a ./.
genotype and .
DR value.
Running sniffles --version
returns
sniffles version: 1.0.3
Is genotyping currently supported for this version? Also can you run sniffles
jointly with multiple samples? I tried sniffles -m <1.bam> <2.bam> ..
and sniffles -m <1.bam> -m <2.bam> ..
and got error messages.
Thanks for any help!
Hi Sniffles developers,
What does the DUPcalls in Sniffles result file refer to, interspersed duplication or tandem duplication?
Thanks.
George
Hi Fritz,
I would like to get even one single nucleotide deletions or insertions. When I set -l parameter to 1. It gets so slow. 15 hours was not enough even to process couple of thousands reads.
Normally if -l is higher than 5 it takes 20 min.
Is it normal?
Best,
Mehmet
IGV 2.3.68 throws an error on a vcf file I created with sniffles:
Unable to parse header with error: Your input file has a malformed header: Unexpected tag "SUPTYPE" in line /path/to/file.vcf
At first I thought it was a typo, but changing SUPTYPE to SUBTYPE gave a similar error.
Hi again, sorry for the second and maybe dumb question.
I just noticed that sniffles only worked on my first 13 contigs, but I have 500+ contigs in total. I did not saw any option to restrict the analysis only on the first contigs... But maybe I did something wrong ? Any idea ?
Thanks again,
Roxane
Switch Chr Contig_11
# Processed reads: 2680000
# Processed reads: 2690000
# Processed reads: 2700000
# Processed reads: 2710000
# Processed reads: 2720000
# Processed reads: 2730000
# Processed reads: 2740000
# Processed reads: 2750000
# Processed reads: 2760000
Switch Chr Contig_12
# Processed reads: 2770000
# Processed reads: 2780000
# Processed reads: 2790000
Switch Chr Contig_13
# Processed reads: 2800000
# Processed reads: 2810000
# Processed reads: 2820000
# Processed reads: 2830000
Finalizing ..
The bug I was referring to in #30
After some debugging I don't think that the genotyping is the problem.
The program segfaults because the variant is defined as DUP/INS (sv_type = 6), causing the following line (Breakpoint.cpp:452) to be false (variant is treated as an insertion):
if (!(this->get_SVtype() & INS)) { //all types but Insertions:
and thus moving into the else clause which tries to compute the median of the lengths2 vector (Breakpoint.cpp:468), but this segfaults because lengths2 is empty.
When I change line 452 to
if (!(this->get_SVtype() == INS)) { //all types but Insertions:
the program terminates successfully.
Hi.
I tested Sniffles with 3 contigs that have 100/1000/2000 bp INDEL.
but Sniffles reported 'Too few reads' even if '--min_support 1'.
# sniffles -m sort.bam --min_support 1 -v output.vcf
Estimating parameter...
Too few reads detected in sort.bam
The contig are mapped by ngmlr and minimap2.
sniffles version:(git source)
# sniffles --version
sniffles version: 1.0.8
Hi Fritz,
I m willing to compare SVs which were called by sniffles with pacbio and short read read caller. (I use pindel for that).
Everything worked pretty well, there is one thing that I do not understand. The way that you give positions for indels.
For example for pindel it is always one nucleotide difference between start and stop of sv. Even if it is a SNP or 100 nucleotide insertions
.
But for Sniffles it varies a lot. I didnt understand the logic behind it. That is why currently I can not compare these two tools' output for indels.
Is it possible to tell me why insertion positions varies that much? Are those complex indels or what?
For example this one;
NC_003075.7 7231255 7968 N . PASS PRECISE;SVMETHOD=Snifflesv1.0.6;CHR2=NC_003075.7;END=7231260;STD_quant_start=0.000000;STD_quant_stop=1.140175;Kurtosis_quant_start=17.144694;Kurtosis_quant_stop=12.218618;SVTYPE=INS;SUPTYPE=AL;SVLEN=11;STRANDS=+-;RE=21 GT:DR:DV ./.:.:21
There is 5 nucleotides distance between start and stop position. I believe it should be something like that start:7231255 and stop 7231256
Best,
Mehmet
Hi,
thank you for fixing bugs. i sincerely appreciate.
i have some questions about "strands" in INFO column.
there is "STRANDS=xx" records.
i extracted SV type and STRANDS pairs from Sniffles output.
i used cancer reads generated by MinION.
result is as below:
DEL + -
DEL/INV + -
DEL/INV/INVDUP + +
DUP - +
DUP/INS + -
INS + -
INV + +
INV + -
INV - -
INV/INVDUP + +
INV/INVDUP - -
INVDUP + +
INVDUP - -
TRA + +
TRA + -
TRA - +
TRA - -
i read this ##INFO line:
##INFO=<ID=STRANDS,Number=.,Type=String,Description="Strand orientation of the adjacency in BEDPE format (DEL:+-, DUP:-+, INV:++/--)">
i think INV will have only STRANDS=--
or STRANDS=++
but in my .vcf
, INV
sometimes have STRANDS=+-
.
i would like to know what +/- means.
Hi,
As usual - thanks for the great tool.
In the BEDPE output, the header specifies the columns as:
#Chrom start stop chrom2 start2 stop2 variant_name/ID score (smaller is better) strand1 strand2 type number_of_split_reads best_chrom best_pos best_chrom2 best_pos2 predicted_length
The last column is called predicted_length
. I naively assumed this was the predicted length of the SV -- e.g. of a deletion. However, I recently noticed that the predicted_length
column always has the same value as the number_of_split_reads
column. For example:
11 11
17 17
15 15
11 11
11 11
12 12
23 23
11 11
12 12
13 13
So my question is -- is that intentional (that both columns have the same value)? If so, is it safe to say that predicted_length
is meaningless for now?
Best,
John
Hi Fritz,
I am working a bacterial genome, which is a circular genome. I would like to call SV using Sniffle.
I have mapped my PacBio read to ref genome using NG-MLR.
ngmlr -r $REF -q $FILE -o $OUTPATH/$SAMNAME -t 30
Then I sort the sam file and convert it to sorted bam format using picard tool.
java -Djava.io.tmpdir=
pwd/tmp -jar $PICARD/picard.jar SortSam \ I=$OUTPATH/$SAMNAME O=$OUTPATH/${SAMNAME%%.sam}_sorted.bam \ SO=coordinate TMP_DIR=
pwd/tmp
The sorted bam file was fed to Sniffles:
sniffles -m file_sorted.bam -v file.vcf
After running this command, I got an error message:
Segmentation fault (core dumped)
Could you help me with this problem?
Jie
Hi,
Thanks a lot for sniffles, this is a really efficient, easy to install and to run tool... I'm so glad such a tool exist.
I've recently used sniffles after ngmlr alignement in order to detect SV in my higly diploid assembly.
I'm not really used to VCF format, but I think I can easily infer what SVTYPE means : INS = insertion, DEL = deletion, INV = inversion...
But I've got a lot of theses huge SV (like several Millon bases long) that are written as TRA, which I have no idea what it stands for.
I couldn't find any information regarding this in VCF specifications format . Do you have any idea about that subject ?
Cheers,
Roxane
Hi,
I am curious how can we cite Sniffles if we used it?
Do you have a publication for Sniffles?
Thanks!
When I run sniffles-1.0.6 with the --report_seq option I get:
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr
Aborted (core dumped)
This does not happen when I don't use the --report_seq option. Without that option the program gives interesting results, thanks.
The problem seems to occur around the time the first vcf record it being output, 1-5 minutes into the run of the program. The error occurs for 4 out of 5 bam files I was working with, while the fifth one worked OK.
Hi Fritz,
For BWA it is obvious -x and -M but I am not 100% sure which parameters for ngmlr that I should enable to use for sniffles?
Best,
Mehmet
I am running the command as follow:
sniffles-core-1.0.6/sniffles -n 5 --genotype -s 2 -m ../pacbio_align_sorted.bam -v ./sniffle_mem_min2.vcf > ./sniffle_out_mem_min2.log 2>&1
after some time of running it raise up this error:
*** Error in `/home/user/source/Sniffles_1.0.6/bin/sniffles-core-1.0.6/sniffles': double free or corruption (out): 0x00007ffe70fb9a70 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f44dadd47e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f44daddd37a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f44dade153c]
/home/medhat/source/Sniffles_1.0.6/bin/sniffles-core-1.0.6/sniffles(_ZN9Genotyper11compute_covER15Breakpoint_TreeRP15breakpoint_node+0x20f)[0x42d67f]
/home/medhat/source/Sniffles_1.0.6/bin/sniffles-core-1.0.6/sniffles(_ZN9Genotyper10update_SVsEv+0x3e)[0x43141e]
/home/user/source/Sniffles_1.0.6/bin/sniffles-core-1.0.6/sniffles(main+0x1ff)[0x426f5f]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f44dad7d830]
/home/user/source/Sniffles_1.0.6/bin/sniffles-core-1.0.6/sniffles(_start+0x29)[0x42aad9]
======= Memory map: ========
00400000-0052a000 r-xp 00000000 08:22 2009114 /home/medhat/source/Sniffles_1.0.6/bin/sniffles-core-1.0.6/sniffles
0072a000-0072b000 r--p 0012a000 08:22 2009114 /home/medhat/source/Sniffles_1.0.6/bin/sniffles-core-1.0.6/sniffles
0072b000-0072c000 rw-p 0012b000 08:22 2009114 /home/medhat/source/Sniffles_1.0.6/bin/sniffles-core-1.0.6/sniffles
0072c000-0072f000 rw-p 00000000 00:00 0
0232d000-11371000 rw-p 00000000 00:00 0 [heap]
7f44c4000000-7f44c5be4000 rw-p 00000000 00:00 0
....
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
[1] 15054 abort (core dumped) sniffles-core-1.0.6/sniffles -n 5 --genotype -s 2
any suggestion?
Hello
I am curious regarding the way the results are represented.
I have for example a position which was confirmed through another independent analysis:
000000F 11430646 11430740 000000F 11580771 11580779 8 -1 + - DEL 15 000000F 11430739 000000F 11580772 150033
So I understand the format of the *.bede file here the following way:
At position 11430646 there is a potential deletion on one of the alleles which ranges till 11580772.
This is similarly as well graphically represented in the VCF file in IGV.
Now I have another candidate:
006538F 3752 3780 006956F 783 1387 7718 -1 + + TRA 16 006538F 3779 006956F 1222 1406222
This is now a translocation which supposedly starts at 3752 on 006538F. But where does it no stop ?
I guess if I understood correctly is stops on 006956F @ 1222 ?
These are not visualized in the VCF if visualized by IGV .
Hello
It seems that the parallel output in VCF and BEDE is not working . E.g. if I try:
sniffles -m file.sorted.bam -t 10 -v file.sniffles.vcf -b file.sniffles.bede
Only the VCF is generated. If I try on the other hand:
sniffles -m file.sorted.bam -t 10 -b file.sniffles.bede &
Everything works well. Is is possible to generate both outputs the same time?
Cheers
Hi,
I am wondering what exactly sniffles is going to do when I specify the option --Ivcf
?
In the wiki, it says that this 'enables forced calling'. Does this mean sniffles will limit variant calling to regions of previously seen variants? If so, are the regions exact positions (variants in vcf file) or do they include flanking regions for variability?
Sorry if I misunderstood.
Dear Fritz
When I try to upload the vcf file into IGV I'm getting an error complaining about missing columns. In the example below the 3rd line is missing the GT:DR:DV field and the field behind that. I'm not too worried about these fields missing but having a couple of tab-separated dots in the lines before the line break where these fields would be would make IGV's problem go away, I reckon.
ENA|CM001196|CM001196.1 83605 0 N <DUP> . PASS PRECISE;SVMETHOD=Snifflesv1.0.8;CHR2=ENA|CM001196|CM001196.1;END=4960077;STD_quant_start=7.925792;STD_quant_stop=0.797724;Kurtosis_quant_start=-1.159180;Kurtosis_quant_stop=-1.284778;SVTYPE=DUP;SUPTYPE=SR;SVLEN=4876472;STRANDS=-+;RE=22;AF=0.956522 GT:DR:DV 1/1:1:22 ENA|CM001196|CM001196.1 85847 1 N <DEL> . PASS PRECISE;SVMETHOD=Snifflesv1.0.8;CHR2=ENA|CM001196|CM001196.1;END=4960071;STD_quant_start=2.102630;STD_quant_stop=1.076055;Kurtosis_quant_start=5.490807;Kurtosis_quant_stop=-1.172381;SVTYPE=DEL;SUPTYPE=SR;SVLEN=4874224;STRANDS=+-;RE=39;AF=0.975 GT:DR:DV 1/1:1:39 ENA|CM001196|CM001196.1 116744 2 N <INS> . PASS IMPRECISE;SVMETHOD=Snifflesv1.0.8;CHR2=ENA|CM001196|CM001196.1;END=116744;STD_quant_start=5.024938;STD_quant_stop=14.515509;Kurtosis_quant_start=-1.800901;Kurtosis_quant_stop=-1.888449;SVTYPE=INS;SUPTYPE=AL;SVLEN=71;STRANDS=+-;RE=40 ENA|CM001196|CM001196.1 117804 3 N <INS> . PASS IMPRECISE;SVMETHOD=Snifflesv1.0.8;CHR2=ENA|CM001196|CM001196.1;END=117804;STD_quant_start=4.156655;STD_quant_stop=12.062338;Kurtosis_quant_start=-1.102856;Kurtosis_quant_stop=-1.052519;SVTYPE=INS;SUPTYPE=AL,SR;SVLEN=610;STRANDS=+-;RE=37;AF=0.902439 GT:DR:DV 1/1:4:37 ENA|CM001196|CM001196.1 146939 4 N <DEL> . PASS PRECISE;SVMETHOD=Snifflesv1.0.8;CHR2=ENA|CM001196|CM001196.1;END=150338;STD_quant_stop=-1.052519;SVTYPE=INS;SUPTYPE=AL,SR;SVLEN=610;STRANDS=+-;RE=37;AF=0.902439 GT:DR:DV 1/1:4:37
Regards,
A. Webb
Hello, I am working on a human genome, and I want to call SV using Sniffle. I have just mapped pacbio data to human reference genome using ngmlr and then sort the bam file using samtools. Before feeding the sorted bam file to sniffle, I wonder whether I need to mark duplicates and select only a single subread as non-duplicate?
Hello,
I am using nanopore (1D) human data, I have aligned using ngmlr and minimap2. Sniffles for SV calling, but ngmlr bam of the same sample is giving straucrtual variant vcf but minimap2 bam is not giving any variant output.
Do yo have any idea about it?
One more question, can I use Illumina bam to extract structural variants using Sniffles.
Expand Sniffles to use the CS tag proposed by @lh3
https://github.com/lh3/minimap2#the-cs-optional-tag
Hi - are there any plans to include sequences in the VCF output? For example, if it's an insertion, what sequence (from the reads) has been inserted?
If this isn't planned, can you suggest other tools to extract sequence from a VCF, reference, and reads?
Thanks,
~Joe
Hi Fritz,
I was wondering if Sniffles is suitable for calling variants on pooled sample data, i.e. where there might be quite low frequency SVs? Or is it best suited to data from individuals?
Thanks very much in advance, and best wishes!
Sam
You have object files and autogenerated makefiles committed to the repository here, which isn't an efficient use of git. Please consider removing them from the repository and adding lines to .gitignore
to prevent them getting added back in inadvertently.
Hi,
I am using Sniffles for my research.
I added --report_BND
option but there may be bug.
In Sniffles/src/print/VCFPrinter.cpp
line 95,
if (strands[0] == '-' && strands[0] == '+')
I think this if
statement is always false.
Hi, can you create a release tag for Sniffles? I'd just like this for reproducibility as my users seem quite enamored with your software!
Hi Fritz,
I applied Sniffles (1.0.6) on one simulated dataset of mine (100 deletions simulated in hg19, chr21+chr22). Sniffles detected many deletions correctly but also outputted several super-large (and in my opinion incorrect) SVs. See the IGV screenshot below (top row: implanted deletions, bottom row: detected SVs from Sniffles)
Two examples from the VCF output:
chr21 11004164 7 N <DEL> . PASS PRECISE;SVMETHOD=Snifflesv1.0.6;CHR2=chr21;END=32799278;STD_quant_start=0.935414;STD_quant_stop=0.707107;Kurtosis_quant_start=0.102041;Kurtosis_quant_stop=-1.000000;SVTYPE=DEL;SUPTYPE=SR;SVLEN=21795114;STRANDS=+-;RE=8 GT:DR:DV ./.:.:8
chr21 11142297 11 N <DUP> . PASS PRECISE;SVMETHOD=Snifflesv1.0.6;CHR2=chr21;END=31923593;STD_quant_start=5.196152;STD_quant_stop=1.154701;Kurtosis_quant_start=0.000000;Kurtosis_quant_stop=0.000000;SVTYPE=DUP;SUPTYPE=SR;SVLEN=20781296;STRANDS=-+;RE=3 GT:DR:DV ./.:.:3
I probably made some mistake here but cannot find it. Have you seen something similar before?
This is my pipeline:
Hi.
please change 'Number=.' to 'Number=A' or 'Number=R' in VCF for more strict VCF.
then 'bcftools norm --multiallelics' can be used to split/merge VCF record.
This will be a minor enhancement.
Best Regards
Hi,
I have got many structral variants called by Sniffles. Now I want to extract copy number variants from them, but I don't know what to do . Is it proper to just treat DUP and DEL as copy number variants ?
I'll appreciate your help!
Hello,
I tested Sniffles on a small sample. I used NGM-LR for mapping pacbio long reads to the reference.
Sinffles reports about 13,000 variants, but they are all imprecise.
Is this expected? Any chance that I get precise calls by changing parameters of the mapper or Sniffles?
Dear Fritz,
I was wondering if the following issue is an error in sniffles. I have aligned 5X worth of PacBio sequel data using NGMLR and called structural variations using SNIFFLES. I have found cases where insertions are called and insertion sequences are provided, but the SVLEN is missing as the example below shows.
I can approximate the SVLEN from the given SEQ, but I wanted to know what was causing this issue.
5 64184964 7714 N . PASS PRECISE;SVMETHOD=Snifflesv1.0.8;CHR2=5;END=64184964;ZMW=1;STD_quant_start=0.707107;STD_quant_stop=0.707107;Kurtosis_quant_start=-1.000000;Kurtosis_quant_stop=-1.000000;SVTYPE=INS;SUPTYPE=AL;SVLEN=NA;STRANDS=+-;SEQ=TGATCACGTTGGAAACACTCTTTTGTAGTGTCTGGAAGGGAAAGATTTGGAGCGCTTTGATGCCCTTTGGTGAACAAGGGCATTTCTTCCCATAAAAATTAGACAGAAGCATTTCTAGAAACTTGTTGTGATGGTGACCCAGCTAAGGAGGACAGTTGAACATTCGTATGATAGAGCAGTGTGAAACACTATTTTTGTGGAACAATGCAAGCAATTGGATATTTGGATAGCTGGAGGGGATTTCGTTGAAAGCGGGATATTCAAATAAAAAGGTAGACAGGAGCATTCTCAGAAATTTTCTTTCTGATGTACTGATATTCAACTCATAGAGTGAAGATTCCCGTTCATAGAGGCAGTTTTGAAACACCTTCTGGAGTATCTGGATGTGGACATGTGGAGCGCTTTGATTGCCTACGGTGAATGAAAAAGGGCAAATATCTTCCCATTAAAACGAGACAGGAAGATTCGGAGAGACAGTGGATGTGTTGTACATCAAGATAGCTAACAGAGTGCGAACCCTTTCTTTGTTACAGAGCAGCTTTGAAACGCTATTTTTGTTGGATCTGCAAATGGATATTTAGAGTTGCTTTTTACAGATATCGTTGGAATAGGGAAAATCGTCAACAAAATCTGGACAGAGCATTCTCACCAAAACTTCTTGTTGATGTGTGTCCTCAACGTAACAGAGTGAAAACTTTCTTTTCGATGCCTAGCAGTTTGGAAAAACCTTGTTGTAGAACACTGTAAGGGGATATTGGGATTGCTCTAAGATTTCGTTGGAAACGGGAATATCAATCATCTAAAATCTAGACAGAAGCACTTATTTAGAAAAATCTTGGTGATATCTGCATTCAAGTCAGAAGAGTTGAACATTCCTTACTTGAGATCGTTTGAAACACCCTTTGGAAAGAATCTGGAATGTGGAATTTGGAGCGCTTTATGCCTTTGGTTGAAAAGGAAACGTCTTCCAATAAAAGCTAGACAGAAAGCATTCTCAGAAATTGTTTCCTTGATGTGTGTACTCAACTAAAAGAGTTGAACCTTTCTATTGATTGTAGAGCAGTTTAACACACTCTTTTTGTGGATTCTGCAAGTGGATATTTGGATTGCTTTGAGGAGGTTCGTTGGAAGCGGGAATTCGTTAAACACACTAGAAGCACATTTCCCAGAAATGTTCTGGATATTTCAATTCAACTCATAGAGATGAACATGGCCTTCATAGAGCAGGTTGAAACACTCTGTTTTGTAGTTGTGGAAGTTGGACATTGTCGATCGCCTGGACGCCTACGGTTGAATGTGAAAAAAGGAGGAAAATTATATTCCAGAAAAAGGACAGAAGCATTCTCAGGAAACTGTGGTTGATTTGGTCCTCAACTAACAGAGTTGACTTTGCCTATGATAGGAGAGCAGTTTTTGAAACCCACTCTATTTTCATGATCTTGCAAGGGGATATTTGGATAGCTTGGAGATTTAGTTGGAAGCGGGCATTCAAAGTAAAGTAGACAGCAGCATTCGCAGATAAATTATTTTCTGATCGATGCATCAAATCATTAGAGTTGAACATTTCCTTTGTCATAGGGCAGGTTTGAAAGACCTCTTTTCTCGTAGTATCTGGCTGTGGAATTTGTGAGCGTTGAGCCTACGGTGAAAAAGTAAATATCTTCCCATTTAAAAACGAGACAGAAGGATTTGAGAAAACAAGTTGTGAATGGTGTACTCAGGCTAACAGAGTGAACCTTCTCTTTGATGCCAGTAGTTTGGAAACACTCTTTTTTGTAGAATAGGGAAACTGTAAGGGTGATTGGGATAGCTCTAATGATTTCGTTGGAAACGGAATATAATACACTAAAAATCGAGACAGAAGCCCTTTCAGAAAATACTTGTGATATCTGCATCAAGTCAACAGCGGTGTTCGAACATCAGCTTTCTTAGACACGTGGAAAACGTCTTGTTGTAGGGGGTCTGGAAGTGGTGGACATTTGGAAGCGCTTTGATGCCTT;RE=2;AF=0.4 GT:DR:DV 0/1:3:2
Best,
Sangjin Lee
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.