walaj / svaba Goto Github PK

View Code? Open in Web Editor NEW

228.0 17.0 44.0 3.87 MB

Structural variation and indel detection by local assembly

License: GNU General Public License v3.0

Shell 0.16% Makefile 18.06% R 6.42% M4 0.06% C++ 72.71% C 2.32% Dockerfile 0.04% CMake 0.23%

indels variants assembled-contigs structural-variations c-plus-plus

svaba's Introduction

SvABA - Structural variation and indel analysis by assembly

[This project was formerly "Snowman"]

License: GNU GPLv3

Installation
Description
Output file description
Filtering and refiltering
Recipes and examples
Attributions

Installation

We recommend compiling with GCC-4.8 or greater. svaba now uses CMake instead of autotools. Note: svaba no longer bundles htslib. A system version of htslib needs to be pointed to during compilation

git clone --recursive https://github.com/walaj/svaba
cd svaba
mkdir build
cd build
## replace the paths below with the paths on your own system
cmake .. -DHTSLIB_DIR=/home/jaw34/software/htslib-1.16
make

## QUICK START (eg run tumor / normal on Chr22, with 4 cores)
build/svaba -t tumor.bam -n normal.bam -k 22 -G ref.fa -a test_id -p -4

## get help
svaba --help
svaba run --help

SvABA uses the SeqLib API for BAM access, BWA-MEM alignments, interval trees and operations, and several other auxillary operations.

Description

SvABA is a method for detecting structural variants in sequencing data using genome-wide local assembly. Under the hood, SvABA uses a custom implementation of SGA (String Graph Assembler) by Jared Simpson, and BWA-MEM by Heng Li. Contigs are assembled for every 25kb window (with some small overlap) for every region in the genome. The default is to use only clipped, discordant, unmapped and indel reads, although this can be customized to any set of reads at the command line using VariantBam rules. These contigs are then immediately aligned to the reference with BWA-MEM and parsed to identify variants. Sequencing reads are then realigned to the contigs with BWA-MEM, and variants are scored by their read support.

SvABA is currently configured to provide indel and rearrangement calls (and anything "in between"). It can jointly call any number of BAM/CRAM/SAM files, and has built-in support for case-control experiments (e.g. tumor/normal, or trios or quads). In case/control mode, any number of cases and controls (but min of 1 case) can be input, and will jointly assemble all sequences together. If both a case and control are present, variants are output separately in "somatic" and "germline" VCFs. If only a single BAM is present (input with the -t flag), a single SV and a single indel VCF will be emitted.

A BWA-MEM index reference genome must also be supplied with -G.

Output file description

`*.bps.txt.gz`

Raw, unfiltered variants. This file is parsed at the end to produce the VCF files. With the bps.txt.gz, one can define a new set of filteirng criteria (depending on sensitivity/specificity needs) using svaba refilter.

`*.contigs.bam`

All assembly contigs as aligned to the reference with BWA-MEM. Note that this is an unsorted file. To view in IGV, it must be first sorted and indexed (e.g. samtools sort -m 8G id.contigs.bam id.sort && samtools index id.sort.bam)

`*.discordants.txt.gz`

Information on all clusters of discordant reads identified with 2+ reads.

`*.log`

Log file giving run-time information, including CPU and Wall time (and how it was partitioned among the tasks), number of reads retrieved and contigs assembled for each region.

`*.alignments.txt.gz`

An ASCII plot of variant-supporting contigs and the BWA-MEM alignment of reads to the contigs. This file is incredibly useful for debugging and visually inspecting the exact information SvABA saw when it performed the variant-calling. This file is typically quite large. The recommended usage is to identify the contig name of your variant of interest first from the VCF file (SCTG=contig_name). Then do gunzip -c id.alignment.txt.gz | grep contig_name > plot.txt. It is highly recommended that you view in a text editor with line truncation turned OFF, so as to not jumble the alignments.

`*.vcf`

VCF of rearrangements and indels parsed from bps.txt.gz and with a somatic_score == 1 (somatic) or 0 (germline) and quality == PASS. NOTE that the cutoff for rearrangement vs indel is taken from BWA-MEM, whether it produces a single gapped-alignment or two separate alignments. This is an arbitrary cutoff, just as there is no clear consensus distinction between what constitutes an "indel" and a "structural variant". The unfiltered VCF files include non-PASS variants.

Filtering and Refiltering

SvABA performs a series of log-likelihood calculations for each variant. The purpose is to first classify a variant as real vs artifact, and then to determine if the variant is somatic or germline. These log-likelihoods are output in the VCF and bps.txt.gz file and described here:

LOD (LO) - Log of the odds that variant is real vs artifact. For indels, the likelihood of an artifact read is proportional to the length of local repeats (repeating units up to 5 long per unit)
LR - Log of the odds that the variant has allelic fraction (AF) of 0 or >=0.5. This is used for somatic vs germline classification
SL - Scaled LOD. LOD scores is heuristically scaled as: (min(Mapping quality #1, Mapping quality #2) - 2 * NM) / 60 * LOD

SvABA can refilter the bps.txt.gz file to produce new VCFs with different stringency cutoffs. To run, the following are required:

-b - a BAM from the original run, which is used just for its header
-i - input bps.txt.gz file

Examples and recipes

Whole genome somatic SV and indel detection

wget "https://data.broadinstitute.org/snowman/dbsnp_indel.vcf" ## get a DBSNP known indel file
DBSNP=dbsnp_indel.vcf
CORES=8 ## set any number of cores
REF=/seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta
## -a is any string you like, which gives the run a unique ID
svaba run -t $TUM_BAM -n $NORM_BAM -p $CORES -D $DBSNP -a somatic_run -G $REF

Whole genome germline SV and indel detection

## Set -I to not do mate-region lookup if mates are mapped to different chromosome.
##   This is appropriate for germline-analysis, where we don't have a built-in control
##   to against mapping artifacts, and we don't want to get bogged down with mate-pair
##   lookups.
## Set -L to 6 which means that 6 or more mate reads must be clustered to 
##   trigger a mate lookup. This also reduces spurious lookups as above, and is more 
##   appropriate the expected ALT counts found in a germline sample 
##   (as opposed to impure, subclonal events in cancer that may have few discordant reads).
svaba run -t $GERMLINE_BAM -p $CORES -L 6 -I -a germline_run -G $REF

Targeted detection

## eg targets.bed is a set of exome capture regions
svaba run -t $BAM -k targets.bed -a exome_cap -G $REF

Targeted local assembly

## -k can be a chromosome, a samtools/IGV style string 
##     (e.g. 1:1,000,000-2,000,000), or a BED file
k=chr17:7,541,145-7,621,399
svaba run -t $TUM_BAM -n $NORM_BAM -p $CORES -k $k  -a TP53 -G $REF

Assemble all reads

## default behavior is just assemble clipped/discordant/unmapped/gapped reads
## This can be overridden with -r all flag
svaba run -t $BAM -r all -G $REF

Snapshot of where svaba run is currently operating

tail somatic_run.log

Debug a local assembly and produce the assembly graph

k=chr17:7,541,145-7,621,399
svaba run -t $BAM -a local_test -k $k --write-asqg

## plot the graph
$GIT/svaba/R/svaba-asqg.R

View all of the ASCII alignments

## Make a read-only and no-line-wrapping version of emacs.
## Very useful for *.alignments.txt.gz files
function ev { 
  emacs $1 --eval '(setq buffer-read-only t)' -nw --eval '(setq truncate-lines t)';
  }
ev somatic_run.alignments.txt.gz

View a particular contig with read to contig alignments

gunzip -c somatic_run.alignments.txt.gz | grep c_1_123456789_123476789 > c_1_123456789_123476789.alignments.txt
ev c_1_123456789_123476789.alignments.txt

Make a function to sort and index contigs

function sai() {
  if [[ -f $1.contigs.bam ]]; then
     samtools sort -m 4G $1.contigs.bam -o $1.contigs.sort.bam
     mv $1.contigs.sort.bam $1.contigs.bam
     samtools index $1.contigs.bam
  fi
}
## for example, for somatic_run.contigs.bam:
sai somatic_run

Attributions

SvABA is developed and maintained by Jeremiah Wala ([email protected]) -- Rameen Berkoukhim lab -- Dana Farber Cancer Institute, Boston, MA.

This project was developed in collaboration with the Cancer Genome Analysis team at the Broad Institute. Particular thanks to:

Cheng-Zhong Zhang - Asst Prof of Biomedical Informatics, Harvard Medical School (https://dbmi.hms.harvard.edu/person/faculty/cheng-zhong-zhang)
Marcin Imielinski - Asst Prof of Computational Genomics, Weill Cornell Medicine, (http://www.nygenome.org/lab-groups-overview/imielinski-lab/)

Additional thanks to Jared Simpson for SGA, Heng Li for htslib and BWA, and for the other developers whose
code contributed to SeqLib.

svaba's People

Contributors

Stargazers

Watchers

svaba's Issues

Compiling v0.2.1 (bioconda recipe)

I am trying to make a bioconda recipe for SvABA v0.2.1, but I keep getting compiler errors:

bioconda/bioconda-recipes#12943

The errors are reproducible when I do the compilation locally. However, if I check out the current master branch, everthing works fine. Do you have any plans of making a new release incorporating the latest changes?

LOWMATCHLEN is a FILTER value in sv.vcf but not described in header

There is no description of the LOWMATCHLEN included in the vcf header. Lines 738-741 in BreakPoint.cpp define the criteria for this FILTER value, but then a description is missing for LOWMATCHLEN in the sv_header.addFilterField function within vcf.cpp. This causes errors for bcftools:[W::vcf_parse] FILTER 'LOWMATCHLEN' is not defined in the header
Error: The filter is not defined in the header: LOWMATCHLEN

Example dbsnp_indel.vcf no longer available

The example dbsnp_indel.vcf is no longer available: wget "https://data.broadinstitute.org/snowman/dbsnp_indel.vcf" ## get a DBSNP known indel file

Is this known indel dbsnp available to download somewhere else? Thanks!

Any plan of replacing bwa-mem with minimap2?

Hi Jeremiah,
Thanks for your great work! Svaba is awesome! I wonder if there is any plan to replace bwa-mem with minimap2 which is compatible with paired end short reads and supports long reads. Furthermore, do you have any plan to support long reads for SV detection like Sniffle and Picky? Thanks!

Snowman VCF Visualisation in IGV

HI,

I am processsing ICGC broad snowman indel calls and I dont see genotype column filled. But when I check snv_mnv.vcf I see that each genotype has a information under it. When I remove those two genotype columns from the indelvcf, problem is solved( I visualised vcf in IGV).

My question is;
Do I loose any information by removing those two empty genotype columns in IGV ?? Also, are these columns left empty as a result of an error or something?

Best regards,

Tunc.

this is indel vcf sample.

Tuncs-MacBook-Pro:CallerComparison morova$ more /Users/morova/Desktop/CallerComparison/patient.somatic.indel.vcf
##fileformat=VCFv4.2
##fileDate=20151212
##source=snowmanSV
##reference=hg19
##INFO=<ID=NFRAC,Number=1,Type=String,Description="Normal allelic fraction at break. -1 for undefined">
##INFO=<ID=NSPLIT,Number=1,Type=Integer,Description="Number of normal reads with read-to-contig alignments supporting this indel">
##INFO=<ID=SPAN,Number=1,Type=Integer,Description="Size of the indel">
##INFO=<ID=TCOV,Number=1,Type=Integer,Description="Tumor coverage at break">
##INFO=<ID=MAPQ,Number=1,Type=Integer,Description="Mapping quality (BWA-MEM) of the assembled contig">
##INFO=<ID=TCIGAR,Number=1,Type=Integer,Description="Number of tumor reads with cigar strings supporting this indel">
##INFO=<ID=BLACKLIST,Number=1,Type=String,Description="Normal allelic fraction at break. -1 for undefined">
##INFO=<ID=NCIGAR,Number=1,Type=Integer,Description="Number of normal reads with cigar strings supporting this indel">
##INFO=<ID=REPSEQ,Number=1,Type=String,Description="Repeat sequence near the event">
##INFO=<ID=TSPLIT,Number=1,Type=Integer,Description="Number of tumor reads with read-to-contig alignments supporting this indel">
##INFO=<ID=NCOV,Number=1,Type=Integer,Description="Normal coverage at break">
##INFO=<ID=TFRAC,Number=1,Type=String,Description="Tumor allelic fraction at break. -1 for undefined">
##FILTER=<ID=PASS,Description="3+ split reads, 60 contig MAPQ">
##FILTER=<ID=REPEAT,Description="3+ split reads, 60 contig MAPQ">
##FILTER=<ID=WEAKCIGARMATCH,Description="For indels <= 5 bp, require 8+ split reads or 4+ and 3+ cigar matches">
##FILTER=<ID=WEAKASSEMBLY,Description="4 or fewer split reads">
##FILTER=<ID=LOWMAPQ,Description="Assembly contig has less than MAPQ 60">
##FORMAT=<ID=NALT,Number=1,Type=Integer,Description="Number of ALT support reads or pairs">
##FORMAT=<ID=NREF,Number=1,Type=Integer,Description="Number of REF support Reads">
##FORMAT=<ID=NALT_SR,Number=1,Type=Integer,Description="Number of ALT support Split Reads">
##FORMAT=<ID=NALT_RP,Number=1,Type=Integer,Description="Number of ALT support aberrant Read Pairs">
##FORMAT=<ID=READ_ID,Number=.,Type=String,Description="ALT supporting Read IDs">
##SAMPLE=<ID=###T>
##SAMPLE=<ID=###N>
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  ###N   ###T
1       3789178 1741:no_id      G       GGGCCCTGTCCCTGCCGC      .       PASS    BLACKLIST=1;EVDNC=INDEL;INSERTION=GGCCCTGTCCCTGCCGC;MAPQ=60;NCIGAR=0;NCOV=39;NDISC=0;NFRAC=0;NSPLIT=0;PONCOUNT=0;SCTG=c_1_3785001_3791001_48;SPAN=17;TCIGAR=37;TCOV=73;TDISC=0;TFRAC=0.506849;TSPLIT=31

snv_mnv vcf

Tuncs-MacBook-Pro:0db91c73-b2e7-4f00-8cea-0b7b3d48451e morova$ more /Users/morova/Desktop/CallerComparison/samepatient.somatic.snv_mnv.vcf
##fileformat=VCFv4.1
##oncotator_version=Oncotator_v1.8.0.0_|
##INFO=<ID=Verification_Status,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Match_Norm_Seq_Allele2,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Validation_Method,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Match_Norm_Seq_Allele1,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Score,Number=.,Type=String,Description="Unknown">
##INFO=<ID=sequencer,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Validation_Status,Number=.,Type=String,Description="Unknown">
##INFO=<ID=tumor_f,Number=.,Type=String,Description="Unknown">
##INFO=<ID=BAM_file,Number=.,Type=String,Description="Unknown">
##INFO=<ID=source,Number=.,Type=String,Description="Unknown">
##INFO=<ID=t_lod_fstar,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Strand,Number=.,Type=String,Description="Unknown">
##INFO=<ID=status,Number=.,Type=String,Description="Unknown">
##INFO=<ID=init_t_lod,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Center,Number=.,Type=String,Description="Unknown">
##INFO=<ID=PoN_remove,Number=.,Type=String,Description="Unknown">
##INFO=<ID=judgement,Number=.,Type=String,Description="Unknown">
##INFO=<ID=NCBI_Build,Number=.,Type=String,Description="Unknown">
##INFO=<ID=phase,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Tumor_Validation_Allele1,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Tumor_Validation_Allele2,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Match_Norm_Validation_Allele2,Number=.,Type=String,Description="Unknown">
##INFO=<ID=Match_Norm_Validation_Allele1,Number=.,Type=String,Description="Unknown">
##FORMAT=<ID=alt_count,Number=.,Type=String,Description="Unknown">
##FORMAT=<ID=ref_count,Number=.,Type=String,Description="Unknown">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  CPCG0206-B1-CPCG0206-F1
1       729633  .       G       A       .       PASS    PoN_remove=False;Validation_Method=none;sequencer=Illumina_GAIIx;Validation_Status=Untested;tumor_f=0.166667;source=WGS;Strand=+;status=Somatic;init_t_lod=19.297413;Center=broad.mit.edu;t_lod_fstar=26.240323;judgement=KEEP;NCBI_Build=37;phase=Phase_I      GT:alt_count:ref_count  ./.:11:55

Example BAM file and output file are available?

Hi Jeremiah,

I wonder if you provide example BAM files and output file (.vcf) that I can test svaba and replicate the result. It is because I like to test whether I am running svaba correctly or not.

I downloaded BAM files from ICGC, and ran svaba. Please see the attached files that I got from svaba running;

DO6350.svaba.somatic.sv.vcf.txt - This is filtered file.
DO6350.svaba.unfiltered.somatic.sv.vcf.txt - This is unfiltered file.

The FILTERED file is empty. All of variant callings have low quality in the UNFILTERED vcf file. I didn't use any parameter in svaba running. Please see the command I used below;

svaba run -t 474e3b36be9aba5e893234f2d4891e6e.bam -n 3a27910d544adbeab95b2b73c20ab6e5.bam
bwa index ./BWA_DNA_RefSeq_Fasta_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa
-G ./BWA_DNA_RefSeq_Fasta_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa
-p 16 -D ./DBSNP_knownIndel/dbsnp_indel.vcf
-a DO6350

I don't think the BAM files I used have problem or wrong file format. Therefore, I like to test svaba with example bam files and see whether I can replicate the result, and see whether I can get FILTERED (high quality) variant callings. Could you help?

Sanghoon

DO6350.svaba.somatic.sv.vcf.txt
DO6350.svaba.unfiltered.somatic.sv.vcf.txt

Splitting input file to small files / Directing output directory

Hi,

I have two questions.

Splitting input file to small files.
My input files (bam files) are 100 ~ 130 GB. When I ran svaba first time in a super computer (16 cores multi-thread), I set waltime limit 72 hours, but the running exceeded 72 hours and terminated. I didn't know the running will take this long time. I wonder if I can split the BAM file to small files, and run svaba. I just googled whether I can split BAM file, and I found some information.;
https://www.biostars.org/p/46327/ Once I can split the BAM file and run svaba for small input files, I think I can get multiple outputs from multiple runnings, and then I can combine the outputs. Would it be okay?
Directing output directory.
I tried to direct output directory. So, I tried to see if there is a flag that can enable to direct output files, but it seems not. "-a" is just output file name prefix. Does svaba has a flag that can direct output files?

Thank you,
Sanghoon

calculate Variant Allele Frequency (VAF)

Hello,

I was wondering what the correct way to calculate the VAF is? I tried AD/DP but a lot of the regions have a value greater than 1 and they also don't correspond with the VAF that some other callers are giving.

Thanks!

#29

Strange output with in-house reference

Hi,
I try to use svaba to detect bacterial SVs in metagenomics data with a reference contains ~600 bacterial genomes. And in *discordant.txt.gz output, I notice the chromosome id 'chr1' and 'chr2' are range from 1 to 24; while in *.alignments.txt.gz output, BP information had some chromosomes named 'X', 'Y' and 'M', like:
Global BP: : X:2,968,914(+) to 457:3,952,360(-)
...
Global BP: : Y:349,942(-) to 104:1,266,871(-)
...
Global BP: : M:10,276(+) to 552:23,800(-)
which are not in my reference.

I guess this may caused by svaba treated the reference as human genome.
Is it caused by that? and if there any way I can do to make svaba can used with this reference?

Thanks a lot!

Query regarding SV calls

Hello,

I ran svaba on few BAM files (similar samples) . I observed that a SV call is made in some samples and not in others, even though the discordant reads can be seen in the BAM files for all samples in IGV. Is there any filtering criteria or cut-off used to decide when a SV call is made and written to the *sv.vcf files? (Mode: tumor only )

I would appreciate your help.

Thanks!

a question in 'alignment.txt'

Hi Walaj,

I have a question regarding interpreting the alignment file.
I am looking at one contig's alignment.

The "Global BP" line show presence of a breakpoint between - strand on 7:145,643,321 and + strand on 7:145,913,731:
Global BP: : 7:145,643,321(-) to 7:145,913,731(+) SPAN 270410

However, the corresponding alignment says the breakpoint is between both + strands:

Would you please give me a short description on what those strands mean?

Thank you in advance,

Challenge of running svaba with FFPE tumor/normal pairs

Svaba is awesome! For frozen WGS samples, we got a genome done within 4-5 hours with low CPU/memory footage.

However, as we run tumor-normal pairs on FFPE samples,it is taking extra long time and not finishing. Is there a way to deal with the FFPE tumor-pair samples using svaba? Suggestions are really appreciated.

Here are some of the log information I have:
The first number is the line number of the sample.log:

930147 Ran GL000192.1:    441,001-    466,001 | T:  2856 N:   127 C:    94 | R: 49% M:  0% T:  5% C: 12% A: 31% P:  0% | CPU: 10282m53s Wall: 1326m25s
930148 ...BFC attempted correct 5314 train: 25402 kcov: 80.874680 kmer: 17
930149 ...assembled 227 contigs for c_84_465501_490501
930150 ...BFC attempted correct 6433 train: 33261 kcov: 100.696404 kmer: 17
930151 writing contigs etc on thread 47362852042496 with limit hit of 7520
930152 Ran GL000192.1:    465,501-    490,501 | T:  4418 N:   895 C:   159 | R: 51% M:  0% T:  6% C: 13% A: 27% P:  0% | CPU: 10283m01s Wall: 1326m27s
930153 ...assembled 268 contigs for c_84_490001_515001
930154 ...assembled 307 contigs for c_84_514501_539501
930155 writing contigs etc on thread 47362854143744 with limit hit of 7295
930156 Ran GL000192.1:    490,001-    515,001 | T:  4677 N:   637 C:   192 | R: 55% M:  0% T:  5% C: 12% A: 25% P:  0% | CPU: 10283m10s Wall: 1326m29s
930157 writing contigs etc on thread 47362839435008 with limit hit of 10967
930158 Ran GL000192.1:    514,501-    539,501 | T:  5463 N:   970 C:   203 | R: 60% M:  0% T:  4% C:  8% A: 25% P:  0% | CPU: 10283m11s Wall: 1326m29s
930159 writing contigs etc on thread 47362847840000 with limit hit of 20518
930160 ...assembled 1199 contigs for c_83_171501_196501
930161 Ran GL000225.1:    196,001-    211,173 | T:  7289 N:  4674 C:   388 | R: 53% M:  0% T:  7% C: 10% A: 27% P:  0% | CPU: 10283m14s Wall: 1326m31s
930162 writing contigs etc on thread 47362841536256 with limit hit of 51970
930163 Ran GL000225.1:    171,501-    196,501 | T: 12692 N:  8323 C:   732 | R: 58% M:  0% T:  6% C: 15% A: 18% P:  0% | CPU: 10283m24s Wall: 1326m41s

The error log of the job (the one killed before) looks like this:

...
21858     stopping read lookup at 83:85,871(+) in window 83:73,501-98,501(*) with 50,001 weird reads. Limit: 50,000
21859     stopping read lookup at 83:50,907(+) in window 83:49,001-74,001(*) with 50,001 weird reads. Limit: 50,000
21860     stopping read lookup at 83:131,670(-) in window 83:122,501-147,501(*) with 50,001 weird reads. Limit: 50,000
21861     stopping read lookup at 83:102,433(-) in window 83:98,001-123,001(*) with 50,001 weird reads. Limit: 50,000
21862     stopping read lookup at 83:126,115(+) in window 83:122,501-147,501(*) with 50,001 weird reads. Limit: 50,000
21863     stopping read lookup at 83:82,785(+) in window 83:73,501-98,501(*) with 50,001 weird reads. Limit: 50,000

Parameter for limitation of Weired read

Hi Jeremiah,

I am trying to apply SvABA to high depth targeted panel data (~1000X), especially highly fragmented FFPE tissue. Due to the numerous split reads of FFPE, SvABA printed out these messages when it ran.

breaking at 2:11,166,812(-) in window 2:11,166,378-11,167,271() with 401 weird reads. Limit: 400 breaking at 2:11,171,138(-) in window 2:11,170,834-11,171,424() with 401 weird reads. Limit: 400 breaking at 2:11,173,008(+) in window 2:11,172,546-11,173,220(*) with 401 weird reads. Limit: 400
.
.

I could not find the proper parameter to optimize these issues. How I optimize SvABA for FFPE samples with many weird reads? and does SvABA have any problems running without control data?

Thanks

Hyun-Tae

[E::hts_open_format] fail to open file

Hello,

I'm sure this is something small and insignificant but I am trying to run SvABA for the first time and the following command gives me the following error return:

svaba/bin/svaba run -t /mnt/scratch/users/40057929/MCLHighNov2016/Batch1/04-0309-01-36351377/04-0309-01_S2.bam/ n- /mnt/scratch/users/40057929/MCL/MCL/12-2164*/12-2164-01_MRDneg_S1.bam/ -G /mnt/scratch/users/40057929/wg.fa/ -a svaba_Batch1_0309 -p 4

--- Running svaba SV and indel detection on 4 threads ---
--- (inspect *.log for real-time progress updates) ---

[E::hts_open_format] fail to open file '/mnt/scratch/users/40057929/MCLHighNov2016/Batch1/04-0309-01-36351377/04-0309-01_S2.bam/'
ERROR: Cannot open main bam file: /mnt/scratch/users/40057929/MCLHighNov2016/Batch1/04-0309-01-36351377/04-0309-01_S2.bam/

I've checked the file path for the main tumour bam file and it is definitely correct. Any other advice? Let me know if you need any other information.

Thanks,
rmurphy49

AD larger than DP?

Hi,

When I check the vcf head infomation, I suppose DP > AD = SR + DR.

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Depth of coverage: Number of reads covering site.">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Allele depth: Number of reads supporting the variant">

While I found some AD value is larger than DP in a normal-tumor pair. Is this a bug?

GT:AD:DP:GQ:PL:SR:DR:LR:LO      0/1:17:29:24.8:47.9,0,24.8:17:0:-48.36:48.36    1/1:31:28:8.4:92.4,8.4,0:31:0:-92.42:92.42

How the -k option affects SV call results

Hi Jeremiah,

I was wondering how the -k option affects SV call results. When specified, will local assembly results or any other stats derived from input data be affected? Or does the option only affect reporting?

In other words, will the results of the following runs be the same?

Run SvABA with -k ${target.bed}
Run SvABA without -k and then extract the SV calls overlapping with ${target.bed}

Thanks!
-Junko

Annotate svaba output vcf

Hi,

Is there a way to annotate svaba output vcf to get gene names and genomic regions etc. I tried snpEFF but it failed to recognize svaba vcf format.

Many thanks,
Rahul

Comparing SvABA running result with ICGC StSM somatic.sv.vcf file

Hi,

I downloaded tumor and normal BAM files of ICGC donor ID, "DO48701" from ICGC. The tumor BAM file ID is FI44968, and normal BAM file ID is FI44967. I ran SvABA on the BAM files, and got "svaba.unfiltered.germline.sv.vcf (541 vcf calls)" and "svaba.unfiltered.somatic.sv.vcf (789 vcf calls)" Here is my command line;

svaba run -t 6abba3f621d5442ad1e893d1ea11e9c6.bam
-n 1ddf1b0c325527318767571b8b076e13.bam
bwa index ./Homo_sapiens.GRCh37.75.dna.primary_assembly.fa
-G ./Homo_sapiens.GRCh37.75.dna.primary_assembly.fa
-p 16 -D ./DBSNP_knownIndel/dbsnp_indel.vcf
-a DO48701

Question1. Bunt, "svaba.germline.sv.vcf" and "svaba.somatic.sv.vcf" files are empty. This means that all vcf calls were filtered out as they are low quality.. am I correct?

Question2. I downloaded "00c27940-c623-11e3-bf01-24c6515278c0.broad-snowman-10.20151218.germline.sv.vcf" file from ICGC. This file is also from the same donor ID, "DO48701". As you see the file name, it is analyzed by 'snowman'. I compared the vcf callings between this file and "svaba.unfiltered.germline.sv.vcf (541 vcf calls)". But, I couldn't see any common vcf callings. I expected my SvABA running result and ICCC StSM somatic.sv.vcf file show similar performance. Could you advise what I did wrong..?

Thank you,
Sanghoon

Metadata ID Format

Hi,

I've noticed that the paths to the bam files I'm using in my svaba runs are used as the sample IDs in the vcf files. I'm wondering if this is the intended output, or if there will be an option to specify each sample ID for vcf format purposes? Having flags such as --tumor-sample-name and --normal-sample-name would be really helpful so that we don't run into any format issues due to the backslashes in the bam file paths.

"Filed to load index for ..." error while running SvABA.

Hi Jeremiah,

I am testing SvABA but I got an error message about "loading index" and didn't get '.vcf' output files. The command line I used is like below.

svaba run -t ../54_dbGaP_RNA-seq_FusionCapture_BWA/FusionCaptureTest_ReadID_NewFormat_test_20180118/aln_fq_paired_SRR988437_1_200M.fastq.gz.bam -G ./BWA_Reference_fasta/GRCh38.primary_assembly.genome.fa -p 16 -D ./DBSNP_knownIndel/dbsnp_indel.vcf -a SRR988437_200M

The log file is attached. If you see the log file, the main error messages are; "Failed to load index for....." and the last line is

svaba: run_svaba.cpp:1788: CountPair collect_mate_reads(WalkerMap&, const MateRegionVector&, int, SeqLib::GRC&): Assertion `w.second.SetMultipleRegions(gg)' failed.
/var/spool/slurmd/job624012/slurm_script: line 20: 15817 Aborted svaba run -t /pghbio/dbmi/xwangp/sal170/55_Snowman_54729-5_AIR/2_54729-5_WGS_AIR_BRCA_BAM/aln_fq_paired_SRR476765.bam -G ./BWA_Reference_fasta/GRCh38.primary_assembly.genome.fa -p 16 -D ./DBSNP_knownIndel/dbsnp_indel.vcf -a SRR476765

As I read the SvABA manual again, it says "A BWA-MEM index reference genome must also be supplied with -G" But, as I see the example command line, "-G" needs refseq fasta file. So, I just used GRCh38.primary_assembly.genome.fa. Should I use BWA-MEM index ref genome with -G? Then, could you provide the BWA-MEM index ref geneom?

Thank you,
Sanghoon

svaba_tumorOnly_SRR476765.log

Many S characters in an alignment file

Hi,

I can see a string, as shown below, only consisting of "S" character in an alignment file.
What does this string represent?

Thanks in advance,

Chromosome name is 23 assertion error

Hi @walaj. I've encountered a problem when running svaba on non-human samples, namely that the program exits with an assertion error when it finds a breakpoint on chromosome 23:

svaba: BreakPoint.cpp:600: BreakEnd::BreakEnd(const SeqLib::BamRecord&): Assertion `chr_name != "23"' failed.

My samples are dog samples, mapped to CanFam3.1, so they have 38 chromosomes (+ X). What is the quickest way for me to get around this problem?

Thanks,
Kevin

calculate SR DR through alignment file

Hi,

This is the alignment information for a somatic SV event.

Global BP: : 1:1,038,901(-) to 1:1,039,297(+) SPAN 396 c_1_1029001_1054001_367C  n001:0 t000:6 ins_aginst_contig 0 del_against_contig 0  c_1_1029001_1054001_367C
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>................................................................................................................      C[0,1
14] G[1039184,1039297]  Local: 1        Aligned to: chr1:1039183(+) CIG: 114M112S MAPQ: 60 SUBN 0 Disc: 1:1038873(-)-1:1039346(+) Tcount: 7 Ncount: 0 Mean MAPQ: 60 Mean Mate MAPQ: 53 Valid: TRUE  -- c_1_1029001_1054001_367C
..................................................................................................................>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>...........      C[114
,215] G[1038901,1039001]        Local: 1        Aligned to: chr1:1038900(+) CIG: 114S101M11S MAPQ: 60 SUBN 0 Disc: 1:1038873(-)-1:1039346(+) Tcount: 7 Ncount: 0 Mean MAPQ: 60 Mean Mate MAPQ: 53 Valid: TRUE  -- c_1_1029001_1054001_367C
ACCCCGGCCTCCTGAATAGTTGGGACCTCAGGTGCACACCATTACACCCAGGTAACGTCCTTTTCTTTTGTAGAGACTGTGTTGCCCAGGCTGGTCTCAAATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACGAGCGGCATCTATACACAGGACTCCCCTCGGCTCTT    c_1_102
9001_1054001_367C
ACCCCGGCCTCCTGAATAGTTGGGACCTCAGGTGCACACCATTACACCCAGGTAACGTCCTTTTCTTTTGTAGAGACTGTGTTGCCCAGGCTGGTCTCAAATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCAT                                                                            t000_161_E0
0514:191:H3YHNCCXY:4:1117:4350:58778--1:1039183 r2c CIGAR: 150M, c_1_1029001_1054001_367C
ACCCCGGCCTCCTGAATAGTTGGGACCTCAGGTGCACACCATTACACCCAGGTAACGTCCTTTTCTTTTGTAGAGACTGTGTTGCCCAGGCTGGTCTCAAATGCCTGGCCTCAACACTCCTGCGGTGAGCGGTGGCTCAGGTCACTGCAT                                                                            t000_161_E0
0516:172:H3H3YCCXY:7:1112:17929:2557--1:1039183 r2c CIGAR: 150M, c_1_1029001_1054001_367C
         TCCTGAATAGTTGGGACCTCAGGTGCACACCATTACACCCAGGTAACGTCCTTTTCTTTTGTAGAGACTGTGTTGTCCAGGCTGGTCTCAAATGCCTGG      TCACACCTGCGGTGAGCGGTGGCCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCC                                          t000_83_E00
514:191:H3YHNCCXY:4:2112:27113:50779--1:1039192 r2c CIGAR: 99M28S,n001_83_E00516:197:H5MGTCCXY:7:2224:26027:37612--1:1038871 r2c CIGAR: 29S22M1D76M, c_1_1029001_1054001_367C
                               GTGCACACCATTACACCCAGGTAACGTCCTTTTCTTTTGTAGAGACTGTGTTGCCCAGGCTGGTCTCAAATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGG                                                                    t000_97_E00
514:191:H3YHNCCXY:4:2116:24383:72280--1:1039214 r2c CIGAR: 127M, c_1_1029001_1054001_367C
                               GTGCACACCATTCCACCCAAGTAACGTCCTTTTCTTTTGTAGAGACTGGGTTGCCCAGGCTGCTCTCAAATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGC                                                                                          t000_65_E00
514:191:H3YHNCCXY:5:1201:31822:35766--1:1039214 r2c CIGAR: 105M22S, c_1_1029001_1054001_367C
                                                     AACGTCCTTTTCTTTTGTAGAGACTGTGTTGCCCAGGCTGGTCTCAAATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACGAGCGGCATCTAT                       t000_163_E0
0514:191:H3YHNCCXY:5:1112:19816:46437--1:1038900 r2c CIGAR: 150M, c_1_1029001_1054001_367C
                                                                                                   AATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACGAGCGGCATCTATACACAGGACTCCCCTCGGCTCTT     t000_8
1_E00514:191:H3YHNCCXY:4:1105:28260:14125--1:1038900 r2c CIGAR: 127M, c_1_1029001_1054001_367C
                                                                                                                  CACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACG                                   n001_105_E0
0516:197:H5MGTCCXY:8:1222:3884:34834--1:1038876 r2c CIGAR: 24S101M2S, c_1_1029001_1054001_367C
                                                                                                                  CACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACG                                   t000_99_E00
514:191:H3YHNCCXY:4:1213:13027:38051--1:1038894 r2c CIGAR: 24S101M2S, c_1_1029001_1054001_367C
                                                                                                                  CACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACGAGCGGCATCTATA                      t000_99_E00
514:191:H3YHNCCXY:5:1219:25631:63472--1:1038899 r2c CIGAR: 11S101M15S, c_1_1029001_1054001_367C

I want to know how "Tcount: 7 Ncount: 0" is counted?

Meanwhile, the vcf record for this variation is:

chr1    1038901 6183:1  c       ]chr1:1039297]c 0       PASS    BX=ACGCCAGGTCCGAATT-1_1,TACTAGGGTGATTACC-1_1,CAACAACAGCATGACG-1_4;DISC_MAPQ=60;EVDNC=ASDIS;MAPQ=60;MATEID=6183:2;MATENM=1;NM=0;NUMPARTS=2;SCTG=c_1_1029001_1054001_367C;SPAN=
396;SVTYPE=BND     GT:AD:DP:GQ:PL:SR:DR:LR:LO      0/0:0:46:12.6:0,12.6,138.6:0:0:12.92:0  0/0:8:108:3.6:0,3.6,277.2:6:7:3.936:14.29
chr1    1039297 6183:2  a       a[chr1:1038901[ 0       PASS    BX=ACGCCAGGTCCGAATT-1_1,TACTAGGGTGATTACC-1_1,CAACAACAGCATGACG-1_4;DISC_MAPQ=53;EVDNC=ASDIS;MAPQ=60;MATEID=6183:1;MATENM=0;NM=1;NUMPARTS=2;SCTG=c_1_1029001_1054001_367C;SPAN=
396;SVTYPE=BND     GT:AD:DP:GQ:PL:SR:DR:LR:LO      0/0:0:46:12.6:0,12.6,138.6:0:0:12.92:0  0/0:8:108:3.6:0,3.6,277.2:6:7:3.936:14.29

How SR=6, DR=7, and AD=8 is calculated based on the alignment information?
I didn't seen 7 discordant reads in alignment file.

Thanks a lot

Reads supporting Variant call

Hello,

Please can you tell me how to get information about number of reads supporting the variant listed in the *sv.vcf output file?

I would really appreciate it. Thanks!

VCF files incomplete?

Greetings, I was recently running SvABA on data from the ICGC and was comparing the vcf files to older data. I noticed that my indel vcf files show only analysis from Chromosome 1. The SV vcf files show other chromosomes, but also feel as if most of the file is dedicated to Chrosome 1 too. I'm not sure if it's a problem with the way I ran SvABA, the data that I'm analyzing, etc. From reading past issues I think the problem mostly lies on the parameters, but I honestly don't know which ones I should change and how. . I'm including the log from my run in case it's of any help for troubleshooting for anyone that would like to help.

Many thanks in advance.
svaba_DO48701_LiverC.log

LOD (LO)

Are there any materials that explain how LO works? I want to do some filtering by LO and I'm not sure what threshold I should put my filter at.

Thanks,

Yosef

Calls Specific to High Coverage Data

Hi,

We have been running multiple SV callers on high coverage data (275x tumor / 156x normal). We have observed that SvABA seems to be making quite a few SV calls that are about 500bp in size and only have a couple supporting reads. SvABA does not make these calls on the same tumor/normal pair with lower coverage (80x tumor / 40x normal). The other SV tools (Manta and Lumpy) do not find these calls at high coverage or low coverage, and after looking at these variants in IGV they do not seem real. We are wondering if this might have something to do with the reassembly being affected by high coverage data?

Thanks,
Molly

a question about TSI?

Hi Walaj,

I wonder if you can give more explanation on TSI_L and TSI_G in the "bpx.txt".
The VCF file says:

TSI_L templated-sequence insertion (local, e.g. AB or BC of an ABC), TSI_G global (e.g. AC of ABC)

Here, could you more clarify what "AB, BC, AC, ABC" represents?
I guess that ABC means an assembled sequence is aligned onto three pieces (A-B-C), and the bp pairs (pos1 and pos2) represent AB, BC, or BC. Is this a correct interpretation?

Thank you in advance,

DP=0, AD>0 and Incorrect Genotypes?

Hi Jeremiah,

I've been seeing a few examples of somatic SVs where AD>0 and DP=0. I saw from issue #29 that AD can be greater than DP since discordant reads spanning the breakpoint are counted towards AD, but not necessarily DP. However, looking in IGV this variant seems to also have non-discordantly mapped reads lining up on the breakpoint, so I'm wondering why DP=0? I'm also curious why the genotype would be 1/1 here, and not 0/1?

Top track - tumor

Bottom track - normal

Top track - tumor
Bottom track - normal

I've also seen some other examples where the genotype doesn't seem to match what I'm seeing in IGV. For example, this variant is called as somatic with genotype of 0/0 in normal and 0/1 in tumor, but in IGV it looks like the genotype should be 1/1 in both the normal and the tumor.

Top track - tumor
Bottom track - normal

Thanks in advance!

STCommon.h:99:46: error: left operand of shift expression

Trying to compile with GCC version gcc (GCC) 6.1.0

I got

../../../src/SGA/SuffixTools/STCommon.h:99:49: warning: left shift of negative value [-Wshift-negative-value] static const uint64_t HIGH_MASK = ~0 << POS_BITS; ^~~~~~~~ ../../../src/SGA/SuffixTools/STCommon.h:99:46: error: left operand of shift expression '(-1 << 28)' is negative [-fpermissive] static const uint64_t HIGH_MASK = ~0 << POS_BITS;

I changed the line to

static const uint64_t HIGH_MASK = (~((uint64_t)(0))) << POS_BITS;

It solved the compilation probleme.

Is it possible to resume a prematurely stopped analysis?

Hi Jeremiah

I have an analysis that stopped ~80% of the way through, with a few chromosomes to go. Is there any way to continue the analysis from the point it got up to, and append the results from the last chromosomes to the existing .bps and other output files?

Best,
Kevin.

Specific types of BND

Hi, could you please provide a script for converting BND svtype to specific SV types, e.g. DEL, DUP. I don't think the one in #4 is accurate.
Thanks,

Problem with long deletion

Hi Walaj,

I tried to use SvABA on long deletion range from 1mbp to 150mbp in simulated bam file. SvABA works well when only small indel are included. However, when I start to include those long deletion into the bam SvABA failed to detect anything, not even the small indels.
I checked the contig bam file, and those contigs do include correct breaking points.
BTW, default germline settings were used for the analysis.

REPVAR is in sv.vcf, but not in the header of it

REPVAR is in sv.vcf, but not in the header of it.

Error with simulated data

Hi Walaj,

I am working with some simulated paired-end data generated using wgsim (coverage=100X, read length=70bp, mean insert size=150 and sd insert size=15). When I run svaba, I come across the following errors:

!!!! WARNING. Multiple max mapq mixed: NC_001137.3_164083_164206_2--60 max possible 60

not enough paired-end reads to get insert-size distribution. skipping discordant analysis
       tead group: NC_001136.10_331155_331313_1 Insert-sizes sampled: 1 visited 2

 [M::bwa_idx_load_from_disk] read 0 ALT contigs
         --- Loaded non-read data. Starting detection pipeline

Also, no output files are generated. Trying the following commands:

module load svaba/intel/20170210 
svaba run -p 16 -G $ref -t $bam1

Can you help me figure out what could be going on and how to fix it? Also, let me know if you require any additional information.

Thanks,
Gunjan

about svaba version

Hi,
Has the svaba software been updated? Is there any information about the software version number?
Thank you!

Tumor only analysis

Hi @walaj,

I'm trying out svaba for the first time looking to call somatic mutations on tumor only data. I was wondering if there is a recommended way to run svaba in those cases?

Thanks,
Carlos

Run on exomes?

Hi, @walaj. This looks like something that I would like to try out. We have a project with about 50/50 mix of genomes and exomes. I understand that the exome SV/indel calls will be limited, but is there any particular reason that snowman could not be applied to exomes?

Description on some columns in the bps file

can't install on Mac

Hello
I have tried on two macs running El Capitan
it fails after the "make" command

../../depcomp: line 611: /Users/christos/svaba/compile: No such file or directory
../../depcomp: line 611: exec: /Users/christos/svaba/compile: cannot execute: No such file or directory
make[2]: *** [libseqlib_a-ssw.o] Error 126
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

any suggestions?
thanks
christos

this is the whole output in case it helps
Christoss-MacBook:~ christos$ cd svaba
Christoss-MacBook:svaba christos$ ./configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
/Users/christos/svaba/missing: Unknown --is-lightweight' option Try /Users/christos/svaba/missing --help' for more information
configure: WARNING: 'missing' script is too old or missing
checking for a thread-safe mkdir -p... ./install-sh -c -d
checking for gawk... no
checking for mawk... no
checking for nawk... no
checking for awk... awk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking for style of include used by make... GNU
checking dependency style of g++... gcc3
checking for gcc... gcc-5.3
checking whether we are using the GNU C compiler... no
checking whether gcc-5.3 accepts -g... no
checking for gcc-5.3 option to accept ISO C89... unsupported
checking whether gcc-5.3 understands -c and -o together... no
checking dependency style of /Users/christos/svaba/compile gcc-5.3... none
checking for ranlib... ranlib
checking how to run the C++ preprocessor... g++ -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking zlib.h usability... yes
checking zlib.h presence... yes
checking for zlib.h... yes
checking for library containing gzopen... -lz
checking for library containing clock_gettime... no
checking google/sparse_hash_set usability... no
checking google/sparse_hash_set presence... no
checking for google/sparse_hash_set... no
checking google/sparse_hash_map usability... no
checking google/sparse_hash_map presence... no
checking for google/sparse_hash_map... no
checking unordered_map usability... yes
checking unordered_map presence... yes
checking for unordered_map... yes
checking tr1/unordered_map usability... no
checking tr1/unordered_map presence... no
checking for tr1/unordered_map... no
checking ext/hash_map usability... yes
checking ext/hash_map presence... yes
checking for ext/hash_map... yes
checking unordered_set usability... yes
checking unordered_set presence... yes
checking for unordered_set... yes
checking tr1/unordered_set usability... no
checking tr1/unordered_set presence... no
checking for tr1/unordered_set... no
checking ext/hash_set usability... yes
checking ext/hash_set presence... yes
checking for ext/hash_set... yes
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating SeqLib/src/Makefile
config.status: creating src/SGA/Util/Makefile
config.status: creating src/SGA/SQG/Makefile
config.status: creating src/SGA/Bigraph/Makefile
config.status: creating src/SGA/Algorithm/Makefile
config.status: creating src/SGA/StringGraph/Makefile
config.status: creating src/SGA/SuffixTools/Makefile
config.status: creating src/SGA/SGA/Makefile
config.status: creating src/svaba/Makefile
config.status: creating config.h
config.status: executing depfiles commands
Christoss-MacBook:svaba christos$ make
/Applications/Xcode.app/Contents/Developer/usr/bin/make all-recursive
Making all in SeqLib/fermi-lite
gcc -c -g -Wall -O2 -Wno-unused-function kthread.c -o kthread.o
gcc -c -g -Wall -O2 -Wno-unused-function misc.c -o misc.o
gcc -c -g -Wall -O2 -Wno-unused-function bseq.c -o bseq.o
gcc -c -g -Wall -O2 -Wno-unused-function htab.c -o htab.o
gcc -c -g -Wall -O2 -Wno-unused-function bfc.c -o bfc.o
gcc -c -g -Wall -O2 -Wno-unused-function rle.c -o rle.o
gcc -c -g -Wall -O2 -Wno-unused-function rope.c -o rope.o
gcc -c -g -Wall -O2 -Wno-unused-function mrope.c -o mrope.o
gcc -c -g -Wall -O2 -Wno-unused-function rld0.c -o rld0.o
gcc -c -g -Wall -O2 -Wno-unused-function unitig.c -o unitig.o
gcc -c -g -Wall -O2 -Wno-unused-function mag.c -o mag.o
gcc -c -g -Wall -O2 -Wno-unused-function bubble.c -o bubble.o
gcc -c -g -Wall -O2 -Wno-unused-function f_ksw.c -o f_ksw.o
ar -csru libfml.a kthread.o misc.o bseq.o htab.o bfc.o rle.o rope.o mrope.o rld0.o unitig.o mag.o bubble.o f_ksw.o
gcc -c -g -Wall -O2 -Wno-unused-function example.c -o example.o
gcc -g -Wall -O2 -Wno-unused-function libfml.a example.o -o fml-asm -L. -lfml -lm -lz -lpthread
Making all in SeqLib/htslib
echo '/* Empty config.h generated by Makefile /' > config.h
gcc -g -Wall -O2 -I. -c -o kfunc.o kfunc.c
gcc -g -Wall -O2 -I. -c -o knetfile.o knetfile.c
gcc -g -Wall -O2 -I. -c -o kstring.o kstring.c
gcc -g -Wall -O2 -I. -c -o bgzf.o bgzf.c
gcc -g -Wall -O2 -I. -c -o errmod.o errmod.c
gcc -g -Wall -O2 -I. -c -o faidx.o faidx.c
gcc -g -Wall -O2 -I. -c -o hfile.o hfile.c
gcc -g -Wall -O2 -I. -c -o hfile_net.o hfile_net.c
echo '#define HTS_VERSION "1.3.2-147-g293a426"' > version.h
gcc -g -Wall -O2 -I. -c -o hts.o hts.c
hts.c:51:5: warning: unused function 'ks_getc' [-Wunused-function]
KSTREAM_INIT2(, BGZF, bgzf_read, 65536)
^
./htslib/kseq.h:152:2: note: expanded from macro 'KSTREAM_INIT2'
__KS_INLINED(__read)
^
./htslib/kseq.h:68:20: note: expanded from macro '__KS_INLINED'
static inline int ks_getc(kstream_t *ks)
^
1 warning generated.
gcc -g -Wall -O2 -I. -c -o md5.o md5.c
gcc -g -Wall -O2 -I. -c -o probaln.o probaln.c
gcc -g -Wall -O2 -I. -c -o realn.o realn.c
gcc -g -Wall -O2 -I. -c -o regidx.o regidx.c
gcc -g -Wall -O2 -I. -c -o sam.o sam.c
gcc -g -Wall -O2 -I. -c -o synced_bcf_reader.o synced_bcf_reader.c
gcc -g -Wall -O2 -I. -c -o vcf_sweep.o vcf_sweep.c
gcc -g -Wall -O2 -I. -c -o tbx.o tbx.c
gcc -g -Wall -O2 -I. -c -o thread_pool.o thread_pool.c
gcc -g -Wall -O2 -I. -c -o vcf.o vcf.c
gcc -g -Wall -O2 -I. -c -o vcfutils.o vcfutils.c
gcc -g -Wall -O2 -I. -c -o cram/cram_codecs.o cram/cram_codecs.c
gcc -g -Wall -O2 -I. -c -o cram/cram_decode.o cram/cram_decode.c
gcc -g -Wall -O2 -I. -c -o cram/cram_encode.o cram/cram_encode.c
gcc -g -Wall -O2 -I. -c -o cram/cram_external.o cram/cram_external.c
gcc -g -Wall -O2 -I. -c -o cram/cram_index.o cram/cram_index.c
gcc -g -Wall -O2 -I. -c -o cram/cram_io.o cram/cram_io.c
gcc -g -Wall -O2 -I. -c -o cram/cram_samtools.o cram/cram_samtools.c
gcc -g -Wall -O2 -I. -c -o cram/cram_stats.o cram/cram_stats.c
gcc -g -Wall -O2 -I. -c -o cram/files.o cram/files.c
gcc -g -Wall -O2 -I. -c -o cram/mFILE.o cram/mFILE.c
gcc -g -Wall -O2 -I. -c -o cram/open_trace_file.o cram/open_trace_file.c
gcc -g -Wall -O2 -I. -c -o cram/pooled_alloc.o cram/pooled_alloc.c
gcc -g -Wall -O2 -I. -c -o cram/rANS_static.o cram/rANS_static.c
gcc -g -Wall -O2 -I. -c -o cram/sam_header.o cram/sam_header.c
gcc -g -Wall -O2 -I. -c -o cram/string_alloc.o cram/string_alloc.c
gcc -g -Wall -O2 -I. -c -o cram/vlen.o cram/vlen.c
gcc -g -Wall -O2 -I. -c -o cram/zfio.o cram/zfio.c
ar -rc libhts.a kfunc.o knetfile.o kstring.o bgzf.o errmod.o faidx.o hfile.o hfile_net.o hts.o md5.o probaln.o realn.o regidx.o sam.o synced_bcf_reader.o vcf_sweep.o tbx.o thread_pool.o vcf.o vcfutils.o cram/cram_codecs.o cram/cram_decode.o cram/cram_encode.o cram/cram_external.o cram/cram_index.o cram/cram_io.o cram/cram_samtools.o cram/cram_stats.o cram/files.o cram/mFILE.o cram/open_trace_file.o cram/pooled_alloc.o cram/rANS_static.o cram/sam_header.o cram/string_alloc.o cram/vlen.o cram/zfio.o
ranlib libhts.a
gcc -dynamiclib -install_name /usr/local/lib/libhts.2.dylib -current_version 1.3.255 -compatibility_version 2 -o libhts.dylib kfunc.o knetfile.o kstring.o bgzf.o errmod.o faidx.o hfile.o hfile_net.o hts.o md5.o probaln.o realn.o regidx.o sam.o synced_bcf_reader.o vcf_sweep.o tbx.o thread_pool.o vcf.o vcfutils.o cram/cram_codecs.o cram/cram_decode.o cram/cram_encode.o cram/cram_external.o cram/cram_index.o cram/cram_io.o cram/cram_samtools.o cram/cram_stats.o cram/files.o cram/mFILE.o cram/open_trace_file.o cram/pooled_alloc.o cram/rANS_static.o cram/sam_header.o cram/string_alloc.o cram/vlen.o cram/zfio.o -lz
ln -sf libhts.dylib libhts.2.dylib
gcc -g -Wall -O2 -I. -c -o bgzip.o bgzip.c
gcc -pthread -o bgzip bgzip.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o htsfile.o htsfile.c
gcc -pthread -o htsfile htsfile.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o tabix.o tabix.c
gcc -pthread -o tabix tabix.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o test/fieldarith.o test/fieldarith.c
gcc -pthread -o test/fieldarith test/fieldarith.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o test/hfile.o test/hfile.c
gcc -pthread -o test/hfile test/hfile.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o test/sam.o test/sam.c
gcc -pthread -o test/sam test/sam.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o test/test-regidx.o test/test-regidx.c
gcc -pthread -o test/test-regidx test/test-regidx.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o test/test_view.o test/test_view.c
gcc -pthread -o test/test_view test/test_view.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o test/test-vcf-api.o test/test-vcf-api.c
gcc -pthread -o test/test-vcf-api test/test-vcf-api.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
gcc -g -Wall -O2 -I. -c -o test/test-vcf-sweep.o test/test-vcf-sweep.c
gcc -pthread -o test/test-vcf-sweep test/test-vcf-sweep.o libhts.a -lz
clang: warning: argument unused during compilation: '-pthread'
Making all in SeqLib/bwa
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS utils.c -o utils.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS kthread.c -o kthread.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS ksw.c -o ksw.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwt.c -o bwt.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bntseq.c -o bntseq.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwa.c -o bwa.o
bwa.c:147:18: warning: absolute value function 'abs' given an argument of type 'long long' but has parameter of
type 'int' which may cause truncation of value [-Wabsolute-value]
w = (max_gap + abs(rlen - l_query) + 1) >> 1;
^
bwa.c:147:18: note: use function 'llabs' instead
w = (max_gap + abs(rlen - l_query) + 1) >> 1;
^~~
llabs
bwa.c:149:11: warning: absolute value function 'abs' given an argument of type 'long long' but has parameter of
type 'int' which may cause truncation of value [-Wabsolute-value]
min_w = abs(rlen - l_query) + 3;
^
bwa.c:149:11: note: use function 'llabs' instead
min_w = abs(rlen - l_query) + 3;
^~~
llabs
2 warnings generated.
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwamem.c -o bwamem.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwamem_pair.c -o bwamem_pair.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwamem_extra.c -o bwamem_extra.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS malloc_wrap.c -o malloc_wrap.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS is.c -o is.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwtindex.c -o bwtindex.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS rope.c -o rope.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS rle.c -o rle.o
ar -csru libbwa.a utils.o kthread.o ksw.o bwt.o bntseq.o bwa.o bwamem.o bwamem_pair.o bwamem_extra.o malloc_wrap.o is.o bwtindex.o rope.o rle.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwashm.c -o bwashm.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwase.c -o bwase.o
bwase.c:181:57: warning: absolute value function 'abs' given an argument of type 'long long' but has parameter
of type 'int' which may cause truncation of value [-Wabsolute-value]
ksw_global(len, seq, rlen, rseq, 5, mat, 5, 1, SW_BW > abs(rlen - len) * 1.5? SW_BW : abs(rlen ...
^
bwase.c:181:57: note: use function 'llabs' instead
ksw_global(len, seq, rlen, rseq, 5, mat, 5, 1, SW_BW > abs(rlen - len) * 1.5? SW_BW : abs(rlen ...
^~~
llabs
bwase.c:181:88: warning: absolute value function 'abs' given an argument of type 'long long' but has parameter
of type 'int' which may cause truncation of value [-Wabsolute-value]
...seq, rlen, rseq, 5, mat, 5, 1, SW_BW > abs(rlen - len) * 1.5? SW_BW : abs(rlen - len) * 1.5, n_cigar, &...
^
bwase.c:181:88: note: use function 'llabs' instead
...seq, rlen, rseq, 5, mat, 5, 1, SW_BW > abs(rlen - len) * 1.5? SW_BW : abs(rlen - len) * 1.5, n_cigar, &...
^~~
llabs
2 warnings generated.
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwaseqio.c -o bwaseqio.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwtgap.c -o bwtgap.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwtaln.c -o bwtaln.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bamlite.c -o bamlite.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwape.c -o bwape.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS kopen.c -o kopen.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS pemerge.c -o pemerge.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS maxk.c -o maxk.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwtsw2_core.c -o bwtsw2_core.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwtsw2_main.c -o bwtsw2_main.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwtsw2_aux.c -o bwtsw2_aux.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwt_lite.c -o bwt_lite.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwtsw2_chain.c -o bwtsw2_chain.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS fastmap.c -o fastmap.o
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwtsw2_pair.c -o bwtsw2_pair.o
bwtsw2_pair.c:239:13: warning: using floating point absolute value function 'fabs' when argument is of integer
type [-Wabsolute-value]
diff = fabs(G[0] - G[1]) / (opt->a + opt->b) / ((hits[i]->hits[...
^
bwtsw2_pair.c:239:13: note: use function 'abs' instead
diff = fabs(G[0] - G[1]) / (opt->a + opt->b) / ((hits[i]->hits[...
^~~~
abs
1 warning generated.
gcc -c -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS main.c -o main.o
gcc -g -Wall -Wno-unused-function -O2 -DHAVE_PTHREAD -DUSE_MALLOC_WRAPPERS bwashm.o bwase.o bwaseqio.o bwtgap.o bwtaln.o bamlite.o bwape.o kopen.o pemerge.o maxk.o bwtsw2_core.o bwtsw2_main.o bwtsw2_aux.o bwt_lite.o bwtsw2_chain.o fastmap.o bwtsw2_pair.o main.o -o bwa -L. -lbwa -lm -lz -lpthread
Making all in SeqLib/src
g++ -DHAVE_CONFIG_H -I. -I../.. -I../ -I../htslib -Wno-sign-compare -g -Wall -Wextra -Wno-unknown-pragmas -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -g -O2 -MT libseqlib_a-FastqReader.o -MD -MP -MF .deps/libseqlib_a-FastqReader.Tpo -c -o libseqlib_a-FastqReader.o test -f 'FastqReader.cpp' || echo './'FastqReader.cpp
mv -f .deps/libseqlib_a-FastqReader.Tpo .deps/libseqlib_a-FastqReader.Po
g++ -DHAVE_CONFIG_H -I. -I../.. -I../ -I../htslib -Wno-sign-compare -g -Wall -Wextra -Wno-unknown-pragmas -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -g -O2 -MT libseqlib_a-BFC.o -MD -MP -MF .deps/libseqlib_a-BFC.Tpo -c -o libseqlib_a-BFC.o test -f 'BFC.cpp' || echo './'BFC.cpp
mv -f .deps/libseqlib_a-BFC.Tpo .deps/libseqlib_a-BFC.Po
g++ -DHAVE_CONFIG_H -I. -I../.. -I../ -I../htslib -Wno-sign-compare -g -Wall -Wextra -Wno-unknown-pragmas -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -g -O2 -MT libseqlib_a-ReadFilter.o -MD -MP -MF .deps/libseqlib_a-ReadFilter.Tpo -c -o libseqlib_a-ReadFilter.o test -f 'ReadFilter.cpp' || echo './'ReadFilter.cpp
mv -f .deps/libseqlib_a-ReadFilter.Tpo .deps/libseqlib_a-ReadFilter.Po
g++ -DHAVE_CONFIG_H -I. -I../.. -I../ -I../htslib -Wno-sign-compare -g -Wall -Wextra -Wno-unknown-pragmas -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -g -O2 -MT libseqlib_a-SeqPlot.o -MD -MP -MF .deps/libseqlib_a-SeqPlot.Tpo -c -o libseqlib_a-SeqPlot.o test -f 'SeqPlot.cpp' || echo './'SeqPlot.cpp
mv -f .deps/libseqlib_a-SeqPlot.Tpo .deps/libseqlib_a-SeqPlot.Po
g++ -DHAVE_CONFIG_H -I. -I../.. -I../ -I../htslib -Wno-sign-compare -g -Wall -Wextra -Wno-unknown-pragmas -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -g -O2 -MT libseqlib_a-jsoncpp.o -MD -MP -MF .deps/libseqlib_a-jsoncpp.Tpo -c -o libseqlib_a-jsoncpp.o test -f 'jsoncpp.cpp' || echo './'jsoncpp.cpp
mv -f .deps/libseqlib_a-jsoncpp.Tpo .deps/libseqlib_a-jsoncpp.Po
g++ -DHAVE_CONFIG_H -I. -I../.. -I../ -I../htslib -Wno-sign-compare -g -Wall -Wextra -Wno-unknown-pragmas -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -g -O2 -MT libseqlib_a-ssw_cpp.o -MD -MP -MF .deps/libseqlib_a-ssw_cpp.Tpo -c -o libseqlib_a-ssw_cpp.o test -f 'ssw_cpp.cpp' || echo './'ssw_cpp.cpp
ssw_cpp.cpp:449:19: warning: unused parameter 'score_matrix_size' [-Wunused-parameter]
const int& score_matrix_size,
^
1 warning generated.
mv -f .deps/libseqlib_a-ssw_cpp.Tpo .deps/libseqlib_a-ssw_cpp.Po
source='ssw.c' object='libseqlib_a-ssw.o' libtool=no
DEPDIR=.deps depmode=none /bin/sh ../../depcomp
/Users/christos/svaba/compile gcc-5.3 -DHAVE_CONFIG_H -I. -I../.. -I../ -I../htslib -Wno-sign-compare -c -o libseqlib_a-ssw.o test -f 'ssw.c' || echo './'ssw.c
../../depcomp: line 611: /Users/christos/svaba/compile: No such file or directory
../../depcomp: line 611: exec: /Users/christos/svaba/compile: cannot execute: No such file or directory
make[2]: *** [libseqlib_a-ssw.o] Error 126
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

LOH detection

Hi @walaj ,

Can I get the LOH result from svaba? Thanks.

Somatic rearrangement scores: `LO` or `SL`

Hi Jeremiah,

I have a question about the scores for somatic rearrangements. I was checking the detected variants in "*.svaba.somatic.sv.vcf" and was wondering whether "LO" in the FORMAT column is the correct one to look at.

Or should I use the scaled score "SL" for somatic rearrangement scores? However, I found "SL" is not included in the resulting vcf after running svaba run. Should I run svaba refilter to get the SL scores? Or would it be fine to compute (min(MAPQ, MATEMAPQ) - 2 * NM) / 60 * LO?

Question about tumor bam file for germline SV calling

Dear author
I recently want to study the difference of whole level of SV structures for normal and tumor tissues. I know that somatic calling could serve the purpose for calling somatic SV structures. However, I wonder that if I can use 'SvABA run -a germline' on tumor and normal tissues separately, and extract the information from separate germline_run.svaba.sv.vcf to represent the SV situations for normal and tumor tissues?

There is a somatic SV, but the alignment file is empty

Hi,

I was able to detect a single somatic SV in a sample (according to svaba.sv.vcf), but the alignment file (alignments.txt and therefore contigs.fa) is empty.
Can you explain when those files can be empty?

Thank you in advance,

Multi-threading not working

Hello,
I've installed svABA with gcc 4.9.2 on centos 6.2
If I run without -p option, I got:

-----------------------------------------------------------
--- Running svaba SV and indel detection on 1 threads ---
---    (inspect *.log for real-time progress updates)   ---
-----------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
--- Loaded non-read data. Starting detection pipeline
 Caught exception for lregion with reg 1:1-25,001(*)
Caught exception in BreakPoint:setRefAlt for indel ref: RefGenome::queryRegion - Returning empty query on 1:10396-10402
Caught exception in BreakPoint:setRefAlt for indel ref: RefGenome::queryRegion - Returning empty query on 1:10402-10439
Caught exception in BreakPoint:setRefAlt for indel ref: RefGenome::queryRegion - Returning empty query on 1:10408-10439
...

but with -p 8 option, program fails :

-----------------------------------------------------------
--- Running svaba SV and indel detection on 4 threads ---
---    (inspect *.log for real-time progress updates)   ---
-----------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
--- Loaded non-read data. Starting detection pipeline
 Caught exception for lregion with reg 1:49,001-74,001(*)
Floating point exception

Do you have any idea of the problem's origin ?
I've attached log files of configure and make commands if it can help.

Thanks in advance for any help !

Anne-Sophie

svaba_config.log
svaba_install.log

Deterministic Output?

Hi,

I'm wondering if svaba is expected to have deterministic output, or if it's possible to get deterministic output? I've run svaba with the same settings on the same tumor-normal pairing and am noticing a few differences between runs. For example, I found a structural variant that's listed in the PASS somatic calls in one run, but not in the PASS somatic calls in the second run. However, this SV is found in the germline PASS calls in the second run even though it shows a somatic genotype.

Another SV was missed due to a low alignment score in one run, but not the other.

Thanks,
Molly

output directory

Hi Jeremiah,

would it be possible to add an option to specify where the outputs should be written instead of using the current directory ?
i don't think this maybe such a complicated task to do (maybe a bit time consuming I agree) but it would very help to make the software included in more broader pipeline.

Thank you

SVTYPE classification

Hi,

I have been testing out your new program which I was hoping would filling a niche in our tumor sv pipeline. After running svaba on a number of in-house benchmarking datasets, I've notice that although in your bioRxiv paper you've classified SVs as DUP, DEL etc in the SV vcf outputs all SVTYPE are all classified as BND. Are there any plans to further classify the outputted SVs by SVTYPE and/or do you already have a mechanism to do so?

Thanks in advance.

Best,

Installation error

I am trying to compile your tool using GCC 4.8.5 and following error is reported:

make[2]: Entering directory svaba/SeqLib/htslib' gcc -g -Wall -O2 -I. -c -o thread_pool.o thread_pool.c thread_pool.c: In function ‘hts_tpool_init’: thread_pool.c:658:5: warning: implicit declaration of function ‘pthread_mutexattr_settype’ [-Wimplicit-function-declaration] pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE); ^ thread_pool.c:658:38: error: ‘PTHREAD_MUTEX_RECURSIVE’ undeclared (first use in this function) pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE); ^ thread_pool.c:658:38: note: each undeclared identifier is reported only once for each function it appears in make[2]: *** [thread_pool.o] Error 1 make[2]: Leaving directory svaba/SeqLib/htslib'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `svaba'
make: *** [all] Error 2

It seems the function has not been defined. Can you please suggest a way out of it.
Thanks

MAPQ zero in unfiltered file

Are these entries in the unfiltered vcf file with QUAL=LOWMAPQ and one mate with MAPQ=0 means that mate has multiple mapping?

walaj / svaba Goto Github PK

svaba's Introduction

SvABA - Structural variation and indel analysis by assembly

Table of contents

Installation

Description

Output file description

*.bps.txt.gz

*.contigs.bam

*.discordants.txt.gz

*.log

*.alignments.txt.gz

*.vcf

Filtering and Refiltering

Examples and recipes

Whole genome somatic SV and indel detection

Whole genome germline SV and indel detection

Targeted detection

Targeted local assembly

Assemble all reads

Snapshot of where svaba run is currently operating

Debug a local assembly and produce the assembly graph

View all of the ASCII alignments

View a particular contig with read to contig alignments

Make a function to sort and index contigs

Attributions

svaba's People

Contributors

Stargazers

Watchers

Forkers

svaba's Issues

svaba/bin/svaba run -t /mnt/scratch/users/40057929/MCLHighNov2016/Batch1/04-0309-01-36351377/04-0309-01_S2.bam/ n- /mnt/scratch/users/40057929/MCL*/MCL*/12-2164*/12-2164-01_MRDneg_S1.bam/ -G /mnt/scratch/users/40057929/wg.fa/ -a svaba_Batch1_0309 -p 4

--- Running svaba SV and indel detection on 4 threads --- --- (inspect *.log for real-time progress updates) ---

Recommend Projects

Recommend Topics

Recommend Org

`*.bps.txt.gz`

`*.contigs.bam`

`*.discordants.txt.gz`

`*.log`

`*.alignments.txt.gz`

`*.vcf`

svaba/bin/svaba run -t /mnt/scratch/users/40057929/MCLHighNov2016/Batch1/04-0309-01-36351377/04-0309-01_S2.bam/ n- /mnt/scratch/users/40057929/MCL/MCL/12-2164*/12-2164-01_MRDneg_S1.bam/ -G /mnt/scratch/users/40057929/wg.fa/ -a svaba_Batch1_0309 -p 4

--- Running svaba SV and indel detection on 4 threads ---
--- (inspect *.log for real-time progress updates) ---