adamewing / bamsurgeon Goto Github PK

View Code? Open in Web Editor NEW

228.0 228.0 86.0 235.16 MB

tools for adding mutations to existing .bam files, used for testing mutation callers

License: MIT License

Python 85.84% Shell 13.69% Dockerfile 0.47%

bamsurgeon's People

Contributors

Stargazers

Watchers

Forkers

huyvuong jeff-k chaelir hammer lbergelson rtvt123 csb5 gillet andreas-wilm jpfeil simexin bioinformaticsarchive elephanthunter nordhuang clalansingh mischalundberg xtmgah gpcr zhmz90 shadriellee mjko1210 bioinfotools snashraf zamaudio litaifang australian-translational-genomics duydls msahraeian rebeccaforrester haoziyeung dancooke cdarby wenmm healthvivo songtr yf0205 zpeng1989 shahirb googlingthecancergenome stefankjartansson forrester-bex sunbymoon janyaaiq trinityctat cgpu da-i stephanholgerd aferriz sebastianhollizeck zzygyx9119 linhxxx boegel leornardzhou jopineda darichter87 debarka gerde n-damo ktan8 mnmdiagnostics wckdouglas yussab guruprasada cutebeaeast nickier0510 jelisavetailic yemilawal limingxiang spkatiyar franziskasundermann ivopieniak mesky1238 xiahaohao genostack drtconway xinghuq jts liuyanbioinfo williambrandler ramonamerong jereiard tim15-tech kclem trangng-th jdidion

bamsurgeon's Issues

'pysam.calignmentfile.AlignedSegment' object has no attribute 'next_reference_name'

Dear users,
I'm trying to perform Bamsurgeon's tests for snvs but I get the following error message.

$ ./test_snv.sh 4 5 /data/references/human/genome/GRCh37_hg19/hg19_wochrs.fasta
addsnv.py -v ../test_data/random_snvs.txt -f ../test_data/testregion_realign.bam -r /data/references/human/genome/GRCh37_hg19/hg19_wochrs.fasta -o ../test_data/testregion_mut.bam -n 4 -p 5 --ignoresnps --ignorepileup --seed 1234

INFO 2016-11-07 05:34:06.947648 starting /usr/local/bin/addsnv.py called with args: /usr/local/bin/addsnv.py -v ../test_data/random_snvs.txt -f ../test_data/testregion_realign.bam -r /data/references/human/genome/GRCh37_hg19/hg19_wochrs.fasta -o ../test_data/testregion_mut.bam -n 4 -p 5 --ignoresnps --ignorepileup --seed 1234

Debug, haploclusters:[[{'start': 33714964, 'end': 33714965, 'chrom': '22', 'altbase': None, 'vaf': None}], [{'start': 33769483, 'end': 33769484, 'chrom': '22', 'altbase': None, 'vaf': None}], [{'start': 33908770, 'end': 33908771, 'chrom': '22', 'altbase': None, 'vaf': None}], [{'start': 34166720, 'end': 34166721, 'chrom': '22', 'altbase': None, 'vaf': None}]]
INFO 2016-11-07 05:34:07.043328 haplo_22_33714965_33714965 creating tmp bam: addsnv.tmp/haplo_22_33714965_33714965.tmpbam.b7077202-4470-4165-be96-01367e35cd4a.bam
INFO 2016-11-07 05:34:07.043409 haplo_22_33769483_33769483 creating tmp bam: addsnv.tmp/haplo_22_33769483_33769483.tmpbam.2466f7b7-9fe6-45b1-af2c-cce8598c9a13.bam
INFO 2016-11-07 05:34:07.043451 haplo_22_33908770_33908770 creating tmp bam: addsnv.tmp/haplo_22_33908770_33908770.tmpbam.de40905a-9b29-4c56-8da1-d999c6f6a133.bam
INFO 2016-11-07 05:34:07.043542 haplo_22_34166721_34166721 creating tmp bam: addsnv.tmp/haplo_22_34166721_34166721.tmpbam.76575af3-18ec-4099-878c-3b8eb073420a.bam

ERROR 2016-11-07 05:34:07.345068 encountered error in mutation spikein: ['22_33769483_33769484_None_None']
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/bamsurgeon-1.0-py2.7.egg/EGG-INFO/scripts/addsnv.py", line 156, in makemut
File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 186, in mutate
mate = find_mate(pread.alignment, bammate)
File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 121, in find_mate
chrom = read.next_reference_name
AttributeError: 'pysam.calignmentfile.AlignedSegment' object has no attribute 'next_reference_name'

ERROR 2016-11-07 05:34:07.441006 encountered error in mutation spikein: ['22_34166720_34166721_None_None']
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/bamsurgeon-1.0-py2.7.egg/EGG-INFO/scripts/addsnv.py", line 156, in makemut
File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 186, in mutate
mate = find_mate(pread.alignment, bammate)
File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 121, in find_mate
chrom = read.next_reference_name
AttributeError: 'pysam.calignmentfile.AlignedSegment' object has no attribute 'next_reference_name'

ERROR 2016-11-07 05:34:07.541356 encountered error in mutation spikein: ['22_33908770_33908771_None_None']
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/bamsurgeon-1.0-py2.7.egg/EGG-INFO/scripts/addsnv.py", line 156, in makemut
File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 186, in mutate
mate = find_mate(pread.alignment, bammate)
File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 121, in find_mate
chrom = read.next_reference_name
AttributeError: 'pysam.calignmentfile.AlignedSegment' object has no attribute 'next_reference_name'

ERROR 2016-11-07 05:34:07.624716 encountered error in mutation spikein: ['22_33714964_33714965_None_None']
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/bamsurgeon-1.0-py2.7.egg/EGG-INFO/scripts/addsnv.py", line 156, in makemut
File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 186, in mutate
mate = find_mate(pread.alignment, bammate)
File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 121, in find_mate
chrom = read.next_reference_name
AttributeError: 'pysam.calignmentfile.AlignedSegment' object has no attribute 'next_reference_name'

INFO 2016-11-07 05:34:07.654180 no succesful mutations
Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
[E::hts_open_format] fail to open file '../test_data/testregion_mut.bam'
[mpileup] failed to open ../test_data/testregion_mut.bam: No such file or directory
Failed to open -: unknown file type

Any help would be much appreciated.

Best regards,

Sheila

Suggestions for coverdiff?

Hi Adam,

I am not interested in there being a difference between input and output coverage. Is 0.9 (default) a good value in terms of the threshold for coverdiff, in that case?

Thanks,
Alva

Can this be put in pip?

Adam - can this be put in pip so pip2 install bamsurgeon works?

countBaseAtPos() fails for mpileup entries with zero coverage

In cases where mpileup returns an entry for a position with zero coverage/no read bases field the index into 'c' on line 98 of addsnv.py will fail. It seems safe to break or return on len(c) < 5 in the line above. This will trigger the "Could not pileup for region" case in the calling function.

mate not found while using addsnv.py

Hi Adam,
i'm trying to spike in SNVs mutations in the NA12878 bam file derived from 1000Genomes, in particular the exome realigned to the GRCh38DH human assembly and the illumina platinum pedigree genome.
This is the input:
python /path/to/bamsurgeon/bamsurgeon-master/bin/addsnv.py -v regions.bed -f NA12878.alt_bwamem_GRCh38DH.20150826.CEU.exome.srt.bam -r GRCh38_full_analysis_set_plus_decoy_hla.fa -o output.bam -s 0.1 -m 0.5 --aligner mem --alignopts t:1,B:4,O:6,E:1,M: --picardjar /opt/tools/picard-tools-1.140/picard.jar --tagreads
And this is what happens:
INFO 2016-10-19 10:07:00.965334 haplo_chr1_2604262_2604262 creating tmp bam: addsnv.tmp/haplo_chr1_2604262_2604262.tmpbam .2ce674c7-436d-442d-b2d2-b943d072335c.bamINFO 2016-10-19 10:07:00.965334 haplo_chr1_3756813_3756813 creating tmp bam:
addsnv.tmp/haplo_chr1_3756813_3756813.tmpbam.794a4f15-2f2f-4314-bd26-9a334193e4ed.bam
INFO 2016-10-19 10:07:00.965757 haplo_chr1_6254008_6254008 creating tmp bam: addsnv.tmp/haplo_chr1_6254008_6254008.tmpbam .35974f03-463a-4bc0-8be3-b0a5c4ebc0b1.bam
INFO 2016-10-19 10:07:00.965882 haplo_chr1_8015351_8015351 creating tmp bam: addsnv.tmp/haplo_chr1_8015351_8015351.tmpbam .75982e73-85fa-4517-bc34-e2facc9e5ea5.bam
INFO 2016-10-19 10:07:00.965947 haplo_chr1_3844974_3844974 creating tmp bam: addsnv.tmp/haplo_chr1_3844974_3844974.tmpbam .07f20227-6940-4d61-8df2-8de9b952b67c.bam
INFO 2016-10-19 10:07:00.968445 haplo_chr1_1668397_1668397 creating tmp bam: addsnv.tmp/haplo_chr1_1668397_1668397.tmpbam .0f0c1948-f969-4816-bf2d-a0d551625e4b.bam
INFO 2016-10-19 10:07:00.974558 haplo_chr1_8422758_8422758 creating tmp bam: addsnv.tmp/haplo_chr1_8422758_8422758.tmpbam .e0133028-2f07-4e7b-81af-58c24bda375f.bam
WARN 2016-10-19 10:07:01.605163 haplo_chr1_1668397_1668398 dropped for proximity to SNP, nearby SNP MAF: 0.25625 (max snv frac: 0.1)
WARN 2016-10-19 10:07:01.860107 chr1_6253717_6254147_None_None warning: no mate for SRR098401.8225801
WARN 2016-10-19 10:07:01.864627 chr1_3844906_3844983_None_None warning: no mate for SRR1517848.5321
WARN 2016-10-19 10:07:01.865476 chr1_3844906_3844983_None_None warning: no mate for SRR098401.87905115
WARN 2016-10-19 10:07:01.870470 chr1_3844906_3844983_None_None warning: no mate for SRR1518151.5881
WARN 2016-10-19 10:07:01.871256 chr1_3844906_3844983_None_None warning: no mate for SRR1517848.5326
WARN 2016-10-19 10:07:01.872458 chr1_3844906_3844983_None_None warning: no mate for SRR098401.17828958
WARN 2016-10-19 10:07:01.873968 chr1_3844906_3844983_None_None warning: no mate for SRR1517974.5815
WARN 2016-10-19 10:07:01.876574 chr1_3844906_3844983_None_None warning: no mate for SRR098401.730667
WARN 2016-10-19 10:07:01.877116 chr1_1668326_1668472_None_None warning: no mate for SRR1517991.2930
WARN 2016-10-19 10:07:01.878038 chr1_1668326_1668472_None_None warning: no mate for SRR098401.21076287
WARN 2016-10-19 10:07:01.878261 chr1_3844906_3844983_None_None warning: no mate for SRR098401.18771219
WARN 2016-10-19 10:07:01.878517 chr1_6253717_6254147_None_None warning: no mate for SRR098401.29619889
WARN 2016-10-19 10:07:01.878564 chr1_3844906_3844983_None_None warning: no mate for SRR098401.26323221
WARN 2016-10-19 10:07:01.880482 chr1_3844906_3844983_None_None warning: no mate for SRR098401.67518704
WARN 2016-10-19 10:07:01.881927 chr1_1668326_1668472_None_None warning: no mate for SRR098401.84773996
WARN 2016-10-19 10:07:01.882513 chr1_1668326_1668472_None_None warning: no mate for SRR1517968.3393
WARN 2016-10-19 10:07:01.883274 chr1_1668326_1668472_None_None warning: no mate for SRR098401.45180288
WARN 2016-10-19 10:07:01.884434 chr1_1668326_1668472_None_None warning: no mate for SRR098401.78156498
WARN 2016-10-19 10:07:01.884514 chr1_3844906_3844983_None_None warning: no mate for SRR098401.102934682
WARN 2016-10-19 10:07:01.884730 chr1_1668326_1668472_None_None warning: no mate for SRR098401.44119978
WARN 2016-10-19 10:07:01.886218 chr1_3844906_3844983_None_None warning: no mate for SRR1517974.5814
WARN 2016-10-19 10:07:01.886530 chr1_3844906_3844983_None_None warning: no mate for SRR098401.27337592
WARN 2016-10-19 10:07:01.886733 chr1_1668326_1668472_None_None warning: no mate for SRR098401.88631178
WARN 2016-10-19 10:07:01.886832 chr1_3844906_3844983_None_None warning: no mate for SRR1517938.5944
It gaves me this warning for every chromosome and when the process ends the output bam doesn't have any mutation.
I've got the latest version of pysam and samtools installed except for picard-tools.
I was wondering if the problem is that BamSurgeon doesn't support GRCh38? or if the problem may be picard?

addsnv doesn't add SNVs to some locations even though very high read depth and no overlapping reads

Hi adam,
I used the following command:

addsnv.py -p 10 --ignoresnps --force --maxdepth 10000 -v locations.bed -f sortedbam -r reference -o output > output.log 2>&1 &
As can be noticed from the options, I am taking care of lot of things that could possibly stop bamSurgeon from adding a mutation. The only thing missing is the 5th column in spike-in file with a specific base.

But anyway, I notice many locations where mutations weren't made and from the logs I was surprised about a particular warning:

``
WARN 2016-05-17 11:21:15.388444 haplo_1_115256537_115256538 could not pileup for region: 1:115256399

WARN 2016-05-17 11:21:15.388939 haplo_1_115256442_115256443 could not pileup for region: 1:115256399

WARN 2016-05-17 11:21:15.480708 haplo_1_115256537_115256538 could not pileup for region: 1:115256410

INFO 2016-05-17 11:21:15.482319 haplo_1_115256442_115256443 mpileup failed, no coverage for base: 1:115256410
``
I am already taking care of min depth by making sure the spike-in sites in my bed file all have >1000 read depth. Why would it try to add mutations at places other than what I specify? Why is it worried about 1:115256399 and 1:115256410 when that is not what I ask for? For both the locations that I get warning for above, no mutations are added in modified Bam ( I expected at least one). Also, no overlapping reads are causing issues (no close locations other than these 2)

Thank you for answering.

bwa command line parameter bug in addsv.py

The command line parameters are incorrect for bwa, at least for the newest version of bwa. Here is a diff with a working version:

diff --git a/addsv.py b/addsv.py
index cafb2ff..65c30ee 100755
--- a/addsv.py
+++ b/addsv.py
@@ -23,8 +23,8 @@ def remap(fq1, fq2, threads, bwaref, outbam, deltmp=True):
tmpbam = basefn + ".bam"
tmpsrt = basefn + ".sort"

-    sai1args = ['bwa', 'aln', bwaref, '-q', '5', '-l', '32', '-k', '2', '-t', str(threads), '-o', '1', '-f', sai1fn, fq1]
-    sai2args = ['bwa', 'aln', bwaref, '-q', '5', '-l', '32', '-k', '2', '-t', str(threads), '-o', '1', '-f', sai2fn, fq2]
+    sai1args = ['bwa', 'aln', '-q', '5', '-l', '32', '-k', '2', '-t', str(threads), '-o', '1', '-f', sai1fn, bwaref, fq1]
+    sai2args = ['bwa', 'aln', '-q', '5', '-l', '32', '-k', '2', '-t', str(threads), '-o', '1', '-f', sai2fn, bwaref, fq2]
     samargs  = ['bwa', 'sampe', '-P', '-f', samfn, bwaref, sai1fn, sai2fn, fq1, fq2]
     bamargs  = ['samtools', 'view', '-bt', refidx, '-o', tmpbam, samfn]
     sortargs = ['samtools', 'sort', tmpbam, tmpsrt]

Translocations using randomsites.py

Hi Adam,

Do you have any progress with implementing choosing random translocations using the randomsites.py script?

Thanks
Shadrielle

Error when trying to spike in snvs

I am getting the following error

ERROR   2015-08-20 12:26:54.730689  encountered error in mutation spikein: ['II_36327_36327_None_None']
Traceback (most recent call last):
  File "bamsurgeon/addsnv.py", line 155, in makemut
    mutfail, hasSNP, maxfrac, outreads, mutreads, mutmates = mutation.mutate(args, log, bamfile, bammate, chrom, min(mutpos_list), max(mutpos_list)+1, mutpos_list, avoid=avoid, mutid_list=mutid_list, is_snv=True, mutbase_list=mutbase_list, reffile=reffile)
  File "/Users/dancook/Documents/git/variant-caller-analysis/bamsurgeon/bs/mutation.py", line 212, in mutate
    return False, hasSNP, maxfrac, outreads, mutreads, mutmates # todo: convert to class
UnboundLocalError: local variable 'maxfrac' referenced before assignment

Any idea what the cause might be?

not sure how to troubleshoot error

Hi there. I'm getting the following error but I'm not sure why since I'm passing --ignorepileup. Any advice?

INFO    2016-07-26 22:14:12.345381  ol_1_5_9_0.1:DEL    creating tmp bam:  addindel.tmp/ol_1_5_9_0.1:DEL.tmpbam.f52f1d5b-d1c9-4021-8a18-13eb14059958.bam
************************************************************
encountered error in mutation spikein: ol_1_5_9_0.1:DEL
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/bamsurgeon-1.0-py2.7.egg/EGG-INFO/scripts/addindel.py", line 116, in makemut
  File "build/bdist.macosx-10.10-x86_64/egg/bamsurgeon/mutation.py", line 225, in mutate
    assert maxfrac is not None, "Error: could not pile up over region: %s" % region
AssertionError: Error: could not pile up over region: haplo_ol_1_5_10

Default VAF vs. specifying VAF in BED file

Hi Adam,

I know I can specify the VAF in the input BED file, but I also know there is a -m option and am not sure how to deal with that. If I want to specify individual VAFs for each site in the BED file, do I omit the -m option?

Thanks,
Alva

OSError from call to subprocess

Hi there,

I have seen other people having similar issues to this, but haven't been able to resolve mine after reading those threads. I am encountering this error with both my own files and test_addindel. I am running:

python addindel.py -v ../test_data/test_indels.txt -f ../test_data/testregion_realign.bam -r ../test_data/Homo_sapiens_chr22_assembly19.fasta -o ../test_data/testregion_mut.bam -c ../test_data/test_cnvlist.txt.gz --seed 1234

And am getting this error for all attempted spikeins:

encountered error in mutation spikein: 22_33958087_33958090_0.4:DEL
Traceback (most recent call last):
File "addindel.py", line 119, in makemut
mutfail, hasSNP, maxfrac, outreads, mutreads, mutmates = mutation.mutate(args, log, bamfile, bammate, chrom, mutpos, mutpos+del_ln+1, mutpos_list, avoid=avoid, mutid_list=[mutid], is_insertion=is_insertion, is_deletion=is_deletion, ins_seq=ins, reffile=reffile, indel_start=start, indel_end=end)
File "/home/junzli_lab/lilseo/strfinder/BAMSurgeon/bamsurgeon/mutation.py", line 208, in mutate
basepile = countBaseAtPos(args.bamFileName,chrom,pcol.pos,mutid=region)
File "/home/junzli_lab/lilseo/strfinder/BAMSurgeon/bamsurgeon/mutation.py", line 14, in countBaseAtPos
p = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
File "/usr/local/mbni/garage/junzli_lab/Python-2.7.3/lib/python2.7/subprocess.py", line 679, in init
errread, errwrite)
File "/usr/local/mbni/garage/junzli_lab/Python-2.7.3/lib/python2.7/subprocess.py", line 1249, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

You said in another thread that you shouldn't have to use shell=True for subprocess. How else can I resolve this OSError?

To temporarily patch over this, I added shell=True to all subprocess calls. Upon running with this change, I saw that there were some reads being mutated but zero being written. For example:

INFO 2016-11-10 09:54:12.576521 22_33694303_33694304_0.5:INS:ATGGC copy number in snp region: 22 33694303 33694304 = 3.0
INFO 2016-11-10 09:54:12.576582 22_33694303_33694304_0.5:INS:ATGGC adjusted VAF: 0.333333333333
INFO 2016-11-10 09:54:12.576752 22_33694303_33694304_0.5:INS:ATGGC picked: 18 reads
INFO 2016-11-10 09:54:12.577711 22_33694303_33694304_0.5:INS:ATGGC wrote: 0 reads, mutated: 18 read

To circumvent this, I ran it with --force, and proceeded to get another OSError, this time raised by os.rename as follows:

encountered error in mutation spikein: 22_33694303_33694304_0.5:INS:ATGGC
Traceback (most recent call last):
File "addindel.py", line 238, in makemut
aligners.remap_bam(args.aligner, tmpoutbamname, args.refFasta, alignopts, mutid=mutid, paired=(not args.single), picardjar=args.picardjar)
File "/home/junzli_lab/lilseo/strfinder/BAMSurgeon/bamsurgeon/aligners.py", line 59, in remap_bam
remap_backtrack_bam(bamfn, threads, fastaref, mutid=mutid, paired=paired)
File "/home/junzli_lab/lilseo/strfinder/BAMSurgeon/bamsurgeon/aligners.py", line 125, in remap_backtrack_bam
os.rename(sortfn, bamfn)
OSError: [Errno 2] No such file or directory

I'm not sure if the latter 2 issues are related to the subprocess/shell issue, which is why I thought I'd also ask about them. Let me know if anything is unclear. Thank you in advance for your help!

Error when trying to mark affected reads

Here is another error I am getting when trying to mark affected reads (--tagreads) using the addsnv.py script

Traceback (most recent call last):
  File "bamsurgeon/addsnv.py", line 467, in <module>
    run()
  File "bamsurgeon/addsnv.py", line 464, in run
    main(args)
  File "bamsurgeon/addsnv.py", line 421, in main
    markreads(mergedtmp, tmp_tag_bam)
NameError: global name 'mergedtmp' is not defined

error in mutation spikein while running test_snv.py

I am running the following:
./test_snv_bwamem.sh 1 1 ../test_data/Homo_sapiens_chr22_assembly19.fasta ~/BioSoftware/picard-tools-1.134/picard.jar

It seems that some of the samtools steps are running correctly and some are not. I am running pysam 0.9.2 and have verified all dependencies are installed and on my path.

The result is:
INFO 2016-10-17 15:44:04.542162 starting /usr/local/bin/addsnv.py called with args: /usr/local/bin/addsnv.py -v ../test_data/random_snvs.txt -f ../test_data/testregion_realign.bam -r ../test_data/Homo_sapiens_chr22_assembly19.fasta -o ../test_data/testregion_mut.bam -n 1 -c ../test_data/test_cnvlist.txt.gz -p 1 --picardjar /Users/rebeccabatorsky/BioSoftware/picard-tools-1.134/picard.jar --aligner mem --seed 1234

INFO 2016-10-17 15:44:04.543324 created tmp directory: addsnv.tmp
INFO 2016-10-17 15:44:04.543457 created directory: addsnv_logs_testregion_mut.bam
Debug, haploclusters:[[{'start': 34166720, 'end': 34166721, 'chrom': '22', 'altbase': None, 'vaf': 1.0}]]
INFO 2016-10-17 15:44:04.555897 haplo_22_34166721_34166721 creating tmp bam: addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.bam
INFO 2016-10-17 15:44:05.746773 haplo_22_34166721_34166721 len(readlist): 58
INFO 2016-10-17 15:44:05.747037 haplo_22_34166721_34166721 copy number in snp region: 22 34166721 34166721 = 3.0

adjusted VAF: 0.333333333333

INFO 2016-10-17 15:44:05.747269 haplo_22_34166721_34166721 picked: 19
INFO 2016-10-17 15:44:05.748240 haplo_22_34166721_34166721 wrote: 58 mutated: 19
INFO 2016-10-17 15:44:05.751673 haplo_22_34166721_34166721 converting addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.bam to fastq

INFO 2016-10-17 15:44:05.752133 converting BAM addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.bam to FASTQ
[Mon Oct 17 15:44:06 EDT 2016] picard.sam.SamToFastq INPUT=addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.bam FASTQ=addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.fastq INTERLEAVE=true INCLUDE_NON_PRIMARY_ALIGNMENTS=false VALIDATION_STRINGENCY=SILENT OUTPUT_PER_RG=false RG_TAG=PU RE_REVERSE=true INCLUDE_NON_PF_READS=false READ1_TRIM=0 READ2_TRIM=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Mon Oct 17 15:44:06 EDT 2016] Executing as [email protected] on Mac OS X 10.10.5 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_51-b16; Picard version: 1.134(a7a08c474e4d99346eec7a9956a8fe71943b5d80_1434033355) JdkDeflater
[Mon Oct 17 15:44:06 EDT 2016] picard.sam.SamToFastq done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=257425408
INFO 2016-10-17 15:44:06.088677 haplo_22_34166721_34166721 aligning addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.fastq with bwa mem

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 116 sequences (11600 bp)...
[M::process] 0 single-end sequences; 116 paired-end sequences
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 58, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat](25, 50, 75) percentile: (469, 501, 535)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (337, 667)
[M::mem_pestat] mean and std.dev: (502.26, 51.98)
[M::mem_pestat] low and high boundaries for proper pairs: (271, 733)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 116 reads in 0.006 CPU sec, 0.006 real sec
[main] Version: 0.7.12-r1039
[main] CMD: bwa mem -t 1 -M -p ../test_data/Homo_sapiens_chr22_assembly19.fasta addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.fastq
[main] Real time: 0.058 sec; CPU: 0.060 sec
INFO 2016-10-17 15:44:06.150695 haplo_22_34166721_34166721 writing addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.bam.realign.sam to BAM...
INFO 2016-10-17 15:44:06.156371 haplo_22_34166721_34166721 deleting SAM: addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.bam.realign.sam
INFO 2016-10-17 15:44:06.156536 haplo_22_34166721_34166721 sorting output: samtools sort -@ 1 -m 10000000000 -T addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.bam.realign.sorted.bam -o addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.bam.realign.sorted.bam addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.bam
INFO 2016-10-17 15:44:06.161041 haplo_22_34166721_34166721 remove original bam:addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.bam
INFO 2016-10-17 15:44:06.161194 haplo_22_34166721_34166721 rename sorted bam: addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.bam.realign.sorted.bam to original name: addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.bam
INFO 2016-10-17 15:44:06.161334 haplo_22_34166721_34166721 indexing: samtools index addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.bam
[bam_index] corrupted or unsorted input file
INFO 2016-10-17 15:44:06.165385 haplo_22_34166721_34166721 removing addsnv.tmp/haplo_22_34166721_34166721.tmpbam.735cd520-a5e4-4d4b-98b3-b145819813ab.fastq

ERROR 2016-10-17 15:44:06.167530 encountered error in mutation spikein: ['22_34166720_34166721_1.0_None']
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/bamsurgeon-1.0-py2.7.egg/EGG-INFO/scripts/addsnv.py", line 280, in makemut
File "/Library/Python/2.7/site-packages/bamsurgeon-1.0-py2.7.egg/EGG-INFO/scripts/addsnv.py", line 56, in countReadCoverage
File "pysam/calignmentfile.pyx", line 1116, in pysam.calignmentfile.AlignmentFile.pileup (pysam/calignmentfile.c:12769)
NotImplementedError: pileup of samfiles not implemented yet

INFO 2016-10-17 15:44:06.168823 no succesful mutations
Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
[E::hts_open_format] fail to open file '../test_data/testregion_mut.bam'
[mpileup] failed to open ../test_data/testregion_mut.bam: No such file or directory
Failed to open -: unknown file type

alignment sanity check failed

Not sure what's going on here.. see error msg below. The error seems to be bam readcount < fastq readcount.

For info: the bam file is based on Haloplex, i.e. an amplicon strategy. Coverage in the area where the variant is introduced is ~2000x. The tested position sits directly at the end of one amplicon (of several) that overlaps it.

Any suggestions are greatly appreciated.

//Pär

[genetik@v01s979 simulBAM]$ run_addsnv.pl test.bed 1-11-ready.bam
INFO    2016-02-02 09:17:37.074275  starting /usr/bin/addsnv.py called with args: /usr/bin/addsnv.py -v test.bed -f 1-11-ready.bam -r /home/genetik/bcbio/share/bcbio/genomes/Hsapiens/hg19/seq/hg19.fa --aligner mem --picardjar /home/genetik/Downloads/bamsurgeon/picard/picard-tools-1.131/picard.jar -o test.bed.bam

Debug, haploclusters:[[{'start': 25880412, 'end': 25880412, 'chrom': 'chr1', 'altbase': None, 'vaf': None}]]
INFO    2016-02-02 09:17:37.116624  haplo_chr1_25880412_25880412    creating tmp bam:  addsnv.tmp/haplo_chr1_25880412_25880412.tmpbam.4999c452-32be-47c6-a4a4-e352ca495201.bam
WARN    2016-02-02 09:17:37.134438  haplo_chr1_25880412_25880413    could not pileup for region: chr1:25880316
INFO    2016-02-02 09:17:52.464997  haplo_chr1_25880412_25880412    len(readlist): 1998
INFO    2016-02-02 09:17:52.466797  haplo_chr1_25880412_25880412    selected VAF: 0.5

INFO    2016-02-02 09:17:52.471142  haplo_chr1_25880412_25880412    picked: 1000
INFO    2016-02-02 09:17:52.599467  haplo_chr1_25880412_25880412    wrote:  1998 mutated: 1000
INFO    2016-02-02 09:17:52.608542  haplo_chr1_25880412_25880412    converting addsnv.tmp/haplo_chr1_25880412_25880412.tmpbam.4999c452-32be-47c6-a4a4-e352ca495201.bam to fastq

INFO    2016-02-02 09:17:52.608865  converting BAM addsnv.tmp/haplo_chr1_25880412_25880412.tmpbam.4999c452-32be-47c6-a4a4-e352ca495201.bam to FASTQ
[Tue Feb 02 09:17:52 CET 2016] picard.sam.SamToFastq INPUT=addsnv.tmp/haplo_chr1_25880412_25880412.tmpbam.4999c452-32be-47c6-a4a4-e352ca495201.bam FASTQ=addsnv.tmp/haplo_chr1_25880412_25880412.tmpbam.4999c452-32be-47c6-a4a4-e352ca495201.fastq INTERLEAVE=true INCLUDE_NON_PRIMARY_ALIGNMENTS=false VALIDATION_STRINGENCY=SILENT    OUTPUT_PER_RG=false RG_TAG=PU RE_REVERSE=true INCLUDE_NON_PF_READS=false READ1_TRIM=0 READ2_TRIM=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Tue Feb 02 09:17:52 CET 2016] Executing as [email protected] on Linux 3.10.0-327.4.4.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_65-b17; Picard version: 1.131(cd60f90fdca902499c70a4472b6162ef37f919ce_1431022382) IntelDeflater
[Tue Feb 02 09:17:53 CET 2016] picard.sam.SamToFastq done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2022178816
INFO    2016-02-02 09:17:53.192863  haplo_chr1_25880412_25880412    aligning addsnv.tmp/haplo_chr1_25880412_25880412.tmpbam.4999c452-32be-47c6-a4a4-e352ca495201.fastq with bwa mem

[E::bwa_idx_load_from_disk] fail to locate the index files
INFO    2016-02-02 09:17:53.197474  haplo_chr1_25880412_25880412    writing addsnv.tmp/haplo_chr1_25880412_25880412.tmpbam.4999c452-32be-47c6-a4a4-e352ca495201.bam.realign.sam to BAM...
INFO    2016-02-02 09:17:53.202247  haplo_chr1_25880412_25880412    deleting SAM: addsnv.tmp/haplo_chr1_25880412_25880412.tmpbam.4999c452-32be-47c6-a4a4-e352ca495201.bam.realign.sam
INFO    2016-02-02 09:17:53.202353  haplo_chr1_25880412_25880412    sorting output: samtools sort -@ 1 -m 10000000000 -T addsnv.tmp/haplo_chr1_25880412_25880412.tmpbam.4999c452-32be-47c6-a4a4-e352ca495201.bam.realign.sorted.bam -o addsnv.tmp/haplo_chr1_25880412_25880412.tmpbam.4999c452-32be-47c6-a4a4-e352ca495201.bam.realign.sorted.bam addsnv.tmp/haplo_chr1_25880412_25880412.tmpbam.4999c452-32be-47c6-a4a4-e352ca495201.bam
INFO    2016-02-02 09:17:53.207738  haplo_chr1_25880412_25880412    remove original bam:addsnv.tmp/haplo_chr1_25880412_25880412.tmpbam.4999c452-32be-47c6-a4a4-e352ca495201.bam
INFO    2016-02-02 09:17:53.207860  haplo_chr1_25880412_25880412    rename sorted bam: addsnv.tmp/haplo_chr1_25880412_25880412.tmpbam.4999c452-32be-47c6-a4a4-e352ca495201.bam.realign.sorted.bam to original name: addsnv.tmp/haplo_chr1_25880412_25880412.tmpbam.4999c452-32be-47c6-a4a4-e352ca495201.bam
INFO    2016-02-02 09:17:53.208120  haplo_chr1_25880412_25880412    indexing: samtools index addsnv.tmp/haplo_chr1_25880412_25880412.tmpbam.4999c452-32be-47c6-a4a4-e352ca495201.bam
************************************************************
ERROR   2016-02-02 09:17:53.215854  encountered error in mutation spikein: ['chr1_25880412_25880412_None_None']
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/bamsurgeon-1.0-py2.7.egg/EGG-INFO/scripts/addsnv.py", line 275, in makemut
  File "build/bdist.linux-x86_64/egg/bamsurgeon/aligners.py", line 58, in remap_bam
    remap_bwamem_bam(bamfn, threads, fastaref, picardjar, mutid=mutid, paired=paired)
  File "build/bdist.linux-x86_64/egg/bamsurgeon/aligners.py", line 195, in remap_bwamem_bam
    raise ValueError("ERROR\t" + now() + "\t" + mutid + "\tbam readcount < fastq readcount, alignment sanity check failed!\n")
ValueError: ERROR   2016-02-02 09:17:53.215762  haplo_chr1_25880412_25880412    bam readcount < fastq readcount, alignment sanity check failed!

************************************************************
INFO    2016-02-02 09:17:53.233485  no succesful mutations

Mutated reads not being added to bam?

Hello again,

You helped me before to finally get addindel running, but after running my post-bamsurgeon BAM file through a program that detects expansive repeats, I began to question whether the BAM actually contained the mutated reads. I ran bamsurgeon as follows:
python2.7 ./addindel.py -v ../test_data/shortRepeats.txt -f input.bam -r human_g1k_v37.fasta -o ../test_data/shortRepeats_mut.bam --seed 1234 --ignorepileup

I thought it was odd that the secondary program was catching virtually none of the insertions that I spiked-in using bamsurgeon, so I ran samtools view on the shortRepeats_mut.bam output file. It looks like the mutated reads contained in the .log files are not present in the output BAM. For example, I have a log that contains a mutated read starting at chr 22 coordinate 18445701 with a "GC" repeat expansion inserted (bolded below):
read ST-E00144:74:H2KFLCCXX:1:1113:24233:26800,18445701,S AGAGACAGGGTCTCACTATGTTGCCCAGGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCAGTCACGAACTCCTACCTTCAAGCAATCTTCCTGCCTCCGCCTCCCAAAGTGCTGAGACTAGAGGGGAGAGCCGCCACACCTAGCC

All the other log files for this particular GC-insertion support that other reads were successfully mutated as well. However when I try to find the mutated read in the shortRepeats_mut.bam file (using samtools view), I only find the un-mutated sequence at this position:
22 18445701 TAGAGACAGGGTCTCACTATGTTGCCCAGGC**AGTCACGAACTCCTACCTTCAAGCAATCTTCCTGCCTCAGCCTCCCAAAGTGCTGAGATTAGAGGTGTGAGCCACCACACCTAGCCAGCGAGAAACACTTCTATTCCAATCAATCATCT
Note: I inserted a ** where the GC expansion should have been spiked-in.

Maybe I am misunderstanding something about bamsurgeon's functionality. Could you help explain what is going on here?

structural variants are not simulated exactly as input

SVs appear to simulated only approximately, usually within a 500-1000bp of the input regions. It would be nice if this were fixed. However, since this appears to be as designed rather than a bug, as a less preferable alternative it would be nice to have an output of the actual sv locations that were simulated.

input of star aligner

Hello,
We are using the star aligner for RNAseq reads. Now we wanted to validate our methods of variant calling by spiking variants using bamsurgeon. But .bam files from the star aligner are not submitted. Can you input the star aligner? That would be very helpful.
Thanks

Exonerate has moved

Adam,

Thanks for this awesome tool.

I noticed the path you have for exonerate on the readme and elsewhere is incorrect.

I was able to download from here if you want to update.
http://www.ebi.ac.uk/sites/ebi.ac.uk/files/groups/flicek/exonerate/exonerate-2.2.0-x86_64.tar.gz

SNV insertion creates a discordant read pair hotspot

Inserting a SNV results in more discordant read pairs containing reads spanning the SNV than would be expected from an actual SNV in the data. Is this a data issue where the bamsurgeon aligner was not the same aligner used for the alignment of originating BAM, a bamsurgeon algorithm issue, or an issue caused by the aligner behaving differently when only dealing with the bamsurgeon subset of the data (eg bwa calculation of fragment size distribution could be problematic if there's lots of SV in the bamsurgeon subset).
From the IS6 DREAM challenge data set:

AttributeError: 'csamtools.PileupRead' object has no attribute 'query_position'

I am getting the an error running bamsurgeon (see below). Please let me know if you need any more information. Thanks in advance for your help!

$ INFO 2015-10-09 15:27:31.643881 starting /packages/bamsurgeon/addsnv.py called with args: /packages/bamsurgeon/addsnv.py -v randStrexomePos.txt -f Control.bwa.final.bam -r hs37d5.fa -o output.bam -m 0.25

Debug, haploclusters:[[{'start': 1133208, 'end': 1133208, 'chrom': '1', 'altbase': None, 'vaf': 0.25}]]
INFO 2015-10-09 15:27:31.749428 haplo_1_1133208_1133208 creating tmp bam: addsnv.tmp/haplo_1_1133208_1133208.tmpbam.ea057829-9464-4363-8fde-72e8ef7eea39.bam

ERROR 2015-10-09 15:27:31.756894 encountered error in mutation spikein: ['1_1133208_1133208_0.25_None']
Traceback (most recent call last):
File "/packages/bamsurgeon/addsnv.py", line 155, in makemut
mutfail, hasSNP, maxfrac, outreads, mutreads, mutmates = mutation.mutate(args, log, bamfile, bammate, chrom, min(mutpos_list), max(mutpos_list)+1, mutpos_list, avoid=avoid, mutid_list=mutid_list, is_snv=True, mutbase_list=mutbase_list, reffile=reffile)
File "/packages/bamsurgeon/bs/mutation.py", line 141, in mutate
if pread.query_position is not None and not pread.alignment.is_secondary and bin(pread.alignment.flag & 2048) != bin(2048):
AttributeError: 'csamtools.PileupRead' object has no attribute 'query_position'

INFO 2015-10-09 15:27:31.770180 no succesful mutations

tmap realignement - NOID read group not in the header

Dear Adam,

I am trying to use MuTect to perform variant calling on IonTorrent data mutated with bamsurgeon.
I've just checked that the final bamsurgeon bam file does not contain in the header the "NOID" read group from tmap realignement, and MuTect doesn't like this.
To fix this problem I add a posteriori a new read group line in the final bam file header (you can see my commands here https://github.com/IARC-bioinfo/BAM-tricks).
Do you see any other solution?
Does this problem arise with the other aligners?

realignment problem with BWAMEM

Try to run the test (test_snv_bwamem.sh), resulting following error message:
Command:

../test/test_snv_bwamem.sh 1 1 /net/kodiak/volumes/river/shared/users/ltfang/link_to_reference/human_g1k_v37_decoy.fasta /home/yaol12/opt/picard-tools-1.130/picard.jar

Logs:

INFO 2015-11-17 14:37:49.936238 starting /home/yaol12/river/software/PythonLibs/bamsurgeon/bin/addsnv.py called with args: /home/yaol12/river/software/PythonLibs/bamsurgeon/bin/addsnv.py -v ../test_data/random_snvs.txt -f ../test_data/testregion.bam -r /net/kodiak/volumes/river/shared/users/ltfang/link_to_reference/human_g1k_v37_decoy.fasta -o ../test_data/testregion_mut.bam -n 1 -c ../test_data/test_cnvlist.txt.gz -p 1 --picardjar /home/yaol12/opt/picard-tools-1.130/picard.jar --aligner mem

Debug, haploclusters:[[{'start': 34166720, 'end': 34166721, 'chrom': '22', 'altbase': None, 'vaf': 1.0}]]
INFO 2015-11-17 14:37:49.991232 haplo_22_34166721_34166721 creating tmp bam: addsnv.tmp/haplo_22_34166721_34166721.tmpbam.e50b82c5-5bca-4047-846a-bc5df9a950fc.bam
INFO 2015-11-17 14:37:51.910871 haplo_22_34166721_34166721 len(readlist): 58
INFO 2015-11-17 14:37:51.911185 haplo_22_34166721_34166721 copy number in snp region: 22 34166721 34166721 = 3.0

adjusted VAF: 0.333333333333

INFO 2015-11-17 14:37:51.911363 haplo_22_34166721_34166721 picked: 19
INFO 2015-11-17 14:37:51.912878 haplo_22_34166721_34166721 wrote: 58 mutated: 19
INFO 2015-11-17 14:37:51.928637 haplo_22_34166721_34166721 converting addsnv.tmp/haplo_22_34166721_34166721.tmpbam.e50b82c5-5bca-4047-846a-bc5df9a950fc.bam to fastq

INFO 2015-11-17 14:37:51.929061 converting BAM addsnv.tmp/haplo_22_34166721_34166721.tmpbam.e50b82c5-5bca-4047-846a-bc5df9a950fc.bam to FASTQ
[Tue Nov 17 14:37:52 PST 2015] picard.sam.SamToFastq INPUT=addsnv.tmp/haplo_22_34166721_34166721.tmpbam.e50b82c5-5bca-4047-846a-bc5df9a950fc.bam FASTQ=addsnv.tmp/haplo_22_34166721_34166721.tmpbam.e50b82c5-5bca-4047-846a-bc5df9a950fc.fastq INTERLEAVE=true INCLUDE_NON_PRIMARY_ALIGNMENTS=false VALIDATION_STRINGENCY=SILENT OUTPUT_PER_RG=false RG_TAG=PU RE_REVERSE=true INCLUDE_NON_PF_READS=false READ1_TRIM=0 READ2_TRIM=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Tue Nov 17 14:37:52 PST 2015] Executing as yaol12@tehran-11 on Linux 3.2.0-4-amd64 amd64; OpenJDK 64-Bit Server VM 1.7.0_79-b14; Picard version: 1.130(8b3e8abe25f920f5aa569db482bb999f29cc447b_1427207353) IntelDeflater
[Tue Nov 17 14:37:52 PST 2015] picard.sam.SamToFastq done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=1521483776
INFO 2015-11-17 14:37:52.499853 haplo_22_34166721_34166721 aligning addsnv.tmp/haplo_22_34166721_34166721.tmpbam.e50b82c5-5bca-4047-846a-bc5df9a950fc.fastq with bwa mem

[main] unrecognized command 'mem'
INFO 2015-11-17 14:37:52.528034 haplo_22_34166721_34166721 writing addsnv.tmp/haplo_22_34166721_34166721.tmpbam.e50b82c5-5bca-4047-846a-bc5df9a950fc.bam.realign.sam to BAM...
INFO 2015-11-17 14:37:52.553217 haplo_22_34166721_34166721 deleting SAM: addsnv.tmp/haplo_22_34166721_34166721.tmpbam.e50b82c5-5bca-4047-846a-bc5df9a950fc.bam.realign.sam
INFO 2015-11-17 14:37:52.572843 haplo_22_34166721_34166721 sorting output: samtools sort -@ 1 -m 10000000000 addsnv.tmp/haplo_22_34166721_34166721.tmpbam.e50b82c5-5bca-4047-846a-bc5df9a950fc.bam addsnv.tmp/haplo_22_34166721_34166721.tmpbam.e50b82c5-5bca-4047-846a-bc5df9a950fc.bam.realign.sorted
INFO 2015-11-17 14:37:52.595364 haplo_22_34166721_34166721 remove original bam:addsnv.tmp/haplo_22_34166721_34166721.tmpbam.e50b82c5-5bca-4047-846a-bc5df9a950fc.bam
INFO 2015-11-17 14:37:52.611344 haplo_22_34166721_34166721 rename sorted bam: addsnv.tmp/haplo_22_34166721_34166721.tmpbam.e50b82c5-5bca-4047-846a-bc5df9a950fc.bam.realign.sorted.bam to original name: addsnv.tmp/haplo_22_34166721_34166721.tmpbam.e50b82c5-5bca-4047-846a-bc5df9a950fc.bam
INFO 2015-11-17 14:37:52.619693 haplo_22_34166721_34166721 indexing: samtools index addsnv.tmp/haplo_22_34166721_34166721.tmpbam.e50b82c5-5bca-4047-846a-bc5df9a950fc.bam

ERROR 2015-11-17 14:37:52.645301 encountered error in mutation spikein: ['22_34166720_34166721_1.0_None']
Traceback (most recent call last):
File "/home/yaol12/river/software/PythonLibs/bamsurgeon/bin/addsnv.py", line 258, in makemut
aligners.remap_bam(args.aligner, tmpoutbamname, args.refFasta, alignopts, mutid=hapstr, paired=(not args.single), picardjar=args.picardjar)
File "build/bdist.linux-x86_64/egg/bamsurgeon/aligners.py", line 58, in remap_bam
remap_bwamem_bam(bamfn, threads, fastaref, picardjar, mutid=mutid, paired=paired)
File "build/bdist.linux-x86_64/egg/bamsurgeon/aligners.py", line 199, in remap_bwamem_bam
raise ValueError("ERROR\t" + now() + "\t" + mutid + "\tbam readcount < fastq readcount, alignment sanity check failed!\n")
ValueError: ERROR 2015-11-17 14:37:52.645257 haplo_22_34166721_34166721 bam readcount < fastq readcount, alignment sanity check failed!

INFO 2015-11-17 14:37:52.677268 no succesful mutations
[main] Unrecognized command.
[mpileup] 1 samples in 1 input files

exonerate no longer supported?

Hello,

Do you plan to support another recent alignment software alternative to exonerate?

Thanks,
Sirisha

Question about warning messages

Hi Adam,

I had a couple of questions about some warning messages: what exactly does too few reads in region (6) skipping... mean (region (6) being an example), and what does forced 3 reads (3 being an example) mean? What about could not pileup for region: 5:100015928? Do all these error messages result in the mutation not being spiked in? If not, which ones do? Does the warning message dropped for outcover/incover < 0.9 mean that the mutation gets dropped, or that it does not get spiked in?

Also, you were referring to tests of bamsurgeon, but not sure exactly which ones you were talking about. In the manual it says:

The script test/test snv.sh may be run with no parameters to display a usage statement:
usage: ./test_snv.sh <number of SNPs> <number of threads> <reference indexed with bwa index>

Is this what you were referring to? Is it possible to test this out with actual data?

Picard prerequisite

Picard is now up to 2.7.0.

Is there a reason why 1.131 is chosen instead of, at least, 1.141 (the last to support Java 1.7, if that's what you need).

There are almost 180 changes between 131 and 141. 500+ changes bn 131 and 2.7.0

Postprocess.py not updating TLEN when cleaning up inconsistencies.

The postprocess.py does not completely clean up the inconsistencies in a the bam produced by Bam Surgeon. While the PNEXT is updated with the mate's position, the new TLEN is not also being updated.

Below is an example.

With Inconsistencies.

samtools view flagged_reads_c.bam
HWI-ST791:143380313:C2TU0ACXX:4:1208:17404:46680 163 1 26879185 60 100M = 26879234 149 CCAGGCTGGTCTCGAACTCCTGACCTCGGGTGATCCGCCTGCCTCAGCCTCCCAAACTGCTGGCTGGGATTATAGGTGTGAGCCACCGTGCCCGGCTTAA >?:@<?<>??8=?9=7=??=>=:>?==>:9=;?>>>?@;;@?>;A@@?>=@@:?@@:9>;:=<:<;@=?@;@=@@<@b28<>><@A=:A@=:?B XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:100 RG:Z:2893716229
HWI-ST791:143380313:C2TU0ACXX:4:1208:17404:46680 83 1 26879326 17 100M = 26879043 -383 AAAGAAAAATATTCATGAAGAAGTCTTCCTCCCACCCCAGCCCCCTCTATTCCCATCTATAGGTAAACATTTTTATTGGTTTCTTGTTTAACTTTCCCAT ?<:9:@>??:<>A@@;@:@:AB<@@@?A?@>>?;@===;;=;;;;>?@>?>??>>?:??A@9@>??=>=@?=;:@;;>?9>?>?;?;::;=?=B?A==>? XT:A:U NM:i:0 SM:i:17 AM:i:17 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:100 RG:Z:2893716229

After processing with postprocess.py...
samtools view flagged_reads_c.postprocessed.bam
HWI-ST791:143380313:C2TU0ACXX:4:1208:17404:46680 163 1 26879185 60 100M = 26879326 149 CCAGGCTGGTCTCGAACTCCTGACCTCGGGTGATCCGCCTGCCTCAGCCTCCCAAACTGCTGGCTGGGATTATAGGTGTGAGCCACCGTGCCCGGCTTAA >?:@<?<>??8=?9=7=??=>=:>?==>:9=;?>>>?@;;@?>;A@@?>=@@:?@@:9>;:=<:<;@=?@;@=@@<@b28<>><@A=:A@=:?B NM:i:0 MD:Z:100 RG:Z:f12ce237-8fac-4073-8c8e-d74821b6a4d8
HWI-ST791:143380313:C2TU0ACXX:4:1208:17404:46680 83 1 26879326 17 100M = 26879185 -383 AAAGAAAAATATTCATGAAGAAGTCTTCCTCCCACCCCAGCCCCCTCTATTCCCATCTATAGGTAAACATTTTTATTGGTTTCTTGTTTAACTTTCCCAT ?<:9:@>??:<>A@@;@:@:AB<@@@?A?@>>?;@===;;=;;;;>?@>?>??>>?:??A@9@>??=>=@?=;:@;;>?9>?>?;?;::;=?=B?A==>? NM:i:0 MD:Z:100 RG:Z:f12ce237-8fac-4073-8c8e-d74821b6a4d8

While the mate position is updated as expected, the TLEN is not and the TLEN still disagrees between the pair-ends.

Picard FixSamFIle produces the following reads where the TLEN is updated to 241...

HWI-ST791:143380313:C2TU0ACXX:4:1208:17404:46680 163 1 26879185 60 100M = 26879326 241 CCAGGCTGGTCTCGAACTCCTGACCTCGGGTGATCCGCCTGCCTCAGCCTCCCAAACTGCTGGCTGGGATTATAGGTGTGAGCCACCGTGCCCGGCTTAA >?:@<?<>??8=?9=7=??=>=:>?==>:9=;?>>>?@;;@?>;A@@?>=@@:?@@:9>;:=<:<;@=?@;@=@@<@b28<>><@A=:A@=:?B X0:i:1 X1:i:0 MD:Z:100 RG:Z:2893716229 XG:i:0 AM:i:37 NM:i:0 SM:i:37 XM:i:0 XO:i:0 MQ:i:17 XT:A:U
HWI-ST791:143380313:C2TU0ACXX:4:1208:17404:46680 83 1 26879326 17 100M = 26879185 -241 AAAGAAAAATATTCATGAAGAAGTCTTCCTCCCACCCCAGCCCCCTCTATTCCCATCTATAGGTAAACATTTTTATTGGTTTCTTGTTTAACTTTCCCAT ?<:9:@>??:<>A@@;@:@:AB<@@@?A?@>>?;@===;;=;;;;>?@>?>??>>?:??A@9@>??=>=@?=;:@;;>?9>?>?;?;::;=?=B?A==>? X0:i:1 X1:i:0 MD:Z:100 RG:Z:2893716229 XG:i:0 AM:i:17 NM:i:0 SM:i:17 XM:i:0 XO:i:0 MQ:i:60 XT:A:U

How to use autoreconf without root permission

Hi,

I am having trouble using autoreconf without having root permission. When I try doing autoreconf -i when installing exonerate, I get "Permission denied". What should I do to resolve this issue?

Thanks,
Alva

AssertionError

The error message is:

ERROR   2016-03-30 18:05:41.418927  encountered error in mutation spikein: ['chr1_11169752_11169752_0.53_None']
Traceback (most recent call last):
  File "/home/guo/anaconda/lib/python2.7/site-packages/bamsurgeon-1.0-py2.7.egg/EGG-INFO/scripts/addsnv.py", line 274, in makemut
  File "build/bdist.linux-x86_64/egg/bamsurgeon/aligners.py", line 58, in remap_bam
    remap_bwamem_bam(bamfn, threads, fastaref, picardjar, mutid=mutid, paired=paired)
  File "build/bdist.linux-x86_64/egg/bamsurgeon/aligners.py", line 154, in remap_bwamem_bam
    fastq = bamtofastq(bamfn, picardjar, threads=threads, paired=paired)[0]
  File "build/bdist.linux-x86_64/egg/bamsurgeon/common.py", line 116, in bamtofastq
    assert os.path.exists(outfq) # conversion failed
AssertionError

My VarFile is:

chr1    11169752        11169752        0.53
chr1    11187075        11187075        0.503
chr1    11188125        11188125        0.735
chr1    11217241        11217241        0.548
chr1    11298593        11298593        0.898
chr1    11303201        11303201        0.651
chr1    11316087        11316087        0.659
chr1    20945016        20945016        0.53
chr1    26231164        26231164        0.875
chr1    65301817        65301817        0.616
chr1    65323420        65323420        0.518
chr1    65332561        65332561        0.809
chr1    97981456        97981456        0.557
chr1    115252279       115252279       0.59
chr1    144923733       144923733       0.563

My cmd is:

addsnv.py -v $varfile -f $bamfile -r $ref -p 1 -o $outbam --requirepaired --aligner mem --alignopts t:10 --picardjar /tools/GATK/picard\
/picard-1.119.jar

Allowing for variation in coverage given specified allelic fraction?

Hi Adam,

I see in the code that you calculate existing coverage of aligned reads over a region of interest (and you also calculate output coverage). Do you allow for some variation in coverage when mutations are introduced (when you flip reads from covering the reference base to an alternate base)? Or, say if coverage is 20 and VAF is 0.5, does that simply mean that the bases in 10 reads aligned to the reference position are changed to an alternate base?

Thanks in advance,
Alva

High coverage - alignment sanity check failed

Dear Adam,

With an high coverage (around 7000), I get this error (not everytime but around 1 time out of 10) :

ERROR 2015-08-31 10:01:45.879129 encountered error in mutation spikein: ['chr17_7573000_7573000_0.2_A']
Traceback (most recent call last):
File "../bamsurgeon-master/addsnv.py", line 231, in makemut
aligners.remap_bam(args.aligner, tmpoutbamname, args.refFasta, alignopts, mutid=hapstr, paired=(not args.single), picardjar=args.picardjar)
File "/home/delhommet/Documents/Softs/bamsurgeon/bamsurgeon-master/bs/aligners.py", line 70, in remap_bam
remap_tmap_bam(bamfn, threads, fastaref, picardjar, mutid=mutid, paired=paired)
File "/home/delhommet/Documents/Softs/bamsurgeon/bamsurgeon-master/bs/aligners.py", line 476, in remap_tmap_bam
raise ValueError("ERROR\t" + now() + "\t" + mutid + "\tbam readcount < fastq readcount, alignment sanity check failed!\n")
ValueError: ERROR 2015-08-31 10:01:45.879037 haplo_chr17_7573000_7573000 bam readcount < fastq readcount, alignment sanity check failed!

I think this is in the case where some reads after being mutated can not be aligned (I check the number of aligned reads after and before alignment with samtools idxstats and this shows that (fastq readcount - bamfn readcount) = unmapped readcount in the final bam).
Do you think I'm right? Is it possible to just have a warning in this case?
The data I used can be found here : https://github.com/tdelhomme/test_bamsurgeon (001tmapAligned_options_sorted.*) in additon to the whole error output (error_output.txt).

ERROR: encountered error in mutation spikein

bamsurgeon stopped running and I got the following error message:

ERROR 2016-04-13 00:48:07.825074 encountered error in mutation spikein: ['5_104998603_104998603_0.444444444444_None']
Traceback (most recent call last):
File "/ifs/home/c2b2/mp_lab/ams2432/bamsurgeon-master/bin/addsnv.py", line 156, in makemut
mutfail, hasSNP, maxfrac, outreads, mutreads, mutmates = mutation.mutate(args, log, bamfile, bammate, chrom, min(mutpos_list), max(mutpos_list)+1, mutpos_list, avoid=avoid, mutid_list=mutid_list, is_snv=True, mutbase_list=mutbase_list, reffile=reffile)
File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 153, in mutate
if pread.query_position is not None and not pread.alignment.is_secondary and bin(pread.alignment.flag & 2048) != bin(2048):
AttributeError: 'csamtools.PileupRead' object has no attribute 'query_position'

INFO 2016-04-13 00:48:07.839295 haplo_5_104999315_104999315 creating tmp bam: addsnv.tmp/haplo_5_104999315_104999315.tmpbam.2879ceab-8fe0-498e-8221-7c2bed3bb6a9.bam

ERROR 2016-04-13 00:48:07.857766 encountered error in mutation spikein: ['5_104999315_104999315_0.588235294118_None']
Traceback (most recent call last):
File "/ifs/home/c2b2/mp_lab/ams2432/bamsurgeon-master/bin/addsnv.py", line 156, in makemut
mutfail, hasSNP, maxfrac, outreads, mutreads, mutmates = mutation.mutate(args, log, bamfile, bammate, chrom, min(mutpos_list), max(mutpos_list)+1, mutpos_list, avoid=avoid, mutid_list=mutid_list, is_snv=True, mutbase_list=mutbase_list, reffile=reffile)
File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 153, in mutate
if pread.query_position is not None and not pread.alignment.is_secondary and bin(pread.alignment.flag & 2048) != bin(2048):
AttributeError: 'csamtools.PileupRead' object has no attribute 'query_position'

INFO 2016-04-13 00:48:07.874683 no succesful mutations

These are only the last few lines in the log file, there are more errors if you keep going back.

These are the inputs:
INFO 2016-04-13 00:29:11.330065 starting /ifs/home/c2b2/mp_lab/ams2432/bamsurgeon-master/bin/addsnv.py called with args: /ifs/home/c2b2/mp_lab/ams2432/bamsurgeon-master/bin/addsnv.py -v /ifs/scratch/c2b2/mp_lab/ams2432/hutt/dnm_files/matching/bamsurgeon/bed_input/sites.non_cpgs.84.02.trio3.chr5.20000.txt -f /ifs/scratch/c2b2/mp_lab/ams2432/hutt/best_practices/recal_bams_split/recal_Hutt3-chr5_5mb_region.bam -r /ifs/data/c2b2/mp_lab/ams2432/files_needed/human_ref/hs37d5.fa -o /ifs/scratch/c2b2/mp_lab/ams2432/hutt/dnm_files/matching/bamsurgeon/spiked_bams/spikedin_20000_Hutt3-chr5_5mb_region.bam --picardjar /ifs/home/c2b2/mp_lab/ams2432/picard-tools-1.141/ --mindepth 11 --maxdepth 48 --minmutreads 3 --tagreads --aligner mem

Not sure what is going here; would you know how to fix this? Is there a problem with my input file?

Error in evaluator.py

Hi,

When running evaluator.py on a Strelka generated VCF, I am getting the following:

python ~/bamsurgeon/bamsurgeon/scripts/evaluator.py -v passed.somatic.indels.vcf -t ../synthetic.challenge.set5.tumour.truth.30pctmasked.vcf.gz -m INDEL
Traceback (most recent call last):
File "/home/lmose/bamsurgeon/bamsurgeon/scripts/evaluator.py", line 330, in
main(args)
File "/home/lmose/bamsurgeon/bamsurgeon/scripts/evaluator.py", line 313, in main
debug=args.debug)
File "/home/lmose/bamsurgeon/bamsurgeon/scripts/evaluator.py", line 183, in evaluate
truvcfh = vcf.Reader(filename=truth)
File "/usr/lib64/python2.7/site-packages/PyVCF-0.6.0-py2.7-linux-x86_64.egg/vcf/parser.py", line 216, in init
self._parse_metainfo()
File "/usr/lib64/python2.7/site-packages/PyVCF-0.6.0-py2.7-linux-x86_64.egg/vcf/parser.py", line 254, in _parse_metainfo
key, val = parser.read_meta(line.strip())
File "/usr/lib64/python2.7/site-packages/PyVCF-0.6.0-py2.7-linux-x86_64.egg/vcf/parser.py", line 166, in read_meta
return self.read_meta_hash(meta_string)
File "/usr/lib64/python2.7/site-packages/PyVCF-0.6.0-py2.7-linux-x86_64.egg/vcf/parser.py", line 161, in read_meta_hash
val = dict(item.split("=") for item in hashItems)
ValueError: dictionary update sequence element #4 has length 71; 2 is required

Any idea as to what I'm doing wrong here? I'm running against the bamsurgeon code checked out today. The truth set is from the Dream Challenge 5.

addsnv has errors in the pysam step

hello,
i'm trying to spike a bam file but i keep getting the following errors and i need to know how to debug this:
python addsnv.py -v ../test_data/random_snvs.txt -f ../test_data/testregion_novo_realign.bam -r ../test_data/Homo_sapiens_chr22_assembly19.fasta -o ../test_data/testregion_novo_spikedMut.bam -p 5 --picardjar /nfs/Public/picard/1.140/picard.jar --ignoresnps --force --tmpdir /tmp/BamspikingTmp

when using the test data and using the default aligner (or when im using novoalign as my aligner and provide the index for novoalign, i get the following output for my process:

(..only copying the last couple logs as there are many, i presume, one for each snv that it tried to spike. )

ERROR 2016-05-27 07:05:58.862516 encountered error in mutation spikein: ['22_34314307_34314308_None_None']
Traceback (most recent call last):
File "addsnv.py", line 156, in makemut
mutfail, hasSNP, maxfrac, outreads, mutreads, mutmates = mutation.mutate(args, log, bamfile, bammate, chrom, min(mutpos_list), max(mutpos_list)+1, mutpos_list, avoid=avoid, mutid_list=mutid_list, is_snv=True, mutbase_list=mutbase_list, reffile=reffile)
File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 186, in mutate
mate = find_mate(pread.alignment, bammate)
File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 121, in find_mate
chrom = read.next_reference_name
AttributeError: 'pysam.calignmentfile.AlignedSegment' object has no attribute 'next_reference_name'

ERROR 2016-05-27 07:05:59.529534 encountered error in mutation spikein: ['22_34322818_34322819_None_None']
Traceback (most recent call last):
File "addsnv.py", line 156, in makemut
mutfail, hasSNP, maxfrac, outreads, mutreads, mutmates = mutation.mutate(args, log, bamfile, bammate, chrom, min(mutpos_list), max(mutpos_list)+1, mutpos_list, avoid=avoid, mutid_list=mutid_list, is_snv=True, mutbase_list=mutbase_list, reffile=reffile)
File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 186, in mutate
mate = find_mate(pread.alignment, bammate)
File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 121, in find_mate
chrom = read.next_reference_name
AttributeError: 'pysam.calignmentfile.AlignedSegment' object has no attribute 'next_reference_name'

INFO 2016-05-27 07:05:59.531348 no succesful mutations

ValueError: reference_id -1 out of range 0<=tid<84

Hiya Adam

I am having issues running the addsnp.py file. I am using the human_g1k_v37 reference from the 1000 genomes project along with a BAM file from the project and I am getting the following error:

ERROR 2016-10-04 10:27:05.346346 encountered error in mutation spikein: ['1_115251145_115251145_0.5_C']
Traceback (most recent call last):
File "addsnv.py", line 156, in makemut
mutfail, hasSNP, maxfrac, outreads, mutreads, mutmates = mutation.mutate(args, log, bamfile, bammate, chrom, min(mutpos_list), max(mutpos_list)+1, mutpos_list, avoid=avoid, mutid_list=mutid_list, is_snv=True, mutbase_list=mutbase_list, reffile=reffile)
File "/usr/local/lib/python2.7/dist-packages/bamsurgeon-1.0-py2.7.egg/bamsurgeon/mutation.py", line 186, in mutate
mate = find_mate(pread.alignment, bammate)
File "/usr/local/lib/python2.7/dist-packages/bamsurgeon-1.0-py2.7.egg/bamsurgeon/mutation.py", line 121, in find_mate
chrom = read.next_reference_name
File "pysam/calignedsegment.pyx", line 941, in pysam.calignedsegment.AlignedSegment.next_reference_name.get (pysam/calignedsegment.c:11814)
File "pysam/calignmentfile.pyx", line 1654, in pysam.calignmentfile.AlignmentFile.getrname (pysam/calignmentfile.c:18204)
File "pysam/calignmentfile.pyx", line 679, in pysam.calignmentfile.AlignmentFile.get_reference_name (pysam/calignmentfile.c:9070)
ValueError: reference_id -1 out of range 0<=tid<84

I have checked all the reference files etc and I cant see what is wrong. Any help would be appreciated as I am a bit stumped!

Ta!
Kirsty

error using addsnv.py

ERROR   2015-02-02 13:21:00.370856  encountered error in mutation spikein: IV_8282495_8282495_0.5_None
Traceback (most recent call last):
  File "/bamsurgeon/addsnv.py", line 166, in makemut
    basepile += pread.alignment.seq[pread.qpos-1]
AttributeError: 'pysam.calignmentfile.PileupRead' object has no attribute 'qpos'

Here is an error I am getting - any idea what might be causing it?

invalid bam output

Hi again,

Got it to work, bamsurgeon now outputs a bam file. The problem was indeed the missing index (#42). However, a new problem has now emerged.

When I try to open the bam in igv or with picard's validator, things fail with this error message:

java.lang.ClassCastException: java.lang.Character cannot be cast to java.lang.String

samtools quickcheck however does not find any problems.

(I have sorted and indexed the bam file).

Is this something that you have seen before?

art_illumina instead of wgsim

I was wondering if art_illumina would be more useful compared to wgsim?

http://www.niehs.nih.gov/research/resources/software/biostatistics/art/

It has more flexible parameters, and has profiles (and a profile builder) for different machines.

encountering error on multibam branch

Adam,

My name is John Malamon. I am a Data Analyst at UPenn. I have been working with Charlie on doing a data analysis plan for Structural Variance. I am encountering an error (below) when running the following command :

python2.7 ../addsv.py -p 4 -i 1_259_22:1_371_28:0.4_540_50,0.6_143_50 -v ../test_data/test_multi.sv.txt -f ../test_data/test_multi.lib1.bam:../test_data/test_multi.lib2.bam:../test_data/test_multi.lib3.bam -r /mnt/genomics/REF/hg19/hg19.fasta.fai -o ../test_data/test_multi.lib1.sv.bam:../test_data/test_multi.lib2.sv.bam:../test_data/test_multi.lib3.sv.bam -c ../test_data/test_multi.cnv.txt.gz

STDOUT

Final graph has 10 nodes and n50 of 3663, max 3680, total 14416, using 4721/4818 reads
warning, cannot find mate for read marked paired: HWI-ST446:143358984:C2TYLACXX:7:1311:16560:77547
warning, cannot find mate for read marked paired: HWI-ST446:143358984:C2TYLACXX:7:1205:5232:77537
warning, cannot find mate for read marked paired: HWI-ST446:143358984:C2TYLACXX:7:2201:2090:41095
found mates for 5000 reads, 0.0076 discordant.
warning, cannot find mate for read marked paired: HWI-ST446:143358984:C2TYLACXX:8:1304:1815:41552
warning, cannot find mate for read marked paired: HWI-ST446:143358984:C2TYLACXX:7:2307:18516:48188
found mates for 5000 reads, 0.0082 discordant.
found mates for 4000 reads, 0.0065 discordant.
warning, cannot find mate for read marked paired: HWI-ST446:143358984:C2TYLACXX:7:1107:6691:68262
warning, cannot find mate for read marked paired: HWI-ST446:143358984:C2TYLACXX:7:1310:4567:72705
warning, cannot find mate for read marked paired: HWI-ST446:143358984:C2TYLACXX:8:2105:11128:56118
warning, cannot find mate for read marked paired: HWI-ST446:143358984:C2TYLACXX:8:1304:17592:74924
warning, cannot find mate for read marked paired: HWI-ST446:143358984:C2TYLACXX:8:2203:12721:11697
warning, cannot find mate for read marked paired: HWI-ST446:143358984:C2TYLACXX:8:1310:14915:75626
warning, cannot find mate for read marked paired: HWI-ST446:143358984:C2TYLACXX:8:2203:7836:90098
found mates for 6000 reads, 0.00866666666667 discordant.
found mates for 5000 reads, 0.0058 discordant.
found mates for 6000 reads, 0.0105 discordant.
warning, cannot find mate for read marked paired: HWI-ST446:143358984:C2TYLACXX:8:2109:11083:82689

encountered error in mutation spikein: 11 70071043 70084754 INV

Traceback (most recent call last):
File "../addsv.py", line 554, in makemut
qrystart, qryend = map(int, alignstats[2:4])
ValueError: need more than 0 values to unpack

interval: ['11', '70271043', '70284754', 'DUP', '3']
length: 13711

Please let me know if you require any additional information.
You help is certainly most appreciated.

Partial reads

Hi, Adam.

BAMSurgeon reported errors as below:
error replacing quality score for read: SRR359108.180851272 : quality and sequence mismatch: 34 != 100

It looks like BAMSurgeon requests all reads having same length.
Is it a feature of BAMSurgeon or just a bug?
Should the bam file be filtered first?

Thank you very much.

Henry

tests on test_data fail

Installed bamsurgeon with no problems.
Ran programs in surgeon-master/tests -- all failed with error:

Target and donor are aligned to incompatable reference genomes!
Text below shows for program "test_svn.sh"
o command for "test_snv.sh"
o output from "test_snv.sh" to STDOUT
o directory listing of out put files
The other two test programs showed similar results.

What can I do to correct the installation?
Thanks
--sandy

Sanford M.Orlow Phd
National Institutes of Health
Bethesda MD
[email protected]
#301-496-5362

$ (cd testbin;./test_snv.sh 10 1 /fdb/igenomes/Homo_sapiens/Ensembl
/GRCh37 /Sequence/BWAIndex/genome.fa)
^ created directory: addsnv_logs_testregion_mut.bam
| creating tmp bam: tmpbam0.973639139619.bam
| len(readlist): 58
| copy number in snp region: 22 34166720 34166721 = 3.0
| adjusted VAF: 0.333333333333
| picked: 19
| wrote: 58 mutated: 19
|
TOTAL OF 638 LINES
|
| DEBUG: len(bamlist) 10
| merging, cmd: ['samtools', 'merge', '-f', 'tmp.merging.0.tmp.0.605064592051.muts.bam', 'tmpbam0.973639139619.bam', 'tmpbam0.864850773523.bam', 'tmpbam0.577943620935.bam', 'tmpbam0.155017835179.bam', 'tmpbam0.993849196129.bam', 'tmpbam0.0611915676721.bam', 'tmpbam0.740942761619.bam', 'tmpbam0.795170425721.bam', 'tmpbam0.360219175697.bam', 'tmpbam0.627406859454.bam']
| merge finished, renaming: tmp.merging.0.tmp.0.605064592051.muts.bam --> tmp.0.605064592051.muts.bam
| done making mutations, merging mutations into ../test_data/testregion.bam --> ../test_data/testregion_mut.bam
| Target and donor are aligned to incompatable reference genomes!

v addsnv.py failed.

TEST OUTPUT FILES

TEST_SNV_1:
drw-r-xr-x 2 4096 Feb 4 12:16 addsnv_logs_testregion_mut.bam
-rw-r--r-- 1 39627 Feb 3 13:17 test_snv.log
-rw-r--r-- 1 33951 Feb 3 13:16 tmp.0.605064592051.muts.bam

TEST_SNV_1/addsnv_logs_testregion_mut.bam:
-rw-r--r-- 1 10346 Feb 3 13:15 testregion_mut.bam.22_33702084_33702085.log
-rw-r--r-- 1 8611 Feb 3 13:15 testregion_mut.bam.22_33714964_33714965.log
-rw-r--r-- 1 8612 Feb 3 13:15 testregion_mut.bam.22_33769483_33769484.log
-rw-r--r-- 1 7935 Feb 3 13:15 testregion_mut.bam.22_33908770_33908771.log
-rw-r--r-- 1 8616 Feb 3 13:15 testregion_mut.bam.22_33926586_33926587.log
-rw-r--r-- 1 8605 Feb 3 13:16 testregion_mut.bam.22_33943910_33943911.log
-rw-r--r-- 1 10156 Feb 3 13:15 testregion_mut.bam.22_33958087_33958088.log
-rw-r--r-- 1 8809 Feb 3 13:15 testregion_mut.bam.22_34141184_34141185.log
-rw-r--r-- 1 10011 Feb 3 13:14 testregion_mut.bam.22_34166720_34166721.log
-rw-r--r-- 1 9491 Feb 3 13:15 testregion_mut.bam.22_34264774_34264775.log

add SNVs on close locations failed

Hello,
I try to add two SNVs on a bam file (IonTorrent data).
The varfile looks like the following :

chr17   7577552 7577552 0.2722953   A
chr17   7577590 7577590 0.0521716   A

NB : the distance between the two SNVs is < 100pb so probably we have overlapping reads.
The problem is : at the end, 0.05 VAF is respected, 0.27 is not. If we do the steps independently, it works. I also noticed that intermediate merged mutbam respects the two VAFs. After some researches, I am convinced the failed step is ReplaceReads but I still don't understand why...

Here is a link where you can find my files (MLT019SK_2*). The command line I used is :
addsnv.py -f MLT019SK_2.bam -v MLT019SK_2.var -r hg19.fasta -o MLT019SK_2_mut.bam --maxdepth 150000 --picardjar picard.jar --aligner tmap --single

Thank you in avance!

Can you start doing versioning/releases?

We would like to use this software in a modular environment, but for reproducibility, we need the software to have versions.

While it is possible to use git hashes, they neither sort well nor give a true indication of how up to date a version is.

bamsurgeon gives error while creating snvs for bowtie2

While using
bamsurgeon-master/bin/addsnv.py -f /gpfs/gpfs0/projects/Drm/1000_genome_fastq/SRR1291026.sorted.bam -v test2.var -r ucsc.hg19.fasta -o SRR1291026_mutc.bam

I get the error below:

ERROR 2016-08-31 22:28:51.479370 encountered error in mutation spikein: ['chr1_144923733_144923733_0.563_None']
Traceback (most recent call last):
File "bamsurgeon-master/bin/addsnv.py", line 159, in makemut
mutfail, hasSNP, maxfrac, outreads, mutreads, mutmates = mutation.mutate(args, log, bamfile, bammate, chrom, min(mutpos_list), max(mutpos_list)+1, mutpos_list, avoid=avoid, mutid_list=mutid_list, is_snv=True, mutbase_list=mutbase_list, reffile=reffile)
File "/gpfs/gpfs0/home/mjalalzai/bamsurgeon-master/bamsurgeon/mutation.py", line 207, in mutate
basepile = countBaseAtPos(args.bamFileName,chrom,pcol.pos,mutid=region)
File "/gpfs/gpfs0/home/mjalalzai/bamsurgeon-master/bamsurgeon/mutation.py", line 14, in countBaseAtPos
p = subprocess.Popen(args,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
File "/usr/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

Reads flagged by Picard's ValidateSamFile after BAM surgeon

After inserting SV into a bam file, there seems to be an issue with problem read pairs being flagged by Picard's ValidateSamFile. Picard's ValidateSamFile.

http://picard.sourceforge.net/command-line-overview.shtml#ValidateSamFile

Below are a sample of the validation errors from the validation output. When the mate alignment does not agree for a read pair, the insert size calculated for each end of the read pair also does not agree. These errors are not being flagged in the original bam file so they are somehow being introduced by bam surgeon. Any ideas on what is going on with these reads?

ERROR: Record 3181916, Read name HWI-ST791:143380313:C2TU0ACXX:5:1106:3880:9468, Mate alignment does not match alignment start of mate
ERROR: Record 3181940, Read name HWI-ST791:143380313:C2TU0ACXX:5:1106:3880:9468, Mate alignment does not match alignment start of mate
ERROR: Record 3181968, Read name HWI-ST791:143380313:C2TU0ACXX:5:1315:10037:40271, Mate alignment does not match alignment start of mate
ERROR: Record 3182005, Read name HWI-ST791:143380313:C2TU0ACXX:5:1315:10037:40271, Mate alignment does not match alignment start of mate
ERROR: Record 3182116, Read name HWI-ST791:143380313:C2TU0ACXX:5:2113:7819:36357, Mate alignment does not match alignment start of mate

missing setup.py

there is no setup.py for package installation.

Infinite time when using addsnv.py on single-end IonTorrent data

I am trying to use bamsurgeon on BAM files generated by our own IonTorrent machine which produces single-end reads, and I get an infinite time execution.
All the files needed to test this are available here on GitHub : tdelhomme/test_bamsurgeon.
I am trying addsnv.py on a small BAM file (001_fragment.bam), and I would like only one mutation from a T reference to an A (test_var_file1.txt).
This is the command line :
../bamsurgeon-master/addsnv.py --single -v test_var_file1.txt -f 001_fragment.bam -r references/chr17_torrent.fasta -o test_result.bam

This is the messages I get, and after that the program is running and running with no end :

INFO 2015-08-24 15:56:10.907789 starting ../bamsurgeon-master/addsnv.py called with args: ../bamsurgeon-master/addsnv.py --single -v test_var_file1.txt -f BAM/001_fragment.bam -r references/chr17_torrent.fasta -o results/test.bam

Debug, haploclusters:[[{'start': 7573000, 'end': 7573001, 'chrom': 'chr17', 'altbase': 'A', 'vaf': 0.2}]]
INFO 2015-08-24 15:56:11.434572 haplo_chr17_7573000_7573000 creating tmp bam: addsnv.tmp/haplo_chr17_7573000_7573000.tmpbam.b8238288-be60-407a-8577-0923918b57f1.bam
WARN 2015-08-24 15:56:11.791626 haplo_chr17_7573000_7573001 could not pileup for region: chr17:7572794
INFO 2015-08-24 15:56:16.468507 haplo_chr17_7573000_7573001 mpileup failed, no coverage for base: chr17:7572836
WARN 2015-08-24 15:56:16.468885 haplo_chr17_7573000_7573001 could not pileup for region: chr17:7572836
INFO 2015-08-24 15:56:45.990263 haplo_chr17_7573000_7573000 len(readlist): 828
INFO 2015-08-24 15:56:45.993464 haplo_chr17_7573000_7573000 selected VAF: 0.2

INFO 2015-08-24 15:56:45.993591 haplo_chr17_7573000_7573000 picked: 165

Thank you in advance...

OSError: [Errno 17] File exists: 'addsnv.tmp'

When I run bamsurgeon on one of my BAM files, I get the following error message:

Traceback (most recent call last):
File "/ifs/home/c2b2/mp_lab/ams2432/bamsurgeon-master/bin/addsnv.py", line 513, in
run()
File "/ifs/home/c2b2/mp_lab/ams2432/bamsurgeon-master/bin/addsnv.py", line 510, in run
main(args)
File "/ifs/home/c2b2/mp_lab/ams2432/bamsurgeon-master/bin/addsnv.py", line 352, in main
os.mkdir(args.tmpdir)
OSError: [Errno 17] File exists: 'addsnv.tmp'

The program works for my other bam files. I am using identical parameters for all of the runs. These are the inputs:

INFO 2016-04-13 00:29:11.257192 starting /ifs/home/c2b2/mp_lab/ams2432/bamsurgeon-master/bin/addsnv.py called with args: /ifs/home/c2b2/mp_lab/ams2432/bamsurgeon-master/bin/addsnv.py -v /ifs/scratch/c2b2/mp_lab/ams2432/hutt/dnm_files/matching/bamsurgeon/bed_input/sites.non_cpgs.84.02.trio3.chr2.20000.txt -f /ifs/scratch/c2b2/mp_lab/ams2432/hutt/best_practices/recal_bams_split/recal_Hutt3-chr2_5mb_region.bam -r /ifs/data/c2b2/mp_lab/ams2432/files_needed/human_ref/hs37d5.fa -o /ifs/scratch/c2b2/mp_lab/ams2432/hutt/dnm_files/matching/bamsurgeon/spiked_bams/spikedin_20000_Hutt3-chr2_5mb_region.bam --picardjar /ifs/home/c2b2/mp_lab/ams2432/picard-tools-1.141/ --mindepth 11 --maxdepth 48 --minmutreads 3 --tagreads --aligner mem

I'm not sure I understand what the error is. How do I fix it?

When to use addindel versus addsv?

At what size of a deletion/insertion should the addsv.py be used instead of addindel.py?

Specify read SNP's to avoid

Cool tool! More of a feature request than an issue... It would be nice to be able to give a list of known SNP's, and then any read that contains any of these SNP's is avoided. This would make it easier to avoid certain haplotypes. Cheers, Dan.

adamewing / bamsurgeon Goto Github PK

bamsurgeon's People

Contributors

Stargazers

Watchers

Forkers

bamsurgeon's Issues

STDOUT

v addsnv.py failed.

TEST OUTPUT FILES

Recommend Projects

Recommend Topics

Recommend Org